Annotation of dmsdos/doc/cvfinfo.doc, revision 1.1.1.1

1.1       root        1: This file contains some information about the compressed filesystem layout.
                      2: 
                      3:                         The CVF Hacker's Guide :-)
                      4:                       ==============================
                      5: 
                      6: WARNING: This is not official M$ specs. In fact, it's a hacker's document.
                      7:          I don't know M$ specs, so this file may contain incorrect 
                      8:          information. Use at your own risk (see the GPL for details).
                      9: 
                     10: WARNING 2: Several parts of the compressed filesystem internals are still
                     11:          unknown to me. If this document is inaccurate in some details, it's
                     12:          because I don't know it more exactly. Feel free to add your
                     13:          knowledge. 
                     14: 
                     15: 
                     16: CVF format overview
                     17: -------------------
                     18: 
                     19: version                        compression          SPC(*)   max. size
                     20: dos 6.0/6.2 doublespace        DS-0-2                 16       512MB
                     21: dos 6.22 drivespace            JM-0-0                 16       512MB
                     22: win95 doublespace/drivespace   DS-0-0                 16       512MB
                     23: win95 drivespace 3             JM-0-0,JM-0-1,SQ-0-0   64       2GB
                     24: 
                     25:   (*)=Sectors Per Cluster
                     26: 
                     27: General filesystem layout
                     28: -------------------------
                     29: 
                     30: Superblock       (1 sector)
                     31: BITFAT           (several sectors)
                     32: MDFAT            (~ twice as large as FAT)
                     33: Bootblock        (1 sector)
                     34: FAT (only one)   (several sectors)
                     35: Root directory   (some sectors)
                     36: Data area        (many sectors)
                     37: Final sector     (1 sector)
                     38: 
                     39: There's some slack (or "reserved space") between some filesystem structures,
                     40: but I don't know what it is good for. Perhaps M$ don't know either.
                     41: 
                     42: Sector counting
                     43: ---------------
                     44: 
                     45: The Superblock is referred as sector 0. The rest of the sectors are counted
                     46: appropriately.
                     47: 
                     48: Superblock layout
                     49: -----------------
                     50: 
                     51: Byte positions are counted beginning with 0 for the first byte. Integers are
                     52: in low byte first order. Only important fields are listed here, usual dos
                     53: fields are omitted.
                     54: 
                     55: Pos. 3-10: string: signature "MSDBL6.0" or "MSDSP6.0"
                     56: Pos. 45,46: *signed* integer: dcluster offset for MDFAT lookups
                     57: Pos. 36,37: first sector of MDFAT minus 1
                     58: Pos. 17,18: number of entries in root directory
                     59: Pos. 13: sectors per cluster
                     60: Pos. 39,40: sector number of Bootblock
                     61: Pos. 14,15: sector offset of FAT start (relative to Bootblock). I.e. to
                     62:             obtain the sector number of the first FAT sector add Pos. 14,15
                     63:             to Pos. 39,40.
                     64: Pos. 41,42: sector offset of root directory start (relative to Bootblock). To
                     65:             obtain the sector number of the first root directory sector add 
                     66:             Pos. 41,42 to Pos. 39,40.
                     67: Pos. 43,44: sector offset of Data area minus 2 (relative to Bootblock). To
                     68:             obtain the sector number of the first Data area sector add 
                     69:             Pos. 43,44 to Pos. 39,40 and finally add 2.
                     70: Pos. 51: version flag (0=dos 6.0/6.2 or win95 doublespace, 1=??, 
                     71:                        2=dos 6.22 drivespace, 3 or 0 ??=win95 drivespace 3)
                     72:          Hint: drivespace 3 format can be recognized safely by watching
                     73:                the sectors per cluster value. The version flag seems to lie
                     74:                for drivespace 3. 
                     75: Pos. 57-60: usually string "12  " or "16  " as the rest of "FAT12  " and
                     76:             "FAT16  " (the spaces are important), but here seems to be a bug
                     77:             in some doublespace versions. PLEASE IGNORE THIS VALUE, IT 
                     78:             SOMETIMES LIES. Use the Bootblock's value instead.
                     79: Pos. 62-63: Maximum size of the CVF in Megabytes.
                     80: Pos. 32-35: Faked total number of sectors (it is something like the real
                     81:             number of sectors in the data area multiplied with the
                     82:             compression ratio). This value is important because it determines
                     83:             the maximum cluster number that is currently allowed for the
                     84:             CVF according to this formula (don't ask me why):
                     85: 
                     86:                         (Pos.33-35)-(Pos.22,23)-(Pos.14,15)-(Pos.17,18)/16
                     87:             max_cluster=--------------------------------------------------- + 1
                     88:                                              (Pos.13)
                     89: 
                     90:             (rounded down). Be sure not to exceed the limits due to FAT/MDFAT
                     91:             size or CVF size here. Since this formula has been found by
                     92:             trial and error, it may not be true in all screwy cases.
                     93: 
                     94: BITFAT layout
                     95: -------------
                     96: 
                     97: The BITFAT is a sector allocation map. Consider it as a list of bits each of
                     98: which represents one sector in the Data area. If a bit is set, the
                     99: appropriate sector contains data - if the bit is clear, the sector is free.
                    100: 
                    101: The first bit matches the first sector in the data area (and so on). The
                    102: bits are counted *wordwise* beginning with the most significant bit of the
                    103: word (where "word" means two bytes at once, low byte first).
                    104: 
                    105: So substract the number of the first data sector from the number of the data
                    106: sector you want to lookup information in the bitfat. Keep the result in
                    107: memory. Divide the resulting number by 16, round down, multiply with 2. Get 
                    108: the two bytes at this position in the bitfat (counted from its beginning)
                    109: and store them as word. Now watch the least 4 bits of the previosly
                    110: memorized result - they represent the bit number (counted from the most
                    111: significant bit) in the word. This bit corresponds to the data sector.
                    112: 
                    113: WARNING: The BITFAT sometimes is incorrect due to a missing system shutdown 
                    114:          under dos. If you want to write to the filesystem, be sure to
                    115:          check (and, if necessary, repair) the BITFAT before. See below
                    116:          how to do this.
                    117: 
                    118: MDFAT layout
                    119: ------------
                    120: 
                    121: MDFAT is organised as a stream of long integers (4 bytes, for drivespace 3:
                    122: 5 bytes). The data are sector-aligned - this means for drivespace 3 that the
                    123: last two bytes of a sector are slack. Consider the bytes in usual order
                    124: (low byte first).
                    125: 
                    126: The MDFAT contains additional information about a cluster:
                    127: 
                    128:      3322222222221111111111            (doublespace/drivespace)
                    129:      10987654321098765432109876543210
                    130:      uchhhhllll?sssssssssssssssssssss
                    131: 
                    132:      333333333322222222221111111111    (drivespace 3)
                    133:      9876543210987654321098765432109876543210
                    134:      uchhhhhhllllllf?ssssssssssssssssssssssss
                    135: 
                    136: u=1: The cluster is used, u=0: the cluster is unused. In the latter case the
                    137:      whole entry should be zerod. An unused cluster contains per definition
                    138:      only zeros ( C notation: '\0'). This is important if a program insists 
                    139:      on reading unused clusters!
                    140: c=1: The cluster is not compressed, c=0: the cluster is compressed.
                    141: h:   Size of decompressed cluster minus 1 (measured in units of 512 bytes).
                    142:      E.g. 3 means (3+1)*512 bytes.
                    143: l:   Size of compressed cluster data minus 1 (measured in units of 512
                    144:      bytes). If the cluster is not compressed according to the c bit, this
                    145:      value is identical to h.
                    146: f:   fragmented bit for drivespace 3. If it is set the cluster is fragmented
                    147:      and needs some special treatment on read and write access.
                    148: ?:   Unknown. Seems to contain random garbage.
                    149: s:   starting sector minus 1. I.e. if you want to read the cluster, read (l+1) 
                    150:      sectors beginning with sector (s+1). If the c bit is zero, the data must
                    151:      be decompressed now.
                    152:      Important: if the cluster on disk is shorter than the filesystem's
                    153:      sectors per cluster value, the missing rest at the end has to be treated 
                    154:      as if it was zerod out.
                    155: 
                    156: To lookup information in the MDFAT, take the cluster number, add the
                    157: dcluster offset (which may be negative!) and take the appropriate entry 
                    158: counted from the beginning of the MDFAT. Don't ignore the sector alignment
                    159: for drivespace 3.
                    160: 
                    161: Bootblock layout
                    162: ----------------
                    163: 
                    164: Emulates normal dos filesystem super block. Most dos fields are identical
                    165: to the Superblock except for the FAT16 or FAT12 string. The FAT bitsize string
                    166: that can be found in the Bootblock is correct while the one in the
                    167: Superblock may be garbage. Take a disk viewer and compare Bootblock and
                    168: Superblock yourself. There are slight differences, but I don't know exactly
                    169: where and why. You'd better never change anything in these blocks...
                    170: 
                    171: FAT layout
                    172: ----------
                    173: 
                    174: No need to explain. It's the same like in a normal dos filesystem. It may be
                    175: 12 or 16 bit according to the Bootblock, but *not* to the Superblock. This
                    176: seems to be a bug in doublespace - the Superblock's FAT bit size information
                    177: is sometimes wrong, so use the Bootblock's information.
                    178: 
                    179: Root directory
                    180: --------------
                    181: 
                    182: The same as in a normal dos filesystem. (The root directory is never
                    183: compressed.)
                    184: 
                    185: Data area
                    186: ---------
                    187: 
                    188: Well, that's the actual space for the data.
                    189: 
                    190: Final sector
                    191: ------------
                    192: 
                    193: Contains the signature "MDR". Must not be used by data. To find it you must
                    194: know the size of the CVF file. There's no pointer in the Superblock that
                    195: points to this sector.
                    196: 
                    197: Compressed clusters
                    198: -------------------
                    199: 
                    200: Compressed data (when the c bit is 0 in the MDFAT entry of a cluster) are
                    201: identified by a compression header. The header consists of 4 bytes which are
                    202: at the beginning of the compressed cluster data. The headers consist of two
                    203: bytes specifying the compression scheme and two bytes version number, and
                    204: usually look like this:
                    205: 
                    206: 'D', 'S', 0x00, 0x02, I write it as 'DS-0-2'
                    207: 'J', 'M', 0x00, 0x00
                    208: 'S', 'Q', 0x00, 0x00
                    209: 
                    210: The version number seems to be ignored though M$ claim that, for example,
                    211: 'High' (JM-0-1) compresses better than 'Normal' (JM-0-0). That's nonsense
                    212: from the compressed format point of view, the format is in fact the same.
                    213: Maybe the original M$ software uses different *compression algorithms* 
                    214: which may be more or less efficient, but they're not using not different 
                    215: *compression schemes*. So in fact there are three schemes: DS, JM, and SQ.
                    216: DS and JM are quite similar, for a decompression algorithm see the dmsdos
                    217: or thsfs sources (both are GPL code, you may reuse it).
                    218: 
                    219: As far as I know, dos 6.x versions of doublespace/drivespace never compress
                    220: directories and never cut them off (if only the first sectors of the cluster
                    221: are used, it is in fact possible to cut the cluster since the unused slack 
                    222: is, per definition, to be treated as if it was zerod out). It is unknown
                    223: whether these versions can read compressed or shortened directories, but it
                    224: is sure they never compress or shorten them. So I just recommend not to do it
                    225: either. drivespace 3 usually cuts off directories and sometimes even
                    226: compresses them though compression of directories is a great performance loss.
                    227: win95 doublespace/drivespace (not drivespace 3) never cuts directories but
                    228: also compresses them sometimes.
                    229: 
                    230: Fragmented clustes
                    231: ------------------
                    232: 
                    233: To make things more complex, M$ have invented these strange things.
                    234: Unfortunately, they need some special treatment.
                    235: 
                    236: A fragmented cluster can be recognized by watching the 'f' bit in the MDFAT.
                    237: This bit only exists in drivespace 3 format.
                    238: 
                    239: The first sector of the cluster contains a fragmentation list. This list
                    240: contains entries each of which use 4 bytes. The first one is the
                    241: fragmentation count - it specifies into how many fragments the cluster is
                    242: devided. It must be > 1 and <=64.
                    243: 
                    244: The following entries are pointers to fragments of data like this:
                    245: 
                    246:     3322222222221111111111
                    247:     10987654321098765432109876543210
                    248:     lllllluussssssssssssssssssssssss
                    249: 
                    250: s: start sector minus 1 - the fragment begins at sector (s+1).
                    251: u: unused and zero (?)
                    252: l: sector count minus 1 - the fragment contains (l+1) sectors beginning
                    253:    with sector (s+1). This means raw data if compressed.
                    254: 
                    255: The first entry always points to the fragmentation list itself. I.e.
                    256: the s and l fields of the first fragmentation list entry are always the same
                    257: as the ones in the MDFAT entry. The first fragment is not restricted to
                    258: contain *only* the fragmentation list, however.
                    259: 
                    260: Now it becomes slightly difficult because the data are stored differently
                    261: depending on whether the cluster is compressed or not. If the cluster is
                    262: compressed the raw (compressed) data begin immediately after the last entry
                    263: of the fragmentation list. The byte position can be calculated by multiplying
                    264: the fragmentation count with 4. Further raw data can be found in the other
                    265: fragments in order.
                    266: 
                    267: If the cluster is not compressed, the (uncompressed) data begin in the
                    268: sector that follows the sector containing the fragmentation list. If the
                    269: first fragment has only the length of 1 sector the data begin in the second
                    270: fragment. Further data are in the fragments in order.
                    271: 
                    272: General rules for cluster access
                    273: --------------------------------
                    274: 
                    275: I'm assuming you want to access cluster number x (x!=0 i.e. not root directory
                    276: - this one should be clear without further explanation).
                    277: 
                    278: How to read cluster x from the compressed filesystem
                    279: ----------------------------------------------------
                    280:  
                    281:   * Get and decode the MDFAT entry for the cluster: lookup entry number 
                    282:     (x+dcluster). dcluster and start of the MDFAT can be obtained from the
                    283:     Superblock.
                    284: 
                    285:   * If the MDFAT entry is unused (u bit clear), just return a cluster full of
                    286:     zeros (0x00).
                    287: 
                    288:   * Read (l+1) sectors beginning with sector (s+1).
                    289: 
                    290:   * If the cluster is fragmented ... uuhhhhh ... you'd better issue an
                    291:     error and encourage the user to boot win95 and defragment the drive.
                    292:     Otherwise read and interpret the fragmentation list now.
                    293: 
                    294:   * If the data are compressed (c bit clear) decompress them.
                    295: 
                    296:   * If the cluster is shortened (i.e. h+1 < sectors per cluster) zero out
                    297:     the rest of the cluster in memory. The sector per cluster value can be
                    298:     obtained from the Superblock.
                    299: 
                    300: How to write cluster x to the compressed filesystem
                    301: ---------------------------------------------------
                    302: 
                    303: WARNING: Be sure you can trust your BITFAT, i.e. have it checked before.
                    304:          See below how to do this.
                    305: 
                    306:   * Be sure to know whether the cluster may be shortened. The size in
                    307:     sectors minus 1 will become the h value of the MDFAT entry later.
                    308: 
                    309:   * If you want, compress the data. Be sure the data really become smaller.
                    310:     Determine the size of the compressed data in sectors and subtract 1 -
                    311:     this will become the l value of the MDFAT entry later. If you don't
                    312:     want to compress the data or the data turn out to be incompressible,
                    313:     set the l to the same value as h and use the uncompressed original data.
                    314:     DON'T ACTUALLY WRITE TO THE MDFAT AT THIS POINT!
                    315: 
                    316:   * Delete the old cluster x that may have been written earlier (see below).
                    317: 
                    318:   * Search for (l+1) free continuous sectors in the BITFAT. Be prepared for
                    319:     failure here (i.e. if the disk is full or too fragmented). Allocate the 
                    320:     sectors by setting the appropriate bits in the BITFAT. Now you can create
                    321:     the MDFAT entry and write it to disk - please note to subtract 1 from the
                    322:     sector number when creating the s value of the MDFAT entry. Also don't
                    323:     forget to set the c bit if the data are not compressed.
                    324: 
                    325:   * Write the (l+1) sectors to disk beginning with sector (s+1).
                    326: 
                    327: How to delete cluster x in a compressed filesystem
                    328: --------------------------------------------------
                    329: 
                    330: WARNING: Be sure you can trust your BITFAT, i.e. have it checked before.
                    331:          See below how to do this.
                    332: 
                    333:   * Get the appropriate MDFAT entry (x+dcluster). If it is unused (u bit
                    334:     clear) there's nothing to do.
                    335: 
                    336:   * If the cluster is fragmented, scan and check the fragmentation list
                    337:     and free up all the fragments.
                    338: 
                    339:   * Otherwise free up (l+1) sectors beginning with sector (s+1) in the BITFAT 
                    340:     by clearing the appropriate bits. Be sure to do a range checking before so
                    341:     you don't corrupt the filesystem if there's garbage in the s field of
                    342:     the MDFAT entry.
                    343: 
                    344:   * Zero out the MDFAT entry completely. Don't just clear the used bit.
                    345: 
                    346: How to check and repair the BITFAT
                    347: ----------------------------------
                    348: 
                    349: Dos seems to recalculate the BITFAT on each bootup. This points out that
                    350: even M$ programmers didn't trust it, so you shouldn't do either if you plan
                    351: to write to the compressed partition.
                    352: 
                    353: It's easy. Just scan the complete MDFAT for used entries (u bit set). You
                    354: get from the l and the s values (don't forget to add 1 in each case) which
                    355: sectors are allocated. Doing this for the whole MDFAT, you get a list of 
                    356: which sectors are used and which are free. Then you can compare this list to
                    357: the BITFAT. If you just keep the list in memory in the same bit encoding as
                    358: used in the real BITFAT, you can just write the complete list to disk and
                    359: replace the BITFAT by it. Uhh, yes, you may need up to 512 KB memory for
                    360: the data for this purpose...
                    361: 
                    362: If you are using drivespace 3 please keep in mind that you also have to
                    363: take care of fragmented clusters (i.e. check the fragmentation bit and scan
                    364: the fragmentation list if necessary).
                    365: 
                    366: Further related documents about compressed filesystems
                    367: ------------------------------------------------------
                    368: 
                    369:  - thsfs source (sunsite and mirrors)
                    370:  - dmsdosfs source (sunsite and mirrors)
                    371:  - Bill Gates' secret drawers
                    372:  - Murphy's law

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.