Annotation of dmsdos/doc/cvfinfo.doc, revision 1.1

1.1     ! root        1: This file contains some information about the compressed filesystem layout.
        !             2: 
        !             3:                         The CVF Hacker's Guide :-)
        !             4:                       ==============================
        !             5: 
        !             6: WARNING: This is not official M$ specs. In fact, it's a hacker's document.
        !             7:          I don't know M$ specs, so this file may contain incorrect 
        !             8:          information. Use at your own risk (see the GPL for details).
        !             9: 
        !            10: WARNING 2: Several parts of the compressed filesystem internals are still
        !            11:          unknown to me. If this document is inaccurate in some details, it's
        !            12:          because I don't know it more exactly. Feel free to add your
        !            13:          knowledge. 
        !            14: 
        !            15: 
        !            16: CVF format overview
        !            17: -------------------
        !            18: 
        !            19: version                        compression          SPC(*)   max. size
        !            20: dos 6.0/6.2 doublespace        DS-0-2                 16       512MB
        !            21: dos 6.22 drivespace            JM-0-0                 16       512MB
        !            22: win95 doublespace/drivespace   DS-0-0                 16       512MB
        !            23: win95 drivespace 3             JM-0-0,JM-0-1,SQ-0-0   64       2GB
        !            24: 
        !            25:   (*)=Sectors Per Cluster
        !            26: 
        !            27: General filesystem layout
        !            28: -------------------------
        !            29: 
        !            30: Superblock       (1 sector)
        !            31: BITFAT           (several sectors)
        !            32: MDFAT            (~ twice as large as FAT)
        !            33: Bootblock        (1 sector)
        !            34: FAT (only one)   (several sectors)
        !            35: Root directory   (some sectors)
        !            36: Data area        (many sectors)
        !            37: Final sector     (1 sector)
        !            38: 
        !            39: There's some slack (or "reserved space") between some filesystem structures,
        !            40: but I don't know what it is good for. Perhaps M$ don't know either.
        !            41: 
        !            42: Sector counting
        !            43: ---------------
        !            44: 
        !            45: The Superblock is referred as sector 0. The rest of the sectors are counted
        !            46: appropriately.
        !            47: 
        !            48: Superblock layout
        !            49: -----------------
        !            50: 
        !            51: Byte positions are counted beginning with 0 for the first byte. Integers are
        !            52: in low byte first order. Only important fields are listed here, usual dos
        !            53: fields are omitted.
        !            54: 
        !            55: Pos. 3-10: string: signature "MSDBL6.0" or "MSDSP6.0"
        !            56: Pos. 45,46: *signed* integer: dcluster offset for MDFAT lookups
        !            57: Pos. 36,37: first sector of MDFAT minus 1
        !            58: Pos. 17,18: number of entries in root directory
        !            59: Pos. 13: sectors per cluster
        !            60: Pos. 39,40: sector number of Bootblock
        !            61: Pos. 14,15: sector offset of FAT start (relative to Bootblock). I.e. to
        !            62:             obtain the sector number of the first FAT sector add Pos. 14,15
        !            63:             to Pos. 39,40.
        !            64: Pos. 41,42: sector offset of root directory start (relative to Bootblock). To
        !            65:             obtain the sector number of the first root directory sector add 
        !            66:             Pos. 41,42 to Pos. 39,40.
        !            67: Pos. 43,44: sector offset of Data area minus 2 (relative to Bootblock). To
        !            68:             obtain the sector number of the first Data area sector add 
        !            69:             Pos. 43,44 to Pos. 39,40 and finally add 2.
        !            70: Pos. 51: version flag (0=dos 6.0/6.2 or win95 doublespace, 1=??, 
        !            71:                        2=dos 6.22 drivespace, 3 or 0 ??=win95 drivespace 3)
        !            72:          Hint: drivespace 3 format can be recognized safely by watching
        !            73:                the sectors per cluster value. The version flag seems to lie
        !            74:                for drivespace 3. 
        !            75: Pos. 57-60: usually string "12  " or "16  " as the rest of "FAT12  " and
        !            76:             "FAT16  " (the spaces are important), but here seems to be a bug
        !            77:             in some doublespace versions. PLEASE IGNORE THIS VALUE, IT 
        !            78:             SOMETIMES LIES. Use the Bootblock's value instead.
        !            79: Pos. 62-63: Maximum size of the CVF in Megabytes.
        !            80: Pos. 32-35: Faked total number of sectors (it is something like the real
        !            81:             number of sectors in the data area multiplied with the
        !            82:             compression ratio). This value is important because it determines
        !            83:             the maximum cluster number that is currently allowed for the
        !            84:             CVF according to this formula (don't ask me why):
        !            85: 
        !            86:                         (Pos.33-35)-(Pos.22,23)-(Pos.14,15)-(Pos.17,18)/16
        !            87:             max_cluster=--------------------------------------------------- + 1
        !            88:                                              (Pos.13)
        !            89: 
        !            90:             (rounded down). Be sure not to exceed the limits due to FAT/MDFAT
        !            91:             size or CVF size here. Since this formula has been found by
        !            92:             trial and error, it may not be true in all screwy cases.
        !            93: 
        !            94: BITFAT layout
        !            95: -------------
        !            96: 
        !            97: The BITFAT is a sector allocation map. Consider it as a list of bits each of
        !            98: which represents one sector in the Data area. If a bit is set, the
        !            99: appropriate sector contains data - if the bit is clear, the sector is free.
        !           100: 
        !           101: The first bit matches the first sector in the data area (and so on). The
        !           102: bits are counted *wordwise* beginning with the most significant bit of the
        !           103: word (where "word" means two bytes at once, low byte first).
        !           104: 
        !           105: So substract the number of the first data sector from the number of the data
        !           106: sector you want to lookup information in the bitfat. Keep the result in
        !           107: memory. Divide the resulting number by 16, round down, multiply with 2. Get 
        !           108: the two bytes at this position in the bitfat (counted from its beginning)
        !           109: and store them as word. Now watch the least 4 bits of the previosly
        !           110: memorized result - they represent the bit number (counted from the most
        !           111: significant bit) in the word. This bit corresponds to the data sector.
        !           112: 
        !           113: WARNING: The BITFAT sometimes is incorrect due to a missing system shutdown 
        !           114:          under dos. If you want to write to the filesystem, be sure to
        !           115:          check (and, if necessary, repair) the BITFAT before. See below
        !           116:          how to do this.
        !           117: 
        !           118: MDFAT layout
        !           119: ------------
        !           120: 
        !           121: MDFAT is organised as a stream of long integers (4 bytes, for drivespace 3:
        !           122: 5 bytes). The data are sector-aligned - this means for drivespace 3 that the
        !           123: last two bytes of a sector are slack. Consider the bytes in usual order
        !           124: (low byte first).
        !           125: 
        !           126: The MDFAT contains additional information about a cluster:
        !           127: 
        !           128:      3322222222221111111111            (doublespace/drivespace)
        !           129:      10987654321098765432109876543210
        !           130:      uchhhhllll?sssssssssssssssssssss
        !           131: 
        !           132:      333333333322222222221111111111    (drivespace 3)
        !           133:      9876543210987654321098765432109876543210
        !           134:      uchhhhhhllllllf?ssssssssssssssssssssssss
        !           135: 
        !           136: u=1: The cluster is used, u=0: the cluster is unused. In the latter case the
        !           137:      whole entry should be zerod. An unused cluster contains per definition
        !           138:      only zeros ( C notation: '\0'). This is important if a program insists 
        !           139:      on reading unused clusters!
        !           140: c=1: The cluster is not compressed, c=0: the cluster is compressed.
        !           141: h:   Size of decompressed cluster minus 1 (measured in units of 512 bytes).
        !           142:      E.g. 3 means (3+1)*512 bytes.
        !           143: l:   Size of compressed cluster data minus 1 (measured in units of 512
        !           144:      bytes). If the cluster is not compressed according to the c bit, this
        !           145:      value is identical to h.
        !           146: f:   fragmented bit for drivespace 3. If it is set the cluster is fragmented
        !           147:      and needs some special treatment on read and write access.
        !           148: ?:   Unknown. Seems to contain random garbage.
        !           149: s:   starting sector minus 1. I.e. if you want to read the cluster, read (l+1) 
        !           150:      sectors beginning with sector (s+1). If the c bit is zero, the data must
        !           151:      be decompressed now.
        !           152:      Important: if the cluster on disk is shorter than the filesystem's
        !           153:      sectors per cluster value, the missing rest at the end has to be treated 
        !           154:      as if it was zerod out.
        !           155: 
        !           156: To lookup information in the MDFAT, take the cluster number, add the
        !           157: dcluster offset (which may be negative!) and take the appropriate entry 
        !           158: counted from the beginning of the MDFAT. Don't ignore the sector alignment
        !           159: for drivespace 3.
        !           160: 
        !           161: Bootblock layout
        !           162: ----------------
        !           163: 
        !           164: Emulates normal dos filesystem super block. Most dos fields are identical
        !           165: to the Superblock except for the FAT16 or FAT12 string. The FAT bitsize string
        !           166: that can be found in the Bootblock is correct while the one in the
        !           167: Superblock may be garbage. Take a disk viewer and compare Bootblock and
        !           168: Superblock yourself. There are slight differences, but I don't know exactly
        !           169: where and why. You'd better never change anything in these blocks...
        !           170: 
        !           171: FAT layout
        !           172: ----------
        !           173: 
        !           174: No need to explain. It's the same like in a normal dos filesystem. It may be
        !           175: 12 or 16 bit according to the Bootblock, but *not* to the Superblock. This
        !           176: seems to be a bug in doublespace - the Superblock's FAT bit size information
        !           177: is sometimes wrong, so use the Bootblock's information.
        !           178: 
        !           179: Root directory
        !           180: --------------
        !           181: 
        !           182: The same as in a normal dos filesystem. (The root directory is never
        !           183: compressed.)
        !           184: 
        !           185: Data area
        !           186: ---------
        !           187: 
        !           188: Well, that's the actual space for the data.
        !           189: 
        !           190: Final sector
        !           191: ------------
        !           192: 
        !           193: Contains the signature "MDR". Must not be used by data. To find it you must
        !           194: know the size of the CVF file. There's no pointer in the Superblock that
        !           195: points to this sector.
        !           196: 
        !           197: Compressed clusters
        !           198: -------------------
        !           199: 
        !           200: Compressed data (when the c bit is 0 in the MDFAT entry of a cluster) are
        !           201: identified by a compression header. The header consists of 4 bytes which are
        !           202: at the beginning of the compressed cluster data. The headers consist of two
        !           203: bytes specifying the compression scheme and two bytes version number, and
        !           204: usually look like this:
        !           205: 
        !           206: 'D', 'S', 0x00, 0x02, I write it as 'DS-0-2'
        !           207: 'J', 'M', 0x00, 0x00
        !           208: 'S', 'Q', 0x00, 0x00
        !           209: 
        !           210: The version number seems to be ignored though M$ claim that, for example,
        !           211: 'High' (JM-0-1) compresses better than 'Normal' (JM-0-0). That's nonsense
        !           212: from the compressed format point of view, the format is in fact the same.
        !           213: Maybe the original M$ software uses different *compression algorithms* 
        !           214: which may be more or less efficient, but they're not using not different 
        !           215: *compression schemes*. So in fact there are three schemes: DS, JM, and SQ.
        !           216: DS and JM are quite similar, for a decompression algorithm see the dmsdos
        !           217: or thsfs sources (both are GPL code, you may reuse it).
        !           218: 
        !           219: As far as I know, dos 6.x versions of doublespace/drivespace never compress
        !           220: directories and never cut them off (if only the first sectors of the cluster
        !           221: are used, it is in fact possible to cut the cluster since the unused slack 
        !           222: is, per definition, to be treated as if it was zerod out). It is unknown
        !           223: whether these versions can read compressed or shortened directories, but it
        !           224: is sure they never compress or shorten them. So I just recommend not to do it
        !           225: either. drivespace 3 usually cuts off directories and sometimes even
        !           226: compresses them though compression of directories is a great performance loss.
        !           227: win95 doublespace/drivespace (not drivespace 3) never cuts directories but
        !           228: also compresses them sometimes.
        !           229: 
        !           230: Fragmented clustes
        !           231: ------------------
        !           232: 
        !           233: To make things more complex, M$ have invented these strange things.
        !           234: Unfortunately, they need some special treatment.
        !           235: 
        !           236: A fragmented cluster can be recognized by watching the 'f' bit in the MDFAT.
        !           237: This bit only exists in drivespace 3 format.
        !           238: 
        !           239: The first sector of the cluster contains a fragmentation list. This list
        !           240: contains entries each of which use 4 bytes. The first one is the
        !           241: fragmentation count - it specifies into how many fragments the cluster is
        !           242: devided. It must be > 1 and <=64.
        !           243: 
        !           244: The following entries are pointers to fragments of data like this:
        !           245: 
        !           246:     3322222222221111111111
        !           247:     10987654321098765432109876543210
        !           248:     lllllluussssssssssssssssssssssss
        !           249: 
        !           250: s: start sector minus 1 - the fragment begins at sector (s+1).
        !           251: u: unused and zero (?)
        !           252: l: sector count minus 1 - the fragment contains (l+1) sectors beginning
        !           253:    with sector (s+1). This means raw data if compressed.
        !           254: 
        !           255: The first entry always points to the fragmentation list itself. I.e.
        !           256: the s and l fields of the first fragmentation list entry are always the same
        !           257: as the ones in the MDFAT entry. The first fragment is not restricted to
        !           258: contain *only* the fragmentation list, however.
        !           259: 
        !           260: Now it becomes slightly difficult because the data are stored differently
        !           261: depending on whether the cluster is compressed or not. If the cluster is
        !           262: compressed the raw (compressed) data begin immediately after the last entry
        !           263: of the fragmentation list. The byte position can be calculated by multiplying
        !           264: the fragmentation count with 4. Further raw data can be found in the other
        !           265: fragments in order.
        !           266: 
        !           267: If the cluster is not compressed, the (uncompressed) data begin in the
        !           268: sector that follows the sector containing the fragmentation list. If the
        !           269: first fragment has only the length of 1 sector the data begin in the second
        !           270: fragment. Further data are in the fragments in order.
        !           271: 
        !           272: General rules for cluster access
        !           273: --------------------------------
        !           274: 
        !           275: I'm assuming you want to access cluster number x (x!=0 i.e. not root directory
        !           276: - this one should be clear without further explanation).
        !           277: 
        !           278: How to read cluster x from the compressed filesystem
        !           279: ----------------------------------------------------
        !           280:  
        !           281:   * Get and decode the MDFAT entry for the cluster: lookup entry number 
        !           282:     (x+dcluster). dcluster and start of the MDFAT can be obtained from the
        !           283:     Superblock.
        !           284: 
        !           285:   * If the MDFAT entry is unused (u bit clear), just return a cluster full of
        !           286:     zeros (0x00).
        !           287: 
        !           288:   * Read (l+1) sectors beginning with sector (s+1).
        !           289: 
        !           290:   * If the cluster is fragmented ... uuhhhhh ... you'd better issue an
        !           291:     error and encourage the user to boot win95 and defragment the drive.
        !           292:     Otherwise read and interpret the fragmentation list now.
        !           293: 
        !           294:   * If the data are compressed (c bit clear) decompress them.
        !           295: 
        !           296:   * If the cluster is shortened (i.e. h+1 < sectors per cluster) zero out
        !           297:     the rest of the cluster in memory. The sector per cluster value can be
        !           298:     obtained from the Superblock.
        !           299: 
        !           300: How to write cluster x to the compressed filesystem
        !           301: ---------------------------------------------------
        !           302: 
        !           303: WARNING: Be sure you can trust your BITFAT, i.e. have it checked before.
        !           304:          See below how to do this.
        !           305: 
        !           306:   * Be sure to know whether the cluster may be shortened. The size in
        !           307:     sectors minus 1 will become the h value of the MDFAT entry later.
        !           308: 
        !           309:   * If you want, compress the data. Be sure the data really become smaller.
        !           310:     Determine the size of the compressed data in sectors and subtract 1 -
        !           311:     this will become the l value of the MDFAT entry later. If you don't
        !           312:     want to compress the data or the data turn out to be incompressible,
        !           313:     set the l to the same value as h and use the uncompressed original data.
        !           314:     DON'T ACTUALLY WRITE TO THE MDFAT AT THIS POINT!
        !           315: 
        !           316:   * Delete the old cluster x that may have been written earlier (see below).
        !           317: 
        !           318:   * Search for (l+1) free continuous sectors in the BITFAT. Be prepared for
        !           319:     failure here (i.e. if the disk is full or too fragmented). Allocate the 
        !           320:     sectors by setting the appropriate bits in the BITFAT. Now you can create
        !           321:     the MDFAT entry and write it to disk - please note to subtract 1 from the
        !           322:     sector number when creating the s value of the MDFAT entry. Also don't
        !           323:     forget to set the c bit if the data are not compressed.
        !           324: 
        !           325:   * Write the (l+1) sectors to disk beginning with sector (s+1).
        !           326: 
        !           327: How to delete cluster x in a compressed filesystem
        !           328: --------------------------------------------------
        !           329: 
        !           330: WARNING: Be sure you can trust your BITFAT, i.e. have it checked before.
        !           331:          See below how to do this.
        !           332: 
        !           333:   * Get the appropriate MDFAT entry (x+dcluster). If it is unused (u bit
        !           334:     clear) there's nothing to do.
        !           335: 
        !           336:   * If the cluster is fragmented, scan and check the fragmentation list
        !           337:     and free up all the fragments.
        !           338: 
        !           339:   * Otherwise free up (l+1) sectors beginning with sector (s+1) in the BITFAT 
        !           340:     by clearing the appropriate bits. Be sure to do a range checking before so
        !           341:     you don't corrupt the filesystem if there's garbage in the s field of
        !           342:     the MDFAT entry.
        !           343: 
        !           344:   * Zero out the MDFAT entry completely. Don't just clear the used bit.
        !           345: 
        !           346: How to check and repair the BITFAT
        !           347: ----------------------------------
        !           348: 
        !           349: Dos seems to recalculate the BITFAT on each bootup. This points out that
        !           350: even M$ programmers didn't trust it, so you shouldn't do either if you plan
        !           351: to write to the compressed partition.
        !           352: 
        !           353: It's easy. Just scan the complete MDFAT for used entries (u bit set). You
        !           354: get from the l and the s values (don't forget to add 1 in each case) which
        !           355: sectors are allocated. Doing this for the whole MDFAT, you get a list of 
        !           356: which sectors are used and which are free. Then you can compare this list to
        !           357: the BITFAT. If you just keep the list in memory in the same bit encoding as
        !           358: used in the real BITFAT, you can just write the complete list to disk and
        !           359: replace the BITFAT by it. Uhh, yes, you may need up to 512 KB memory for
        !           360: the data for this purpose...
        !           361: 
        !           362: If you are using drivespace 3 please keep in mind that you also have to
        !           363: take care of fragmented clusters (i.e. check the fragmentation bit and scan
        !           364: the fragmentation list if necessary).
        !           365: 
        !           366: Further related documents about compressed filesystems
        !           367: ------------------------------------------------------
        !           368: 
        !           369:  - thsfs source (sunsite and mirrors)
        !           370:  - dmsdosfs source (sunsite and mirrors)
        !           371:  - Bill Gates' secret drawers
        !           372:  - Murphy's law

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.