|
|
1.1 ! root 1: This file contains some information about the compressed filesystem layout. ! 2: ! 3: The CVF Hacker's Guide :-) ! 4: ============================== ! 5: ! 6: WARNING: This is not official M$ specs. In fact, it's a hacker's document. ! 7: I don't know M$ specs, so this file may contain incorrect ! 8: information. Use at your own risk (see the GPL for details). ! 9: ! 10: WARNING 2: Several parts of the compressed filesystem internals are still ! 11: unknown to me. If this document is inaccurate in some details, it's ! 12: because I don't know it more exactly. Feel free to add your ! 13: knowledge. ! 14: ! 15: ! 16: CVF format overview ! 17: ------------------- ! 18: ! 19: version compression SPC(*) max. size ! 20: dos 6.0/6.2 doublespace DS-0-2 16 512MB ! 21: dos 6.22 drivespace JM-0-0 16 512MB ! 22: win95 doublespace/drivespace DS-0-0 16 512MB ! 23: win95 drivespace 3 JM-0-0,JM-0-1,SQ-0-0 64 2GB ! 24: ! 25: (*)=Sectors Per Cluster ! 26: ! 27: General filesystem layout ! 28: ------------------------- ! 29: ! 30: Superblock (1 sector) ! 31: BITFAT (several sectors) ! 32: MDFAT (~ twice as large as FAT) ! 33: Bootblock (1 sector) ! 34: FAT (only one) (several sectors) ! 35: Root directory (some sectors) ! 36: Data area (many sectors) ! 37: Final sector (1 sector) ! 38: ! 39: There's some slack (or "reserved space") between some filesystem structures, ! 40: but I don't know what it is good for. Perhaps M$ don't know either. ! 41: ! 42: Sector counting ! 43: --------------- ! 44: ! 45: The Superblock is referred as sector 0. The rest of the sectors are counted ! 46: appropriately. ! 47: ! 48: Superblock layout ! 49: ----------------- ! 50: ! 51: Byte positions are counted beginning with 0 for the first byte. Integers are ! 52: in low byte first order. Only important fields are listed here, usual dos ! 53: fields are omitted. ! 54: ! 55: Pos. 3-10: string: signature "MSDBL6.0" or "MSDSP6.0" ! 56: Pos. 45,46: *signed* integer: dcluster offset for MDFAT lookups ! 57: Pos. 36,37: first sector of MDFAT minus 1 ! 58: Pos. 17,18: number of entries in root directory ! 59: Pos. 13: sectors per cluster ! 60: Pos. 39,40: sector number of Bootblock ! 61: Pos. 14,15: sector offset of FAT start (relative to Bootblock). I.e. to ! 62: obtain the sector number of the first FAT sector add Pos. 14,15 ! 63: to Pos. 39,40. ! 64: Pos. 41,42: sector offset of root directory start (relative to Bootblock). To ! 65: obtain the sector number of the first root directory sector add ! 66: Pos. 41,42 to Pos. 39,40. ! 67: Pos. 43,44: sector offset of Data area minus 2 (relative to Bootblock). To ! 68: obtain the sector number of the first Data area sector add ! 69: Pos. 43,44 to Pos. 39,40 and finally add 2. ! 70: Pos. 51: version flag (0=dos 6.0/6.2 or win95 doublespace, 1=??, ! 71: 2=dos 6.22 drivespace, 3 or 0 ??=win95 drivespace 3) ! 72: Hint: drivespace 3 format can be recognized safely by watching ! 73: the sectors per cluster value. The version flag seems to lie ! 74: for drivespace 3. ! 75: Pos. 57-60: usually string "12 " or "16 " as the rest of "FAT12 " and ! 76: "FAT16 " (the spaces are important), but here seems to be a bug ! 77: in some doublespace versions. PLEASE IGNORE THIS VALUE, IT ! 78: SOMETIMES LIES. Use the Bootblock's value instead. ! 79: Pos. 62-63: Maximum size of the CVF in Megabytes. ! 80: Pos. 32-35: Faked total number of sectors (it is something like the real ! 81: number of sectors in the data area multiplied with the ! 82: compression ratio). This value is important because it determines ! 83: the maximum cluster number that is currently allowed for the ! 84: CVF according to this formula (don't ask me why): ! 85: ! 86: (Pos.33-35)-(Pos.22,23)-(Pos.14,15)-(Pos.17,18)/16 ! 87: max_cluster=--------------------------------------------------- + 1 ! 88: (Pos.13) ! 89: ! 90: (rounded down). Be sure not to exceed the limits due to FAT/MDFAT ! 91: size or CVF size here. Since this formula has been found by ! 92: trial and error, it may not be true in all screwy cases. ! 93: ! 94: BITFAT layout ! 95: ------------- ! 96: ! 97: The BITFAT is a sector allocation map. Consider it as a list of bits each of ! 98: which represents one sector in the Data area. If a bit is set, the ! 99: appropriate sector contains data - if the bit is clear, the sector is free. ! 100: ! 101: The first bit matches the first sector in the data area (and so on). The ! 102: bits are counted *wordwise* beginning with the most significant bit of the ! 103: word (where "word" means two bytes at once, low byte first). ! 104: ! 105: So substract the number of the first data sector from the number of the data ! 106: sector you want to lookup information in the bitfat. Keep the result in ! 107: memory. Divide the resulting number by 16, round down, multiply with 2. Get ! 108: the two bytes at this position in the bitfat (counted from its beginning) ! 109: and store them as word. Now watch the least 4 bits of the previosly ! 110: memorized result - they represent the bit number (counted from the most ! 111: significant bit) in the word. This bit corresponds to the data sector. ! 112: ! 113: WARNING: The BITFAT sometimes is incorrect due to a missing system shutdown ! 114: under dos. If you want to write to the filesystem, be sure to ! 115: check (and, if necessary, repair) the BITFAT before. See below ! 116: how to do this. ! 117: ! 118: MDFAT layout ! 119: ------------ ! 120: ! 121: MDFAT is organised as a stream of long integers (4 bytes, for drivespace 3: ! 122: 5 bytes). The data are sector-aligned - this means for drivespace 3 that the ! 123: last two bytes of a sector are slack. Consider the bytes in usual order ! 124: (low byte first). ! 125: ! 126: The MDFAT contains additional information about a cluster: ! 127: ! 128: 3322222222221111111111 (doublespace/drivespace) ! 129: 10987654321098765432109876543210 ! 130: uchhhhllll?sssssssssssssssssssss ! 131: ! 132: 333333333322222222221111111111 (drivespace 3) ! 133: 9876543210987654321098765432109876543210 ! 134: uchhhhhhllllllf?ssssssssssssssssssssssss ! 135: ! 136: u=1: The cluster is used, u=0: the cluster is unused. In the latter case the ! 137: whole entry should be zerod. An unused cluster contains per definition ! 138: only zeros ( C notation: '\0'). This is important if a program insists ! 139: on reading unused clusters! ! 140: c=1: The cluster is not compressed, c=0: the cluster is compressed. ! 141: h: Size of decompressed cluster minus 1 (measured in units of 512 bytes). ! 142: E.g. 3 means (3+1)*512 bytes. ! 143: l: Size of compressed cluster data minus 1 (measured in units of 512 ! 144: bytes). If the cluster is not compressed according to the c bit, this ! 145: value is identical to h. ! 146: f: fragmented bit for drivespace 3. If it is set the cluster is fragmented ! 147: and needs some special treatment on read and write access. ! 148: ?: Unknown. Seems to contain random garbage. ! 149: s: starting sector minus 1. I.e. if you want to read the cluster, read (l+1) ! 150: sectors beginning with sector (s+1). If the c bit is zero, the data must ! 151: be decompressed now. ! 152: Important: if the cluster on disk is shorter than the filesystem's ! 153: sectors per cluster value, the missing rest at the end has to be treated ! 154: as if it was zerod out. ! 155: ! 156: To lookup information in the MDFAT, take the cluster number, add the ! 157: dcluster offset (which may be negative!) and take the appropriate entry ! 158: counted from the beginning of the MDFAT. Don't ignore the sector alignment ! 159: for drivespace 3. ! 160: ! 161: Bootblock layout ! 162: ---------------- ! 163: ! 164: Emulates normal dos filesystem super block. Most dos fields are identical ! 165: to the Superblock except for the FAT16 or FAT12 string. The FAT bitsize string ! 166: that can be found in the Bootblock is correct while the one in the ! 167: Superblock may be garbage. Take a disk viewer and compare Bootblock and ! 168: Superblock yourself. There are slight differences, but I don't know exactly ! 169: where and why. You'd better never change anything in these blocks... ! 170: ! 171: FAT layout ! 172: ---------- ! 173: ! 174: No need to explain. It's the same like in a normal dos filesystem. It may be ! 175: 12 or 16 bit according to the Bootblock, but *not* to the Superblock. This ! 176: seems to be a bug in doublespace - the Superblock's FAT bit size information ! 177: is sometimes wrong, so use the Bootblock's information. ! 178: ! 179: Root directory ! 180: -------------- ! 181: ! 182: The same as in a normal dos filesystem. (The root directory is never ! 183: compressed.) ! 184: ! 185: Data area ! 186: --------- ! 187: ! 188: Well, that's the actual space for the data. ! 189: ! 190: Final sector ! 191: ------------ ! 192: ! 193: Contains the signature "MDR". Must not be used by data. To find it you must ! 194: know the size of the CVF file. There's no pointer in the Superblock that ! 195: points to this sector. ! 196: ! 197: Compressed clusters ! 198: ------------------- ! 199: ! 200: Compressed data (when the c bit is 0 in the MDFAT entry of a cluster) are ! 201: identified by a compression header. The header consists of 4 bytes which are ! 202: at the beginning of the compressed cluster data. The headers consist of two ! 203: bytes specifying the compression scheme and two bytes version number, and ! 204: usually look like this: ! 205: ! 206: 'D', 'S', 0x00, 0x02, I write it as 'DS-0-2' ! 207: 'J', 'M', 0x00, 0x00 ! 208: 'S', 'Q', 0x00, 0x00 ! 209: ! 210: The version number seems to be ignored though M$ claim that, for example, ! 211: 'High' (JM-0-1) compresses better than 'Normal' (JM-0-0). That's nonsense ! 212: from the compressed format point of view, the format is in fact the same. ! 213: Maybe the original M$ software uses different *compression algorithms* ! 214: which may be more or less efficient, but they're not using not different ! 215: *compression schemes*. So in fact there are three schemes: DS, JM, and SQ. ! 216: DS and JM are quite similar, for a decompression algorithm see the dmsdos ! 217: or thsfs sources (both are GPL code, you may reuse it). ! 218: ! 219: As far as I know, dos 6.x versions of doublespace/drivespace never compress ! 220: directories and never cut them off (if only the first sectors of the cluster ! 221: are used, it is in fact possible to cut the cluster since the unused slack ! 222: is, per definition, to be treated as if it was zerod out). It is unknown ! 223: whether these versions can read compressed or shortened directories, but it ! 224: is sure they never compress or shorten them. So I just recommend not to do it ! 225: either. drivespace 3 usually cuts off directories and sometimes even ! 226: compresses them though compression of directories is a great performance loss. ! 227: win95 doublespace/drivespace (not drivespace 3) never cuts directories but ! 228: also compresses them sometimes. ! 229: ! 230: Fragmented clustes ! 231: ------------------ ! 232: ! 233: To make things more complex, M$ have invented these strange things. ! 234: Unfortunately, they need some special treatment. ! 235: ! 236: A fragmented cluster can be recognized by watching the 'f' bit in the MDFAT. ! 237: This bit only exists in drivespace 3 format. ! 238: ! 239: The first sector of the cluster contains a fragmentation list. This list ! 240: contains entries each of which use 4 bytes. The first one is the ! 241: fragmentation count - it specifies into how many fragments the cluster is ! 242: devided. It must be > 1 and <=64. ! 243: ! 244: The following entries are pointers to fragments of data like this: ! 245: ! 246: 3322222222221111111111 ! 247: 10987654321098765432109876543210 ! 248: lllllluussssssssssssssssssssssss ! 249: ! 250: s: start sector minus 1 - the fragment begins at sector (s+1). ! 251: u: unused and zero (?) ! 252: l: sector count minus 1 - the fragment contains (l+1) sectors beginning ! 253: with sector (s+1). This means raw data if compressed. ! 254: ! 255: The first entry always points to the fragmentation list itself. I.e. ! 256: the s and l fields of the first fragmentation list entry are always the same ! 257: as the ones in the MDFAT entry. The first fragment is not restricted to ! 258: contain *only* the fragmentation list, however. ! 259: ! 260: Now it becomes slightly difficult because the data are stored differently ! 261: depending on whether the cluster is compressed or not. If the cluster is ! 262: compressed the raw (compressed) data begin immediately after the last entry ! 263: of the fragmentation list. The byte position can be calculated by multiplying ! 264: the fragmentation count with 4. Further raw data can be found in the other ! 265: fragments in order. ! 266: ! 267: If the cluster is not compressed, the (uncompressed) data begin in the ! 268: sector that follows the sector containing the fragmentation list. If the ! 269: first fragment has only the length of 1 sector the data begin in the second ! 270: fragment. Further data are in the fragments in order. ! 271: ! 272: General rules for cluster access ! 273: -------------------------------- ! 274: ! 275: I'm assuming you want to access cluster number x (x!=0 i.e. not root directory ! 276: - this one should be clear without further explanation). ! 277: ! 278: How to read cluster x from the compressed filesystem ! 279: ---------------------------------------------------- ! 280: ! 281: * Get and decode the MDFAT entry for the cluster: lookup entry number ! 282: (x+dcluster). dcluster and start of the MDFAT can be obtained from the ! 283: Superblock. ! 284: ! 285: * If the MDFAT entry is unused (u bit clear), just return a cluster full of ! 286: zeros (0x00). ! 287: ! 288: * Read (l+1) sectors beginning with sector (s+1). ! 289: ! 290: * If the cluster is fragmented ... uuhhhhh ... you'd better issue an ! 291: error and encourage the user to boot win95 and defragment the drive. ! 292: Otherwise read and interpret the fragmentation list now. ! 293: ! 294: * If the data are compressed (c bit clear) decompress them. ! 295: ! 296: * If the cluster is shortened (i.e. h+1 < sectors per cluster) zero out ! 297: the rest of the cluster in memory. The sector per cluster value can be ! 298: obtained from the Superblock. ! 299: ! 300: How to write cluster x to the compressed filesystem ! 301: --------------------------------------------------- ! 302: ! 303: WARNING: Be sure you can trust your BITFAT, i.e. have it checked before. ! 304: See below how to do this. ! 305: ! 306: * Be sure to know whether the cluster may be shortened. The size in ! 307: sectors minus 1 will become the h value of the MDFAT entry later. ! 308: ! 309: * If you want, compress the data. Be sure the data really become smaller. ! 310: Determine the size of the compressed data in sectors and subtract 1 - ! 311: this will become the l value of the MDFAT entry later. If you don't ! 312: want to compress the data or the data turn out to be incompressible, ! 313: set the l to the same value as h and use the uncompressed original data. ! 314: DON'T ACTUALLY WRITE TO THE MDFAT AT THIS POINT! ! 315: ! 316: * Delete the old cluster x that may have been written earlier (see below). ! 317: ! 318: * Search for (l+1) free continuous sectors in the BITFAT. Be prepared for ! 319: failure here (i.e. if the disk is full or too fragmented). Allocate the ! 320: sectors by setting the appropriate bits in the BITFAT. Now you can create ! 321: the MDFAT entry and write it to disk - please note to subtract 1 from the ! 322: sector number when creating the s value of the MDFAT entry. Also don't ! 323: forget to set the c bit if the data are not compressed. ! 324: ! 325: * Write the (l+1) sectors to disk beginning with sector (s+1). ! 326: ! 327: How to delete cluster x in a compressed filesystem ! 328: -------------------------------------------------- ! 329: ! 330: WARNING: Be sure you can trust your BITFAT, i.e. have it checked before. ! 331: See below how to do this. ! 332: ! 333: * Get the appropriate MDFAT entry (x+dcluster). If it is unused (u bit ! 334: clear) there's nothing to do. ! 335: ! 336: * If the cluster is fragmented, scan and check the fragmentation list ! 337: and free up all the fragments. ! 338: ! 339: * Otherwise free up (l+1) sectors beginning with sector (s+1) in the BITFAT ! 340: by clearing the appropriate bits. Be sure to do a range checking before so ! 341: you don't corrupt the filesystem if there's garbage in the s field of ! 342: the MDFAT entry. ! 343: ! 344: * Zero out the MDFAT entry completely. Don't just clear the used bit. ! 345: ! 346: How to check and repair the BITFAT ! 347: ---------------------------------- ! 348: ! 349: Dos seems to recalculate the BITFAT on each bootup. This points out that ! 350: even M$ programmers didn't trust it, so you shouldn't do either if you plan ! 351: to write to the compressed partition. ! 352: ! 353: It's easy. Just scan the complete MDFAT for used entries (u bit set). You ! 354: get from the l and the s values (don't forget to add 1 in each case) which ! 355: sectors are allocated. Doing this for the whole MDFAT, you get a list of ! 356: which sectors are used and which are free. Then you can compare this list to ! 357: the BITFAT. If you just keep the list in memory in the same bit encoding as ! 358: used in the real BITFAT, you can just write the complete list to disk and ! 359: replace the BITFAT by it. Uhh, yes, you may need up to 512 KB memory for ! 360: the data for this purpose... ! 361: ! 362: If you are using drivespace 3 please keep in mind that you also have to ! 363: take care of fragmented clusters (i.e. check the fragmentation bit and scan ! 364: the fragmentation list if necessary). ! 365: ! 366: Further related documents about compressed filesystems ! 367: ------------------------------------------------------ ! 368: ! 369: - thsfs source (sunsite and mirrors) ! 370: - dmsdosfs source (sunsite and mirrors) ! 371: - Bill Gates' secret drawers ! 372: - Murphy's law
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.