|
|
1.1 root 1: This file contains some information about the compressed filesystem layout.
2:
3: The CVF Hacker's Guide :-)
4: ==============================
5:
6: WARNING: This is not official M$ specs. In fact, it's a hacker's document.
7: I don't know M$ specs, so this file may contain incorrect
8: information. Use at your own risk (see the GPL for details).
9:
10: WARNING 2: Several parts of the compressed filesystem internals are still
11: unknown to me. If this document is inaccurate in some details, it's
12: because I don't know it more exactly. Feel free to add your
13: knowledge.
14:
15:
16: CVF format overview
17: -------------------
18:
19: version compression SPC(*) max. size
20: dos 6.0/6.2 doublespace DS-0-2 16 512MB
21: dos 6.22 drivespace JM-0-0 16 512MB
22: win95 doublespace/drivespace DS-0-0 16 512MB
23: win95 drivespace 3 JM-0-0,JM-0-1,SQ-0-0 64 2GB
24:
25: (*)=Sectors Per Cluster
26:
27: General filesystem layout
28: -------------------------
29:
30: Superblock (1 sector)
31: BITFAT (several sectors)
32: MDFAT (~ twice as large as FAT)
33: Bootblock (1 sector)
34: FAT (only one) (several sectors)
35: Root directory (some sectors)
36: Data area (many sectors)
37: Final sector (1 sector)
38:
39: There's some slack (or "reserved space") between some filesystem structures,
40: but I don't know what it is good for. Perhaps M$ don't know either.
41:
42: Sector counting
43: ---------------
44:
45: The Superblock is referred as sector 0. The rest of the sectors are counted
46: appropriately.
47:
48: Superblock layout
49: -----------------
50:
51: Byte positions are counted beginning with 0 for the first byte. Integers are
52: in low byte first order. Only important fields are listed here, usual dos
53: fields are omitted.
54:
55: Pos. 3-10: string: signature "MSDBL6.0" or "MSDSP6.0"
56: Pos. 45,46: *signed* integer: dcluster offset for MDFAT lookups
57: Pos. 36,37: first sector of MDFAT minus 1
58: Pos. 17,18: number of entries in root directory
59: Pos. 13: sectors per cluster
60: Pos. 39,40: sector number of Bootblock
61: Pos. 14,15: sector offset of FAT start (relative to Bootblock). I.e. to
62: obtain the sector number of the first FAT sector add Pos. 14,15
63: to Pos. 39,40.
64: Pos. 41,42: sector offset of root directory start (relative to Bootblock). To
65: obtain the sector number of the first root directory sector add
66: Pos. 41,42 to Pos. 39,40.
67: Pos. 43,44: sector offset of Data area minus 2 (relative to Bootblock). To
68: obtain the sector number of the first Data area sector add
69: Pos. 43,44 to Pos. 39,40 and finally add 2.
70: Pos. 51: version flag (0=dos 6.0/6.2 or win95 doublespace, 1=??,
71: 2=dos 6.22 drivespace, 3 or 0 ??=win95 drivespace 3)
72: Hint: drivespace 3 format can be recognized safely by watching
73: the sectors per cluster value. The version flag seems to lie
74: for drivespace 3.
75: Pos. 57-60: usually string "12 " or "16 " as the rest of "FAT12 " and
76: "FAT16 " (the spaces are important), but here seems to be a bug
77: in some doublespace versions. PLEASE IGNORE THIS VALUE, IT
78: SOMETIMES LIES. Use the Bootblock's value instead.
79: Pos. 62-63: Maximum size of the CVF in Megabytes.
80: Pos. 32-35: Faked total number of sectors (it is something like the real
81: number of sectors in the data area multiplied with the
82: compression ratio). This value is important because it determines
83: the maximum cluster number that is currently allowed for the
84: CVF according to this formula (don't ask me why):
85:
86: (Pos.33-35)-(Pos.22,23)-(Pos.14,15)-(Pos.17,18)/16
87: max_cluster=--------------------------------------------------- + 1
88: (Pos.13)
89:
90: (rounded down). Be sure not to exceed the limits due to FAT/MDFAT
91: size or CVF size here. Since this formula has been found by
92: trial and error, it may not be true in all screwy cases.
93:
94: BITFAT layout
95: -------------
96:
97: The BITFAT is a sector allocation map. Consider it as a list of bits each of
98: which represents one sector in the Data area. If a bit is set, the
99: appropriate sector contains data - if the bit is clear, the sector is free.
100:
101: The first bit matches the first sector in the data area (and so on). The
102: bits are counted *wordwise* beginning with the most significant bit of the
103: word (where "word" means two bytes at once, low byte first).
104:
105: So substract the number of the first data sector from the number of the data
106: sector you want to lookup information in the bitfat. Keep the result in
107: memory. Divide the resulting number by 16, round down, multiply with 2. Get
108: the two bytes at this position in the bitfat (counted from its beginning)
109: and store them as word. Now watch the least 4 bits of the previosly
110: memorized result - they represent the bit number (counted from the most
111: significant bit) in the word. This bit corresponds to the data sector.
112:
113: WARNING: The BITFAT sometimes is incorrect due to a missing system shutdown
114: under dos. If you want to write to the filesystem, be sure to
115: check (and, if necessary, repair) the BITFAT before. See below
116: how to do this.
117:
118: MDFAT layout
119: ------------
120:
121: MDFAT is organised as a stream of long integers (4 bytes, for drivespace 3:
122: 5 bytes). The data are sector-aligned - this means for drivespace 3 that the
123: last two bytes of a sector are slack. Consider the bytes in usual order
124: (low byte first).
125:
126: The MDFAT contains additional information about a cluster:
127:
128: 3322222222221111111111 (doublespace/drivespace)
129: 10987654321098765432109876543210
130: uchhhhllll?sssssssssssssssssssss
131:
132: 333333333322222222221111111111 (drivespace 3)
133: 9876543210987654321098765432109876543210
134: uchhhhhhllllllf?ssssssssssssssssssssssss
135:
136: u=1: The cluster is used, u=0: the cluster is unused. In the latter case the
137: whole entry should be zerod. An unused cluster contains per definition
138: only zeros ( C notation: '\0'). This is important if a program insists
139: on reading unused clusters!
140: c=1: The cluster is not compressed, c=0: the cluster is compressed.
141: h: Size of decompressed cluster minus 1 (measured in units of 512 bytes).
142: E.g. 3 means (3+1)*512 bytes.
143: l: Size of compressed cluster data minus 1 (measured in units of 512
144: bytes). If the cluster is not compressed according to the c bit, this
145: value is identical to h.
146: f: fragmented bit for drivespace 3. If it is set the cluster is fragmented
147: and needs some special treatment on read and write access.
148: ?: Unknown. Seems to contain random garbage.
149: s: starting sector minus 1. I.e. if you want to read the cluster, read (l+1)
150: sectors beginning with sector (s+1). If the c bit is zero, the data must
151: be decompressed now.
152: Important: if the cluster on disk is shorter than the filesystem's
153: sectors per cluster value, the missing rest at the end has to be treated
154: as if it was zerod out.
155:
156: To lookup information in the MDFAT, take the cluster number, add the
157: dcluster offset (which may be negative!) and take the appropriate entry
158: counted from the beginning of the MDFAT. Don't ignore the sector alignment
159: for drivespace 3.
160:
161: Bootblock layout
162: ----------------
163:
164: Emulates normal dos filesystem super block. Most dos fields are identical
165: to the Superblock except for the FAT16 or FAT12 string. The FAT bitsize string
166: that can be found in the Bootblock is correct while the one in the
167: Superblock may be garbage. Take a disk viewer and compare Bootblock and
168: Superblock yourself. There are slight differences, but I don't know exactly
169: where and why. You'd better never change anything in these blocks...
170:
171: FAT layout
172: ----------
173:
174: No need to explain. It's the same like in a normal dos filesystem. It may be
175: 12 or 16 bit according to the Bootblock, but *not* to the Superblock. This
176: seems to be a bug in doublespace - the Superblock's FAT bit size information
177: is sometimes wrong, so use the Bootblock's information.
178:
179: Root directory
180: --------------
181:
182: The same as in a normal dos filesystem. (The root directory is never
183: compressed.)
184:
185: Data area
186: ---------
187:
188: Well, that's the actual space for the data.
189:
190: Final sector
191: ------------
192:
193: Contains the signature "MDR". Must not be used by data. To find it you must
194: know the size of the CVF file. There's no pointer in the Superblock that
195: points to this sector.
196:
197: Compressed clusters
198: -------------------
199:
200: Compressed data (when the c bit is 0 in the MDFAT entry of a cluster) are
201: identified by a compression header. The header consists of 4 bytes which are
202: at the beginning of the compressed cluster data. The headers consist of two
203: bytes specifying the compression scheme and two bytes version number, and
204: usually look like this:
205:
206: 'D', 'S', 0x00, 0x02, I write it as 'DS-0-2'
207: 'J', 'M', 0x00, 0x00
208: 'S', 'Q', 0x00, 0x00
209:
210: The version number seems to be ignored though M$ claim that, for example,
211: 'High' (JM-0-1) compresses better than 'Normal' (JM-0-0). That's nonsense
212: from the compressed format point of view, the format is in fact the same.
213: Maybe the original M$ software uses different *compression algorithms*
214: which may be more or less efficient, but they're not using not different
215: *compression schemes*. So in fact there are three schemes: DS, JM, and SQ.
216: DS and JM are quite similar, for a decompression algorithm see the dmsdos
217: or thsfs sources (both are GPL code, you may reuse it).
218:
219: As far as I know, dos 6.x versions of doublespace/drivespace never compress
220: directories and never cut them off (if only the first sectors of the cluster
221: are used, it is in fact possible to cut the cluster since the unused slack
222: is, per definition, to be treated as if it was zerod out). It is unknown
223: whether these versions can read compressed or shortened directories, but it
224: is sure they never compress or shorten them. So I just recommend not to do it
225: either. drivespace 3 usually cuts off directories and sometimes even
226: compresses them though compression of directories is a great performance loss.
227: win95 doublespace/drivespace (not drivespace 3) never cuts directories but
228: also compresses them sometimes.
229:
230: Fragmented clustes
231: ------------------
232:
233: To make things more complex, M$ have invented these strange things.
234: Unfortunately, they need some special treatment.
235:
236: A fragmented cluster can be recognized by watching the 'f' bit in the MDFAT.
237: This bit only exists in drivespace 3 format.
238:
239: The first sector of the cluster contains a fragmentation list. This list
240: contains entries each of which use 4 bytes. The first one is the
241: fragmentation count - it specifies into how many fragments the cluster is
242: devided. It must be > 1 and <=64.
243:
244: The following entries are pointers to fragments of data like this:
245:
246: 3322222222221111111111
247: 10987654321098765432109876543210
248: lllllluussssssssssssssssssssssss
249:
250: s: start sector minus 1 - the fragment begins at sector (s+1).
251: u: unused and zero (?)
252: l: sector count minus 1 - the fragment contains (l+1) sectors beginning
253: with sector (s+1). This means raw data if compressed.
254:
255: The first entry always points to the fragmentation list itself. I.e.
256: the s and l fields of the first fragmentation list entry are always the same
257: as the ones in the MDFAT entry. The first fragment is not restricted to
258: contain *only* the fragmentation list, however.
259:
260: Now it becomes slightly difficult because the data are stored differently
261: depending on whether the cluster is compressed or not. If the cluster is
262: compressed the raw (compressed) data begin immediately after the last entry
263: of the fragmentation list. The byte position can be calculated by multiplying
264: the fragmentation count with 4. Further raw data can be found in the other
265: fragments in order.
266:
267: If the cluster is not compressed, the (uncompressed) data begin in the
268: sector that follows the sector containing the fragmentation list. If the
269: first fragment has only the length of 1 sector the data begin in the second
270: fragment. Further data are in the fragments in order.
271:
272: General rules for cluster access
273: --------------------------------
274:
275: I'm assuming you want to access cluster number x (x!=0 i.e. not root directory
276: - this one should be clear without further explanation).
277:
278: How to read cluster x from the compressed filesystem
279: ----------------------------------------------------
280:
281: * Get and decode the MDFAT entry for the cluster: lookup entry number
282: (x+dcluster). dcluster and start of the MDFAT can be obtained from the
283: Superblock.
284:
285: * If the MDFAT entry is unused (u bit clear), just return a cluster full of
286: zeros (0x00).
287:
288: * Read (l+1) sectors beginning with sector (s+1).
289:
290: * If the cluster is fragmented ... uuhhhhh ... you'd better issue an
291: error and encourage the user to boot win95 and defragment the drive.
292: Otherwise read and interpret the fragmentation list now.
293:
294: * If the data are compressed (c bit clear) decompress them.
295:
296: * If the cluster is shortened (i.e. h+1 < sectors per cluster) zero out
297: the rest of the cluster in memory. The sector per cluster value can be
298: obtained from the Superblock.
299:
300: How to write cluster x to the compressed filesystem
301: ---------------------------------------------------
302:
303: WARNING: Be sure you can trust your BITFAT, i.e. have it checked before.
304: See below how to do this.
305:
306: * Be sure to know whether the cluster may be shortened. The size in
307: sectors minus 1 will become the h value of the MDFAT entry later.
308:
309: * If you want, compress the data. Be sure the data really become smaller.
310: Determine the size of the compressed data in sectors and subtract 1 -
311: this will become the l value of the MDFAT entry later. If you don't
312: want to compress the data or the data turn out to be incompressible,
313: set the l to the same value as h and use the uncompressed original data.
314: DON'T ACTUALLY WRITE TO THE MDFAT AT THIS POINT!
315:
316: * Delete the old cluster x that may have been written earlier (see below).
317:
318: * Search for (l+1) free continuous sectors in the BITFAT. Be prepared for
319: failure here (i.e. if the disk is full or too fragmented). Allocate the
320: sectors by setting the appropriate bits in the BITFAT. Now you can create
321: the MDFAT entry and write it to disk - please note to subtract 1 from the
322: sector number when creating the s value of the MDFAT entry. Also don't
323: forget to set the c bit if the data are not compressed.
324:
325: * Write the (l+1) sectors to disk beginning with sector (s+1).
326:
327: How to delete cluster x in a compressed filesystem
328: --------------------------------------------------
329:
330: WARNING: Be sure you can trust your BITFAT, i.e. have it checked before.
331: See below how to do this.
332:
333: * Get the appropriate MDFAT entry (x+dcluster). If it is unused (u bit
334: clear) there's nothing to do.
335:
336: * If the cluster is fragmented, scan and check the fragmentation list
337: and free up all the fragments.
338:
339: * Otherwise free up (l+1) sectors beginning with sector (s+1) in the BITFAT
340: by clearing the appropriate bits. Be sure to do a range checking before so
341: you don't corrupt the filesystem if there's garbage in the s field of
342: the MDFAT entry.
343:
344: * Zero out the MDFAT entry completely. Don't just clear the used bit.
345:
346: How to check and repair the BITFAT
347: ----------------------------------
348:
349: Dos seems to recalculate the BITFAT on each bootup. This points out that
350: even M$ programmers didn't trust it, so you shouldn't do either if you plan
351: to write to the compressed partition.
352:
353: It's easy. Just scan the complete MDFAT for used entries (u bit set). You
354: get from the l and the s values (don't forget to add 1 in each case) which
355: sectors are allocated. Doing this for the whole MDFAT, you get a list of
356: which sectors are used and which are free. Then you can compare this list to
357: the BITFAT. If you just keep the list in memory in the same bit encoding as
358: used in the real BITFAT, you can just write the complete list to disk and
359: replace the BITFAT by it. Uhh, yes, you may need up to 512 KB memory for
360: the data for this purpose...
361:
362: If you are using drivespace 3 please keep in mind that you also have to
363: take care of fragmented clusters (i.e. check the fragmentation bit and scan
364: the fragmentation list if necessary).
365:
366: Further related documents about compressed filesystems
367: ------------------------------------------------------
368:
369: - thsfs source (sunsite and mirrors)
370: - dmsdosfs source (sunsite and mirrors)
371: - Bill Gates' secret drawers
372: - Murphy's law
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.