|
|
1.1 root 1: == General ==
2:
3: A qcow2 image file is organized in units of constant size, which are called
4: (host) clusters. A cluster is the unit in which all allocations are done,
5: both for actual guest data and for image metadata.
6:
7: Likewise, the virtual disk as seen by the guest is divided into (guest)
8: clusters of the same size.
9:
10: All numbers in qcow2 are stored in Big Endian byte order.
11:
12:
13: == Header ==
14:
15: The first cluster of a qcow2 image contains the file header:
16:
17: Byte 0 - 3: magic
18: QCOW magic string ("QFI\xfb")
19:
20: 4 - 7: version
1.1.1.3 ! root 21: Version number (valid values are 2 and 3)
1.1 root 22:
23: 8 - 15: backing_file_offset
24: Offset into the image file at which the backing file name
25: is stored (NB: The string is not null terminated). 0 if the
26: image doesn't have a backing file.
27:
28: 16 - 19: backing_file_size
29: Length of the backing file name in bytes. Must not be
30: longer than 1023 bytes. Undefined if the image doesn't have
31: a backing file.
32:
33: 20 - 23: cluster_bits
34: Number of bits that are used for addressing an offset
35: within a cluster (1 << cluster_bits is the cluster size).
36: Must not be less than 9 (i.e. 512 byte clusters).
37:
38: Note: qemu as of today has an implementation limit of 2 MB
39: as the maximum cluster size and won't be able to open images
40: with larger cluster sizes.
41:
42: 24 - 31: size
43: Virtual disk size in bytes
44:
45: 32 - 35: crypt_method
46: 0 for no encryption
47: 1 for AES encryption
48:
49: 36 - 39: l1_size
50: Number of entries in the active L1 table
51:
52: 40 - 47: l1_table_offset
53: Offset into the image file at which the active L1 table
54: starts. Must be aligned to a cluster boundary.
55:
56: 48 - 55: refcount_table_offset
57: Offset into the image file at which the refcount table
58: starts. Must be aligned to a cluster boundary.
59:
60: 56 - 59: refcount_table_clusters
61: Number of clusters that the refcount table occupies
62:
63: 60 - 63: nb_snapshots
64: Number of snapshots contained in the image
65:
66: 64 - 71: snapshots_offset
67: Offset into the image file at which the snapshot table
68: starts. Must be aligned to a cluster boundary.
69:
1.1.1.3 ! root 70: If the version is 3 or higher, the header has the following additional fields.
! 71: For version 2, the values are assumed to be zero, unless specified otherwise
! 72: in the description of a field.
! 73:
! 74: 72 - 79: incompatible_features
! 75: Bitmask of incompatible features. An implementation must
! 76: fail to open an image if an unknown bit is set.
! 77:
! 78: Bits 0-63: Reserved (set to 0)
! 79:
! 80: 80 - 87: compatible_features
! 81: Bitmask of compatible features. An implementation can
! 82: safely ignore any unknown bits that are set.
! 83:
! 84: Bits 0-63: Reserved (set to 0)
! 85:
! 86: 88 - 95: autoclear_features
! 87: Bitmask of auto-clear features. An implementation may only
! 88: write to an image with unknown auto-clear features if it
! 89: clears the respective bits from this field first.
! 90:
! 91: Bits 0-63: Reserved (set to 0)
! 92:
! 93: 96 - 99: refcount_order
! 94: Describes the width of a reference count block entry (width
! 95: in bits = 1 << refcount_order). For version 2 images, the
! 96: order is always assumed to be 4 (i.e. the width is 16 bits).
! 97:
! 98: 100 - 103: header_length
! 99: Length of the header structure in bytes. For version 2
! 100: images, the length is always assumed to be 72 bytes.
! 101:
1.1 root 102: Directly after the image header, optional sections called header extensions can
103: be stored. Each extension has a structure like the following:
104:
105: Byte 0 - 3: Header extension type:
106: 0x00000000 - End of the header extension area
107: 0xE2792ACA - Backing file format name
1.1.1.3 ! root 108: 0x6803f857 - Feature name table
1.1 root 109: other - Unknown header extension, can be safely
110: ignored
111:
112: 4 - 7: Length of the header extension data
113:
114: 8 - n: Header extension data
115:
116: n - m: Padding to round up the header extension size to the next
117: multiple of 8.
118:
1.1.1.3 ! root 119: Unless stated otherwise, each header extension type shall appear at most once
! 120: in the same image.
! 121:
1.1 root 122: The remaining space between the end of the header extension area and the end of
1.1.1.3 ! root 123: the first cluster can be used for the backing file name. It is not allowed to
! 124: store other data here, so that an implementation can safely modify the header
! 125: and add extensions without harming data of compatible features that it
! 126: doesn't support. Compatible features that need space for additional data can
! 127: use a header extension.
! 128:
! 129:
! 130: == Feature name table ==
! 131:
! 132: The feature name table is an optional header extension that contains the name
! 133: for features used by the image. It can be used by applications that don't know
! 134: the respective feature (e.g. because the feature was introduced only later) to
! 135: display a useful error message.
! 136:
! 137: The number of entries in the feature name table is determined by the length of
! 138: the header extension data. Each entry look like this:
! 139:
! 140: Byte 0: Type of feature (select feature bitmap)
! 141: 0: Incompatible feature
! 142: 1: Compatible feature
! 143: 2: Autoclear feature
! 144:
! 145: 1: Bit number within the selected feature bitmap (valid
! 146: values: 0-63)
! 147:
! 148: 2 - 47: Feature name (padded with zeros, but not necessarily null
! 149: terminated if it has full length)
1.1 root 150:
151:
152: == Host cluster management ==
153:
154: qcow2 manages the allocation of host clusters by maintaining a reference count
155: for each host cluster. A refcount of 0 means that the cluster is free, 1 means
156: that it is used, and >= 2 means that it is used and any write access must
157: perform a COW (copy on write) operation.
158:
159: The refcounts are managed in a two-level table. The first level is called
160: refcount table and has a variable size (which is stored in the header). The
161: refcount table can cover multiple clusters, however it needs to be contiguous
162: in the image file.
163:
164: It contains pointers to the second level structures which are called refcount
165: blocks and are exactly one cluster in size.
166:
167: Given a offset into the image file, the refcount of its cluster can be obtained
168: as follows:
169:
170: refcount_block_entries = (cluster_size / sizeof(uint16_t))
171:
1.1.1.2 root 172: refcount_block_index = (offset / cluster_size) % refcount_block_entries
173: refcount_table_index = (offset / cluster_size) / refcount_block_entries
1.1 root 174:
175: refcount_block = load_cluster(refcount_table[refcount_table_index]);
176: return refcount_block[refcount_block_index];
177:
178: Refcount table entry:
179:
180: Bit 0 - 8: Reserved (set to 0)
181:
182: 9 - 63: Bits 9-63 of the offset into the image file at which the
183: refcount block starts. Must be aligned to a cluster
184: boundary.
185:
186: If this is 0, the corresponding refcount block has not yet
187: been allocated. All refcounts managed by this refcount block
188: are 0.
189:
1.1.1.3 ! root 190: Refcount block entry (x = refcount_bits - 1):
1.1 root 191:
1.1.1.3 ! root 192: Bit 0 - x: Reference count of the cluster. If refcount_bits implies a
! 193: sub-byte width, note that bit 0 means the least significant
! 194: bit in this context.
1.1 root 195:
196:
197: == Cluster mapping ==
198:
199: Just as for refcounts, qcow2 uses a two-level structure for the mapping of
200: guest clusters to host clusters. They are called L1 and L2 table.
201:
202: The L1 table has a variable size (stored in the header) and may use multiple
203: clusters, however it must be contiguous in the image file. L2 tables are
204: exactly one cluster in size.
205:
206: Given a offset into the virtual disk, the offset into the image file can be
207: obtained as follows:
208:
209: l2_entries = (cluster_size / sizeof(uint64_t))
210:
211: l2_index = (offset / cluster_size) % l2_entries
212: l1_index = (offset / cluster_size) / l2_entries
213:
214: l2_table = load_cluster(l1_table[l1_index]);
215: cluster_offset = l2_table[l2_index];
216:
217: return cluster_offset + (offset % cluster_size)
218:
219: L1 table entry:
220:
221: Bit 0 - 8: Reserved (set to 0)
222:
223: 9 - 55: Bits 9-55 of the offset into the image file at which the L2
224: table starts. Must be aligned to a cluster boundary. If the
225: offset is 0, the L2 table and all clusters described by this
226: L2 table are unallocated.
227:
228: 56 - 62: Reserved (set to 0)
229:
230: 63: 0 for an L2 table that is unused or requires COW, 1 if its
231: refcount is exactly one. This information is only accurate
232: in the active L1 table.
233:
1.1.1.3 ! root 234: L2 table entry:
1.1 root 235:
1.1.1.3 ! root 236: Bit 0 - 61: Cluster descriptor
! 237:
! 238: 62: 0 for standard clusters
! 239: 1 for compressed clusters
! 240:
! 241: 63: 0 for a cluster that is unused or requires COW, 1 if its
! 242: refcount is exactly one. This information is only accurate
! 243: in L2 tables that are reachable from the the active L1
! 244: table.
! 245:
! 246: Standard Cluster Descriptor:
! 247:
! 248: Bit 0: If set to 1, the cluster reads as all zeros. The host
! 249: cluster offset can be used to describe a preallocation,
! 250: but it won't be used for reading data from this cluster,
! 251: nor is data read from the backing file if the cluster is
! 252: unallocated.
! 253:
! 254: With version 2, this is always 0.
! 255:
! 256: 1 - 8: Reserved (set to 0)
1.1 root 257:
258: 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a
259: cluster boundary. If the offset is 0, the cluster is
260: unallocated.
261:
262: 56 - 61: Reserved (set to 0)
263:
264:
1.1.1.3 ! root 265: Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
1.1 root 266:
267: Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a
268: cluster boundary!
269:
270: x+1 - 61: Compressed size of the images in sectors of 512 bytes
271:
272: If a cluster is unallocated, read requests shall read the data from the backing
1.1.1.3 ! root 273: file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
! 274: no backing file or the backing file is smaller than the image, they shall read
! 275: zeros for all parts that are not covered by the backing file.
1.1 root 276:
277:
278: == Snapshots ==
279:
280: qcow2 supports internal snapshots. Their basic principle of operation is to
281: switch the active L1 table, so that a different set of host clusters are
282: exposed to the guest.
283:
284: When creating a snapshot, the L1 table should be copied and the refcount of all
1.1.1.2 root 285: L2 tables and clusters reachable from this L1 table must be increased, so that
1.1 root 286: a write causes a COW and isn't visible in other snapshots.
287:
288: When loading a snapshot, bit 63 of all entries in the new active L1 table and
289: all L2 tables referenced by it must be reconstructed from the refcount table
290: as it doesn't need to be accurate in inactive L1 tables.
291:
292: A directory of all snapshots is stored in the snapshot table, a contiguous area
293: in the image file, whose starting offset and length are given by the header
294: fields snapshots_offset and nb_snapshots. The entries of the snapshot table
295: have variable length, depending on the length of ID, name and extra data.
296:
297: Snapshot table entry:
298:
299: Byte 0 - 7: Offset into the image file at which the L1 table for the
300: snapshot starts. Must be aligned to a cluster boundary.
301:
302: 8 - 11: Number of entries in the L1 table of the snapshots
303:
304: 12 - 13: Length of the unique ID string describing the snapshot
305:
306: 14 - 15: Length of the name of the snapshot
307:
308: 16 - 19: Time at which the snapshot was taken in seconds since the
309: Epoch
310:
311: 20 - 23: Subsecond part of the time at which the snapshot was taken
312: in nanoseconds
313:
314: 24 - 31: Time that the guest was running until the snapshot was
315: taken in nanoseconds
316:
317: 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved.
318: If there is VM state, it starts at the first cluster
319: described by first L1 table entry that doesn't describe a
320: regular guest cluster (i.e. VM state is stored like guest
321: disk content, except that it is stored at offsets that are
322: larger than the virtual disk presented to the guest)
323:
324: 36 - 39: Size of extra data in the table entry (used for future
325: extensions of the format)
326:
1.1.1.3 ! root 327: variable: Extra data for future extensions. Unknown fields must be
! 328: ignored. Currently defined are (offset relative to snapshot
! 329: table entry):
! 330:
! 331: Byte 40 - 47: Size of the VM state in bytes. 0 if no VM
! 332: state is saved. If this field is present,
! 333: the 32-bit value in bytes 32-35 is ignored.
! 334:
! 335: Byte 48 - 55: Virtual disk size of the snapshot in bytes
! 336:
! 337: Version 3 images must include extra data at least up to
! 338: byte 55.
1.1 root 339:
340: variable: Unique ID string for the snapshot (not null terminated)
341:
342: variable: Name of the snapshot (not null terminated)
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.