|
|
1.1 root 1: == General ==
2:
3: A qcow2 image file is organized in units of constant size, which are called
4: (host) clusters. A cluster is the unit in which all allocations are done,
5: both for actual guest data and for image metadata.
6:
7: Likewise, the virtual disk as seen by the guest is divided into (guest)
8: clusters of the same size.
9:
10: All numbers in qcow2 are stored in Big Endian byte order.
11:
12:
13: == Header ==
14:
15: The first cluster of a qcow2 image contains the file header:
16:
17: Byte 0 - 3: magic
18: QCOW magic string ("QFI\xfb")
19:
20: 4 - 7: version
21: Version number (only valid value is 2)
22:
23: 8 - 15: backing_file_offset
24: Offset into the image file at which the backing file name
25: is stored (NB: The string is not null terminated). 0 if the
26: image doesn't have a backing file.
27:
28: 16 - 19: backing_file_size
29: Length of the backing file name in bytes. Must not be
30: longer than 1023 bytes. Undefined if the image doesn't have
31: a backing file.
32:
33: 20 - 23: cluster_bits
34: Number of bits that are used for addressing an offset
35: within a cluster (1 << cluster_bits is the cluster size).
36: Must not be less than 9 (i.e. 512 byte clusters).
37:
38: Note: qemu as of today has an implementation limit of 2 MB
39: as the maximum cluster size and won't be able to open images
40: with larger cluster sizes.
41:
42: 24 - 31: size
43: Virtual disk size in bytes
44:
45: 32 - 35: crypt_method
46: 0 for no encryption
47: 1 for AES encryption
48:
49: 36 - 39: l1_size
50: Number of entries in the active L1 table
51:
52: 40 - 47: l1_table_offset
53: Offset into the image file at which the active L1 table
54: starts. Must be aligned to a cluster boundary.
55:
56: 48 - 55: refcount_table_offset
57: Offset into the image file at which the refcount table
58: starts. Must be aligned to a cluster boundary.
59:
60: 56 - 59: refcount_table_clusters
61: Number of clusters that the refcount table occupies
62:
63: 60 - 63: nb_snapshots
64: Number of snapshots contained in the image
65:
66: 64 - 71: snapshots_offset
67: Offset into the image file at which the snapshot table
68: starts. Must be aligned to a cluster boundary.
69:
70: Directly after the image header, optional sections called header extensions can
71: be stored. Each extension has a structure like the following:
72:
73: Byte 0 - 3: Header extension type:
74: 0x00000000 - End of the header extension area
75: 0xE2792ACA - Backing file format name
76: other - Unknown header extension, can be safely
77: ignored
78:
79: 4 - 7: Length of the header extension data
80:
81: 8 - n: Header extension data
82:
83: n - m: Padding to round up the header extension size to the next
84: multiple of 8.
85:
86: The remaining space between the end of the header extension area and the end of
87: the first cluster can be used for other data. Usually, the backing file name is
88: stored there.
89:
90:
91: == Host cluster management ==
92:
93: qcow2 manages the allocation of host clusters by maintaining a reference count
94: for each host cluster. A refcount of 0 means that the cluster is free, 1 means
95: that it is used, and >= 2 means that it is used and any write access must
96: perform a COW (copy on write) operation.
97:
98: The refcounts are managed in a two-level table. The first level is called
99: refcount table and has a variable size (which is stored in the header). The
100: refcount table can cover multiple clusters, however it needs to be contiguous
101: in the image file.
102:
103: It contains pointers to the second level structures which are called refcount
104: blocks and are exactly one cluster in size.
105:
106: Given a offset into the image file, the refcount of its cluster can be obtained
107: as follows:
108:
109: refcount_block_entries = (cluster_size / sizeof(uint16_t))
110:
1.1.1.2 ! root 111: refcount_block_index = (offset / cluster_size) % refcount_block_entries
! 112: refcount_table_index = (offset / cluster_size) / refcount_block_entries
1.1 root 113:
114: refcount_block = load_cluster(refcount_table[refcount_table_index]);
115: return refcount_block[refcount_block_index];
116:
117: Refcount table entry:
118:
119: Bit 0 - 8: Reserved (set to 0)
120:
121: 9 - 63: Bits 9-63 of the offset into the image file at which the
122: refcount block starts. Must be aligned to a cluster
123: boundary.
124:
125: If this is 0, the corresponding refcount block has not yet
126: been allocated. All refcounts managed by this refcount block
127: are 0.
128:
129: Refcount block entry:
130:
131: Bit 0 - 15: Reference count of the cluster
132:
133:
134: == Cluster mapping ==
135:
136: Just as for refcounts, qcow2 uses a two-level structure for the mapping of
137: guest clusters to host clusters. They are called L1 and L2 table.
138:
139: The L1 table has a variable size (stored in the header) and may use multiple
140: clusters, however it must be contiguous in the image file. L2 tables are
141: exactly one cluster in size.
142:
143: Given a offset into the virtual disk, the offset into the image file can be
144: obtained as follows:
145:
146: l2_entries = (cluster_size / sizeof(uint64_t))
147:
148: l2_index = (offset / cluster_size) % l2_entries
149: l1_index = (offset / cluster_size) / l2_entries
150:
151: l2_table = load_cluster(l1_table[l1_index]);
152: cluster_offset = l2_table[l2_index];
153:
154: return cluster_offset + (offset % cluster_size)
155:
156: L1 table entry:
157:
158: Bit 0 - 8: Reserved (set to 0)
159:
160: 9 - 55: Bits 9-55 of the offset into the image file at which the L2
161: table starts. Must be aligned to a cluster boundary. If the
162: offset is 0, the L2 table and all clusters described by this
163: L2 table are unallocated.
164:
165: 56 - 62: Reserved (set to 0)
166:
167: 63: 0 for an L2 table that is unused or requires COW, 1 if its
168: refcount is exactly one. This information is only accurate
169: in the active L1 table.
170:
171: L2 table entry (for normal clusters):
172:
173: Bit 0 - 8: Reserved (set to 0)
174:
175: 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a
176: cluster boundary. If the offset is 0, the cluster is
177: unallocated.
178:
179: 56 - 61: Reserved (set to 0)
180:
181: 62: 0 (this cluster is not compressed)
182:
183: 63: 0 for a cluster that is unused or requires COW, 1 if its
184: refcount is exactly one. This information is only accurate
185: in L2 tables that are reachable from the the active L1
186: table.
187:
188: L2 table entry (for compressed clusters; x = 62 - (cluster_size - 8)):
189:
190: Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a
191: cluster boundary!
192:
193: x+1 - 61: Compressed size of the images in sectors of 512 bytes
194:
195: 62: 1 (this cluster is compressed using zlib)
196:
197: 63: 0 for a cluster that is unused or requires COW, 1 if its
198: refcount is exactly one. This information is only accurate
199: in L2 tables that are reachable from the the active L1
200: table.
201:
202: If a cluster is unallocated, read requests shall read the data from the backing
203: file. If there is no backing file or the backing file is smaller than the image,
204: they shall read zeros for all parts that are not covered by the backing file.
205:
206:
207: == Snapshots ==
208:
209: qcow2 supports internal snapshots. Their basic principle of operation is to
210: switch the active L1 table, so that a different set of host clusters are
211: exposed to the guest.
212:
213: When creating a snapshot, the L1 table should be copied and the refcount of all
1.1.1.2 ! root 214: L2 tables and clusters reachable from this L1 table must be increased, so that
1.1 root 215: a write causes a COW and isn't visible in other snapshots.
216:
217: When loading a snapshot, bit 63 of all entries in the new active L1 table and
218: all L2 tables referenced by it must be reconstructed from the refcount table
219: as it doesn't need to be accurate in inactive L1 tables.
220:
221: A directory of all snapshots is stored in the snapshot table, a contiguous area
222: in the image file, whose starting offset and length are given by the header
223: fields snapshots_offset and nb_snapshots. The entries of the snapshot table
224: have variable length, depending on the length of ID, name and extra data.
225:
226: Snapshot table entry:
227:
228: Byte 0 - 7: Offset into the image file at which the L1 table for the
229: snapshot starts. Must be aligned to a cluster boundary.
230:
231: 8 - 11: Number of entries in the L1 table of the snapshots
232:
233: 12 - 13: Length of the unique ID string describing the snapshot
234:
235: 14 - 15: Length of the name of the snapshot
236:
237: 16 - 19: Time at which the snapshot was taken in seconds since the
238: Epoch
239:
240: 20 - 23: Subsecond part of the time at which the snapshot was taken
241: in nanoseconds
242:
243: 24 - 31: Time that the guest was running until the snapshot was
244: taken in nanoseconds
245:
246: 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved.
247: If there is VM state, it starts at the first cluster
248: described by first L1 table entry that doesn't describe a
249: regular guest cluster (i.e. VM state is stored like guest
250: disk content, except that it is stored at offsets that are
251: larger than the virtual disk presented to the guest)
252:
253: 36 - 39: Size of extra data in the table entry (used for future
254: extensions of the format)
255:
256: variable: Extra data for future extensions. Must be ignored.
257:
258: variable: Unique ID string for the snapshot (not null terminated)
259:
260: variable: Name of the snapshot (not null terminated)
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.