Annotation of qemu/docs/specs/qcow2.txt, revision 1.1.1.3

1.1       root        1: == General ==
                      2: 
                      3: A qcow2 image file is organized in units of constant size, which are called
                      4: (host) clusters. A cluster is the unit in which all allocations are done,
                      5: both for actual guest data and for image metadata.
                      6: 
                      7: Likewise, the virtual disk as seen by the guest is divided into (guest)
                      8: clusters of the same size.
                      9: 
                     10: All numbers in qcow2 are stored in Big Endian byte order.
                     11: 
                     12: 
                     13: == Header ==
                     14: 
                     15: The first cluster of a qcow2 image contains the file header:
                     16: 
                     17:     Byte  0 -  3:   magic
                     18:                     QCOW magic string ("QFI\xfb")
                     19: 
                     20:           4 -  7:   version
1.1.1.3 ! root       21:                     Version number (valid values are 2 and 3)
1.1       root       22: 
                     23:           8 - 15:   backing_file_offset
                     24:                     Offset into the image file at which the backing file name
                     25:                     is stored (NB: The string is not null terminated). 0 if the
                     26:                     image doesn't have a backing file.
                     27: 
                     28:          16 - 19:   backing_file_size
                     29:                     Length of the backing file name in bytes. Must not be
                     30:                     longer than 1023 bytes. Undefined if the image doesn't have
                     31:                     a backing file.
                     32: 
                     33:          20 - 23:   cluster_bits
                     34:                     Number of bits that are used for addressing an offset
                     35:                     within a cluster (1 << cluster_bits is the cluster size).
                     36:                     Must not be less than 9 (i.e. 512 byte clusters).
                     37: 
                     38:                     Note: qemu as of today has an implementation limit of 2 MB
                     39:                     as the maximum cluster size and won't be able to open images
                     40:                     with larger cluster sizes.
                     41: 
                     42:          24 - 31:   size
                     43:                     Virtual disk size in bytes
                     44: 
                     45:          32 - 35:   crypt_method
                     46:                     0 for no encryption
                     47:                     1 for AES encryption
                     48: 
                     49:          36 - 39:   l1_size
                     50:                     Number of entries in the active L1 table
                     51: 
                     52:          40 - 47:   l1_table_offset
                     53:                     Offset into the image file at which the active L1 table
                     54:                     starts. Must be aligned to a cluster boundary.
                     55: 
                     56:          48 - 55:   refcount_table_offset
                     57:                     Offset into the image file at which the refcount table
                     58:                     starts. Must be aligned to a cluster boundary.
                     59: 
                     60:          56 - 59:   refcount_table_clusters
                     61:                     Number of clusters that the refcount table occupies
                     62: 
                     63:          60 - 63:   nb_snapshots
                     64:                     Number of snapshots contained in the image
                     65: 
                     66:          64 - 71:   snapshots_offset
                     67:                     Offset into the image file at which the snapshot table
                     68:                     starts. Must be aligned to a cluster boundary.
                     69: 
1.1.1.3 ! root       70: If the version is 3 or higher, the header has the following additional fields.
        !            71: For version 2, the values are assumed to be zero, unless specified otherwise
        !            72: in the description of a field.
        !            73: 
        !            74:          72 -  79:  incompatible_features
        !            75:                     Bitmask of incompatible features. An implementation must
        !            76:                     fail to open an image if an unknown bit is set.
        !            77: 
        !            78:                     Bits 0-63:  Reserved (set to 0)
        !            79: 
        !            80:          80 -  87:  compatible_features
        !            81:                     Bitmask of compatible features. An implementation can
        !            82:                     safely ignore any unknown bits that are set.
        !            83: 
        !            84:                     Bits 0-63:  Reserved (set to 0)
        !            85: 
        !            86:          88 -  95:  autoclear_features
        !            87:                     Bitmask of auto-clear features. An implementation may only
        !            88:                     write to an image with unknown auto-clear features if it
        !            89:                     clears the respective bits from this field first.
        !            90: 
        !            91:                     Bits 0-63:  Reserved (set to 0)
        !            92: 
        !            93:          96 -  99:  refcount_order
        !            94:                     Describes the width of a reference count block entry (width
        !            95:                     in bits = 1 << refcount_order). For version 2 images, the
        !            96:                     order is always assumed to be 4 (i.e. the width is 16 bits).
        !            97: 
        !            98:         100 - 103:  header_length
        !            99:                     Length of the header structure in bytes. For version 2
        !           100:                     images, the length is always assumed to be 72 bytes.
        !           101: 
1.1       root      102: Directly after the image header, optional sections called header extensions can
                    103: be stored. Each extension has a structure like the following:
                    104: 
                    105:     Byte  0 -  3:   Header extension type:
                    106:                         0x00000000 - End of the header extension area
                    107:                         0xE2792ACA - Backing file format name
1.1.1.3 ! root      108:                         0x6803f857 - Feature name table
1.1       root      109:                         other      - Unknown header extension, can be safely
                    110:                                      ignored
                    111: 
                    112:           4 -  7:   Length of the header extension data
                    113: 
                    114:           8 -  n:   Header extension data
                    115: 
                    116:           n -  m:   Padding to round up the header extension size to the next
                    117:                     multiple of 8.
                    118: 
1.1.1.3 ! root      119: Unless stated otherwise, each header extension type shall appear at most once
        !           120: in the same image.
        !           121: 
1.1       root      122: The remaining space between the end of the header extension area and the end of
1.1.1.3 ! root      123: the first cluster can be used for the backing file name. It is not allowed to
        !           124: store other data here, so that an implementation can safely modify the header
        !           125: and add extensions without harming data of compatible features that it
        !           126: doesn't support. Compatible features that need space for additional data can
        !           127: use a header extension.
        !           128: 
        !           129: 
        !           130: == Feature name table ==
        !           131: 
        !           132: The feature name table is an optional header extension that contains the name
        !           133: for features used by the image. It can be used by applications that don't know
        !           134: the respective feature (e.g. because the feature was introduced only later) to
        !           135: display a useful error message.
        !           136: 
        !           137: The number of entries in the feature name table is determined by the length of
        !           138: the header extension data. Each entry look like this:
        !           139: 
        !           140:     Byte       0:   Type of feature (select feature bitmap)
        !           141:                         0: Incompatible feature
        !           142:                         1: Compatible feature
        !           143:                         2: Autoclear feature
        !           144: 
        !           145:                1:   Bit number within the selected feature bitmap (valid
        !           146:                     values: 0-63)
        !           147: 
        !           148:           2 - 47:   Feature name (padded with zeros, but not necessarily null
        !           149:                     terminated if it has full length)
1.1       root      150: 
                    151: 
                    152: == Host cluster management ==
                    153: 
                    154: qcow2 manages the allocation of host clusters by maintaining a reference count
                    155: for each host cluster. A refcount of 0 means that the cluster is free, 1 means
                    156: that it is used, and >= 2 means that it is used and any write access must
                    157: perform a COW (copy on write) operation.
                    158: 
                    159: The refcounts are managed in a two-level table. The first level is called
                    160: refcount table and has a variable size (which is stored in the header). The
                    161: refcount table can cover multiple clusters, however it needs to be contiguous
                    162: in the image file.
                    163: 
                    164: It contains pointers to the second level structures which are called refcount
                    165: blocks and are exactly one cluster in size.
                    166: 
                    167: Given a offset into the image file, the refcount of its cluster can be obtained
                    168: as follows:
                    169: 
                    170:     refcount_block_entries = (cluster_size / sizeof(uint16_t))
                    171: 
1.1.1.2   root      172:     refcount_block_index = (offset / cluster_size) % refcount_block_entries
                    173:     refcount_table_index = (offset / cluster_size) / refcount_block_entries
1.1       root      174: 
                    175:     refcount_block = load_cluster(refcount_table[refcount_table_index]);
                    176:     return refcount_block[refcount_block_index];
                    177: 
                    178: Refcount table entry:
                    179: 
                    180:     Bit  0 -  8:    Reserved (set to 0)
                    181: 
                    182:          9 - 63:    Bits 9-63 of the offset into the image file at which the
                    183:                     refcount block starts. Must be aligned to a cluster
                    184:                     boundary.
                    185: 
                    186:                     If this is 0, the corresponding refcount block has not yet
                    187:                     been allocated. All refcounts managed by this refcount block
                    188:                     are 0.
                    189: 
1.1.1.3 ! root      190: Refcount block entry (x = refcount_bits - 1):
1.1       root      191: 
1.1.1.3 ! root      192:     Bit  0 -  x:    Reference count of the cluster. If refcount_bits implies a
        !           193:                     sub-byte width, note that bit 0 means the least significant
        !           194:                     bit in this context.
1.1       root      195: 
                    196: 
                    197: == Cluster mapping ==
                    198: 
                    199: Just as for refcounts, qcow2 uses a two-level structure for the mapping of
                    200: guest clusters to host clusters. They are called L1 and L2 table.
                    201: 
                    202: The L1 table has a variable size (stored in the header) and may use multiple
                    203: clusters, however it must be contiguous in the image file. L2 tables are
                    204: exactly one cluster in size.
                    205: 
                    206: Given a offset into the virtual disk, the offset into the image file can be
                    207: obtained as follows:
                    208: 
                    209:     l2_entries = (cluster_size / sizeof(uint64_t))
                    210: 
                    211:     l2_index = (offset / cluster_size) % l2_entries
                    212:     l1_index = (offset / cluster_size) / l2_entries
                    213: 
                    214:     l2_table = load_cluster(l1_table[l1_index]);
                    215:     cluster_offset = l2_table[l2_index];
                    216: 
                    217:     return cluster_offset + (offset % cluster_size)
                    218: 
                    219: L1 table entry:
                    220: 
                    221:     Bit  0 -  8:    Reserved (set to 0)
                    222: 
                    223:          9 - 55:    Bits 9-55 of the offset into the image file at which the L2
                    224:                     table starts. Must be aligned to a cluster boundary. If the
                    225:                     offset is 0, the L2 table and all clusters described by this
                    226:                     L2 table are unallocated.
                    227: 
                    228:         56 - 62:    Reserved (set to 0)
                    229: 
                    230:              63:    0 for an L2 table that is unused or requires COW, 1 if its
                    231:                     refcount is exactly one. This information is only accurate
                    232:                     in the active L1 table.
                    233: 
1.1.1.3 ! root      234: L2 table entry:
1.1       root      235: 
1.1.1.3 ! root      236:     Bit  0 -  61:   Cluster descriptor
        !           237: 
        !           238:               62:   0 for standard clusters
        !           239:                     1 for compressed clusters
        !           240: 
        !           241:               63:   0 for a cluster that is unused or requires COW, 1 if its
        !           242:                     refcount is exactly one. This information is only accurate
        !           243:                     in L2 tables that are reachable from the the active L1
        !           244:                     table.
        !           245: 
        !           246: Standard Cluster Descriptor:
        !           247: 
        !           248:     Bit       0:    If set to 1, the cluster reads as all zeros. The host
        !           249:                     cluster offset can be used to describe a preallocation,
        !           250:                     but it won't be used for reading data from this cluster,
        !           251:                     nor is data read from the backing file if the cluster is
        !           252:                     unallocated.
        !           253: 
        !           254:                     With version 2, this is always 0.
        !           255: 
        !           256:          1 -  8:    Reserved (set to 0)
1.1       root      257: 
                    258:          9 - 55:    Bits 9-55 of host cluster offset. Must be aligned to a
                    259:                     cluster boundary. If the offset is 0, the cluster is
                    260:                     unallocated.
                    261: 
                    262:         56 - 61:    Reserved (set to 0)
                    263: 
                    264: 
1.1.1.3 ! root      265: Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
1.1       root      266: 
                    267:     Bit  0 -  x:    Host cluster offset. This is usually _not_ aligned to a
                    268:                     cluster boundary!
                    269: 
                    270:        x+1 - 61:    Compressed size of the images in sectors of 512 bytes
                    271: 
                    272: If a cluster is unallocated, read requests shall read the data from the backing
1.1.1.3 ! root      273: file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
        !           274: no backing file or the backing file is smaller than the image, they shall read
        !           275: zeros for all parts that are not covered by the backing file.
1.1       root      276: 
                    277: 
                    278: == Snapshots ==
                    279: 
                    280: qcow2 supports internal snapshots. Their basic principle of operation is to
                    281: switch the active L1 table, so that a different set of host clusters are
                    282: exposed to the guest.
                    283: 
                    284: When creating a snapshot, the L1 table should be copied and the refcount of all
1.1.1.2   root      285: L2 tables and clusters reachable from this L1 table must be increased, so that
1.1       root      286: a write causes a COW and isn't visible in other snapshots.
                    287: 
                    288: When loading a snapshot, bit 63 of all entries in the new active L1 table and
                    289: all L2 tables referenced by it must be reconstructed from the refcount table
                    290: as it doesn't need to be accurate in inactive L1 tables.
                    291: 
                    292: A directory of all snapshots is stored in the snapshot table, a contiguous area
                    293: in the image file, whose starting offset and length are given by the header
                    294: fields snapshots_offset and nb_snapshots. The entries of the snapshot table
                    295: have variable length, depending on the length of ID, name and extra data.
                    296: 
                    297: Snapshot table entry:
                    298: 
                    299:     Byte 0 -  7:    Offset into the image file at which the L1 table for the
                    300:                     snapshot starts. Must be aligned to a cluster boundary.
                    301: 
                    302:          8 - 11:    Number of entries in the L1 table of the snapshots
                    303: 
                    304:         12 - 13:    Length of the unique ID string describing the snapshot
                    305: 
                    306:         14 - 15:    Length of the name of the snapshot
                    307: 
                    308:         16 - 19:    Time at which the snapshot was taken in seconds since the
                    309:                     Epoch
                    310: 
                    311:         20 - 23:    Subsecond part of the time at which the snapshot was taken
                    312:                     in nanoseconds
                    313: 
                    314:         24 - 31:    Time that the guest was running until the snapshot was
                    315:                     taken in nanoseconds
                    316: 
                    317:         32 - 35:    Size of the VM state in bytes. 0 if no VM state is saved.
                    318:                     If there is VM state, it starts at the first cluster
                    319:                     described by first L1 table entry that doesn't describe a
                    320:                     regular guest cluster (i.e. VM state is stored like guest
                    321:                     disk content, except that it is stored at offsets that are
                    322:                     larger than the virtual disk presented to the guest)
                    323: 
                    324:         36 - 39:    Size of extra data in the table entry (used for future
                    325:                     extensions of the format)
                    326: 
1.1.1.3 ! root      327:         variable:   Extra data for future extensions. Unknown fields must be
        !           328:                     ignored. Currently defined are (offset relative to snapshot
        !           329:                     table entry):
        !           330: 
        !           331:                     Byte 40 - 47:   Size of the VM state in bytes. 0 if no VM
        !           332:                                     state is saved. If this field is present,
        !           333:                                     the 32-bit value in bytes 32-35 is ignored.
        !           334: 
        !           335:                     Byte 48 - 55:   Virtual disk size of the snapshot in bytes
        !           336: 
        !           337:                     Version 3 images must include extra data at least up to
        !           338:                     byte 55.
1.1       root      339: 
                    340:         variable:   Unique ID string for the snapshot (not null terminated)
                    341: 
                    342:         variable:   Name of the snapshot (not null terminated)

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.