|
|
1.1 root 1: The memory API
2: ==============
3:
4: The memory API models the memory and I/O buses and controllers of a QEMU
5: machine. It attempts to allow modelling of:
6:
7: - ordinary RAM
8: - memory-mapped I/O (MMIO)
9: - memory controllers that can dynamically reroute physical memory regions
1.1.1.2 ! root 10: to different destinations
1.1 root 11:
12: The memory model provides support for
13:
14: - tracking RAM changes by the guest
15: - setting up coalesced memory for kvm
16: - setting up ioeventfd regions for kvm
17:
18: Memory is modelled as a tree (really acyclic graph) of MemoryRegion objects.
19: The root of the tree is memory as seen from the CPU's viewpoint (the system
20: bus). Nodes in the tree represent other buses, memory controllers, and
21: memory regions that have been rerouted. Leaves are RAM and MMIO regions.
22:
23: Types of regions
24: ----------------
25:
26: There are four types of memory regions (all represented by a single C type
27: MemoryRegion):
28:
29: - RAM: a RAM region is simply a range of host memory that can be made available
30: to the guest.
31:
32: - MMIO: a range of guest memory that is implemented by host callbacks;
33: each read or write causes a callback to be called on the host.
34:
35: - container: a container simply includes other memory regions, each at
36: a different offset. Containers are useful for grouping several regions
37: into one unit. For example, a PCI BAR may be composed of a RAM region
38: and an MMIO region.
39:
40: A container's subregions are usually non-overlapping. In some cases it is
41: useful to have overlapping regions; for example a memory controller that
42: can overlay a subregion of RAM with MMIO or ROM, or a PCI controller
43: that does not prevent card from claiming overlapping BARs.
44:
45: - alias: a subsection of another region. Aliases allow a region to be
46: split apart into discontiguous regions. Examples of uses are memory banks
47: used when the guest address space is smaller than the amount of RAM
48: addressed, or a memory controller that splits main memory to expose a "PCI
49: hole". Aliases may point to any type of region, including other aliases,
50: but an alias may not point back to itself, directly or indirectly.
51:
52:
53: Region names
54: ------------
55:
56: Regions are assigned names by the constructor. For most regions these are
57: only used for debugging purposes, but RAM regions also use the name to identify
58: live migration sections. This means that RAM region names need to have ABI
59: stability.
60:
61: Region lifecycle
62: ----------------
63:
64: A region is created by one of the constructor functions (memory_region_init*())
65: and destroyed by the destructor (memory_region_destroy()). In between,
66: a region can be added to an address space by using memory_region_add_subregion()
67: and removed using memory_region_del_subregion(). Region attributes may be
68: changed at any point; they take effect once the region becomes exposed to the
69: guest.
70:
71: Overlapping regions and priority
72: --------------------------------
73: Usually, regions may not overlap each other; a memory address decodes into
74: exactly one target. In some cases it is useful to allow regions to overlap,
75: and sometimes to control which of an overlapping regions is visible to the
76: guest. This is done with memory_region_add_subregion_overlap(), which
77: allows the region to overlap any other region in the same container, and
78: specifies a priority that allows the core to decide which of two regions at
79: the same address are visible (highest wins).
80:
81: Visibility
82: ----------
83: The memory core uses the following rules to select a memory region when the
84: guest accesses an address:
85:
86: - all direct subregions of the root region are matched against the address, in
87: descending priority order
88: - if the address lies outside the region offset/size, the subregion is
89: discarded
90: - if the subregion is a leaf (RAM or MMIO), the search terminates
91: - if the subregion is a container, the same algorithm is used within the
92: subregion (after the address is adjusted by the subregion offset)
93: - if the subregion is an alias, the search is continues at the alias target
94: (after the address is adjusted by the subregion offset and alias offset)
95:
96: Example memory map
97: ------------------
98:
99: system_memory: container@0-2^48-1
100: |
101: +---- lomem: alias@0-0xdfffffff ---> #ram (0-0xdfffffff)
102: |
103: +---- himem: alias@0x100000000-0x11fffffff ---> #ram (0xe0000000-0xffffffff)
104: |
105: +---- vga-window: alias@0xa0000-0xbfffff ---> #pci (0xa0000-0xbffff)
106: | (prio 1)
107: |
108: +---- pci-hole: alias@0xe0000000-0xffffffff ---> #pci (0xe0000000-0xffffffff)
109:
110: pci (0-2^32-1)
111: |
112: +--- vga-area: container@0xa0000-0xbffff
113: | |
114: | +--- alias@0x00000-0x7fff ---> #vram (0x010000-0x017fff)
115: | |
116: | +--- alias@0x08000-0xffff ---> #vram (0x020000-0x027fff)
117: |
118: +---- vram: ram@0xe1000000-0xe1ffffff
119: |
120: +---- vga-mmio: mmio@0xe2000000-0xe200ffff
121:
122: ram: ram@0x00000000-0xffffffff
123:
1.1.1.2 ! root 124: This is a (simplified) PC memory map. The 4GB RAM block is mapped into the
1.1 root 125: system address space via two aliases: "lomem" is a 1:1 mapping of the first
126: 3.5GB; "himem" maps the last 0.5GB at address 4GB. This leaves 0.5GB for the
127: so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with
128: 4GB of memory.
129:
130: The memory controller diverts addresses in the range 640K-768K to the PCI
131: address space. This is modelled using the "vga-window" alias, mapped at a
132: higher priority so it obscures the RAM at the same addresses. The vga window
133: can be removed by programming the memory controller; this is modelled by
134: removing the alias and exposing the RAM underneath.
135:
136: The pci address space is not a direct child of the system address space, since
137: we only want parts of it to be visible (we accomplish this using aliases).
138: It has two subregions: vga-area models the legacy vga window and is occupied
139: by two 32K memory banks pointing at two sections of the framebuffer.
140: In addition the vram is mapped as a BAR at address e1000000, and an additional
141: BAR containing MMIO registers is mapped after it.
142:
143: Note that if the guest maps a BAR outside the PCI hole, it would not be
144: visible as the pci-hole alias clips it to a 0.5GB range.
145:
146: Attributes
147: ----------
148:
149: Various region attributes (read-only, dirty logging, coalesced mmio, ioeventfd)
150: can be changed during the region lifecycle. They take effect once the region
151: is made visible (which can be immediately, later, or never).
152:
153: MMIO Operations
154: ---------------
155:
156: MMIO regions are provided with ->read() and ->write() callbacks; in addition
157: various constraints can be supplied to control how these callbacks are called:
158:
159: - .valid.min_access_size, .valid.max_access_size define the access sizes
160: (in bytes) which the device accepts; accesses outside this range will
161: have device and bus specific behaviour (ignored, or machine check)
162: - .valid.aligned specifies that the device only accepts naturally aligned
163: accesses. Unaligned accesses invoke device and bus specific behaviour.
164: - .impl.min_access_size, .impl.max_access_size define the access sizes
165: (in bytes) supported by the *implementation*; other access sizes will be
166: emulated using the ones available. For example a 4-byte write will be
1.1.1.2 ! root 167: emulated using four 1-byte writes, if .impl.max_access_size = 1.
1.1 root 168: - .impl.valid specifies that the *implementation* only supports unaligned
169: accesses; unaligned accesses will be emulated by two aligned accesses.
170: - .old_portio and .old_mmio can be used to ease porting from code using
171: cpu_register_io_memory() and register_ioport(). They should not be used
172: in new code.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.