File:  [Qemu by Fabrice Bellard] / qemu / qemu-tech.texi
Revision (vendor branch): download - view: text, annotated - select for diffs
Tue Apr 24 19:17:37 2018 UTC (3 years, 1 month ago) by root
Branches: qemu, MAIN
CVS tags: qemu1001, HEAD
qemu 1.0.1

    1: \input texinfo @c -*- texinfo -*-
    2: @c %**start of header
    3: @setfilename
    5: @documentlanguage en
    6: @documentencoding UTF-8
    8: @settitle QEMU Internals
    9: @exampleindent 0
   10: @paragraphindent 0
   11: @c %**end of header
   13: @ifinfo
   14: @direntry
   15: * QEMU Internals: (qemu-tech).   The QEMU Emulator Internals.
   16: @end direntry
   17: @end ifinfo
   19: @iftex
   20: @titlepage
   21: @sp 7
   22: @center @titlefont{QEMU Internals}
   23: @sp 3
   24: @end titlepage
   25: @end iftex
   27: @ifnottex
   28: @node Top
   29: @top
   31: @menu
   32: * Introduction::
   33: * QEMU Internals::
   34: * Regression Tests::
   35: * Index::
   36: @end menu
   37: @end ifnottex
   39: @contents
   41: @node Introduction
   42: @chapter Introduction
   44: @menu
   45: * intro_features::         Features
   46: * intro_x86_emulation::    x86 and x86-64 emulation
   47: * intro_arm_emulation::    ARM emulation
   48: * intro_mips_emulation::   MIPS emulation
   49: * intro_ppc_emulation::    PowerPC emulation
   50: * intro_sparc_emulation::  Sparc32 and Sparc64 emulation
   51: * intro_xtensa_emulation:: Xtensa emulation
   52: * intro_other_emulation::  Other CPU emulation
   53: @end menu
   55: @node intro_features
   56: @section Features
   58: QEMU is a FAST! processor emulator using a portable dynamic
   59: translator.
   61: QEMU has two operating modes:
   63: @itemize @minus
   65: @item
   66: Full system emulation. In this mode (full platform virtualization),
   67: QEMU emulates a full system (usually a PC), including a processor and
   68: various peripherals. It can be used to launch several different
   69: Operating Systems at once without rebooting the host machine or to
   70: debug system code.
   72: @item
   73: User mode emulation. In this mode (application level virtualization),
   74: QEMU can launch processes compiled for one CPU on another CPU, however
   75: the Operating Systems must match. This can be used for example to ease
   76: cross-compilation and cross-debugging.
   77: @end itemize
   79: As QEMU requires no host kernel driver to run, it is very safe and
   80: easy to use.
   82: QEMU generic features:
   84: @itemize
   86: @item User space only or full system emulation.
   88: @item Using dynamic translation to native code for reasonable speed.
   90: @item
   91: Working on x86, x86_64 and PowerPC32/64 hosts. Being tested on ARM,
   92: HPPA, Sparc32 and Sparc64. Previous versions had some support for
   93: Alpha and S390 hosts, but TCG (see below) doesn't support those yet.
   95: @item Self-modifying code support.
   97: @item Precise exceptions support.
   99: @item The virtual CPU is a library (@code{libqemu}) which can be used
  100: in other projects (look at @file{qemu/tests/qruncom.c} to have an
  101: example of user mode @code{libqemu} usage).
  103: @item
  104: Floating point library supporting both full software emulation and
  105: native host FPU instructions.
  107: @end itemize
  109: QEMU user mode emulation features:
  110: @itemize
  111: @item Generic Linux system call converter, including most ioctls.
  113: @item clone() emulation using native CPU clone() to use Linux scheduler for threads.
  115: @item Accurate signal handling by remapping host signals to target signals.
  116: @end itemize
  118: Linux user emulator (Linux host only) can be used to launch the Wine
  119: Windows API emulator (@url{}). A Darwin user
  120: emulator (Darwin hosts only) exists and a BSD user emulator for BSD
  121: hosts is under development. It would also be possible to develop a
  122: similar user emulator for Solaris.
  124: QEMU full system emulation features:
  125: @itemize
  126: @item
  127: QEMU uses a full software MMU for maximum portability.
  129: @item
  130: QEMU can optionally use an in-kernel accelerator, like kvm. The accelerators 
  131: execute some of the guest code natively, while
  132: continuing to emulate the rest of the machine.
  134: @item
  135: Various hardware devices can be emulated and in some cases, host
  136: devices (e.g. serial and parallel ports, USB, drives) can be used
  137: transparently by the guest Operating System. Host device passthrough
  138: can be used for talking to external physical peripherals (e.g. a
  139: webcam, modem or tape drive).
  141: @item
  142: Symmetric multiprocessing (SMP) even on a host with a single CPU. On a
  143: SMP host system, QEMU can use only one CPU fully due to difficulty in
  144: implementing atomic memory accesses efficiently.
  146: @end itemize
  148: @node intro_x86_emulation
  149: @section x86 and x86-64 emulation
  151: QEMU x86 target features:
  153: @itemize
  155: @item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
  156: LDT/GDT and IDT are emulated. VM86 mode is also supported to run
  157: DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3,
  158: and SSE4 as well as x86-64 SVM.
  160: @item Support of host page sizes bigger than 4KB in user mode emulation.
  162: @item QEMU can emulate itself on x86.
  164: @item An extensive Linux x86 CPU test program is included @file{tests/test-i386}.
  165: It can be used to test other x86 virtual CPUs.
  167: @end itemize
  169: Current QEMU limitations:
  171: @itemize
  173: @item Limited x86-64 support.
  175: @item IPC syscalls are missing.
  177: @item The x86 segment limits and access rights are not tested at every
  178: memory access (yet). Hopefully, very few OSes seem to rely on that for
  179: normal use.
  181: @end itemize
  183: @node intro_arm_emulation
  184: @section ARM emulation
  186: @itemize
  188: @item Full ARM 7 user emulation.
  190: @item NWFPE FPU support included in user Linux emulation.
  192: @item Can run most ARM Linux binaries.
  194: @end itemize
  196: @node intro_mips_emulation
  197: @section MIPS emulation
  199: @itemize
  201: @item The system emulation allows full MIPS32/MIPS64 Release 2 emulation,
  202: including privileged instructions, FPU and MMU, in both little and big
  203: endian modes.
  205: @item The Linux userland emulation can run many 32 bit MIPS Linux binaries.
  207: @end itemize
  209: Current QEMU limitations:
  211: @itemize
  213: @item Self-modifying code is not always handled correctly.
  215: @item 64 bit userland emulation is not implemented.
  217: @item The system emulation is not complete enough to run real firmware.
  219: @item The watchpoint debug facility is not implemented.
  221: @end itemize
  223: @node intro_ppc_emulation
  224: @section PowerPC emulation
  226: @itemize
  228: @item Full PowerPC 32 bit emulation, including privileged instructions,
  229: FPU and MMU.
  231: @item Can run most PowerPC Linux binaries.
  233: @end itemize
  235: @node intro_sparc_emulation
  236: @section Sparc32 and Sparc64 emulation
  238: @itemize
  240: @item Full SPARC V8 emulation, including privileged
  241: instructions, FPU and MMU. SPARC V9 emulation includes most privileged
  242: and VIS instructions, FPU and I/D MMU. Alignment is fully enforced.
  244: @item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and
  245: some 64-bit SPARC Linux binaries.
  247: @end itemize
  249: Current QEMU limitations:
  251: @itemize
  253: @item IPC syscalls are missing.
  255: @item Floating point exception support is buggy.
  257: @item Atomic instructions are not correctly implemented.
  259: @item There are still some problems with Sparc64 emulators.
  261: @end itemize
  263: @node intro_xtensa_emulation
  264: @section Xtensa emulation
  266: @itemize
  268: @item Core Xtensa ISA emulation, including most options: code density,
  269: loop, extended L32R, 16- and 32-bit multiplication, 32-bit division,
  270: MAC16, miscellaneous operations, boolean, multiprocessor synchronization,
  271: conditional store, exceptions, relocatable vectors, unaligned exception,
  272: interrupts (including high priority and timer), hardware alignment,
  273: region protection, region translation, MMU, windowed registers, thread
  274: pointer, processor ID.
  276: @item Not implemented options: FP coprocessor, coprocessor context,
  277: data/instruction cache (including cache prefetch and locking), XLMI,
  278: processor interface, debug. Also options not covered by the core ISA
  279: (e.g. FLIX, wide branches) are not implemented.
  281: @item Can run most Xtensa Linux binaries.
  283: @item New core configuration that requires no additional instructions
  284: may be created from overlay with minimal amount of hand-written code.
  286: @end itemize
  288: @node intro_other_emulation
  289: @section Other CPU emulation
  291: In addition to the above, QEMU supports emulation of other CPUs with
  292: varying levels of success. These are:
  294: @itemize
  296: @item
  297: Alpha
  298: @item
  299: CRIS
  300: @item
  301: M68k
  302: @item
  303: SH4
  304: @end itemize
  306: @node QEMU Internals
  307: @chapter QEMU Internals
  309: @menu
  310: * QEMU compared to other emulators::
  311: * Portable dynamic translation::
  312: * Condition code optimisations::
  313: * CPU state optimisations::
  314: * Translation cache::
  315: * Direct block chaining::
  316: * Self-modifying code and translated code invalidation::
  317: * Exception support::
  318: * MMU emulation::
  319: * Device emulation::
  320: * Hardware interrupts::
  321: * User emulation specific details::
  322: * Bibliography::
  323: @end menu
  325: @node QEMU compared to other emulators
  326: @section QEMU compared to other emulators
  328: Like bochs [3], QEMU emulates an x86 CPU. But QEMU is much faster than
  329: bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC
  330: emulation while QEMU can emulate several processors.
  332: Like Valgrind [2], QEMU does user space emulation and dynamic
  333: translation. Valgrind is mainly a memory debugger while QEMU has no
  334: support for it (QEMU could be used to detect out of bound memory
  335: accesses as Valgrind, but it has no support to track uninitialised data
  336: as Valgrind does). The Valgrind dynamic translator generates better code
  337: than QEMU (in particular it does register allocation) but it is closely
  338: tied to an x86 host and target and has no support for precise exceptions
  339: and system emulation.
  341: EM86 [4] is the closest project to user space QEMU (and QEMU still uses
  342: some of its code, in particular the ELF file loader). EM86 was limited
  343: to an alpha host and used a proprietary and slow interpreter (the
  344: interpreter part of the FX!32 Digital Win32 code translator [5]).
  346: TWIN [6] is a Windows API emulator like Wine. It is less accurate than
  347: Wine but includes a protected mode x86 interpreter to launch x86 Windows
  348: executables. Such an approach has greater potential because most of the
  349: Windows API is executed natively but it is far more difficult to develop
  350: because all the data structures and function parameters exchanged
  351: between the API and the x86 code must be converted.
  353: User mode Linux [7] was the only solution before QEMU to launch a
  354: Linux kernel as a process while not needing any host kernel
  355: patches. However, user mode Linux requires heavy kernel patches while
  356: QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is
  357: slower.
  359: The Plex86 [8] PC virtualizer is done in the same spirit as the now
  360: obsolete qemu-fast system emulator. It requires a patched Linux kernel
  361: to work (you cannot launch the same kernel on your PC), but the
  362: patches are really small. As it is a PC virtualizer (no emulation is
  363: done except for some privileged instructions), it has the potential of
  364: being faster than QEMU. The downside is that a complicated (and
  365: potentially unsafe) host kernel patch is needed.
  367: The commercial PC Virtualizers (VMWare [9], VirtualPC [10], TwoOStwo
  368: [11]) are faster than QEMU, but they all need specific, proprietary
  369: and potentially unsafe host drivers. Moreover, they are unable to
  370: provide cycle exact simulation as an emulator can.
  372: VirtualBox [12], Xen [13] and KVM [14] are based on QEMU. QEMU-SystemC
  373: [15] uses QEMU to simulate a system where some hardware devices are
  374: developed in SystemC.
  376: @node Portable dynamic translation
  377: @section Portable dynamic translation
  379: QEMU is a dynamic translator. When it first encounters a piece of code,
  380: it converts it to the host instruction set. Usually dynamic translators
  381: are very complicated and highly CPU dependent. QEMU uses some tricks
  382: which make it relatively easily portable and simple while achieving good
  383: performances.
  385: After the release of version 0.9.1, QEMU switched to a new method of
  386: generating code, Tiny Code Generator or TCG. TCG relaxes the
  387: dependency on the exact version of the compiler used. The basic idea
  388: is to split every target instruction into a couple of RISC-like TCG
  389: ops (see @code{target-i386/translate.c}). Some optimizations can be
  390: performed at this stage, including liveness analysis and trivial
  391: constant expression evaluation. TCG ops are then implemented in the
  392: host CPU back end, also known as TCG target (see
  393: @code{tcg/i386/tcg-target.c}). For more information, please take a
  394: look at @code{tcg/README}.
  396: @node Condition code optimisations
  397: @section Condition code optimisations
  399: Lazy evaluation of CPU condition codes (@code{EFLAGS} register on x86)
  400: is important for CPUs where every instruction sets the condition
  401: codes. It tends to be less important on conventional RISC systems
  402: where condition codes are only updated when explicitly requested. On
  403: Sparc64, costly update of both 32 and 64 bit condition codes can be
  404: avoided with lazy evaluation.
  406: Instead of computing the condition codes after each x86 instruction,
  407: QEMU just stores one operand (called @code{CC_SRC}), the result
  408: (called @code{CC_DST}) and the type of operation (called
  409: @code{CC_OP}). When the condition codes are needed, the condition
  410: codes can be calculated using this information. In addition, an
  411: optimized calculation can be performed for some instruction types like
  412: conditional branches.
  414: @code{CC_OP} is almost never explicitly set in the generated code
  415: because it is known at translation time.
  417: The lazy condition code evaluation is used on x86, m68k, cris and
  418: Sparc. ARM uses a simplified variant for the N and Z flags.
  420: @node CPU state optimisations
  421: @section CPU state optimisations
  423: The target CPUs have many internal states which change the way it
  424: evaluates instructions. In order to achieve a good speed, the
  425: translation phase considers that some state information of the virtual
  426: CPU cannot change in it. The state is recorded in the Translation
  427: Block (TB). If the state changes (e.g. privilege level), a new TB will
  428: be generated and the previous TB won't be used anymore until the state
  429: matches the state recorded in the previous TB. For example, if the SS,
  430: DS and ES segments have a zero base, then the translator does not even
  431: generate an addition for the segment base.
  433: [The FPU stack pointer register is not handled that way yet].
  435: @node Translation cache
  436: @section Translation cache
  438: A 32 MByte cache holds the most recently used translations. For
  439: simplicity, it is completely flushed when it is full. A translation unit
  440: contains just a single basic block (a block of x86 instructions
  441: terminated by a jump or by a virtual CPU state change which the
  442: translator cannot deduce statically).
  444: @node Direct block chaining
  445: @section Direct block chaining
  447: After each translated basic block is executed, QEMU uses the simulated
  448: Program Counter (PC) and other cpu state informations (such as the CS
  449: segment base value) to find the next basic block.
  451: In order to accelerate the most common cases where the new simulated PC
  452: is known, QEMU can patch a basic block so that it jumps directly to the
  453: next one.
  455: The most portable code uses an indirect jump. An indirect jump makes
  456: it easier to make the jump target modification atomic. On some host
  457: architectures (such as x86 or PowerPC), the @code{JUMP} opcode is
  458: directly patched so that the block chaining has no overhead.
  460: @node Self-modifying code and translated code invalidation
  461: @section Self-modifying code and translated code invalidation
  463: Self-modifying code is a special challenge in x86 emulation because no
  464: instruction cache invalidation is signaled by the application when code
  465: is modified.
  467: When translated code is generated for a basic block, the corresponding
  468: host page is write protected if it is not already read-only. Then, if
  469: a write access is done to the page, Linux raises a SEGV signal. QEMU
  470: then invalidates all the translated code in the page and enables write
  471: accesses to the page.
  473: Correct translated code invalidation is done efficiently by maintaining
  474: a linked list of every translated block contained in a given page. Other
  475: linked lists are also maintained to undo direct block chaining.
  477: On RISC targets, correctly written software uses memory barriers and
  478: cache flushes, so some of the protection above would not be
  479: necessary. However, QEMU still requires that the generated code always
  480: matches the target instructions in memory in order to handle
  481: exceptions correctly.
  483: @node Exception support
  484: @section Exception support
  486: longjmp() is used when an exception such as division by zero is
  487: encountered.
  489: The host SIGSEGV and SIGBUS signal handlers are used to get invalid
  490: memory accesses. The simulated program counter is found by
  491: retranslating the corresponding basic block and by looking where the
  492: host program counter was at the exception point.
  494: The virtual CPU cannot retrieve the exact @code{EFLAGS} register because
  495: in some cases it is not computed because of condition code
  496: optimisations. It is not a big concern because the emulated code can
  497: still be restarted in any cases.
  499: @node MMU emulation
  500: @section MMU emulation
  502: For system emulation QEMU supports a soft MMU. In that mode, the MMU
  503: virtual to physical address translation is done at every memory
  504: access. QEMU uses an address translation cache to speed up the
  505: translation.
  507: In order to avoid flushing the translated code each time the MMU
  508: mappings change, QEMU uses a physically indexed translation cache. It
  509: means that each basic block is indexed with its physical address.
  511: When MMU mappings change, only the chaining of the basic blocks is
  512: reset (i.e. a basic block can no longer jump directly to another one).
  514: @node Device emulation
  515: @section Device emulation
  517: Systems emulated by QEMU are organized by boards. At initialization
  518: phase, each board instantiates a number of CPUs, devices, RAM and
  519: ROM. Each device in turn can assign I/O ports or memory areas (for
  520: MMIO) to its handlers. When the emulation starts, an access to the
  521: ports or MMIO memory areas assigned to the device causes the
  522: corresponding handler to be called.
  524: RAM and ROM are handled more optimally, only the offset to the host
  525: memory needs to be added to the guest address.
  527: The video RAM of VGA and other display cards is special: it can be
  528: read or written directly like RAM, but write accesses cause the memory
  529: to be marked with VGA_DIRTY flag as well.
  531: QEMU supports some device classes like serial and parallel ports, USB,
  532: drives and network devices, by providing APIs for easier connection to
  533: the generic, higher level implementations. The API hides the
  534: implementation details from the devices, like native device use or
  535: advanced block device formats like QCOW.
  537: Usually the devices implement a reset method and register support for
  538: saving and loading of the device state. The devices can also use
  539: timers, especially together with the use of bottom halves (BHs).
  541: @node Hardware interrupts
  542: @section Hardware interrupts
  544: In order to be faster, QEMU does not check at every basic block if an
  545: hardware interrupt is pending. Instead, the user must asynchronously
  546: call a specific function to tell that an interrupt is pending. This
  547: function resets the chaining of the currently executing basic
  548: block. It ensures that the execution will return soon in the main loop
  549: of the CPU emulator. Then the main loop can test if the interrupt is
  550: pending and handle it.
  552: @node User emulation specific details
  553: @section User emulation specific details
  555: @subsection Linux system call translation
  557: QEMU includes a generic system call translator for Linux. It means that
  558: the parameters of the system calls can be converted to fix the
  559: endianness and 32/64 bit issues. The IOCTLs are converted with a generic
  560: type description system (see @file{ioctls.h} and @file{thunk.c}).
  562: QEMU supports host CPUs which have pages bigger than 4KB. It records all
  563: the mappings the process does and try to emulated the @code{mmap()}
  564: system calls in cases where the host @code{mmap()} call would fail
  565: because of bad page alignment.
  567: @subsection Linux signals
  569: Normal and real-time signals are queued along with their information
  570: (@code{siginfo_t}) as it is done in the Linux kernel. Then an interrupt
  571: request is done to the virtual CPU. When it is interrupted, one queued
  572: signal is handled by generating a stack frame in the virtual CPU as the
  573: Linux kernel does. The @code{sigreturn()} system call is emulated to return
  574: from the virtual signal handler.
  576: Some signals (such as SIGALRM) directly come from the host. Other
  577: signals are synthesized from the virtual CPU exceptions such as SIGFPE
  578: when a division by zero is done (see @code{main.c:cpu_loop()}).
  580: The blocked signal mask is still handled by the host Linux kernel so
  581: that most signal system calls can be redirected directly to the host
  582: Linux kernel. Only the @code{sigaction()} and @code{sigreturn()} system
  583: calls need to be fully emulated (see @file{signal.c}).
  585: @subsection clone() system call and threads
  587: The Linux clone() system call is usually used to create a thread. QEMU
  588: uses the host clone() system call so that real host threads are created
  589: for each emulated thread. One virtual CPU instance is created for each
  590: thread.
  592: The virtual x86 CPU atomic operations are emulated with a global lock so
  593: that their semantic is preserved.
  595: Note that currently there are still some locking issues in QEMU. In
  596: particular, the translated cache flush is not protected yet against
  597: reentrancy.
  599: @subsection Self-virtualization
  601: QEMU was conceived so that ultimately it can emulate itself. Although
  602: it is not very useful, it is an important test to show the power of the
  603: emulator.
  605: Achieving self-virtualization is not easy because there may be address
  606: space conflicts. QEMU user emulators solve this problem by being an
  607: executable ELF shared object as the ELF interpreter. That
  608: way, it can be relocated at load time.
  610: @node Bibliography
  611: @section Bibliography
  613: @table @asis
  615: @item [1]
  616: @url{}, Optimizing
  617: direct threaded code by selective inlining (1998) by Ian Piumarta, Fabio
  618: Riccardi.
  620: @item [2]
  621: @url{}, Valgrind, an open-source
  622: memory debugger for x86-GNU/Linux, by Julian Seward.
  624: @item [3]
  625: @url{}, the Bochs IA-32 Emulator Project,
  626: by Kevin Lawton et al.
  628: @item [4]
  629: @url{}, the EM86
  630: x86 emulator on Alpha-Linux.
  632: @item [5]
  633: @url{},
  634: DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
  635: Chernoff and Ray Hookway.
  637: @item [6]
  638: @url{}, Windows API library emulation from
  639: Willows Software.
  641: @item [7]
  642: @url{},
  643: The User-mode Linux Kernel.
  645: @item [8]
  646: @url{},
  647: The new Plex86 project.
  649: @item [9]
  650: @url{},
  651: The VMWare PC virtualizer.
  653: @item [10]
  654: @url{},
  655: The VirtualPC PC virtualizer.
  657: @item [11]
  658: @url{},
  659: The TwoOStwo PC virtualizer.
  661: @item [12]
  662: @url{},
  663: The VirtualBox PC virtualizer.
  665: @item [13]
  666: @url{},
  667: The Xen hypervisor.
  669: @item [14]
  670: @url{},
  671: Kernel Based Virtual Machine (KVM).
  673: @item [15]
  674: @url{},
  675: QEMU-SystemC, a hardware co-simulator.
  677: @end table
  679: @node Regression Tests
  680: @chapter Regression Tests
  682: In the directory @file{tests/}, various interesting testing programs
  683: are available. They are used for regression testing.
  685: @menu
  686: * test-i386::
  687: * linux-test::
  688: * qruncom.c::
  689: @end menu
  691: @node test-i386
  692: @section @file{test-i386}
  694: This program executes most of the 16 bit and 32 bit x86 instructions and
  695: generates a text output. It can be compared with the output obtained with
  696: a real CPU or another emulator. The target @code{make test} runs this
  697: program and a @code{diff} on the generated output.
  699: The Linux system call @code{modify_ldt()} is used to create x86 selectors
  700: to test some 16 bit addressing and 32 bit with segmentation cases.
  702: The Linux system call @code{vm86()} is used to test vm86 emulation.
  704: Various exceptions are raised to test most of the x86 user space
  705: exception reporting.
  707: @node linux-test
  708: @section @file{linux-test}
  710: This program tests various Linux system calls. It is used to verify
  711: that the system call parameters are correctly converted between target
  712: and host CPUs.
  714: @node qruncom.c
  715: @section @file{qruncom.c}
  717: Example of usage of @code{libqemu} to emulate a user mode i386 CPU.
  719: @node Index
  720: @chapter Index
  721: @printindex cp
  723: @bye