Annotation of qemu/tcg/README, revision 1.1.1.5

1.1       root        1: Tiny Code Generator - Fabrice Bellard.
                      2: 
                      3: 1) Introduction
                      4: 
                      5: TCG (Tiny Code Generator) began as a generic backend for a C
                      6: compiler. It was simplified to be used in QEMU. It also has its roots
                      7: in the QOP code generator written by Paul Brook. 
                      8: 
                      9: 2) Definitions
                     10: 
                     11: The TCG "target" is the architecture for which we generate the
                     12: code. It is of course not the same as the "target" of QEMU which is
                     13: the emulated architecture. As TCG started as a generic C backend used
                     14: for cross compiling, it is assumed that the TCG target is different
                     15: from the host, although it is never the case for QEMU.
                     16: 
                     17: A TCG "function" corresponds to a QEMU Translated Block (TB).
                     18: 
                     19: A TCG "temporary" is a variable only live in a basic
                     20: block. Temporaries are allocated explicitly in each function.
                     21: 
                     22: A TCG "local temporary" is a variable only live in a function. Local
                     23: temporaries are allocated explicitly in each function.
                     24: 
                     25: A TCG "global" is a variable which is live in all the functions
                     26: (equivalent of a C global variable). They are defined before the
                     27: functions defined. A TCG global can be a memory location (e.g. a QEMU
                     28: CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
                     29: or a memory location which is stored in a register outside QEMU TBs
                     30: (not implemented yet).
                     31: 
                     32: A TCG "basic block" corresponds to a list of instructions terminated
                     33: by a branch instruction. 
                     34: 
                     35: 3) Intermediate representation
                     36: 
                     37: 3.1) Introduction
                     38: 
                     39: TCG instructions operate on variables which are temporaries, local
                     40: temporaries or globals. TCG instructions and variables are strongly
                     41: typed. Two types are supported: 32 bit integers and 64 bit
                     42: integers. Pointers are defined as an alias to 32 bit or 64 bit
                     43: integers depending on the TCG target word size.
                     44: 
                     45: Each instruction has a fixed number of output variable operands, input
                     46: variable operands and always constant operands.
                     47: 
                     48: The notable exception is the call instruction which has a variable
                     49: number of outputs and inputs.
                     50: 
                     51: In the textual form, output operands usually come first, followed by
                     52: input operands, followed by constant operands. The output type is
                     53: included in the instruction name. Constants are prefixed with a '$'.
                     54: 
                     55: add_i32 t0, t1, t2  (t0 <- t1 + t2)
                     56: 
                     57: 3.2) Assumptions
                     58: 
                     59: * Basic blocks
                     60: 
                     61: - Basic blocks end after branches (e.g. brcond_i32 instruction),
                     62:   goto_tb and exit_tb instructions.
                     63: - Basic blocks start after the end of a previous basic block, or at a
                     64:   set_label instruction.
                     65: 
                     66: After the end of a basic block, the content of temporaries is
                     67: destroyed, but local temporaries and globals are preserved.
                     68: 
                     69: * Floating point types are not supported yet
                     70: 
                     71: * Pointers: depending on the TCG target, pointer size is 32 bit or 64
                     72:   bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
                     73:   TCG_TYPE_I64.
                     74: 
                     75: * Helpers:
                     76: 
                     77: Using the tcg_gen_helper_x_y it is possible to call any function
1.1.1.4   root       78: taking i32, i64 or pointer types. By default, before calling a helper,
1.1.1.3   root       79: all globals are stored at their canonical location and it is assumed
1.1.1.4   root       80: that the function can modify them. This can be overridden by the
1.1.1.3   root       81: TCG_CALL_CONST function modifier. By default, the helper is allowed to
1.1.1.4   root       82: modify the CPU state or raise an exception. This can be overridden by
1.1.1.3   root       83: the TCG_CALL_PURE function modifier, in which case the call to the
                     84: function is removed if the return value is not used.
1.1       root       85: 
                     86: On some TCG targets (e.g. x86), several calling conventions are
                     87: supported.
                     88: 
                     89: * Branches:
                     90: 
                     91: Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
                     92: explicit address. Conditional branches can only jump to labels.
                     93: 
                     94: 3.3) Code Optimizations
                     95: 
                     96: When generating instructions, you can count on at least the following
                     97: optimizations:
                     98: 
                     99: - Single instructions are simplified, e.g.
                    100: 
                    101:    and_i32 t0, t0, $0xffffffff
                    102:     
                    103:   is suppressed.
                    104: 
                    105: - A liveness analysis is done at the basic block level. The
                    106:   information is used to suppress moves from a dead variable to
                    107:   another one. It is also used to remove instructions which compute
                    108:   dead results. The later is especially useful for condition code
                    109:   optimization in QEMU.
                    110: 
                    111:   In the following example:
                    112: 
                    113:   add_i32 t0, t1, t2
                    114:   add_i32 t0, t0, $1
                    115:   mov_i32 t0, $1
                    116: 
                    117:   only the last instruction is kept.
                    118: 
                    119: 3.4) Instruction Reference
                    120: 
                    121: ********* Function call
                    122: 
                    123: * call <ret> <params> ptr
                    124: 
                    125: call function 'ptr' (pointer type)
                    126: 
                    127: <ret> optional 32 bit or 64 bit return value
                    128: <params> optional 32 bit or 64 bit parameters
                    129: 
                    130: ********* Jumps/Labels
                    131: 
                    132: * jmp t0
                    133: 
                    134: Absolute jump to address t0 (pointer type).
                    135: 
                    136: * set_label $label
                    137: 
                    138: Define label 'label' at the current program point.
                    139: 
                    140: * br $label
                    141: 
                    142: Jump to label.
                    143: 
                    144: * brcond_i32/i64 cond, t0, t1, label
                    145: 
                    146: Conditional jump if t0 cond t1 is true. cond can be:
                    147:     TCG_COND_EQ
                    148:     TCG_COND_NE
                    149:     TCG_COND_LT /* signed */
                    150:     TCG_COND_GE /* signed */
                    151:     TCG_COND_LE /* signed */
                    152:     TCG_COND_GT /* signed */
                    153:     TCG_COND_LTU /* unsigned */
                    154:     TCG_COND_GEU /* unsigned */
                    155:     TCG_COND_LEU /* unsigned */
                    156:     TCG_COND_GTU /* unsigned */
                    157: 
                    158: ********* Arithmetic
                    159: 
                    160: * add_i32/i64 t0, t1, t2
                    161: 
                    162: t0=t1+t2
                    163: 
                    164: * sub_i32/i64 t0, t1, t2
                    165: 
                    166: t0=t1-t2
                    167: 
                    168: * neg_i32/i64 t0, t1
                    169: 
                    170: t0=-t1 (two's complement)
                    171: 
                    172: * mul_i32/i64 t0, t1, t2
                    173: 
                    174: t0=t1*t2
                    175: 
                    176: * div_i32/i64 t0, t1, t2
                    177: 
                    178: t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
                    179: 
                    180: * divu_i32/i64 t0, t1, t2
                    181: 
                    182: t0=t1/t2 (unsigned). Undefined behavior if division by zero.
                    183: 
                    184: * rem_i32/i64 t0, t1, t2
                    185: 
                    186: t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
                    187: 
                    188: * remu_i32/i64 t0, t1, t2
                    189: 
                    190: t0=t1%t2 (unsigned). Undefined behavior if division by zero.
                    191: 
                    192: ********* Logical
                    193: 
                    194: * and_i32/i64 t0, t1, t2
                    195: 
                    196: t0=t1&t2
                    197: 
                    198: * or_i32/i64 t0, t1, t2
                    199: 
                    200: t0=t1|t2
                    201: 
                    202: * xor_i32/i64 t0, t1, t2
                    203: 
                    204: t0=t1^t2
                    205: 
                    206: * not_i32/i64 t0, t1
                    207: 
                    208: t0=~t1
                    209: 
                    210: * andc_i32/i64 t0, t1, t2
                    211: 
                    212: t0=t1&~t2
                    213: 
                    214: * eqv_i32/i64 t0, t1, t2
                    215: 
1.1.1.3   root      216: t0=~(t1^t2), or equivalently, t0=t1^~t2
1.1       root      217: 
                    218: * nand_i32/i64 t0, t1, t2
                    219: 
                    220: t0=~(t1&t2)
                    221: 
                    222: * nor_i32/i64 t0, t1, t2
                    223: 
                    224: t0=~(t1|t2)
                    225: 
                    226: * orc_i32/i64 t0, t1, t2
                    227: 
                    228: t0=t1|~t2
                    229: 
                    230: ********* Shifts/Rotates
                    231: 
                    232: * shl_i32/i64 t0, t1, t2
                    233: 
                    234: t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    235: 
                    236: * shr_i32/i64 t0, t1, t2
                    237: 
                    238: t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    239: 
                    240: * sar_i32/i64 t0, t1, t2
                    241: 
                    242: t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    243: 
                    244: * rotl_i32/i64 t0, t1, t2
                    245: 
                    246: Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    247: 
                    248: * rotr_i32/i64 t0, t1, t2
                    249: 
                    250: Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    251: 
                    252: ********* Misc
                    253: 
                    254: * mov_i32/i64 t0, t1
                    255: 
                    256: t0 = t1
                    257: 
                    258: Move t1 to t0 (both operands must have the same type).
                    259: 
                    260: * ext8s_i32/i64 t0, t1
                    261: ext8u_i32/i64 t0, t1
                    262: ext16s_i32/i64 t0, t1
                    263: ext16u_i32/i64 t0, t1
                    264: ext32s_i64 t0, t1
                    265: ext32u_i64 t0, t1
                    266: 
                    267: 8, 16 or 32 bit sign/zero extension (both operands must have the same type)
                    268: 
1.1.1.2   root      269: * bswap16_i32/i64 t0, t1
1.1       root      270: 
1.1.1.3   root      271: 16 bit byte swap on a 32/64 bit value. It assumes that the two/six high order
                    272: bytes are set to zero.
1.1       root      273: 
1.1.1.2   root      274: * bswap32_i32/i64 t0, t1
1.1       root      275: 
1.1.1.3   root      276: 32 bit byte swap on a 32/64 bit value. With a 64 bit value, it assumes that
                    277: the four high order bytes are set to zero.
1.1       root      278: 
1.1.1.2   root      279: * bswap64_i64 t0, t1
1.1       root      280: 
                    281: 64 bit byte swap
                    282: 
                    283: * discard_i32/i64 t0
                    284: 
                    285: Indicate that the value of t0 won't be used later. It is useful to
                    286: force dead code elimination.
                    287: 
1.1.1.4   root      288: * deposit_i32/i64 dest, t1, t2, pos, len
                    289: 
                    290: Deposit T2 as a bitfield into T1, placing the result in DEST.
                    291: The bitfield is described by POS/LEN, which are immediate values:
                    292: 
                    293:   LEN - the length of the bitfield
                    294:   POS - the position of the first bit, counting from the LSB
                    295: 
                    296: For example, pos=8, len=4 indicates a 4-bit field at bit 8.
                    297: This operation would be equivalent to
                    298: 
                    299:   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
                    300: 
                    301: 
1.1.1.3   root      302: ********* Conditional moves
                    303: 
                    304: * setcond_i32/i64 cond, dest, t1, t2
                    305: 
                    306: dest = (t1 cond t2)
                    307: 
                    308: Set DEST to 1 if (T1 cond T2) is true, otherwise set to 0.
                    309: 
1.1       root      310: ********* Type conversions
                    311: 
                    312: * ext_i32_i64 t0, t1
                    313: Convert t1 (32 bit) to t0 (64 bit) and does sign extension
                    314: 
                    315: * extu_i32_i64 t0, t1
                    316: Convert t1 (32 bit) to t0 (64 bit) and does zero extension
                    317: 
                    318: * trunc_i64_i32 t0, t1
                    319: Truncate t1 (64 bit) to t0 (32 bit)
                    320: 
                    321: * concat_i32_i64 t0, t1, t2
                    322: Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half
                    323: from t2 (32 bit).
                    324: 
                    325: * concat32_i64 t0, t1, t2
                    326: Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half
                    327: from t2 (64 bit).
                    328: 
                    329: ********* Load/Store
                    330: 
                    331: * ld_i32/i64 t0, t1, offset
                    332: ld8s_i32/i64 t0, t1, offset
                    333: ld8u_i32/i64 t0, t1, offset
                    334: ld16s_i32/i64 t0, t1, offset
                    335: ld16u_i32/i64 t0, t1, offset
                    336: ld32s_i64 t0, t1, offset
                    337: ld32u_i64 t0, t1, offset
                    338: 
                    339: t0 = read(t1 + offset)
                    340: Load 8, 16, 32 or 64 bits with or without sign extension from host memory. 
                    341: offset must be a constant.
                    342: 
                    343: * st_i32/i64 t0, t1, offset
                    344: st8_i32/i64 t0, t1, offset
                    345: st16_i32/i64 t0, t1, offset
                    346: st32_i64 t0, t1, offset
                    347: 
                    348: write(t0, t1 + offset)
                    349: Write 8, 16, 32 or 64 bits to host memory.
                    350: 
1.1.1.3   root      351: ********* 64-bit target on 32-bit host support
                    352: 
                    353: The following opcodes are internal to TCG.  Thus they are to be implemented by
                    354: 32-bit host code generators, but are not to be emitted by guest translators.
                    355: They are emitted as needed by inline functions within "tcg-op.h".
                    356: 
                    357: * brcond2_i32 cond, t0_low, t0_high, t1_low, t1_high, label
                    358: 
                    359: Similar to brcond, except that the 64-bit values T0 and T1
                    360: are formed from two 32-bit arguments.
                    361: 
                    362: * add2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
                    363: * sub2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
                    364: 
                    365: Similar to add/sub, except that the 64-bit inputs T1 and T2 are
                    366: formed from two 32-bit arguments, and the 64-bit output T0
                    367: is returned in two 32-bit outputs.
                    368: 
                    369: * mulu2_i32 t0_low, t0_high, t1, t2
                    370: 
                    371: Similar to mul, except two 32-bit (unsigned) inputs T1 and T2 yielding
                    372: the full 64-bit product T0.  The later is returned in two 32-bit outputs.
                    373: 
                    374: * setcond2_i32 cond, dest, t1_low, t1_high, t2_low, t2_high
                    375: 
                    376: Similar to setcond, except that the 64-bit values T1 and T2 are
                    377: formed from two 32-bit arguments.  The result is a 32-bit value.
                    378: 
1.1       root      379: ********* QEMU specific operations
                    380: 
1.1.1.4   root      381: * exit_tb t0
1.1       root      382: 
                    383: Exit the current TB and return the value t0 (word type).
                    384: 
                    385: * goto_tb index
                    386: 
                    387: Exit the current TB and jump to the TB index 'index' (constant) if the
                    388: current TB was linked to this TB. Otherwise execute the next
                    389: instructions.
                    390: 
                    391: * qemu_ld8u t0, t1, flags
                    392: qemu_ld8s t0, t1, flags
                    393: qemu_ld16u t0, t1, flags
                    394: qemu_ld16s t0, t1, flags
1.1.1.3   root      395: qemu_ld32 t0, t1, flags
1.1       root      396: qemu_ld32u t0, t1, flags
                    397: qemu_ld32s t0, t1, flags
                    398: qemu_ld64 t0, t1, flags
                    399: 
1.1.1.3   root      400: Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU address
                    401: type. 'flags' contains the QEMU memory index (selects user or kernel access)
                    402: for example.
                    403: 
                    404: Note that "qemu_ld32" implies a 32-bit result, while "qemu_ld32u" and
                    405: "qemu_ld32s" imply a 64-bit result appropriately extended from 32 bits.
1.1       root      406: 
                    407: * qemu_st8 t0, t1, flags
                    408: qemu_st16 t0, t1, flags
                    409: qemu_st32 t0, t1, flags
                    410: qemu_st64 t0, t1, flags
                    411: 
                    412: Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
                    413: address type. 'flags' contains the QEMU memory index (selects user or
                    414: kernel access) for example.
                    415: 
                    416: Note 1: Some shortcuts are defined when the last operand is known to be
                    417: a constant (e.g. addi for add, movi for mov).
                    418: 
                    419: Note 2: When using TCG, the opcodes must never be generated directly
                    420: as some of them may not be available as "real" opcodes. Always use the
                    421: function tcg_gen_xxx(args).
                    422: 
                    423: 4) Backend
                    424: 
                    425: tcg-target.h contains the target specific definitions. tcg-target.c
                    426: contains the target specific code.
                    427: 
                    428: 4.1) Assumptions
                    429: 
                    430: The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
                    431: 64 bit. It is expected that the pointer has the same size as the word.
                    432: 
                    433: On a 32 bit target, all 64 bit operations are converted to 32 bits. A
                    434: few specific operations must be implemented to allow it (see add2_i32,
                    435: sub2_i32, brcond2_i32).
                    436: 
                    437: Floating point operations are not supported in this version. A
                    438: previous incarnation of the code generator had full support of them,
                    439: but it is better to concentrate on integer operations first.
                    440: 
                    441: On a 64 bit target, no assumption is made in TCG about the storage of
                    442: the 32 bit values in 64 bit registers.
                    443: 
                    444: 4.2) Constraints
                    445: 
                    446: GCC like constraints are used to define the constraints of every
                    447: instruction. Memory constraints are not supported in this
                    448: version. Aliases are specified in the input operands as for GCC.
                    449: 
                    450: The same register may be used for both an input and an output, even when
                    451: they are not explicitly aliased.  If an op expands to multiple target
                    452: instructions then care must be taken to avoid clobbering input values.
                    453: GCC style "early clobber" outputs are not currently supported.
                    454: 
                    455: A target can define specific register or constant constraints. If an
                    456: operation uses a constant input constraint which does not allow all
                    457: constants, it must also accept registers in order to have a fallback.
                    458: 
                    459: The movi_i32 and movi_i64 operations must accept any constants.
                    460: 
                    461: The mov_i32 and mov_i64 operations must accept any registers of the
                    462: same type.
                    463: 
                    464: The ld/st instructions must accept signed 32 bit constant offsets. It
                    465: can be implemented by reserving a specific register to compute the
                    466: address if the offset is too big.
                    467: 
                    468: The ld/st instructions must accept any destination (ld) or source (st)
                    469: register.
                    470: 
                    471: 4.3) Function call assumptions
                    472: 
                    473: - The only supported types for parameters and return value are: 32 and
                    474:   64 bit integers and pointer.
                    475: - The stack grows downwards.
                    476: - The first N parameters are passed in registers.
                    477: - The next parameters are passed on the stack by storing them as words.
                    478: - Some registers are clobbered during the call. 
                    479: - The function can return 0 or 1 value in registers. On a 32 bit
                    480:   target, functions must be able to return 2 values in registers for
                    481:   64 bit return type.
                    482: 
                    483: 5) Recommended coding rules for best performance
                    484: 
                    485: - Use globals to represent the parts of the QEMU CPU state which are
                    486:   often modified, e.g. the integer registers and the condition
                    487:   codes. TCG will be able to use host registers to store them.
                    488: 
                    489: - Avoid globals stored in fixed registers. They must be used only to
                    490:   store the pointer to the CPU state and possibly to store a pointer
                    491:   to a register window.
                    492: 
                    493: - Use temporaries. Use local temporaries only when really needed,
                    494:   e.g. when you need to use a value after a jump. Local temporaries
                    495:   introduce a performance hit in the current TCG implementation: their
                    496:   content is saved to memory at end of each basic block.
                    497: 
                    498: - Free temporaries and local temporaries when they are no longer used
                    499:   (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
                    500:   should free it after it is used. Freeing temporaries does not yield
                    501:   a better generated code, but it reduces the memory usage of TCG and
                    502:   the speed of the translation.
                    503: 
                    504: - Don't hesitate to use helpers for complicated or seldom used target
1.1.1.4   root      505:   instructions. There is little performance advantage in using TCG to
1.1       root      506:   implement target instructions taking more than about twenty TCG
1.1.1.5 ! root      507:   instructions. Note that this rule of thumb is more applicable to
        !           508:   helpers doing complex logic or arithmetic, where the C compiler has
        !           509:   scope to do a good job of optimisation; it is less relevant where
        !           510:   the instruction is mostly doing loads and stores, and in those cases
        !           511:   inline TCG may still be faster for longer sequences.
        !           512: 
        !           513: - The hard limit on the number of TCG instructions you can generate
        !           514:   per target instruction is set by MAX_OP_PER_INSTR in exec-all.h --
        !           515:   you cannot exceed this without risking a buffer overrun.
1.1       root      516: 
                    517: - Use the 'discard' instruction if you know that TCG won't be able to
                    518:   prove that a given global is "dead" at a given program point. The
                    519:   x86 target uses it to improve the condition codes optimisation.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.