Annotation of qemu/tcg/README, revision 1.1.1.1

1.1       root        1: Tiny Code Generator - Fabrice Bellard.
                      2: 
                      3: 1) Introduction
                      4: 
                      5: TCG (Tiny Code Generator) began as a generic backend for a C
                      6: compiler. It was simplified to be used in QEMU. It also has its roots
                      7: in the QOP code generator written by Paul Brook. 
                      8: 
                      9: 2) Definitions
                     10: 
                     11: The TCG "target" is the architecture for which we generate the
                     12: code. It is of course not the same as the "target" of QEMU which is
                     13: the emulated architecture. As TCG started as a generic C backend used
                     14: for cross compiling, it is assumed that the TCG target is different
                     15: from the host, although it is never the case for QEMU.
                     16: 
                     17: A TCG "function" corresponds to a QEMU Translated Block (TB).
                     18: 
                     19: A TCG "temporary" is a variable only live in a basic
                     20: block. Temporaries are allocated explicitly in each function.
                     21: 
                     22: A TCG "local temporary" is a variable only live in a function. Local
                     23: temporaries are allocated explicitly in each function.
                     24: 
                     25: A TCG "global" is a variable which is live in all the functions
                     26: (equivalent of a C global variable). They are defined before the
                     27: functions defined. A TCG global can be a memory location (e.g. a QEMU
                     28: CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
                     29: or a memory location which is stored in a register outside QEMU TBs
                     30: (not implemented yet).
                     31: 
                     32: A TCG "basic block" corresponds to a list of instructions terminated
                     33: by a branch instruction. 
                     34: 
                     35: 3) Intermediate representation
                     36: 
                     37: 3.1) Introduction
                     38: 
                     39: TCG instructions operate on variables which are temporaries, local
                     40: temporaries or globals. TCG instructions and variables are strongly
                     41: typed. Two types are supported: 32 bit integers and 64 bit
                     42: integers. Pointers are defined as an alias to 32 bit or 64 bit
                     43: integers depending on the TCG target word size.
                     44: 
                     45: Each instruction has a fixed number of output variable operands, input
                     46: variable operands and always constant operands.
                     47: 
                     48: The notable exception is the call instruction which has a variable
                     49: number of outputs and inputs.
                     50: 
                     51: In the textual form, output operands usually come first, followed by
                     52: input operands, followed by constant operands. The output type is
                     53: included in the instruction name. Constants are prefixed with a '$'.
                     54: 
                     55: add_i32 t0, t1, t2  (t0 <- t1 + t2)
                     56: 
                     57: 3.2) Assumptions
                     58: 
                     59: * Basic blocks
                     60: 
                     61: - Basic blocks end after branches (e.g. brcond_i32 instruction),
                     62:   goto_tb and exit_tb instructions.
                     63: - Basic blocks start after the end of a previous basic block, or at a
                     64:   set_label instruction.
                     65: 
                     66: After the end of a basic block, the content of temporaries is
                     67: destroyed, but local temporaries and globals are preserved.
                     68: 
                     69: * Floating point types are not supported yet
                     70: 
                     71: * Pointers: depending on the TCG target, pointer size is 32 bit or 64
                     72:   bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
                     73:   TCG_TYPE_I64.
                     74: 
                     75: * Helpers:
                     76: 
                     77: Using the tcg_gen_helper_x_y it is possible to call any function
                     78: taking i32, i64 or pointer types. Before calling an helper, all
                     79: globals are stored at their canonical location and it is assumed that
                     80: the function can modify them. In the future, function modifiers will
                     81: be allowed to tell that the helper does not read or write some globals.
                     82: 
                     83: On some TCG targets (e.g. x86), several calling conventions are
                     84: supported.
                     85: 
                     86: * Branches:
                     87: 
                     88: Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
                     89: explicit address. Conditional branches can only jump to labels.
                     90: 
                     91: 3.3) Code Optimizations
                     92: 
                     93: When generating instructions, you can count on at least the following
                     94: optimizations:
                     95: 
                     96: - Single instructions are simplified, e.g.
                     97: 
                     98:    and_i32 t0, t0, $0xffffffff
                     99:     
                    100:   is suppressed.
                    101: 
                    102: - A liveness analysis is done at the basic block level. The
                    103:   information is used to suppress moves from a dead variable to
                    104:   another one. It is also used to remove instructions which compute
                    105:   dead results. The later is especially useful for condition code
                    106:   optimization in QEMU.
                    107: 
                    108:   In the following example:
                    109: 
                    110:   add_i32 t0, t1, t2
                    111:   add_i32 t0, t0, $1
                    112:   mov_i32 t0, $1
                    113: 
                    114:   only the last instruction is kept.
                    115: 
                    116: 3.4) Instruction Reference
                    117: 
                    118: ********* Function call
                    119: 
                    120: * call <ret> <params> ptr
                    121: 
                    122: call function 'ptr' (pointer type)
                    123: 
                    124: <ret> optional 32 bit or 64 bit return value
                    125: <params> optional 32 bit or 64 bit parameters
                    126: 
                    127: ********* Jumps/Labels
                    128: 
                    129: * jmp t0
                    130: 
                    131: Absolute jump to address t0 (pointer type).
                    132: 
                    133: * set_label $label
                    134: 
                    135: Define label 'label' at the current program point.
                    136: 
                    137: * br $label
                    138: 
                    139: Jump to label.
                    140: 
                    141: * brcond_i32/i64 cond, t0, t1, label
                    142: 
                    143: Conditional jump if t0 cond t1 is true. cond can be:
                    144:     TCG_COND_EQ
                    145:     TCG_COND_NE
                    146:     TCG_COND_LT /* signed */
                    147:     TCG_COND_GE /* signed */
                    148:     TCG_COND_LE /* signed */
                    149:     TCG_COND_GT /* signed */
                    150:     TCG_COND_LTU /* unsigned */
                    151:     TCG_COND_GEU /* unsigned */
                    152:     TCG_COND_LEU /* unsigned */
                    153:     TCG_COND_GTU /* unsigned */
                    154: 
                    155: ********* Arithmetic
                    156: 
                    157: * add_i32/i64 t0, t1, t2
                    158: 
                    159: t0=t1+t2
                    160: 
                    161: * sub_i32/i64 t0, t1, t2
                    162: 
                    163: t0=t1-t2
                    164: 
                    165: * neg_i32/i64 t0, t1
                    166: 
                    167: t0=-t1 (two's complement)
                    168: 
                    169: * mul_i32/i64 t0, t1, t2
                    170: 
                    171: t0=t1*t2
                    172: 
                    173: * div_i32/i64 t0, t1, t2
                    174: 
                    175: t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
                    176: 
                    177: * divu_i32/i64 t0, t1, t2
                    178: 
                    179: t0=t1/t2 (unsigned). Undefined behavior if division by zero.
                    180: 
                    181: * rem_i32/i64 t0, t1, t2
                    182: 
                    183: t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
                    184: 
                    185: * remu_i32/i64 t0, t1, t2
                    186: 
                    187: t0=t1%t2 (unsigned). Undefined behavior if division by zero.
                    188: 
                    189: ********* Logical
                    190: 
                    191: * and_i32/i64 t0, t1, t2
                    192: 
                    193: t0=t1&t2
                    194: 
                    195: * or_i32/i64 t0, t1, t2
                    196: 
                    197: t0=t1|t2
                    198: 
                    199: * xor_i32/i64 t0, t1, t2
                    200: 
                    201: t0=t1^t2
                    202: 
                    203: * not_i32/i64 t0, t1
                    204: 
                    205: t0=~t1
                    206: 
                    207: * andc_i32/i64 t0, t1, t2
                    208: 
                    209: t0=t1&~t2
                    210: 
                    211: * eqv_i32/i64 t0, t1, t2
                    212: 
                    213: t0=~(t1^t2)
                    214: 
                    215: * nand_i32/i64 t0, t1, t2
                    216: 
                    217: t0=~(t1&t2)
                    218: 
                    219: * nor_i32/i64 t0, t1, t2
                    220: 
                    221: t0=~(t1|t2)
                    222: 
                    223: * orc_i32/i64 t0, t1, t2
                    224: 
                    225: t0=t1|~t2
                    226: 
                    227: ********* Shifts/Rotates
                    228: 
                    229: * shl_i32/i64 t0, t1, t2
                    230: 
                    231: t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    232: 
                    233: * shr_i32/i64 t0, t1, t2
                    234: 
                    235: t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    236: 
                    237: * sar_i32/i64 t0, t1, t2
                    238: 
                    239: t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    240: 
                    241: * rotl_i32/i64 t0, t1, t2
                    242: 
                    243: Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    244: 
                    245: * rotr_i32/i64 t0, t1, t2
                    246: 
                    247: Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    248: 
                    249: ********* Misc
                    250: 
                    251: * mov_i32/i64 t0, t1
                    252: 
                    253: t0 = t1
                    254: 
                    255: Move t1 to t0 (both operands must have the same type).
                    256: 
                    257: * ext8s_i32/i64 t0, t1
                    258: ext8u_i32/i64 t0, t1
                    259: ext16s_i32/i64 t0, t1
                    260: ext16u_i32/i64 t0, t1
                    261: ext32s_i64 t0, t1
                    262: ext32u_i64 t0, t1
                    263: 
                    264: 8, 16 or 32 bit sign/zero extension (both operands must have the same type)
                    265: 
                    266: * bswap16_i32 t0, t1
                    267: 
                    268: 16 bit byte swap on a 32 bit value. The two high order bytes must be set
                    269: to zero.
                    270: 
                    271: * bswap_i32 t0, t1
                    272: 
                    273: 32 bit byte swap
                    274: 
                    275: * bswap_i64 t0, t1
                    276: 
                    277: 64 bit byte swap
                    278: 
                    279: * discard_i32/i64 t0
                    280: 
                    281: Indicate that the value of t0 won't be used later. It is useful to
                    282: force dead code elimination.
                    283: 
                    284: ********* Type conversions
                    285: 
                    286: * ext_i32_i64 t0, t1
                    287: Convert t1 (32 bit) to t0 (64 bit) and does sign extension
                    288: 
                    289: * extu_i32_i64 t0, t1
                    290: Convert t1 (32 bit) to t0 (64 bit) and does zero extension
                    291: 
                    292: * trunc_i64_i32 t0, t1
                    293: Truncate t1 (64 bit) to t0 (32 bit)
                    294: 
                    295: * concat_i32_i64 t0, t1, t2
                    296: Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half
                    297: from t2 (32 bit).
                    298: 
                    299: * concat32_i64 t0, t1, t2
                    300: Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half
                    301: from t2 (64 bit).
                    302: 
                    303: ********* Load/Store
                    304: 
                    305: * ld_i32/i64 t0, t1, offset
                    306: ld8s_i32/i64 t0, t1, offset
                    307: ld8u_i32/i64 t0, t1, offset
                    308: ld16s_i32/i64 t0, t1, offset
                    309: ld16u_i32/i64 t0, t1, offset
                    310: ld32s_i64 t0, t1, offset
                    311: ld32u_i64 t0, t1, offset
                    312: 
                    313: t0 = read(t1 + offset)
                    314: Load 8, 16, 32 or 64 bits with or without sign extension from host memory. 
                    315: offset must be a constant.
                    316: 
                    317: * st_i32/i64 t0, t1, offset
                    318: st8_i32/i64 t0, t1, offset
                    319: st16_i32/i64 t0, t1, offset
                    320: st32_i64 t0, t1, offset
                    321: 
                    322: write(t0, t1 + offset)
                    323: Write 8, 16, 32 or 64 bits to host memory.
                    324: 
                    325: ********* QEMU specific operations
                    326: 
                    327: * tb_exit t0
                    328: 
                    329: Exit the current TB and return the value t0 (word type).
                    330: 
                    331: * goto_tb index
                    332: 
                    333: Exit the current TB and jump to the TB index 'index' (constant) if the
                    334: current TB was linked to this TB. Otherwise execute the next
                    335: instructions.
                    336: 
                    337: * qemu_ld8u t0, t1, flags
                    338: qemu_ld8s t0, t1, flags
                    339: qemu_ld16u t0, t1, flags
                    340: qemu_ld16s t0, t1, flags
                    341: qemu_ld32u t0, t1, flags
                    342: qemu_ld32s t0, t1, flags
                    343: qemu_ld64 t0, t1, flags
                    344: 
                    345: Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU
                    346: address type. 'flags' contains the QEMU memory index (selects user or
                    347: kernel access) for example.
                    348: 
                    349: * qemu_st8 t0, t1, flags
                    350: qemu_st16 t0, t1, flags
                    351: qemu_st32 t0, t1, flags
                    352: qemu_st64 t0, t1, flags
                    353: 
                    354: Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
                    355: address type. 'flags' contains the QEMU memory index (selects user or
                    356: kernel access) for example.
                    357: 
                    358: Note 1: Some shortcuts are defined when the last operand is known to be
                    359: a constant (e.g. addi for add, movi for mov).
                    360: 
                    361: Note 2: When using TCG, the opcodes must never be generated directly
                    362: as some of them may not be available as "real" opcodes. Always use the
                    363: function tcg_gen_xxx(args).
                    364: 
                    365: 4) Backend
                    366: 
                    367: tcg-target.h contains the target specific definitions. tcg-target.c
                    368: contains the target specific code.
                    369: 
                    370: 4.1) Assumptions
                    371: 
                    372: The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
                    373: 64 bit. It is expected that the pointer has the same size as the word.
                    374: 
                    375: On a 32 bit target, all 64 bit operations are converted to 32 bits. A
                    376: few specific operations must be implemented to allow it (see add2_i32,
                    377: sub2_i32, brcond2_i32).
                    378: 
                    379: Floating point operations are not supported in this version. A
                    380: previous incarnation of the code generator had full support of them,
                    381: but it is better to concentrate on integer operations first.
                    382: 
                    383: On a 64 bit target, no assumption is made in TCG about the storage of
                    384: the 32 bit values in 64 bit registers.
                    385: 
                    386: 4.2) Constraints
                    387: 
                    388: GCC like constraints are used to define the constraints of every
                    389: instruction. Memory constraints are not supported in this
                    390: version. Aliases are specified in the input operands as for GCC.
                    391: 
                    392: The same register may be used for both an input and an output, even when
                    393: they are not explicitly aliased.  If an op expands to multiple target
                    394: instructions then care must be taken to avoid clobbering input values.
                    395: GCC style "early clobber" outputs are not currently supported.
                    396: 
                    397: A target can define specific register or constant constraints. If an
                    398: operation uses a constant input constraint which does not allow all
                    399: constants, it must also accept registers in order to have a fallback.
                    400: 
                    401: The movi_i32 and movi_i64 operations must accept any constants.
                    402: 
                    403: The mov_i32 and mov_i64 operations must accept any registers of the
                    404: same type.
                    405: 
                    406: The ld/st instructions must accept signed 32 bit constant offsets. It
                    407: can be implemented by reserving a specific register to compute the
                    408: address if the offset is too big.
                    409: 
                    410: The ld/st instructions must accept any destination (ld) or source (st)
                    411: register.
                    412: 
                    413: 4.3) Function call assumptions
                    414: 
                    415: - The only supported types for parameters and return value are: 32 and
                    416:   64 bit integers and pointer.
                    417: - The stack grows downwards.
                    418: - The first N parameters are passed in registers.
                    419: - The next parameters are passed on the stack by storing them as words.
                    420: - Some registers are clobbered during the call. 
                    421: - The function can return 0 or 1 value in registers. On a 32 bit
                    422:   target, functions must be able to return 2 values in registers for
                    423:   64 bit return type.
                    424: 
                    425: 5) Recommended coding rules for best performance
                    426: 
                    427: - Use globals to represent the parts of the QEMU CPU state which are
                    428:   often modified, e.g. the integer registers and the condition
                    429:   codes. TCG will be able to use host registers to store them.
                    430: 
                    431: - Avoid globals stored in fixed registers. They must be used only to
                    432:   store the pointer to the CPU state and possibly to store a pointer
                    433:   to a register window.
                    434: 
                    435: - Use temporaries. Use local temporaries only when really needed,
                    436:   e.g. when you need to use a value after a jump. Local temporaries
                    437:   introduce a performance hit in the current TCG implementation: their
                    438:   content is saved to memory at end of each basic block.
                    439: 
                    440: - Free temporaries and local temporaries when they are no longer used
                    441:   (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
                    442:   should free it after it is used. Freeing temporaries does not yield
                    443:   a better generated code, but it reduces the memory usage of TCG and
                    444:   the speed of the translation.
                    445: 
                    446: - Don't hesitate to use helpers for complicated or seldom used target
                    447:   intructions. There is little performance advantage in using TCG to
                    448:   implement target instructions taking more than about twenty TCG
                    449:   instructions.
                    450: 
                    451: - Use the 'discard' instruction if you know that TCG won't be able to
                    452:   prove that a given global is "dead" at a given program point. The
                    453:   x86 target uses it to improve the condition codes optimisation.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.