Annotation of qemu/tcg/README, revision 1.1.1.3

1.1       root        1: Tiny Code Generator - Fabrice Bellard.
                      2: 
                      3: 1) Introduction
                      4: 
                      5: TCG (Tiny Code Generator) began as a generic backend for a C
                      6: compiler. It was simplified to be used in QEMU. It also has its roots
                      7: in the QOP code generator written by Paul Brook. 
                      8: 
                      9: 2) Definitions
                     10: 
                     11: The TCG "target" is the architecture for which we generate the
                     12: code. It is of course not the same as the "target" of QEMU which is
                     13: the emulated architecture. As TCG started as a generic C backend used
                     14: for cross compiling, it is assumed that the TCG target is different
                     15: from the host, although it is never the case for QEMU.
                     16: 
                     17: A TCG "function" corresponds to a QEMU Translated Block (TB).
                     18: 
                     19: A TCG "temporary" is a variable only live in a basic
                     20: block. Temporaries are allocated explicitly in each function.
                     21: 
                     22: A TCG "local temporary" is a variable only live in a function. Local
                     23: temporaries are allocated explicitly in each function.
                     24: 
                     25: A TCG "global" is a variable which is live in all the functions
                     26: (equivalent of a C global variable). They are defined before the
                     27: functions defined. A TCG global can be a memory location (e.g. a QEMU
                     28: CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
                     29: or a memory location which is stored in a register outside QEMU TBs
                     30: (not implemented yet).
                     31: 
                     32: A TCG "basic block" corresponds to a list of instructions terminated
                     33: by a branch instruction. 
                     34: 
                     35: 3) Intermediate representation
                     36: 
                     37: 3.1) Introduction
                     38: 
                     39: TCG instructions operate on variables which are temporaries, local
                     40: temporaries or globals. TCG instructions and variables are strongly
                     41: typed. Two types are supported: 32 bit integers and 64 bit
                     42: integers. Pointers are defined as an alias to 32 bit or 64 bit
                     43: integers depending on the TCG target word size.
                     44: 
                     45: Each instruction has a fixed number of output variable operands, input
                     46: variable operands and always constant operands.
                     47: 
                     48: The notable exception is the call instruction which has a variable
                     49: number of outputs and inputs.
                     50: 
                     51: In the textual form, output operands usually come first, followed by
                     52: input operands, followed by constant operands. The output type is
                     53: included in the instruction name. Constants are prefixed with a '$'.
                     54: 
                     55: add_i32 t0, t1, t2  (t0 <- t1 + t2)
                     56: 
                     57: 3.2) Assumptions
                     58: 
                     59: * Basic blocks
                     60: 
                     61: - Basic blocks end after branches (e.g. brcond_i32 instruction),
                     62:   goto_tb and exit_tb instructions.
                     63: - Basic blocks start after the end of a previous basic block, or at a
                     64:   set_label instruction.
                     65: 
                     66: After the end of a basic block, the content of temporaries is
                     67: destroyed, but local temporaries and globals are preserved.
                     68: 
                     69: * Floating point types are not supported yet
                     70: 
                     71: * Pointers: depending on the TCG target, pointer size is 32 bit or 64
                     72:   bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
                     73:   TCG_TYPE_I64.
                     74: 
                     75: * Helpers:
                     76: 
                     77: Using the tcg_gen_helper_x_y it is possible to call any function
1.1.1.3 ! root       78: taking i32, i64 or pointer types. By default, before calling an helper,
        !            79: all globals are stored at their canonical location and it is assumed
        !            80: that the function can modify them. This can be overriden by the
        !            81: TCG_CALL_CONST function modifier. By default, the helper is allowed to
        !            82: modify the CPU state or raise an exception. This can be overriden by
        !            83: the TCG_CALL_PURE function modifier, in which case the call to the
        !            84: function is removed if the return value is not used.
1.1       root       85: 
                     86: On some TCG targets (e.g. x86), several calling conventions are
                     87: supported.
                     88: 
                     89: * Branches:
                     90: 
                     91: Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
                     92: explicit address. Conditional branches can only jump to labels.
                     93: 
                     94: 3.3) Code Optimizations
                     95: 
                     96: When generating instructions, you can count on at least the following
                     97: optimizations:
                     98: 
                     99: - Single instructions are simplified, e.g.
                    100: 
                    101:    and_i32 t0, t0, $0xffffffff
                    102:     
                    103:   is suppressed.
                    104: 
                    105: - A liveness analysis is done at the basic block level. The
                    106:   information is used to suppress moves from a dead variable to
                    107:   another one. It is also used to remove instructions which compute
                    108:   dead results. The later is especially useful for condition code
                    109:   optimization in QEMU.
                    110: 
                    111:   In the following example:
                    112: 
                    113:   add_i32 t0, t1, t2
                    114:   add_i32 t0, t0, $1
                    115:   mov_i32 t0, $1
                    116: 
                    117:   only the last instruction is kept.
                    118: 
                    119: 3.4) Instruction Reference
                    120: 
                    121: ********* Function call
                    122: 
                    123: * call <ret> <params> ptr
                    124: 
                    125: call function 'ptr' (pointer type)
                    126: 
                    127: <ret> optional 32 bit or 64 bit return value
                    128: <params> optional 32 bit or 64 bit parameters
                    129: 
                    130: ********* Jumps/Labels
                    131: 
                    132: * jmp t0
                    133: 
                    134: Absolute jump to address t0 (pointer type).
                    135: 
                    136: * set_label $label
                    137: 
                    138: Define label 'label' at the current program point.
                    139: 
                    140: * br $label
                    141: 
                    142: Jump to label.
                    143: 
                    144: * brcond_i32/i64 cond, t0, t1, label
                    145: 
                    146: Conditional jump if t0 cond t1 is true. cond can be:
                    147:     TCG_COND_EQ
                    148:     TCG_COND_NE
                    149:     TCG_COND_LT /* signed */
                    150:     TCG_COND_GE /* signed */
                    151:     TCG_COND_LE /* signed */
                    152:     TCG_COND_GT /* signed */
                    153:     TCG_COND_LTU /* unsigned */
                    154:     TCG_COND_GEU /* unsigned */
                    155:     TCG_COND_LEU /* unsigned */
                    156:     TCG_COND_GTU /* unsigned */
                    157: 
                    158: ********* Arithmetic
                    159: 
                    160: * add_i32/i64 t0, t1, t2
                    161: 
                    162: t0=t1+t2
                    163: 
                    164: * sub_i32/i64 t0, t1, t2
                    165: 
                    166: t0=t1-t2
                    167: 
                    168: * neg_i32/i64 t0, t1
                    169: 
                    170: t0=-t1 (two's complement)
                    171: 
                    172: * mul_i32/i64 t0, t1, t2
                    173: 
                    174: t0=t1*t2
                    175: 
                    176: * div_i32/i64 t0, t1, t2
                    177: 
                    178: t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
                    179: 
                    180: * divu_i32/i64 t0, t1, t2
                    181: 
                    182: t0=t1/t2 (unsigned). Undefined behavior if division by zero.
                    183: 
                    184: * rem_i32/i64 t0, t1, t2
                    185: 
                    186: t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
                    187: 
                    188: * remu_i32/i64 t0, t1, t2
                    189: 
                    190: t0=t1%t2 (unsigned). Undefined behavior if division by zero.
                    191: 
                    192: ********* Logical
                    193: 
                    194: * and_i32/i64 t0, t1, t2
                    195: 
                    196: t0=t1&t2
                    197: 
                    198: * or_i32/i64 t0, t1, t2
                    199: 
                    200: t0=t1|t2
                    201: 
                    202: * xor_i32/i64 t0, t1, t2
                    203: 
                    204: t0=t1^t2
                    205: 
                    206: * not_i32/i64 t0, t1
                    207: 
                    208: t0=~t1
                    209: 
                    210: * andc_i32/i64 t0, t1, t2
                    211: 
                    212: t0=t1&~t2
                    213: 
                    214: * eqv_i32/i64 t0, t1, t2
                    215: 
1.1.1.3 ! root      216: t0=~(t1^t2), or equivalently, t0=t1^~t2
1.1       root      217: 
                    218: * nand_i32/i64 t0, t1, t2
                    219: 
                    220: t0=~(t1&t2)
                    221: 
                    222: * nor_i32/i64 t0, t1, t2
                    223: 
                    224: t0=~(t1|t2)
                    225: 
                    226: * orc_i32/i64 t0, t1, t2
                    227: 
                    228: t0=t1|~t2
                    229: 
                    230: ********* Shifts/Rotates
                    231: 
                    232: * shl_i32/i64 t0, t1, t2
                    233: 
                    234: t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    235: 
                    236: * shr_i32/i64 t0, t1, t2
                    237: 
                    238: t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    239: 
                    240: * sar_i32/i64 t0, t1, t2
                    241: 
                    242: t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    243: 
                    244: * rotl_i32/i64 t0, t1, t2
                    245: 
                    246: Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    247: 
                    248: * rotr_i32/i64 t0, t1, t2
                    249: 
                    250: Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
                    251: 
                    252: ********* Misc
                    253: 
                    254: * mov_i32/i64 t0, t1
                    255: 
                    256: t0 = t1
                    257: 
                    258: Move t1 to t0 (both operands must have the same type).
                    259: 
                    260: * ext8s_i32/i64 t0, t1
                    261: ext8u_i32/i64 t0, t1
                    262: ext16s_i32/i64 t0, t1
                    263: ext16u_i32/i64 t0, t1
                    264: ext32s_i64 t0, t1
                    265: ext32u_i64 t0, t1
                    266: 
                    267: 8, 16 or 32 bit sign/zero extension (both operands must have the same type)
                    268: 
1.1.1.2   root      269: * bswap16_i32/i64 t0, t1
1.1       root      270: 
1.1.1.3 ! root      271: 16 bit byte swap on a 32/64 bit value. It assumes that the two/six high order
        !           272: bytes are set to zero.
1.1       root      273: 
1.1.1.2   root      274: * bswap32_i32/i64 t0, t1
1.1       root      275: 
1.1.1.3 ! root      276: 32 bit byte swap on a 32/64 bit value. With a 64 bit value, it assumes that
        !           277: the four high order bytes are set to zero.
1.1       root      278: 
1.1.1.2   root      279: * bswap64_i64 t0, t1
1.1       root      280: 
                    281: 64 bit byte swap
                    282: 
                    283: * discard_i32/i64 t0
                    284: 
                    285: Indicate that the value of t0 won't be used later. It is useful to
                    286: force dead code elimination.
                    287: 
1.1.1.3 ! root      288: ********* Conditional moves
        !           289: 
        !           290: * setcond_i32/i64 cond, dest, t1, t2
        !           291: 
        !           292: dest = (t1 cond t2)
        !           293: 
        !           294: Set DEST to 1 if (T1 cond T2) is true, otherwise set to 0.
        !           295: 
1.1       root      296: ********* Type conversions
                    297: 
                    298: * ext_i32_i64 t0, t1
                    299: Convert t1 (32 bit) to t0 (64 bit) and does sign extension
                    300: 
                    301: * extu_i32_i64 t0, t1
                    302: Convert t1 (32 bit) to t0 (64 bit) and does zero extension
                    303: 
                    304: * trunc_i64_i32 t0, t1
                    305: Truncate t1 (64 bit) to t0 (32 bit)
                    306: 
                    307: * concat_i32_i64 t0, t1, t2
                    308: Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half
                    309: from t2 (32 bit).
                    310: 
                    311: * concat32_i64 t0, t1, t2
                    312: Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half
                    313: from t2 (64 bit).
                    314: 
                    315: ********* Load/Store
                    316: 
                    317: * ld_i32/i64 t0, t1, offset
                    318: ld8s_i32/i64 t0, t1, offset
                    319: ld8u_i32/i64 t0, t1, offset
                    320: ld16s_i32/i64 t0, t1, offset
                    321: ld16u_i32/i64 t0, t1, offset
                    322: ld32s_i64 t0, t1, offset
                    323: ld32u_i64 t0, t1, offset
                    324: 
                    325: t0 = read(t1 + offset)
                    326: Load 8, 16, 32 or 64 bits with or without sign extension from host memory. 
                    327: offset must be a constant.
                    328: 
                    329: * st_i32/i64 t0, t1, offset
                    330: st8_i32/i64 t0, t1, offset
                    331: st16_i32/i64 t0, t1, offset
                    332: st32_i64 t0, t1, offset
                    333: 
                    334: write(t0, t1 + offset)
                    335: Write 8, 16, 32 or 64 bits to host memory.
                    336: 
1.1.1.3 ! root      337: ********* 64-bit target on 32-bit host support
        !           338: 
        !           339: The following opcodes are internal to TCG.  Thus they are to be implemented by
        !           340: 32-bit host code generators, but are not to be emitted by guest translators.
        !           341: They are emitted as needed by inline functions within "tcg-op.h".
        !           342: 
        !           343: * brcond2_i32 cond, t0_low, t0_high, t1_low, t1_high, label
        !           344: 
        !           345: Similar to brcond, except that the 64-bit values T0 and T1
        !           346: are formed from two 32-bit arguments.
        !           347: 
        !           348: * add2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
        !           349: * sub2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
        !           350: 
        !           351: Similar to add/sub, except that the 64-bit inputs T1 and T2 are
        !           352: formed from two 32-bit arguments, and the 64-bit output T0
        !           353: is returned in two 32-bit outputs.
        !           354: 
        !           355: * mulu2_i32 t0_low, t0_high, t1, t2
        !           356: 
        !           357: Similar to mul, except two 32-bit (unsigned) inputs T1 and T2 yielding
        !           358: the full 64-bit product T0.  The later is returned in two 32-bit outputs.
        !           359: 
        !           360: * setcond2_i32 cond, dest, t1_low, t1_high, t2_low, t2_high
        !           361: 
        !           362: Similar to setcond, except that the 64-bit values T1 and T2 are
        !           363: formed from two 32-bit arguments.  The result is a 32-bit value.
        !           364: 
1.1       root      365: ********* QEMU specific operations
                    366: 
                    367: * tb_exit t0
                    368: 
                    369: Exit the current TB and return the value t0 (word type).
                    370: 
                    371: * goto_tb index
                    372: 
                    373: Exit the current TB and jump to the TB index 'index' (constant) if the
                    374: current TB was linked to this TB. Otherwise execute the next
                    375: instructions.
                    376: 
                    377: * qemu_ld8u t0, t1, flags
                    378: qemu_ld8s t0, t1, flags
                    379: qemu_ld16u t0, t1, flags
                    380: qemu_ld16s t0, t1, flags
1.1.1.3 ! root      381: qemu_ld32 t0, t1, flags
1.1       root      382: qemu_ld32u t0, t1, flags
                    383: qemu_ld32s t0, t1, flags
                    384: qemu_ld64 t0, t1, flags
                    385: 
1.1.1.3 ! root      386: Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU address
        !           387: type. 'flags' contains the QEMU memory index (selects user or kernel access)
        !           388: for example.
        !           389: 
        !           390: Note that "qemu_ld32" implies a 32-bit result, while "qemu_ld32u" and
        !           391: "qemu_ld32s" imply a 64-bit result appropriately extended from 32 bits.
1.1       root      392: 
                    393: * qemu_st8 t0, t1, flags
                    394: qemu_st16 t0, t1, flags
                    395: qemu_st32 t0, t1, flags
                    396: qemu_st64 t0, t1, flags
                    397: 
                    398: Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
                    399: address type. 'flags' contains the QEMU memory index (selects user or
                    400: kernel access) for example.
                    401: 
                    402: Note 1: Some shortcuts are defined when the last operand is known to be
                    403: a constant (e.g. addi for add, movi for mov).
                    404: 
                    405: Note 2: When using TCG, the opcodes must never be generated directly
                    406: as some of them may not be available as "real" opcodes. Always use the
                    407: function tcg_gen_xxx(args).
                    408: 
                    409: 4) Backend
                    410: 
                    411: tcg-target.h contains the target specific definitions. tcg-target.c
                    412: contains the target specific code.
                    413: 
                    414: 4.1) Assumptions
                    415: 
                    416: The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
                    417: 64 bit. It is expected that the pointer has the same size as the word.
                    418: 
                    419: On a 32 bit target, all 64 bit operations are converted to 32 bits. A
                    420: few specific operations must be implemented to allow it (see add2_i32,
                    421: sub2_i32, brcond2_i32).
                    422: 
                    423: Floating point operations are not supported in this version. A
                    424: previous incarnation of the code generator had full support of them,
                    425: but it is better to concentrate on integer operations first.
                    426: 
                    427: On a 64 bit target, no assumption is made in TCG about the storage of
                    428: the 32 bit values in 64 bit registers.
                    429: 
                    430: 4.2) Constraints
                    431: 
                    432: GCC like constraints are used to define the constraints of every
                    433: instruction. Memory constraints are not supported in this
                    434: version. Aliases are specified in the input operands as for GCC.
                    435: 
                    436: The same register may be used for both an input and an output, even when
                    437: they are not explicitly aliased.  If an op expands to multiple target
                    438: instructions then care must be taken to avoid clobbering input values.
                    439: GCC style "early clobber" outputs are not currently supported.
                    440: 
                    441: A target can define specific register or constant constraints. If an
                    442: operation uses a constant input constraint which does not allow all
                    443: constants, it must also accept registers in order to have a fallback.
                    444: 
                    445: The movi_i32 and movi_i64 operations must accept any constants.
                    446: 
                    447: The mov_i32 and mov_i64 operations must accept any registers of the
                    448: same type.
                    449: 
                    450: The ld/st instructions must accept signed 32 bit constant offsets. It
                    451: can be implemented by reserving a specific register to compute the
                    452: address if the offset is too big.
                    453: 
                    454: The ld/st instructions must accept any destination (ld) or source (st)
                    455: register.
                    456: 
                    457: 4.3) Function call assumptions
                    458: 
                    459: - The only supported types for parameters and return value are: 32 and
                    460:   64 bit integers and pointer.
                    461: - The stack grows downwards.
                    462: - The first N parameters are passed in registers.
                    463: - The next parameters are passed on the stack by storing them as words.
                    464: - Some registers are clobbered during the call. 
                    465: - The function can return 0 or 1 value in registers. On a 32 bit
                    466:   target, functions must be able to return 2 values in registers for
                    467:   64 bit return type.
                    468: 
                    469: 5) Recommended coding rules for best performance
                    470: 
                    471: - Use globals to represent the parts of the QEMU CPU state which are
                    472:   often modified, e.g. the integer registers and the condition
                    473:   codes. TCG will be able to use host registers to store them.
                    474: 
                    475: - Avoid globals stored in fixed registers. They must be used only to
                    476:   store the pointer to the CPU state and possibly to store a pointer
                    477:   to a register window.
                    478: 
                    479: - Use temporaries. Use local temporaries only when really needed,
                    480:   e.g. when you need to use a value after a jump. Local temporaries
                    481:   introduce a performance hit in the current TCG implementation: their
                    482:   content is saved to memory at end of each basic block.
                    483: 
                    484: - Free temporaries and local temporaries when they are no longer used
                    485:   (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
                    486:   should free it after it is used. Freeing temporaries does not yield
                    487:   a better generated code, but it reduces the memory usage of TCG and
                    488:   the speed of the translation.
                    489: 
                    490: - Don't hesitate to use helpers for complicated or seldom used target
                    491:   intructions. There is little performance advantage in using TCG to
                    492:   implement target instructions taking more than about twenty TCG
                    493:   instructions.
                    494: 
                    495: - Use the 'discard' instruction if you know that TCG won't be able to
                    496:   prove that a given global is "dead" at a given program point. The
                    497:   x86 target uses it to improve the condition codes optimisation.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.