|
|
1.1 ! root 1: Tiny Code Generator - Fabrice Bellard. ! 2: ! 3: 1) Introduction ! 4: ! 5: TCG (Tiny Code Generator) began as a generic backend for a C ! 6: compiler. It was simplified to be used in QEMU. It also has its roots ! 7: in the QOP code generator written by Paul Brook. ! 8: ! 9: 2) Definitions ! 10: ! 11: The TCG "target" is the architecture for which we generate the ! 12: code. It is of course not the same as the "target" of QEMU which is ! 13: the emulated architecture. As TCG started as a generic C backend used ! 14: for cross compiling, it is assumed that the TCG target is different ! 15: from the host, although it is never the case for QEMU. ! 16: ! 17: A TCG "function" corresponds to a QEMU Translated Block (TB). ! 18: ! 19: A TCG "temporary" is a variable only live in a basic ! 20: block. Temporaries are allocated explicitly in each function. ! 21: ! 22: A TCG "local temporary" is a variable only live in a function. Local ! 23: temporaries are allocated explicitly in each function. ! 24: ! 25: A TCG "global" is a variable which is live in all the functions ! 26: (equivalent of a C global variable). They are defined before the ! 27: functions defined. A TCG global can be a memory location (e.g. a QEMU ! 28: CPU register), a fixed host register (e.g. the QEMU CPU state pointer) ! 29: or a memory location which is stored in a register outside QEMU TBs ! 30: (not implemented yet). ! 31: ! 32: A TCG "basic block" corresponds to a list of instructions terminated ! 33: by a branch instruction. ! 34: ! 35: 3) Intermediate representation ! 36: ! 37: 3.1) Introduction ! 38: ! 39: TCG instructions operate on variables which are temporaries, local ! 40: temporaries or globals. TCG instructions and variables are strongly ! 41: typed. Two types are supported: 32 bit integers and 64 bit ! 42: integers. Pointers are defined as an alias to 32 bit or 64 bit ! 43: integers depending on the TCG target word size. ! 44: ! 45: Each instruction has a fixed number of output variable operands, input ! 46: variable operands and always constant operands. ! 47: ! 48: The notable exception is the call instruction which has a variable ! 49: number of outputs and inputs. ! 50: ! 51: In the textual form, output operands usually come first, followed by ! 52: input operands, followed by constant operands. The output type is ! 53: included in the instruction name. Constants are prefixed with a '$'. ! 54: ! 55: add_i32 t0, t1, t2 (t0 <- t1 + t2) ! 56: ! 57: 3.2) Assumptions ! 58: ! 59: * Basic blocks ! 60: ! 61: - Basic blocks end after branches (e.g. brcond_i32 instruction), ! 62: goto_tb and exit_tb instructions. ! 63: - Basic blocks start after the end of a previous basic block, or at a ! 64: set_label instruction. ! 65: ! 66: After the end of a basic block, the content of temporaries is ! 67: destroyed, but local temporaries and globals are preserved. ! 68: ! 69: * Floating point types are not supported yet ! 70: ! 71: * Pointers: depending on the TCG target, pointer size is 32 bit or 64 ! 72: bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or ! 73: TCG_TYPE_I64. ! 74: ! 75: * Helpers: ! 76: ! 77: Using the tcg_gen_helper_x_y it is possible to call any function ! 78: taking i32, i64 or pointer types. Before calling an helper, all ! 79: globals are stored at their canonical location and it is assumed that ! 80: the function can modify them. In the future, function modifiers will ! 81: be allowed to tell that the helper does not read or write some globals. ! 82: ! 83: On some TCG targets (e.g. x86), several calling conventions are ! 84: supported. ! 85: ! 86: * Branches: ! 87: ! 88: Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an ! 89: explicit address. Conditional branches can only jump to labels. ! 90: ! 91: 3.3) Code Optimizations ! 92: ! 93: When generating instructions, you can count on at least the following ! 94: optimizations: ! 95: ! 96: - Single instructions are simplified, e.g. ! 97: ! 98: and_i32 t0, t0, $0xffffffff ! 99: ! 100: is suppressed. ! 101: ! 102: - A liveness analysis is done at the basic block level. The ! 103: information is used to suppress moves from a dead variable to ! 104: another one. It is also used to remove instructions which compute ! 105: dead results. The later is especially useful for condition code ! 106: optimization in QEMU. ! 107: ! 108: In the following example: ! 109: ! 110: add_i32 t0, t1, t2 ! 111: add_i32 t0, t0, $1 ! 112: mov_i32 t0, $1 ! 113: ! 114: only the last instruction is kept. ! 115: ! 116: 3.4) Instruction Reference ! 117: ! 118: ********* Function call ! 119: ! 120: * call <ret> <params> ptr ! 121: ! 122: call function 'ptr' (pointer type) ! 123: ! 124: <ret> optional 32 bit or 64 bit return value ! 125: <params> optional 32 bit or 64 bit parameters ! 126: ! 127: ********* Jumps/Labels ! 128: ! 129: * jmp t0 ! 130: ! 131: Absolute jump to address t0 (pointer type). ! 132: ! 133: * set_label $label ! 134: ! 135: Define label 'label' at the current program point. ! 136: ! 137: * br $label ! 138: ! 139: Jump to label. ! 140: ! 141: * brcond_i32/i64 cond, t0, t1, label ! 142: ! 143: Conditional jump if t0 cond t1 is true. cond can be: ! 144: TCG_COND_EQ ! 145: TCG_COND_NE ! 146: TCG_COND_LT /* signed */ ! 147: TCG_COND_GE /* signed */ ! 148: TCG_COND_LE /* signed */ ! 149: TCG_COND_GT /* signed */ ! 150: TCG_COND_LTU /* unsigned */ ! 151: TCG_COND_GEU /* unsigned */ ! 152: TCG_COND_LEU /* unsigned */ ! 153: TCG_COND_GTU /* unsigned */ ! 154: ! 155: ********* Arithmetic ! 156: ! 157: * add_i32/i64 t0, t1, t2 ! 158: ! 159: t0=t1+t2 ! 160: ! 161: * sub_i32/i64 t0, t1, t2 ! 162: ! 163: t0=t1-t2 ! 164: ! 165: * neg_i32/i64 t0, t1 ! 166: ! 167: t0=-t1 (two's complement) ! 168: ! 169: * mul_i32/i64 t0, t1, t2 ! 170: ! 171: t0=t1*t2 ! 172: ! 173: * div_i32/i64 t0, t1, t2 ! 174: ! 175: t0=t1/t2 (signed). Undefined behavior if division by zero or overflow. ! 176: ! 177: * divu_i32/i64 t0, t1, t2 ! 178: ! 179: t0=t1/t2 (unsigned). Undefined behavior if division by zero. ! 180: ! 181: * rem_i32/i64 t0, t1, t2 ! 182: ! 183: t0=t1%t2 (signed). Undefined behavior if division by zero or overflow. ! 184: ! 185: * remu_i32/i64 t0, t1, t2 ! 186: ! 187: t0=t1%t2 (unsigned). Undefined behavior if division by zero. ! 188: ! 189: ********* Logical ! 190: ! 191: * and_i32/i64 t0, t1, t2 ! 192: ! 193: t0=t1&t2 ! 194: ! 195: * or_i32/i64 t0, t1, t2 ! 196: ! 197: t0=t1|t2 ! 198: ! 199: * xor_i32/i64 t0, t1, t2 ! 200: ! 201: t0=t1^t2 ! 202: ! 203: * not_i32/i64 t0, t1 ! 204: ! 205: t0=~t1 ! 206: ! 207: * andc_i32/i64 t0, t1, t2 ! 208: ! 209: t0=t1&~t2 ! 210: ! 211: * eqv_i32/i64 t0, t1, t2 ! 212: ! 213: t0=~(t1^t2) ! 214: ! 215: * nand_i32/i64 t0, t1, t2 ! 216: ! 217: t0=~(t1&t2) ! 218: ! 219: * nor_i32/i64 t0, t1, t2 ! 220: ! 221: t0=~(t1|t2) ! 222: ! 223: * orc_i32/i64 t0, t1, t2 ! 224: ! 225: t0=t1|~t2 ! 226: ! 227: ********* Shifts/Rotates ! 228: ! 229: * shl_i32/i64 t0, t1, t2 ! 230: ! 231: t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) ! 232: ! 233: * shr_i32/i64 t0, t1, t2 ! 234: ! 235: t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) ! 236: ! 237: * sar_i32/i64 t0, t1, t2 ! 238: ! 239: t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) ! 240: ! 241: * rotl_i32/i64 t0, t1, t2 ! 242: ! 243: Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) ! 244: ! 245: * rotr_i32/i64 t0, t1, t2 ! 246: ! 247: Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) ! 248: ! 249: ********* Misc ! 250: ! 251: * mov_i32/i64 t0, t1 ! 252: ! 253: t0 = t1 ! 254: ! 255: Move t1 to t0 (both operands must have the same type). ! 256: ! 257: * ext8s_i32/i64 t0, t1 ! 258: ext8u_i32/i64 t0, t1 ! 259: ext16s_i32/i64 t0, t1 ! 260: ext16u_i32/i64 t0, t1 ! 261: ext32s_i64 t0, t1 ! 262: ext32u_i64 t0, t1 ! 263: ! 264: 8, 16 or 32 bit sign/zero extension (both operands must have the same type) ! 265: ! 266: * bswap16_i32 t0, t1 ! 267: ! 268: 16 bit byte swap on a 32 bit value. The two high order bytes must be set ! 269: to zero. ! 270: ! 271: * bswap_i32 t0, t1 ! 272: ! 273: 32 bit byte swap ! 274: ! 275: * bswap_i64 t0, t1 ! 276: ! 277: 64 bit byte swap ! 278: ! 279: * discard_i32/i64 t0 ! 280: ! 281: Indicate that the value of t0 won't be used later. It is useful to ! 282: force dead code elimination. ! 283: ! 284: ********* Type conversions ! 285: ! 286: * ext_i32_i64 t0, t1 ! 287: Convert t1 (32 bit) to t0 (64 bit) and does sign extension ! 288: ! 289: * extu_i32_i64 t0, t1 ! 290: Convert t1 (32 bit) to t0 (64 bit) and does zero extension ! 291: ! 292: * trunc_i64_i32 t0, t1 ! 293: Truncate t1 (64 bit) to t0 (32 bit) ! 294: ! 295: * concat_i32_i64 t0, t1, t2 ! 296: Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half ! 297: from t2 (32 bit). ! 298: ! 299: * concat32_i64 t0, t1, t2 ! 300: Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half ! 301: from t2 (64 bit). ! 302: ! 303: ********* Load/Store ! 304: ! 305: * ld_i32/i64 t0, t1, offset ! 306: ld8s_i32/i64 t0, t1, offset ! 307: ld8u_i32/i64 t0, t1, offset ! 308: ld16s_i32/i64 t0, t1, offset ! 309: ld16u_i32/i64 t0, t1, offset ! 310: ld32s_i64 t0, t1, offset ! 311: ld32u_i64 t0, t1, offset ! 312: ! 313: t0 = read(t1 + offset) ! 314: Load 8, 16, 32 or 64 bits with or without sign extension from host memory. ! 315: offset must be a constant. ! 316: ! 317: * st_i32/i64 t0, t1, offset ! 318: st8_i32/i64 t0, t1, offset ! 319: st16_i32/i64 t0, t1, offset ! 320: st32_i64 t0, t1, offset ! 321: ! 322: write(t0, t1 + offset) ! 323: Write 8, 16, 32 or 64 bits to host memory. ! 324: ! 325: ********* QEMU specific operations ! 326: ! 327: * tb_exit t0 ! 328: ! 329: Exit the current TB and return the value t0 (word type). ! 330: ! 331: * goto_tb index ! 332: ! 333: Exit the current TB and jump to the TB index 'index' (constant) if the ! 334: current TB was linked to this TB. Otherwise execute the next ! 335: instructions. ! 336: ! 337: * qemu_ld8u t0, t1, flags ! 338: qemu_ld8s t0, t1, flags ! 339: qemu_ld16u t0, t1, flags ! 340: qemu_ld16s t0, t1, flags ! 341: qemu_ld32u t0, t1, flags ! 342: qemu_ld32s t0, t1, flags ! 343: qemu_ld64 t0, t1, flags ! 344: ! 345: Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU ! 346: address type. 'flags' contains the QEMU memory index (selects user or ! 347: kernel access) for example. ! 348: ! 349: * qemu_st8 t0, t1, flags ! 350: qemu_st16 t0, t1, flags ! 351: qemu_st32 t0, t1, flags ! 352: qemu_st64 t0, t1, flags ! 353: ! 354: Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU ! 355: address type. 'flags' contains the QEMU memory index (selects user or ! 356: kernel access) for example. ! 357: ! 358: Note 1: Some shortcuts are defined when the last operand is known to be ! 359: a constant (e.g. addi for add, movi for mov). ! 360: ! 361: Note 2: When using TCG, the opcodes must never be generated directly ! 362: as some of them may not be available as "real" opcodes. Always use the ! 363: function tcg_gen_xxx(args). ! 364: ! 365: 4) Backend ! 366: ! 367: tcg-target.h contains the target specific definitions. tcg-target.c ! 368: contains the target specific code. ! 369: ! 370: 4.1) Assumptions ! 371: ! 372: The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or ! 373: 64 bit. It is expected that the pointer has the same size as the word. ! 374: ! 375: On a 32 bit target, all 64 bit operations are converted to 32 bits. A ! 376: few specific operations must be implemented to allow it (see add2_i32, ! 377: sub2_i32, brcond2_i32). ! 378: ! 379: Floating point operations are not supported in this version. A ! 380: previous incarnation of the code generator had full support of them, ! 381: but it is better to concentrate on integer operations first. ! 382: ! 383: On a 64 bit target, no assumption is made in TCG about the storage of ! 384: the 32 bit values in 64 bit registers. ! 385: ! 386: 4.2) Constraints ! 387: ! 388: GCC like constraints are used to define the constraints of every ! 389: instruction. Memory constraints are not supported in this ! 390: version. Aliases are specified in the input operands as for GCC. ! 391: ! 392: The same register may be used for both an input and an output, even when ! 393: they are not explicitly aliased. If an op expands to multiple target ! 394: instructions then care must be taken to avoid clobbering input values. ! 395: GCC style "early clobber" outputs are not currently supported. ! 396: ! 397: A target can define specific register or constant constraints. If an ! 398: operation uses a constant input constraint which does not allow all ! 399: constants, it must also accept registers in order to have a fallback. ! 400: ! 401: The movi_i32 and movi_i64 operations must accept any constants. ! 402: ! 403: The mov_i32 and mov_i64 operations must accept any registers of the ! 404: same type. ! 405: ! 406: The ld/st instructions must accept signed 32 bit constant offsets. It ! 407: can be implemented by reserving a specific register to compute the ! 408: address if the offset is too big. ! 409: ! 410: The ld/st instructions must accept any destination (ld) or source (st) ! 411: register. ! 412: ! 413: 4.3) Function call assumptions ! 414: ! 415: - The only supported types for parameters and return value are: 32 and ! 416: 64 bit integers and pointer. ! 417: - The stack grows downwards. ! 418: - The first N parameters are passed in registers. ! 419: - The next parameters are passed on the stack by storing them as words. ! 420: - Some registers are clobbered during the call. ! 421: - The function can return 0 or 1 value in registers. On a 32 bit ! 422: target, functions must be able to return 2 values in registers for ! 423: 64 bit return type. ! 424: ! 425: 5) Recommended coding rules for best performance ! 426: ! 427: - Use globals to represent the parts of the QEMU CPU state which are ! 428: often modified, e.g. the integer registers and the condition ! 429: codes. TCG will be able to use host registers to store them. ! 430: ! 431: - Avoid globals stored in fixed registers. They must be used only to ! 432: store the pointer to the CPU state and possibly to store a pointer ! 433: to a register window. ! 434: ! 435: - Use temporaries. Use local temporaries only when really needed, ! 436: e.g. when you need to use a value after a jump. Local temporaries ! 437: introduce a performance hit in the current TCG implementation: their ! 438: content is saved to memory at end of each basic block. ! 439: ! 440: - Free temporaries and local temporaries when they are no longer used ! 441: (tcg_temp_free). Since tcg_const_x() also creates a temporary, you ! 442: should free it after it is used. Freeing temporaries does not yield ! 443: a better generated code, but it reduces the memory usage of TCG and ! 444: the speed of the translation. ! 445: ! 446: - Don't hesitate to use helpers for complicated or seldom used target ! 447: intructions. There is little performance advantage in using TCG to ! 448: implement target instructions taking more than about twenty TCG ! 449: instructions. ! 450: ! 451: - Use the 'discard' instruction if you know that TCG won't be able to ! 452: prove that a given global is "dead" at a given program point. The ! 453: x86 target uses it to improve the condition codes optimisation.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.