|
|
1.1 root 1: Tiny Code Generator - Fabrice Bellard.
2:
3: 1) Introduction
4:
5: TCG (Tiny Code Generator) began as a generic backend for a C
6: compiler. It was simplified to be used in QEMU. It also has its roots
7: in the QOP code generator written by Paul Brook.
8:
9: 2) Definitions
10:
11: The TCG "target" is the architecture for which we generate the
12: code. It is of course not the same as the "target" of QEMU which is
13: the emulated architecture. As TCG started as a generic C backend used
14: for cross compiling, it is assumed that the TCG target is different
15: from the host, although it is never the case for QEMU.
16:
17: A TCG "function" corresponds to a QEMU Translated Block (TB).
18:
19: A TCG "temporary" is a variable only live in a basic
20: block. Temporaries are allocated explicitly in each function.
21:
22: A TCG "local temporary" is a variable only live in a function. Local
23: temporaries are allocated explicitly in each function.
24:
25: A TCG "global" is a variable which is live in all the functions
26: (equivalent of a C global variable). They are defined before the
27: functions defined. A TCG global can be a memory location (e.g. a QEMU
28: CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
29: or a memory location which is stored in a register outside QEMU TBs
30: (not implemented yet).
31:
32: A TCG "basic block" corresponds to a list of instructions terminated
33: by a branch instruction.
34:
35: 3) Intermediate representation
36:
37: 3.1) Introduction
38:
39: TCG instructions operate on variables which are temporaries, local
40: temporaries or globals. TCG instructions and variables are strongly
41: typed. Two types are supported: 32 bit integers and 64 bit
42: integers. Pointers are defined as an alias to 32 bit or 64 bit
43: integers depending on the TCG target word size.
44:
45: Each instruction has a fixed number of output variable operands, input
46: variable operands and always constant operands.
47:
48: The notable exception is the call instruction which has a variable
49: number of outputs and inputs.
50:
51: In the textual form, output operands usually come first, followed by
52: input operands, followed by constant operands. The output type is
53: included in the instruction name. Constants are prefixed with a '$'.
54:
55: add_i32 t0, t1, t2 (t0 <- t1 + t2)
56:
57: 3.2) Assumptions
58:
59: * Basic blocks
60:
61: - Basic blocks end after branches (e.g. brcond_i32 instruction),
62: goto_tb and exit_tb instructions.
63: - Basic blocks start after the end of a previous basic block, or at a
64: set_label instruction.
65:
66: After the end of a basic block, the content of temporaries is
67: destroyed, but local temporaries and globals are preserved.
68:
69: * Floating point types are not supported yet
70:
71: * Pointers: depending on the TCG target, pointer size is 32 bit or 64
72: bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
73: TCG_TYPE_I64.
74:
75: * Helpers:
76:
77: Using the tcg_gen_helper_x_y it is possible to call any function
1.1.1.3 ! root 78: taking i32, i64 or pointer types. By default, before calling an helper,
! 79: all globals are stored at their canonical location and it is assumed
! 80: that the function can modify them. This can be overriden by the
! 81: TCG_CALL_CONST function modifier. By default, the helper is allowed to
! 82: modify the CPU state or raise an exception. This can be overriden by
! 83: the TCG_CALL_PURE function modifier, in which case the call to the
! 84: function is removed if the return value is not used.
1.1 root 85:
86: On some TCG targets (e.g. x86), several calling conventions are
87: supported.
88:
89: * Branches:
90:
91: Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
92: explicit address. Conditional branches can only jump to labels.
93:
94: 3.3) Code Optimizations
95:
96: When generating instructions, you can count on at least the following
97: optimizations:
98:
99: - Single instructions are simplified, e.g.
100:
101: and_i32 t0, t0, $0xffffffff
102:
103: is suppressed.
104:
105: - A liveness analysis is done at the basic block level. The
106: information is used to suppress moves from a dead variable to
107: another one. It is also used to remove instructions which compute
108: dead results. The later is especially useful for condition code
109: optimization in QEMU.
110:
111: In the following example:
112:
113: add_i32 t0, t1, t2
114: add_i32 t0, t0, $1
115: mov_i32 t0, $1
116:
117: only the last instruction is kept.
118:
119: 3.4) Instruction Reference
120:
121: ********* Function call
122:
123: * call <ret> <params> ptr
124:
125: call function 'ptr' (pointer type)
126:
127: <ret> optional 32 bit or 64 bit return value
128: <params> optional 32 bit or 64 bit parameters
129:
130: ********* Jumps/Labels
131:
132: * jmp t0
133:
134: Absolute jump to address t0 (pointer type).
135:
136: * set_label $label
137:
138: Define label 'label' at the current program point.
139:
140: * br $label
141:
142: Jump to label.
143:
144: * brcond_i32/i64 cond, t0, t1, label
145:
146: Conditional jump if t0 cond t1 is true. cond can be:
147: TCG_COND_EQ
148: TCG_COND_NE
149: TCG_COND_LT /* signed */
150: TCG_COND_GE /* signed */
151: TCG_COND_LE /* signed */
152: TCG_COND_GT /* signed */
153: TCG_COND_LTU /* unsigned */
154: TCG_COND_GEU /* unsigned */
155: TCG_COND_LEU /* unsigned */
156: TCG_COND_GTU /* unsigned */
157:
158: ********* Arithmetic
159:
160: * add_i32/i64 t0, t1, t2
161:
162: t0=t1+t2
163:
164: * sub_i32/i64 t0, t1, t2
165:
166: t0=t1-t2
167:
168: * neg_i32/i64 t0, t1
169:
170: t0=-t1 (two's complement)
171:
172: * mul_i32/i64 t0, t1, t2
173:
174: t0=t1*t2
175:
176: * div_i32/i64 t0, t1, t2
177:
178: t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
179:
180: * divu_i32/i64 t0, t1, t2
181:
182: t0=t1/t2 (unsigned). Undefined behavior if division by zero.
183:
184: * rem_i32/i64 t0, t1, t2
185:
186: t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
187:
188: * remu_i32/i64 t0, t1, t2
189:
190: t0=t1%t2 (unsigned). Undefined behavior if division by zero.
191:
192: ********* Logical
193:
194: * and_i32/i64 t0, t1, t2
195:
196: t0=t1&t2
197:
198: * or_i32/i64 t0, t1, t2
199:
200: t0=t1|t2
201:
202: * xor_i32/i64 t0, t1, t2
203:
204: t0=t1^t2
205:
206: * not_i32/i64 t0, t1
207:
208: t0=~t1
209:
210: * andc_i32/i64 t0, t1, t2
211:
212: t0=t1&~t2
213:
214: * eqv_i32/i64 t0, t1, t2
215:
1.1.1.3 ! root 216: t0=~(t1^t2), or equivalently, t0=t1^~t2
1.1 root 217:
218: * nand_i32/i64 t0, t1, t2
219:
220: t0=~(t1&t2)
221:
222: * nor_i32/i64 t0, t1, t2
223:
224: t0=~(t1|t2)
225:
226: * orc_i32/i64 t0, t1, t2
227:
228: t0=t1|~t2
229:
230: ********* Shifts/Rotates
231:
232: * shl_i32/i64 t0, t1, t2
233:
234: t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
235:
236: * shr_i32/i64 t0, t1, t2
237:
238: t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
239:
240: * sar_i32/i64 t0, t1, t2
241:
242: t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
243:
244: * rotl_i32/i64 t0, t1, t2
245:
246: Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
247:
248: * rotr_i32/i64 t0, t1, t2
249:
250: Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
251:
252: ********* Misc
253:
254: * mov_i32/i64 t0, t1
255:
256: t0 = t1
257:
258: Move t1 to t0 (both operands must have the same type).
259:
260: * ext8s_i32/i64 t0, t1
261: ext8u_i32/i64 t0, t1
262: ext16s_i32/i64 t0, t1
263: ext16u_i32/i64 t0, t1
264: ext32s_i64 t0, t1
265: ext32u_i64 t0, t1
266:
267: 8, 16 or 32 bit sign/zero extension (both operands must have the same type)
268:
1.1.1.2 root 269: * bswap16_i32/i64 t0, t1
1.1 root 270:
1.1.1.3 ! root 271: 16 bit byte swap on a 32/64 bit value. It assumes that the two/six high order
! 272: bytes are set to zero.
1.1 root 273:
1.1.1.2 root 274: * bswap32_i32/i64 t0, t1
1.1 root 275:
1.1.1.3 ! root 276: 32 bit byte swap on a 32/64 bit value. With a 64 bit value, it assumes that
! 277: the four high order bytes are set to zero.
1.1 root 278:
1.1.1.2 root 279: * bswap64_i64 t0, t1
1.1 root 280:
281: 64 bit byte swap
282:
283: * discard_i32/i64 t0
284:
285: Indicate that the value of t0 won't be used later. It is useful to
286: force dead code elimination.
287:
1.1.1.3 ! root 288: ********* Conditional moves
! 289:
! 290: * setcond_i32/i64 cond, dest, t1, t2
! 291:
! 292: dest = (t1 cond t2)
! 293:
! 294: Set DEST to 1 if (T1 cond T2) is true, otherwise set to 0.
! 295:
1.1 root 296: ********* Type conversions
297:
298: * ext_i32_i64 t0, t1
299: Convert t1 (32 bit) to t0 (64 bit) and does sign extension
300:
301: * extu_i32_i64 t0, t1
302: Convert t1 (32 bit) to t0 (64 bit) and does zero extension
303:
304: * trunc_i64_i32 t0, t1
305: Truncate t1 (64 bit) to t0 (32 bit)
306:
307: * concat_i32_i64 t0, t1, t2
308: Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half
309: from t2 (32 bit).
310:
311: * concat32_i64 t0, t1, t2
312: Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half
313: from t2 (64 bit).
314:
315: ********* Load/Store
316:
317: * ld_i32/i64 t0, t1, offset
318: ld8s_i32/i64 t0, t1, offset
319: ld8u_i32/i64 t0, t1, offset
320: ld16s_i32/i64 t0, t1, offset
321: ld16u_i32/i64 t0, t1, offset
322: ld32s_i64 t0, t1, offset
323: ld32u_i64 t0, t1, offset
324:
325: t0 = read(t1 + offset)
326: Load 8, 16, 32 or 64 bits with or without sign extension from host memory.
327: offset must be a constant.
328:
329: * st_i32/i64 t0, t1, offset
330: st8_i32/i64 t0, t1, offset
331: st16_i32/i64 t0, t1, offset
332: st32_i64 t0, t1, offset
333:
334: write(t0, t1 + offset)
335: Write 8, 16, 32 or 64 bits to host memory.
336:
1.1.1.3 ! root 337: ********* 64-bit target on 32-bit host support
! 338:
! 339: The following opcodes are internal to TCG. Thus they are to be implemented by
! 340: 32-bit host code generators, but are not to be emitted by guest translators.
! 341: They are emitted as needed by inline functions within "tcg-op.h".
! 342:
! 343: * brcond2_i32 cond, t0_low, t0_high, t1_low, t1_high, label
! 344:
! 345: Similar to brcond, except that the 64-bit values T0 and T1
! 346: are formed from two 32-bit arguments.
! 347:
! 348: * add2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
! 349: * sub2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
! 350:
! 351: Similar to add/sub, except that the 64-bit inputs T1 and T2 are
! 352: formed from two 32-bit arguments, and the 64-bit output T0
! 353: is returned in two 32-bit outputs.
! 354:
! 355: * mulu2_i32 t0_low, t0_high, t1, t2
! 356:
! 357: Similar to mul, except two 32-bit (unsigned) inputs T1 and T2 yielding
! 358: the full 64-bit product T0. The later is returned in two 32-bit outputs.
! 359:
! 360: * setcond2_i32 cond, dest, t1_low, t1_high, t2_low, t2_high
! 361:
! 362: Similar to setcond, except that the 64-bit values T1 and T2 are
! 363: formed from two 32-bit arguments. The result is a 32-bit value.
! 364:
1.1 root 365: ********* QEMU specific operations
366:
367: * tb_exit t0
368:
369: Exit the current TB and return the value t0 (word type).
370:
371: * goto_tb index
372:
373: Exit the current TB and jump to the TB index 'index' (constant) if the
374: current TB was linked to this TB. Otherwise execute the next
375: instructions.
376:
377: * qemu_ld8u t0, t1, flags
378: qemu_ld8s t0, t1, flags
379: qemu_ld16u t0, t1, flags
380: qemu_ld16s t0, t1, flags
1.1.1.3 ! root 381: qemu_ld32 t0, t1, flags
1.1 root 382: qemu_ld32u t0, t1, flags
383: qemu_ld32s t0, t1, flags
384: qemu_ld64 t0, t1, flags
385:
1.1.1.3 ! root 386: Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU address
! 387: type. 'flags' contains the QEMU memory index (selects user or kernel access)
! 388: for example.
! 389:
! 390: Note that "qemu_ld32" implies a 32-bit result, while "qemu_ld32u" and
! 391: "qemu_ld32s" imply a 64-bit result appropriately extended from 32 bits.
1.1 root 392:
393: * qemu_st8 t0, t1, flags
394: qemu_st16 t0, t1, flags
395: qemu_st32 t0, t1, flags
396: qemu_st64 t0, t1, flags
397:
398: Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
399: address type. 'flags' contains the QEMU memory index (selects user or
400: kernel access) for example.
401:
402: Note 1: Some shortcuts are defined when the last operand is known to be
403: a constant (e.g. addi for add, movi for mov).
404:
405: Note 2: When using TCG, the opcodes must never be generated directly
406: as some of them may not be available as "real" opcodes. Always use the
407: function tcg_gen_xxx(args).
408:
409: 4) Backend
410:
411: tcg-target.h contains the target specific definitions. tcg-target.c
412: contains the target specific code.
413:
414: 4.1) Assumptions
415:
416: The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
417: 64 bit. It is expected that the pointer has the same size as the word.
418:
419: On a 32 bit target, all 64 bit operations are converted to 32 bits. A
420: few specific operations must be implemented to allow it (see add2_i32,
421: sub2_i32, brcond2_i32).
422:
423: Floating point operations are not supported in this version. A
424: previous incarnation of the code generator had full support of them,
425: but it is better to concentrate on integer operations first.
426:
427: On a 64 bit target, no assumption is made in TCG about the storage of
428: the 32 bit values in 64 bit registers.
429:
430: 4.2) Constraints
431:
432: GCC like constraints are used to define the constraints of every
433: instruction. Memory constraints are not supported in this
434: version. Aliases are specified in the input operands as for GCC.
435:
436: The same register may be used for both an input and an output, even when
437: they are not explicitly aliased. If an op expands to multiple target
438: instructions then care must be taken to avoid clobbering input values.
439: GCC style "early clobber" outputs are not currently supported.
440:
441: A target can define specific register or constant constraints. If an
442: operation uses a constant input constraint which does not allow all
443: constants, it must also accept registers in order to have a fallback.
444:
445: The movi_i32 and movi_i64 operations must accept any constants.
446:
447: The mov_i32 and mov_i64 operations must accept any registers of the
448: same type.
449:
450: The ld/st instructions must accept signed 32 bit constant offsets. It
451: can be implemented by reserving a specific register to compute the
452: address if the offset is too big.
453:
454: The ld/st instructions must accept any destination (ld) or source (st)
455: register.
456:
457: 4.3) Function call assumptions
458:
459: - The only supported types for parameters and return value are: 32 and
460: 64 bit integers and pointer.
461: - The stack grows downwards.
462: - The first N parameters are passed in registers.
463: - The next parameters are passed on the stack by storing them as words.
464: - Some registers are clobbered during the call.
465: - The function can return 0 or 1 value in registers. On a 32 bit
466: target, functions must be able to return 2 values in registers for
467: 64 bit return type.
468:
469: 5) Recommended coding rules for best performance
470:
471: - Use globals to represent the parts of the QEMU CPU state which are
472: often modified, e.g. the integer registers and the condition
473: codes. TCG will be able to use host registers to store them.
474:
475: - Avoid globals stored in fixed registers. They must be used only to
476: store the pointer to the CPU state and possibly to store a pointer
477: to a register window.
478:
479: - Use temporaries. Use local temporaries only when really needed,
480: e.g. when you need to use a value after a jump. Local temporaries
481: introduce a performance hit in the current TCG implementation: their
482: content is saved to memory at end of each basic block.
483:
484: - Free temporaries and local temporaries when they are no longer used
485: (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
486: should free it after it is used. Freeing temporaries does not yield
487: a better generated code, but it reduces the memory usage of TCG and
488: the speed of the translation.
489:
490: - Don't hesitate to use helpers for complicated or seldom used target
491: intructions. There is little performance advantage in using TCG to
492: implement target instructions taking more than about twenty TCG
493: instructions.
494:
495: - Use the 'discard' instruction if you know that TCG won't be able to
496: prove that a given global is "dead" at a given program point. The
497: x86 target uses it to improve the condition codes optimisation.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.