|
|
1.1 root 1: Tiny Code Generator - Fabrice Bellard.
2:
3: 1) Introduction
4:
5: TCG (Tiny Code Generator) began as a generic backend for a C
6: compiler. It was simplified to be used in QEMU. It also has its roots
7: in the QOP code generator written by Paul Brook.
8:
9: 2) Definitions
10:
11: The TCG "target" is the architecture for which we generate the
12: code. It is of course not the same as the "target" of QEMU which is
13: the emulated architecture. As TCG started as a generic C backend used
14: for cross compiling, it is assumed that the TCG target is different
15: from the host, although it is never the case for QEMU.
16:
17: A TCG "function" corresponds to a QEMU Translated Block (TB).
18:
19: A TCG "temporary" is a variable only live in a basic
20: block. Temporaries are allocated explicitly in each function.
21:
22: A TCG "local temporary" is a variable only live in a function. Local
23: temporaries are allocated explicitly in each function.
24:
25: A TCG "global" is a variable which is live in all the functions
26: (equivalent of a C global variable). They are defined before the
27: functions defined. A TCG global can be a memory location (e.g. a QEMU
28: CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
29: or a memory location which is stored in a register outside QEMU TBs
30: (not implemented yet).
31:
32: A TCG "basic block" corresponds to a list of instructions terminated
33: by a branch instruction.
34:
35: 3) Intermediate representation
36:
37: 3.1) Introduction
38:
39: TCG instructions operate on variables which are temporaries, local
40: temporaries or globals. TCG instructions and variables are strongly
41: typed. Two types are supported: 32 bit integers and 64 bit
42: integers. Pointers are defined as an alias to 32 bit or 64 bit
43: integers depending on the TCG target word size.
44:
45: Each instruction has a fixed number of output variable operands, input
46: variable operands and always constant operands.
47:
48: The notable exception is the call instruction which has a variable
49: number of outputs and inputs.
50:
51: In the textual form, output operands usually come first, followed by
52: input operands, followed by constant operands. The output type is
53: included in the instruction name. Constants are prefixed with a '$'.
54:
55: add_i32 t0, t1, t2 (t0 <- t1 + t2)
56:
57: 3.2) Assumptions
58:
59: * Basic blocks
60:
61: - Basic blocks end after branches (e.g. brcond_i32 instruction),
62: goto_tb and exit_tb instructions.
63: - Basic blocks start after the end of a previous basic block, or at a
64: set_label instruction.
65:
66: After the end of a basic block, the content of temporaries is
67: destroyed, but local temporaries and globals are preserved.
68:
69: * Floating point types are not supported yet
70:
71: * Pointers: depending on the TCG target, pointer size is 32 bit or 64
72: bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
73: TCG_TYPE_I64.
74:
75: * Helpers:
76:
77: Using the tcg_gen_helper_x_y it is possible to call any function
78: taking i32, i64 or pointer types. Before calling an helper, all
79: globals are stored at their canonical location and it is assumed that
80: the function can modify them. In the future, function modifiers will
81: be allowed to tell that the helper does not read or write some globals.
82:
83: On some TCG targets (e.g. x86), several calling conventions are
84: supported.
85:
86: * Branches:
87:
88: Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
89: explicit address. Conditional branches can only jump to labels.
90:
91: 3.3) Code Optimizations
92:
93: When generating instructions, you can count on at least the following
94: optimizations:
95:
96: - Single instructions are simplified, e.g.
97:
98: and_i32 t0, t0, $0xffffffff
99:
100: is suppressed.
101:
102: - A liveness analysis is done at the basic block level. The
103: information is used to suppress moves from a dead variable to
104: another one. It is also used to remove instructions which compute
105: dead results. The later is especially useful for condition code
106: optimization in QEMU.
107:
108: In the following example:
109:
110: add_i32 t0, t1, t2
111: add_i32 t0, t0, $1
112: mov_i32 t0, $1
113:
114: only the last instruction is kept.
115:
116: 3.4) Instruction Reference
117:
118: ********* Function call
119:
120: * call <ret> <params> ptr
121:
122: call function 'ptr' (pointer type)
123:
124: <ret> optional 32 bit or 64 bit return value
125: <params> optional 32 bit or 64 bit parameters
126:
127: ********* Jumps/Labels
128:
129: * jmp t0
130:
131: Absolute jump to address t0 (pointer type).
132:
133: * set_label $label
134:
135: Define label 'label' at the current program point.
136:
137: * br $label
138:
139: Jump to label.
140:
141: * brcond_i32/i64 cond, t0, t1, label
142:
143: Conditional jump if t0 cond t1 is true. cond can be:
144: TCG_COND_EQ
145: TCG_COND_NE
146: TCG_COND_LT /* signed */
147: TCG_COND_GE /* signed */
148: TCG_COND_LE /* signed */
149: TCG_COND_GT /* signed */
150: TCG_COND_LTU /* unsigned */
151: TCG_COND_GEU /* unsigned */
152: TCG_COND_LEU /* unsigned */
153: TCG_COND_GTU /* unsigned */
154:
155: ********* Arithmetic
156:
157: * add_i32/i64 t0, t1, t2
158:
159: t0=t1+t2
160:
161: * sub_i32/i64 t0, t1, t2
162:
163: t0=t1-t2
164:
165: * neg_i32/i64 t0, t1
166:
167: t0=-t1 (two's complement)
168:
169: * mul_i32/i64 t0, t1, t2
170:
171: t0=t1*t2
172:
173: * div_i32/i64 t0, t1, t2
174:
175: t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
176:
177: * divu_i32/i64 t0, t1, t2
178:
179: t0=t1/t2 (unsigned). Undefined behavior if division by zero.
180:
181: * rem_i32/i64 t0, t1, t2
182:
183: t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
184:
185: * remu_i32/i64 t0, t1, t2
186:
187: t0=t1%t2 (unsigned). Undefined behavior if division by zero.
188:
189: ********* Logical
190:
191: * and_i32/i64 t0, t1, t2
192:
193: t0=t1&t2
194:
195: * or_i32/i64 t0, t1, t2
196:
197: t0=t1|t2
198:
199: * xor_i32/i64 t0, t1, t2
200:
201: t0=t1^t2
202:
203: * not_i32/i64 t0, t1
204:
205: t0=~t1
206:
207: * andc_i32/i64 t0, t1, t2
208:
209: t0=t1&~t2
210:
211: * eqv_i32/i64 t0, t1, t2
212:
213: t0=~(t1^t2)
214:
215: * nand_i32/i64 t0, t1, t2
216:
217: t0=~(t1&t2)
218:
219: * nor_i32/i64 t0, t1, t2
220:
221: t0=~(t1|t2)
222:
223: * orc_i32/i64 t0, t1, t2
224:
225: t0=t1|~t2
226:
227: ********* Shifts/Rotates
228:
229: * shl_i32/i64 t0, t1, t2
230:
231: t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
232:
233: * shr_i32/i64 t0, t1, t2
234:
235: t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
236:
237: * sar_i32/i64 t0, t1, t2
238:
239: t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
240:
241: * rotl_i32/i64 t0, t1, t2
242:
243: Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
244:
245: * rotr_i32/i64 t0, t1, t2
246:
247: Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
248:
249: ********* Misc
250:
251: * mov_i32/i64 t0, t1
252:
253: t0 = t1
254:
255: Move t1 to t0 (both operands must have the same type).
256:
257: * ext8s_i32/i64 t0, t1
258: ext8u_i32/i64 t0, t1
259: ext16s_i32/i64 t0, t1
260: ext16u_i32/i64 t0, t1
261: ext32s_i64 t0, t1
262: ext32u_i64 t0, t1
263:
264: 8, 16 or 32 bit sign/zero extension (both operands must have the same type)
265:
1.1.1.2 ! root 266: * bswap16_i32/i64 t0, t1
1.1 root 267:
1.1.1.2 ! root 268: 16 bit byte swap on a 32/64 bit value. The two/six high order bytes must be
! 269: set to zero.
1.1 root 270:
1.1.1.2 ! root 271: * bswap32_i32/i64 t0, t1
1.1 root 272:
1.1.1.2 ! root 273: 32 bit byte swap on a 32/64 bit value. With a 64 bit value, the four high
! 274: order bytes must be set to zero.
1.1 root 275:
1.1.1.2 ! root 276: * bswap64_i64 t0, t1
1.1 root 277:
278: 64 bit byte swap
279:
280: * discard_i32/i64 t0
281:
282: Indicate that the value of t0 won't be used later. It is useful to
283: force dead code elimination.
284:
285: ********* Type conversions
286:
287: * ext_i32_i64 t0, t1
288: Convert t1 (32 bit) to t0 (64 bit) and does sign extension
289:
290: * extu_i32_i64 t0, t1
291: Convert t1 (32 bit) to t0 (64 bit) and does zero extension
292:
293: * trunc_i64_i32 t0, t1
294: Truncate t1 (64 bit) to t0 (32 bit)
295:
296: * concat_i32_i64 t0, t1, t2
297: Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half
298: from t2 (32 bit).
299:
300: * concat32_i64 t0, t1, t2
301: Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half
302: from t2 (64 bit).
303:
304: ********* Load/Store
305:
306: * ld_i32/i64 t0, t1, offset
307: ld8s_i32/i64 t0, t1, offset
308: ld8u_i32/i64 t0, t1, offset
309: ld16s_i32/i64 t0, t1, offset
310: ld16u_i32/i64 t0, t1, offset
311: ld32s_i64 t0, t1, offset
312: ld32u_i64 t0, t1, offset
313:
314: t0 = read(t1 + offset)
315: Load 8, 16, 32 or 64 bits with or without sign extension from host memory.
316: offset must be a constant.
317:
318: * st_i32/i64 t0, t1, offset
319: st8_i32/i64 t0, t1, offset
320: st16_i32/i64 t0, t1, offset
321: st32_i64 t0, t1, offset
322:
323: write(t0, t1 + offset)
324: Write 8, 16, 32 or 64 bits to host memory.
325:
326: ********* QEMU specific operations
327:
328: * tb_exit t0
329:
330: Exit the current TB and return the value t0 (word type).
331:
332: * goto_tb index
333:
334: Exit the current TB and jump to the TB index 'index' (constant) if the
335: current TB was linked to this TB. Otherwise execute the next
336: instructions.
337:
338: * qemu_ld8u t0, t1, flags
339: qemu_ld8s t0, t1, flags
340: qemu_ld16u t0, t1, flags
341: qemu_ld16s t0, t1, flags
342: qemu_ld32u t0, t1, flags
343: qemu_ld32s t0, t1, flags
344: qemu_ld64 t0, t1, flags
345:
346: Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU
347: address type. 'flags' contains the QEMU memory index (selects user or
348: kernel access) for example.
349:
350: * qemu_st8 t0, t1, flags
351: qemu_st16 t0, t1, flags
352: qemu_st32 t0, t1, flags
353: qemu_st64 t0, t1, flags
354:
355: Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
356: address type. 'flags' contains the QEMU memory index (selects user or
357: kernel access) for example.
358:
359: Note 1: Some shortcuts are defined when the last operand is known to be
360: a constant (e.g. addi for add, movi for mov).
361:
362: Note 2: When using TCG, the opcodes must never be generated directly
363: as some of them may not be available as "real" opcodes. Always use the
364: function tcg_gen_xxx(args).
365:
366: 4) Backend
367:
368: tcg-target.h contains the target specific definitions. tcg-target.c
369: contains the target specific code.
370:
371: 4.1) Assumptions
372:
373: The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
374: 64 bit. It is expected that the pointer has the same size as the word.
375:
376: On a 32 bit target, all 64 bit operations are converted to 32 bits. A
377: few specific operations must be implemented to allow it (see add2_i32,
378: sub2_i32, brcond2_i32).
379:
380: Floating point operations are not supported in this version. A
381: previous incarnation of the code generator had full support of them,
382: but it is better to concentrate on integer operations first.
383:
384: On a 64 bit target, no assumption is made in TCG about the storage of
385: the 32 bit values in 64 bit registers.
386:
387: 4.2) Constraints
388:
389: GCC like constraints are used to define the constraints of every
390: instruction. Memory constraints are not supported in this
391: version. Aliases are specified in the input operands as for GCC.
392:
393: The same register may be used for both an input and an output, even when
394: they are not explicitly aliased. If an op expands to multiple target
395: instructions then care must be taken to avoid clobbering input values.
396: GCC style "early clobber" outputs are not currently supported.
397:
398: A target can define specific register or constant constraints. If an
399: operation uses a constant input constraint which does not allow all
400: constants, it must also accept registers in order to have a fallback.
401:
402: The movi_i32 and movi_i64 operations must accept any constants.
403:
404: The mov_i32 and mov_i64 operations must accept any registers of the
405: same type.
406:
407: The ld/st instructions must accept signed 32 bit constant offsets. It
408: can be implemented by reserving a specific register to compute the
409: address if the offset is too big.
410:
411: The ld/st instructions must accept any destination (ld) or source (st)
412: register.
413:
414: 4.3) Function call assumptions
415:
416: - The only supported types for parameters and return value are: 32 and
417: 64 bit integers and pointer.
418: - The stack grows downwards.
419: - The first N parameters are passed in registers.
420: - The next parameters are passed on the stack by storing them as words.
421: - Some registers are clobbered during the call.
422: - The function can return 0 or 1 value in registers. On a 32 bit
423: target, functions must be able to return 2 values in registers for
424: 64 bit return type.
425:
426: 5) Recommended coding rules for best performance
427:
428: - Use globals to represent the parts of the QEMU CPU state which are
429: often modified, e.g. the integer registers and the condition
430: codes. TCG will be able to use host registers to store them.
431:
432: - Avoid globals stored in fixed registers. They must be used only to
433: store the pointer to the CPU state and possibly to store a pointer
434: to a register window.
435:
436: - Use temporaries. Use local temporaries only when really needed,
437: e.g. when you need to use a value after a jump. Local temporaries
438: introduce a performance hit in the current TCG implementation: their
439: content is saved to memory at end of each basic block.
440:
441: - Free temporaries and local temporaries when they are no longer used
442: (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
443: should free it after it is used. Freeing temporaries does not yield
444: a better generated code, but it reduces the memory usage of TCG and
445: the speed of the translation.
446:
447: - Don't hesitate to use helpers for complicated or seldom used target
448: intructions. There is little performance advantage in using TCG to
449: implement target instructions taking more than about twenty TCG
450: instructions.
451:
452: - Use the 'discard' instruction if you know that TCG won't be able to
453: prove that a given global is "dead" at a given program point. The
454: x86 target uses it to improve the condition codes optimisation.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.