|
|
1.1 root 1: Tiny Code Generator - Fabrice Bellard.
2:
3: 1) Introduction
4:
5: TCG (Tiny Code Generator) began as a generic backend for a C
6: compiler. It was simplified to be used in QEMU. It also has its roots
7: in the QOP code generator written by Paul Brook.
8:
9: 2) Definitions
10:
11: The TCG "target" is the architecture for which we generate the
12: code. It is of course not the same as the "target" of QEMU which is
13: the emulated architecture. As TCG started as a generic C backend used
14: for cross compiling, it is assumed that the TCG target is different
15: from the host, although it is never the case for QEMU.
16:
17: A TCG "function" corresponds to a QEMU Translated Block (TB).
18:
19: A TCG "temporary" is a variable only live in a basic
20: block. Temporaries are allocated explicitly in each function.
21:
22: A TCG "local temporary" is a variable only live in a function. Local
23: temporaries are allocated explicitly in each function.
24:
25: A TCG "global" is a variable which is live in all the functions
26: (equivalent of a C global variable). They are defined before the
27: functions defined. A TCG global can be a memory location (e.g. a QEMU
28: CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
29: or a memory location which is stored in a register outside QEMU TBs
30: (not implemented yet).
31:
32: A TCG "basic block" corresponds to a list of instructions terminated
33: by a branch instruction.
34:
35: 3) Intermediate representation
36:
37: 3.1) Introduction
38:
39: TCG instructions operate on variables which are temporaries, local
40: temporaries or globals. TCG instructions and variables are strongly
41: typed. Two types are supported: 32 bit integers and 64 bit
42: integers. Pointers are defined as an alias to 32 bit or 64 bit
43: integers depending on the TCG target word size.
44:
45: Each instruction has a fixed number of output variable operands, input
46: variable operands and always constant operands.
47:
48: The notable exception is the call instruction which has a variable
49: number of outputs and inputs.
50:
51: In the textual form, output operands usually come first, followed by
52: input operands, followed by constant operands. The output type is
53: included in the instruction name. Constants are prefixed with a '$'.
54:
55: add_i32 t0, t1, t2 (t0 <- t1 + t2)
56:
57: 3.2) Assumptions
58:
59: * Basic blocks
60:
61: - Basic blocks end after branches (e.g. brcond_i32 instruction),
62: goto_tb and exit_tb instructions.
63: - Basic blocks start after the end of a previous basic block, or at a
64: set_label instruction.
65:
66: After the end of a basic block, the content of temporaries is
67: destroyed, but local temporaries and globals are preserved.
68:
69: * Floating point types are not supported yet
70:
71: * Pointers: depending on the TCG target, pointer size is 32 bit or 64
72: bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
73: TCG_TYPE_I64.
74:
75: * Helpers:
76:
77: Using the tcg_gen_helper_x_y it is possible to call any function
78: taking i32, i64 or pointer types. Before calling an helper, all
79: globals are stored at their canonical location and it is assumed that
80: the function can modify them. In the future, function modifiers will
81: be allowed to tell that the helper does not read or write some globals.
82:
83: On some TCG targets (e.g. x86), several calling conventions are
84: supported.
85:
86: * Branches:
87:
88: Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
89: explicit address. Conditional branches can only jump to labels.
90:
91: 3.3) Code Optimizations
92:
93: When generating instructions, you can count on at least the following
94: optimizations:
95:
96: - Single instructions are simplified, e.g.
97:
98: and_i32 t0, t0, $0xffffffff
99:
100: is suppressed.
101:
102: - A liveness analysis is done at the basic block level. The
103: information is used to suppress moves from a dead variable to
104: another one. It is also used to remove instructions which compute
105: dead results. The later is especially useful for condition code
106: optimization in QEMU.
107:
108: In the following example:
109:
110: add_i32 t0, t1, t2
111: add_i32 t0, t0, $1
112: mov_i32 t0, $1
113:
114: only the last instruction is kept.
115:
116: 3.4) Instruction Reference
117:
118: ********* Function call
119:
120: * call <ret> <params> ptr
121:
122: call function 'ptr' (pointer type)
123:
124: <ret> optional 32 bit or 64 bit return value
125: <params> optional 32 bit or 64 bit parameters
126:
127: ********* Jumps/Labels
128:
129: * jmp t0
130:
131: Absolute jump to address t0 (pointer type).
132:
133: * set_label $label
134:
135: Define label 'label' at the current program point.
136:
137: * br $label
138:
139: Jump to label.
140:
141: * brcond_i32/i64 cond, t0, t1, label
142:
143: Conditional jump if t0 cond t1 is true. cond can be:
144: TCG_COND_EQ
145: TCG_COND_NE
146: TCG_COND_LT /* signed */
147: TCG_COND_GE /* signed */
148: TCG_COND_LE /* signed */
149: TCG_COND_GT /* signed */
150: TCG_COND_LTU /* unsigned */
151: TCG_COND_GEU /* unsigned */
152: TCG_COND_LEU /* unsigned */
153: TCG_COND_GTU /* unsigned */
154:
155: ********* Arithmetic
156:
157: * add_i32/i64 t0, t1, t2
158:
159: t0=t1+t2
160:
161: * sub_i32/i64 t0, t1, t2
162:
163: t0=t1-t2
164:
165: * neg_i32/i64 t0, t1
166:
167: t0=-t1 (two's complement)
168:
169: * mul_i32/i64 t0, t1, t2
170:
171: t0=t1*t2
172:
173: * div_i32/i64 t0, t1, t2
174:
175: t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
176:
177: * divu_i32/i64 t0, t1, t2
178:
179: t0=t1/t2 (unsigned). Undefined behavior if division by zero.
180:
181: * rem_i32/i64 t0, t1, t2
182:
183: t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
184:
185: * remu_i32/i64 t0, t1, t2
186:
187: t0=t1%t2 (unsigned). Undefined behavior if division by zero.
188:
189: ********* Logical
190:
191: * and_i32/i64 t0, t1, t2
192:
193: t0=t1&t2
194:
195: * or_i32/i64 t0, t1, t2
196:
197: t0=t1|t2
198:
199: * xor_i32/i64 t0, t1, t2
200:
201: t0=t1^t2
202:
203: * not_i32/i64 t0, t1
204:
205: t0=~t1
206:
207: * andc_i32/i64 t0, t1, t2
208:
209: t0=t1&~t2
210:
211: * eqv_i32/i64 t0, t1, t2
212:
213: t0=~(t1^t2)
214:
215: * nand_i32/i64 t0, t1, t2
216:
217: t0=~(t1&t2)
218:
219: * nor_i32/i64 t0, t1, t2
220:
221: t0=~(t1|t2)
222:
223: * orc_i32/i64 t0, t1, t2
224:
225: t0=t1|~t2
226:
227: ********* Shifts/Rotates
228:
229: * shl_i32/i64 t0, t1, t2
230:
231: t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
232:
233: * shr_i32/i64 t0, t1, t2
234:
235: t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
236:
237: * sar_i32/i64 t0, t1, t2
238:
239: t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
240:
241: * rotl_i32/i64 t0, t1, t2
242:
243: Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
244:
245: * rotr_i32/i64 t0, t1, t2
246:
247: Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
248:
249: ********* Misc
250:
251: * mov_i32/i64 t0, t1
252:
253: t0 = t1
254:
255: Move t1 to t0 (both operands must have the same type).
256:
257: * ext8s_i32/i64 t0, t1
258: ext8u_i32/i64 t0, t1
259: ext16s_i32/i64 t0, t1
260: ext16u_i32/i64 t0, t1
261: ext32s_i64 t0, t1
262: ext32u_i64 t0, t1
263:
264: 8, 16 or 32 bit sign/zero extension (both operands must have the same type)
265:
266: * bswap16_i32 t0, t1
267:
268: 16 bit byte swap on a 32 bit value. The two high order bytes must be set
269: to zero.
270:
271: * bswap_i32 t0, t1
272:
273: 32 bit byte swap
274:
275: * bswap_i64 t0, t1
276:
277: 64 bit byte swap
278:
279: * discard_i32/i64 t0
280:
281: Indicate that the value of t0 won't be used later. It is useful to
282: force dead code elimination.
283:
284: ********* Type conversions
285:
286: * ext_i32_i64 t0, t1
287: Convert t1 (32 bit) to t0 (64 bit) and does sign extension
288:
289: * extu_i32_i64 t0, t1
290: Convert t1 (32 bit) to t0 (64 bit) and does zero extension
291:
292: * trunc_i64_i32 t0, t1
293: Truncate t1 (64 bit) to t0 (32 bit)
294:
295: * concat_i32_i64 t0, t1, t2
296: Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half
297: from t2 (32 bit).
298:
299: * concat32_i64 t0, t1, t2
300: Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half
301: from t2 (64 bit).
302:
303: ********* Load/Store
304:
305: * ld_i32/i64 t0, t1, offset
306: ld8s_i32/i64 t0, t1, offset
307: ld8u_i32/i64 t0, t1, offset
308: ld16s_i32/i64 t0, t1, offset
309: ld16u_i32/i64 t0, t1, offset
310: ld32s_i64 t0, t1, offset
311: ld32u_i64 t0, t1, offset
312:
313: t0 = read(t1 + offset)
314: Load 8, 16, 32 or 64 bits with or without sign extension from host memory.
315: offset must be a constant.
316:
317: * st_i32/i64 t0, t1, offset
318: st8_i32/i64 t0, t1, offset
319: st16_i32/i64 t0, t1, offset
320: st32_i64 t0, t1, offset
321:
322: write(t0, t1 + offset)
323: Write 8, 16, 32 or 64 bits to host memory.
324:
325: ********* QEMU specific operations
326:
327: * tb_exit t0
328:
329: Exit the current TB and return the value t0 (word type).
330:
331: * goto_tb index
332:
333: Exit the current TB and jump to the TB index 'index' (constant) if the
334: current TB was linked to this TB. Otherwise execute the next
335: instructions.
336:
337: * qemu_ld8u t0, t1, flags
338: qemu_ld8s t0, t1, flags
339: qemu_ld16u t0, t1, flags
340: qemu_ld16s t0, t1, flags
341: qemu_ld32u t0, t1, flags
342: qemu_ld32s t0, t1, flags
343: qemu_ld64 t0, t1, flags
344:
345: Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU
346: address type. 'flags' contains the QEMU memory index (selects user or
347: kernel access) for example.
348:
349: * qemu_st8 t0, t1, flags
350: qemu_st16 t0, t1, flags
351: qemu_st32 t0, t1, flags
352: qemu_st64 t0, t1, flags
353:
354: Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
355: address type. 'flags' contains the QEMU memory index (selects user or
356: kernel access) for example.
357:
358: Note 1: Some shortcuts are defined when the last operand is known to be
359: a constant (e.g. addi for add, movi for mov).
360:
361: Note 2: When using TCG, the opcodes must never be generated directly
362: as some of them may not be available as "real" opcodes. Always use the
363: function tcg_gen_xxx(args).
364:
365: 4) Backend
366:
367: tcg-target.h contains the target specific definitions. tcg-target.c
368: contains the target specific code.
369:
370: 4.1) Assumptions
371:
372: The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
373: 64 bit. It is expected that the pointer has the same size as the word.
374:
375: On a 32 bit target, all 64 bit operations are converted to 32 bits. A
376: few specific operations must be implemented to allow it (see add2_i32,
377: sub2_i32, brcond2_i32).
378:
379: Floating point operations are not supported in this version. A
380: previous incarnation of the code generator had full support of them,
381: but it is better to concentrate on integer operations first.
382:
383: On a 64 bit target, no assumption is made in TCG about the storage of
384: the 32 bit values in 64 bit registers.
385:
386: 4.2) Constraints
387:
388: GCC like constraints are used to define the constraints of every
389: instruction. Memory constraints are not supported in this
390: version. Aliases are specified in the input operands as for GCC.
391:
392: The same register may be used for both an input and an output, even when
393: they are not explicitly aliased. If an op expands to multiple target
394: instructions then care must be taken to avoid clobbering input values.
395: GCC style "early clobber" outputs are not currently supported.
396:
397: A target can define specific register or constant constraints. If an
398: operation uses a constant input constraint which does not allow all
399: constants, it must also accept registers in order to have a fallback.
400:
401: The movi_i32 and movi_i64 operations must accept any constants.
402:
403: The mov_i32 and mov_i64 operations must accept any registers of the
404: same type.
405:
406: The ld/st instructions must accept signed 32 bit constant offsets. It
407: can be implemented by reserving a specific register to compute the
408: address if the offset is too big.
409:
410: The ld/st instructions must accept any destination (ld) or source (st)
411: register.
412:
413: 4.3) Function call assumptions
414:
415: - The only supported types for parameters and return value are: 32 and
416: 64 bit integers and pointer.
417: - The stack grows downwards.
418: - The first N parameters are passed in registers.
419: - The next parameters are passed on the stack by storing them as words.
420: - Some registers are clobbered during the call.
421: - The function can return 0 or 1 value in registers. On a 32 bit
422: target, functions must be able to return 2 values in registers for
423: 64 bit return type.
424:
425: 5) Recommended coding rules for best performance
426:
427: - Use globals to represent the parts of the QEMU CPU state which are
428: often modified, e.g. the integer registers and the condition
429: codes. TCG will be able to use host registers to store them.
430:
431: - Avoid globals stored in fixed registers. They must be used only to
432: store the pointer to the CPU state and possibly to store a pointer
433: to a register window.
434:
435: - Use temporaries. Use local temporaries only when really needed,
436: e.g. when you need to use a value after a jump. Local temporaries
437: introduce a performance hit in the current TCG implementation: their
438: content is saved to memory at end of each basic block.
439:
440: - Free temporaries and local temporaries when they are no longer used
441: (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
442: should free it after it is used. Freeing temporaries does not yield
443: a better generated code, but it reduces the memory usage of TCG and
444: the speed of the translation.
445:
446: - Don't hesitate to use helpers for complicated or seldom used target
447: intructions. There is little performance advantage in using TCG to
448: implement target instructions taking more than about twenty TCG
449: instructions.
450:
451: - Use the 'discard' instruction if you know that TCG won't be able to
452: prove that a given global is "dead" at a given program point. The
453: x86 target uses it to improve the condition codes optimisation.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.