|
|
1.1 root 1: Info file gcc.info, produced by Makeinfo, -*- Text -*- from input
2: file gcc.texinfo.
3:
4: This file documents the use and the internals of the GNU compiler.
5:
6: Copyright (C) 1988 Free Software Foundation, Inc.
7:
8: Permission is granted to make and distribute verbatim copies of this
9: manual provided the copyright notice and this permission notice are
10: preserved on all copies.
11:
12: Permission is granted to copy and distribute modified versions of
13: this manual under the conditions for verbatim copying, provided also
14: that the section entitled ``GNU CC General Public License'' is
15: included exactly as in the original, and provided that the entire
16: resulting derived work is distributed under the terms of a permission
17: notice identical to this one.
18:
19: Permission is granted to copy and distribute translations of this
20: manual into another language, under the above conditions for modified
21: versions, except that the section entitled ``GNU CC General Public
22: License'' and this permission notice may be included in translations
23: approved by the Free Software Foundation instead of in the original
24: English.
25:
26:
27:
28: File: gcc.info, Node: Interface, Next: Passes, Prev: Portability, Up: Top
29:
30: Interfacing to GNU CC Output
31: ****************************
32:
33: GNU CC is normally configured to use the same function calling
34: convention normally in use on the target system. This is done with
35: the machine-description macros described (*note Machine Macros::.).
36:
37: However, returning of structure and union values is done differently
38: on some target machines. As a result, functions compiled with PCC
39: returning such types cannot be called from code compiled with GNU CC,
40: and vice versa. This does not cause trouble often because few Unix
41: library routines return structures or unions.
42:
43: GNU CC code returns structures and unions that are 1, 2, 4 or 8 bytes
44: long in the same registers used for `int' or `double' return values.
45: (GNU CC typically allocates variables of such types in registers
46: also.) Structures and unions of other sizes are returned by storing
47: them into an address passed by the caller (usually in a register).
48: The machine-description macros `STRUCT_VALUE' and
49: `STRUCT_INCOMING_VALUE' tell GNU CC where to pass this address.
50:
51: By contrast, PCC on most target machines returns structures and
52: unions of any size by copying the data into an area of static
53: storage, and then returning the address of that storage as if it were
54: a pointer value. The caller must copy the data from that memory area
55: to the place where the value is wanted. This is slower than the
56: method used by GNU CC, and fails to be reentrant.
57:
58: On some target machines, such as RISC machines and the 80386, the
59: standard system convention is to pass to the subroutine the address
60: of where to return the value. On these machines, GNU CC has been
61: configured to be compatible with the standard compiler, when this
62: method is used. It may not be compatible for structures of 1, 2, 4
63: or 8 bytes.
64:
65: GNU CC uses the system's standard convention for passing arguments.
66: On some machines, the first few arguments are passed in registers; in
67: others, all are passed on the stack. It would be possible to use
68: registers for argument passing on any machine, and this would
69: probably result in a significant speedup. But the result would be
70: complete incompatibility with code that follows the standard
71: convention. So this change is practical only if you are switching to
72: GNU CC as the sole C compiler for the system. We may implement
73: register argument passing on certain machines once we have a complete
74: GNU system so that we can compile the libraries with GNU CC.
75:
76: If you use `longjmp', beware of automatic variables. ANSI C says
77: that automatic variables that are not declared `volatile' have
78: undefined values after a `longjmp'. And this is all GNU CC promises
79: to do, because it is very difficult to restore register variables
80: correctly, and one of GNU CC's features is that it can put variables
81: in registers without your asking it to.
82:
83: If you want a variable to be unaltered by `longjmp', and you don't
84: want to write `volatile' because old C compilers don't accept it,
85: just take the address of the variable. If a variable's address is
86: ever taken, even if just to compute it and ignore it, then the
87: variable cannot go in a register:
88:
89: {
90: int careful;
91: &careful;
92: ...
93: }
94:
95: Code compiled with GNU CC may call certain library routines. Most of
96: them handle arithmetic for which there are no instructions. This
97: includes multiply and divide on some machines, and floating point
98: operations on any machine for which floating point support is
99: disabled with `-msoft-float'. Some standard parts of the C library,
100: such as `bcopy' or `memcpy', are also called automatically. The
101: usual function call interface is used for calling the library routines.
102:
103: These library routines should be defined in the library `gnulib',
104: which GNU CC automatically searches whenever it links a program. On
105: machines that have multiply and divide instructions, if hardware
106: floating point is in use, normally `gnulib' is not needed, but it is
107: searched just in case.
108:
109: Each arithmetic function is defined in `gnulib.c' to use the
110: corresponding C arithmetic operator. As long as the file is compiled
111: with another C compiler, which supports all the C arithmetic
112: operators, this file will work portably. However, `gnulib.c' does
113: not work if compiled with GNU CC, because each arithmetic function
114: would compile into a call to itself!
115:
116:
117:
118: File: gcc.info, Node: Passes, Next: RTL, Prev: Interface, Up: Top
119:
120: Passes and Files of the Compiler
121: ********************************
122:
123: The overall control structure of the compiler is in `toplev.c'. This
124: file is responsible for initialization, decoding arguments, opening
125: and closing files, and sequencing the passes.
126:
127: The parsing pass is invoked only once, to parse the entire input.
128: The RTL intermediate code for a function is generated as the function
129: is parsed, a statement at a time. Each statement is read in as a
130: syntax tree and then converted to RTL; then the storage for the tree
131: for the statement is reclaimed. Storage for types (and the
132: expressions for their sizes), declarations, and a representation of
133: the binding contours and how they nest, remains until the function is
134: finished being compiled; these are all needed to output the debugging
135: information.
136:
137: Each time the parsing pass reads a complete function definition or
138: top-level declaration, it calls the function `rest_of_compilation' or
139: `rest_of_decl_compilation' in `toplev.c', which are responsible for
140: all further processing necessary, ending with output of the assembler
141: language. All other compiler passes run, in sequence, within
142: `rest_of_compilation'. When that function returns from compiling a
143: function definition, the storage used for that function definition's
144: compilation is entirely freed, unless it is an inline function (*note
145: Inline::.).
146:
147: Here is a list of all the passes of the compiler and their source
148: files. Also included is a description of where debugging dumps can
149: be requested with `-d' options.
150:
151: * Parsing. This pass reads the entire text of a function
152: definition, constructing partial syntax trees. This and RTL
153: generation are no longer truly separate passes (formerly they
154: were), but it is easier to think of them as separate.
155:
156: The tree representation does not entirely follow C syntax,
157: because it is intended to support other languages as well.
158:
159: C data type analysis is also done in this pass, and every tree
160: node that represents an expression has a data type attached.
161: Variables are represented as declaration nodes.
162:
163: Constant folding and associative-law simplifications are also
164: done during this pass.
165:
166: The source files for parsing are `c-parse.y', `c-decl.c',
167: `c-typeck.c', `c-convert.c', `stor-layout.c', `fold-const.c',
168: and `tree.c'. The last three files are intended to be
169: language-independent. There are also header files `c-parse.h',
170: `c-tree.h', `tree.h' and `tree.def'. The last two define the
171: format of the tree representation.
172:
173: * RTL generation. This is the conversion of syntax tree into RTL
174: code. It is actually done statement-by-statement during
175: parsing, but for most purposes it can be thought of as a
176: separate pass.
177:
178: This is where the bulk of target-parameter-dependent code is
179: found, since often it is necessary for strategies to apply only
180: when certain standard kinds of instructions are available. The
181: purpose of named instruction patterns is to provide this
182: information to the RTL generation pass.
183:
184: Optimization is done in this pass for `if'-conditions that are
185: comparisons, boolean operations or conditional expressions.
186: Tail recursion is detected at this time also. Decisions are
187: made about how best to arrange loops and how to output `switch'
188: statements.
189:
190: The source files for RTL generation are `stmt.c', `expr.c',
191: `explow.c', `expmed.c', `optabs.c' and `emit-rtl.c'. Also, the
192: file `insn-emit.c', generated from the machine description by
193: the program `genemit', is used in this pass. The header files
194: `expr.h' is used for communication within this pass.
195:
196: The header files `insn-flags.h' and `insn-codes.h', generated
197: from the machine description by the programs `genflags' and
198: `gencodes', tell this pass which standard names are available
199: for use and which patterns correspond to them.
200:
201: Aside from debugging information output, none of the following
202: passes refers to the tree structure representation of the
203: function (only part of which is saved).
204:
205: The decision of whether the function can and should be expanded
206: inline in its subsequent callers is made at the end of rtl
207: generation. The function must meet certain criteria, currently
208: related to the size of the function and the types and number of
209: parameters it has. Note that this function may contain loops,
210: recursive calls to itself (tail-recursive functions can be
211: inlined!), gotos, in short, all constructs supported by GNU CC.
212:
213: The option `-dr' causes a debugging dump of the RTL code after
214: this pass. This dump file's name is made by appending `.rtl' to
215: the input file name.
216:
217: * Jump optimization. This pass simplifies jumps to the following
218: instruction, jumps across jumps, and jumps to jumps. It deletes
219: unreferenced labels and unreachable code, except that
220: unreachable code that contains a loop is not recognized as
221: unreachable in this pass. (Such loops are deleted later in the
222: basic block analysis.)
223:
224: Jump optimization is performed two or three times. The first
225: time is immediately following RTL generation. The second time
226: is after CSE, but only if CSE says repeated jump optimization is
227: needed. The last time is right before the final pass. That
228: time, cross-jumping and deletion of no-op move instructions are
229: done together with the optimizations described above.
230:
231: The source file of this pass is `jump.c'.
232:
233: The option `-dj' causes a debugging dump of the RTL code after
234: this pass is run for the first time. This dump file's name is
235: made by appending `.jump' to the input file name.
236:
237: * Register scan. This pass finds the first and last use of each
238: register, as a guide for common subexpression elimination. Its
239: source is in `regclass.c'.
240:
241: * Common subexpression elimination. This pass also does constant
242: propagation. Its source file is `cse.c'. If constant
243: propagation causes conditional jumps to become unconditional or
244: to become no-ops, jump optimization is run again when CSE is
245: finished.
246:
247: The option `-ds' causes a debugging dump of the RTL code after
248: this pass. This dump file's name is made by appending `.cse' to
249: the input file name.
250:
251: * Loop optimization. This pass moves constant expressions out of
252: loops. Its source file is `loop.c'.
253:
254: The option `-dL' causes a debugging dump of the RTL code after
255: this pass. This dump file's name is made by appending `.loop'
256: to the input file name.
257:
258: * Stupid register allocation is performed at this point in a
259: nonoptimizing compilation. It does a little data flow analysis
260: as well. When stupid register allocation is in use, the next
261: pass executed is the reloading pass; the others in between are
262: skipped. The source file is `stupid.c'.
263:
264: * Data flow analysis (`flow.c'). This pass divides the program
265: into basic blocks (and in the process deletes unreachable
266: loops); then it computes which pseudo-registers are live at each
267: point in the program, and makes the first instruction that uses
268: a value point at the instruction that computed the value.
269:
270: This pass also deletes computations whose results are never
271: used, and combines memory references with add or subtract
272: instructions to make autoincrement or autodecrement addressing.
273:
274: The option `-df' causes a debugging dump of the RTL code after
275: this pass. This dump file's name is made by appending `.flow'
276: to the input file name. If stupid register allocation is in
277: use, this dump file reflects the full results of such allocation.
278:
279: * Instruction combination (`combine.c'). This pass attempts to
280: combine groups of two or three instructions that are related by
281: data flow into single instructions. It combines the RTL
282: expressions for the instructions by substitution, simplifies the
283: result using algebra, and then attempts to match the result
284: against the machine description.
285:
286: The option `-dc' causes a debugging dump of the RTL code after
287: this pass. This dump file's name is made by appending
288: `.combine' to the input file name.
289:
290: * Register class preferencing. The RTL code is scanned to find
291: out which register class is best for each pseudo register. The
292: source file is `regclass.c'.
293:
294: * Local register allocation (`local-alloc.c'). This pass
295: allocates hard registers to pseudo registers that are used only
296: within one basic block. Because the basic block is linear, it
297: can use fast and powerful techniques to do a very good job.
298:
299: The option `-dl' causes a debugging dump of the RTL code after
300: this pass. This dump file's name is made by appending `.lreg'
301: to the input file name.
302:
303: * Global register allocation (`global-alloc.c'). This pass
304: allocates hard registers for the remaining pseudo registers
305: (those whose life spans are not contained in one basic block).
306:
307: * Reloading. This pass renumbers pseudo registers with the
308: hardware registers numbers they were allocated. Pseudo
309: registers that did not get hard registers are replaced with
310: stack slots. Then it finds instructions that are invalid
311: because a value has failed to end up in a register, or has ended
312: up in a register of the wrong kind. It fixes up these
313: instructions by reloading the problematical values temporarily
314: into registers. Additional instructions are generated to do the
315: copying.
316:
317: Source files are `reload.c' and `reload1.c', plus the header
318: `reload.h' used for communication between them.
319:
320: The option `-dg' causes a debugging dump of the RTL code after
321: this pass. This dump file's name is made by appending `.greg'
322: to the input file name.
323:
324: * Jump optimization is repeated, this time including cross-jumping
325: and deletion of no-op move instructions. Machine-specific
326: peephole optimizations are performed at the same time.
327:
328: The option `-dJ' causes a debugging dump of the RTL code after
329: this pass. This dump file's name is made by appending `.jump2'
330: to the input file name.
331:
332: * Final. This pass outputs the assembler code for the function.
333: It is also responsible for identifying spurious test and compare
334: instructions. The function entry and exit sequences are
335: generated directly as assembler code in this pass; they never
336: exist as RTL.
337:
338: The source files are `final.c' plus `insn-output.c'; the latter
339: is generated automatically from the machine description by the
340: tool `genoutput'. The header file `conditions.h' is used for
341: communication between these files.
342:
343: * Debugging information output. This is run after final because
344: it must output the stack slot offsets for pseudo registers that
345: did not get hard registers. Source files are `dbxout.c' for DBX
346: symbol table format and `symout.c' for GDB's own symbol table
347: format.
348:
349: Some additional files are used by all or many passes:
350:
351: * Every pass uses `machmode.def', which defines the machine modes.
352:
353: * All the passes that work with RTL use the header files `rtl.h'
354: and `rtl.def', and subroutines in file `rtl.c'. The tools
355: `gen*' also use these files to read and work with the machine
356: description RTL.
357:
358: * Several passes refer to the header file `insn-config.h' which
359: contains a few parameters (C macro definitions) generated
360: automatically from the machine description RTL by the tool
361: `genconfig'.
362:
363: * Several passes use the instruction recognizer, which consists of
364: `recog.c' and `recog.h', plus the files `insn-recog.c' and
365: `insn-extract.c' that are generated automatically from the
366: machine description by the tools `genrecog' and `genextract'.
367:
368: * Several passes use the header files `regs.h' which defines the
369: information recorded about pseudo register usage, and
370: `basic-block.h' which defines the information recorded about
371: basic blocks.
372:
373: * `hard-reg-set.h' defines the type `HARD_REG_SET', a bit-vector
374: with a bit for each hard register, and some macros to manipulate
375: it. This type is just `int' if the machine has few enough hard
376: registers; otherwise it is an array of `int' and some of the
377: macros expand into loops.
378:
379:
380:
381: File: gcc.info, Node: RTL, Next: Machine Desc, Prev: Passes, Up: Top
382:
383: RTL Representation
384: ******************
385:
386: Most of the work of the compiler is done on an intermediate
387: representation called register transfer language. In this language,
388: the instructions to be output are described, pretty much one by one,
389: in an algebraic form that describes what the instruction does.
390:
391: RTL is inspired by Lisp lists. It has both an internal form, made up
392: of structures that point at other structures, and a textual form that
393: is used in the machine description and in printed debugging dumps.
394: The textual form uses nested parentheses to indicate the pointers in
395: the internal form.
396:
397: * Menu:
398:
399: * RTL Objects:: Expressions vs vectors vs strings vs integers.
400: * Accessors:: Macros to access expression operands or vector elts.
401: * Flags:: Other flags in an RTL expression.
402: * Machine Modes:: Describing the size and format of a datum.
403: * Constants:: Expressions with constant values.
404: * Regs and Memory:: Expressions representing register contents or memory.
405: * Arithmetic:: Expressions representing arithmetic on other expressions.
406: * Comparisons:: Expressions representing comparison of expressions.
407: * Bit Fields:: Expressions representing bit-fields in memory or reg.
408: * Conversions:: Extending, truncating, floating or fixing.
409: * RTL Declarations:: Declaring volatility, constancy, etc.
410: * Side Effects:: Expressions for storing in registers, etc.
411: * Incdec:: Embedded side-effects for autoincrement addressing.
412: * Assembler:: Representing `asm' with operands.
413: * Insns:: Expression types for entire insns.
414: * Calls:: RTL representation of function call insns.
415: * Sharing:: Some expressions are unique; others *must* be copied.
416:
417:
418:
419: File: gcc.info, Node: RTL Objects, Next: Accessors, Prev: RTL, Up: RTL
420:
421: RTL Object Types
422: ================
423:
424: RTL uses four kinds of objects: expressions, integers, strings and
425: vectors. Expressions are the most important ones. An RTL expression
426: (``RTX'', for short) is a C structure, but it is usually referred to
427: with a pointer; a type that is given the typedef name `rtx'.
428:
429: An integer is simply an `int', and a string is a `char *'. Within
430: RTL code, strings appear only inside `symbol_ref' expressions, but
431: they appear in other contexts in the RTL expressions that make up
432: machine descriptions. Their written form uses decimal digits.
433:
434: A string is a sequence of characters. In core it is represented as a
435: `char *' in usual C fashion, and it is written in C syntax as well.
436: However, strings in RTL may never be null. If you write an empty
437: string in a machine description, it is represented in core as a null
438: pointer rather than as a pointer to a null character. In certain
439: contexts, these null pointers instead of strings are valid.
440:
441: A vector contains an arbitrary, specified number of pointers to
442: expressions. The number of elements in the vector is explicitly
443: present in the vector. The written form of a vector consists of
444: square brackets (`[...]') surrounding the elements, in sequence and
445: with whitespace separating them. Vectors of length zero are not
446: created; null pointers are used instead.
447:
448: Expressions are classified by "expression codes" (also called RTX
449: codes). The expression code is a name defined in `rtl.def', which is
450: also (in upper case) a C enumeration constant. The possible
451: expression codes and their meanings are machine-independent. The
452: code of an RTX can be extracted with the macro `GET_CODE (X)' and
453: altered with `PUT_CODE (X, NEWCODE)'.
454:
455: The expression code determines how many operands the expression
456: contains, and what kinds of objects they are. In RTL, unlike Lisp,
457: you cannot tell by looking at an operand what kind of object it is.
458: Instead, you must know from its context--from the expression code of
459: the containing expression. For example, in an expression of code
460: `subreg', the first operand is to be regarded as an expression and
461: the second operand as an integer. In an expression of code `plus',
462: there are two operands, both of which are to be regarded as
463: expressions. In a `symbol_ref' expression, there is one operand,
464: which is to be regarded as a string.
465:
466: Expressions are written as parentheses containing the name of the
467: expression type, its flags and machine mode if any, and then the
468: operands of the expression (separated by spaces).
469:
470: Expression code names in the `md' file are written in lower case, but
471: when they appear in C code they are written in upper case. In this
472: manual, they are shown as follows: `const_int'.
473:
474: In a few contexts a null pointer is valid where an expression is
475: normally wanted. The written form of this is `(nil)'.
476:
477:
478:
479: File: gcc.info, Node: Accessors, Next: Flags, Prev: RTL Objects, Up: RTL
480:
481: Access to Operands
482: ==================
483:
484: For each expression type `rtl.def' specifies the number of contained
485: objects and their kinds, with four possibilities: `e' for expression
486: (actually a pointer to an expression), `i' for integer, `s' for
487: string, and `E' for vector of expressions. The sequence of letters
488: for an expression code is called its "format". Thus, the format of
489: `subreg' is `ei'.
490:
491: Two other format characters are used occasionally: `u' and `0'. `u'
492: is equivalent to `e' except that it is printed differently in
493: debugging dumps, and `0' means a slot whose contents do not fit any
494: normal category. `0' slots are not printed at all in dumps, and are
495: often used in special ways by small parts of the compiler.
496:
497: There are macros to get the number of operands and the format of an
498: expression code:
499:
500: `GET_RTX_LENGTH (CODE)'
501: Number of operands of an RTX of code CODE.
502:
503: `GET_RTX_FORMAT (CODE)'
504: The format of an RTX of code CODE, as a C string.
505:
506: Operands of expressions are accessed using the macros `XEXP', `XINT'
507: and `XSTR'. Each of these macros takes two arguments: an
508: expression-pointer (RTX) and an operand number (counting from zero).
509: Thus,
510:
511: XEXP (X, 2)
512:
513: accesses operand 2 of expression X, as an expression.
514:
515: XINT (X, 2)
516:
517: accesses the same operand as an integer. `XSTR', used in the same
518: fashion, would access it as a string.
519:
520: Any operand can be accessed as an integer, as an expression or as a
521: string. You must choose the correct method of access for the kind of
522: value actually stored in the operand. You would do this based on the
523: expression code of the containing expression. That is also how you
524: would know how many operands there are.
525:
526: For example, if X is a `subreg' expression, you know that it has two
527: operands which can be correctly accessed as `XEXP (X, 0)' and `XINT
528: (X, 1)'. If you did `XINT (X, 0)', you would get the address of the
529: expression operand but cast as an integer; that might occasionally be
530: useful, but it would be cleaner to write `(int) XEXP (X, 0)'. `XEXP
531: (X, 1)' would also compile without error, and would return the
532: second, integer operand cast as an expression pointer, which would
533: probably result in a crash when accessed. Nothing stops you from
534: writing `XEXP (X, 28)' either, but this will access memory past the
535: end of the expression with unpredictable results.
536:
537: Access to operands which are vectors is more complicated. You can
538: use the macro `XVEC' to get the vector-pointer itself, or the macros
539: `XVECEXP' and `XVECLEN' to access the elements and length of a vector.
540:
541: `XVEC (EXP, IDX)'
542: Access the vector-pointer which is operand number IDX in EXP.
543:
544: `XVECLEN (EXP, IDX)'
545: Access the length (number of elements) in the vector which is in
546: operand number IDX in EXP. This value is an `int'.
547:
548: `XVECEXP (EXP, IDX, ELTNUM)'
549: Access element number ELTNUM in the vector which is in operand
550: number IDX in EXP. This value is an RTX.
551:
552: It is up to you to make sure that ELTNUM is not negative and is
553: less than `XVECLEN (EXP, IDX)'.
554:
555: All the macros defined in this section expand into lvalues and
556: therefore can be used to assign the operands, lengths and vector
557: elements as well as to access them.
558:
559:
560:
561: File: gcc.info, Node: Flags, Next: Machine Modes, Prev: Accessors, Up: RTL
562:
563: Flags in an RTL Expression
564: ==========================
565:
566: RTL expressions contain several flags (one-bit bit-fields) that are
567: used in certain types of expression. Most often they are accessed
568: with the following macros:
569:
570: `MEM_VOLATILE_P (X)'
571: In `mem' expressions, nonzero for volatile memory references.
572: Stored in the `volatil' field and printed as `/v'.
573:
574: `MEM_IN_STRUCT_P (X)'
575: In `mem' expressions, nonzero for reference to an entire
576: structure, union or array, or to a component of one. Zero for
577: references to a scalar variable or through a pointer to a scalar.
578: Stored in the `in_struct' field and printed as `/s'.
579:
580: `REG_USER_VAR_P (X)'
581: In a `reg', nonzero if it corresponds to a variable present in
582: the user's source code. Zero for temporaries generated
583: internally by the compiler. Stored in the `volatil' field and
584: printed as `/v'.
585:
586: `REG_FUNCTION_VALUE_P (X)'
587: Nonzero in a `reg' if it is the place in which this function's
588: value is going to be returned. (This happens only in a hard
589: register.) Stored in the `integrated' field and printed as `/i'.
590:
591: The same hard register may be used also for collecting the
592: values of functions called by this one, but
593: `REG_FUNCTION_VALUE_P' is zero in this kind of use.
594:
595: `RTX_UNCHANGING_P (X)'
596: Nonzero in a `reg' or `mem' if the value is not changed
597: explicitly by the current function. (If it is a memory
598: reference then it may be changed by other functions or by
599: aliasing.) Stored in the `unchanging' field and printed as `/u'.
600:
601: `RTX_INTEGRATED_P (INSN)'
602: Nonzero in an insn if it resulted from an in-line function call.
603: Stored in the `integrated' field and printed as `/i'. This may
604: be deleted; nothing currently depends on it.
605:
606: `INSN_DELETED_P (INSN)'
607: In an insn, nonzero if the insn has been deleted. Stored in the
608: `volatil' field and printed as `/v'.
609:
610: `CONSTANT_POOL_ADDRESS_P (X)'
611: Nonzero in a `symbol_ref' if it refers to part of the current
612: function's ``constants pool''. These are addresses close to the
613: beginning of the function, and GNU CC assumes they can be
614: addressed directly (perhaps with the help of base registers).
615: Stored in the `unchanging' field and printed as `/u'.
616:
617: These are the fields which the above macros refer to:
618:
619: `used'
620: This flag is used only momentarily, at the end of RTL generation
621: for a function, to count the number of times an expression
622: appears in insns. Expressions that appear more than once are
623: copied, according to the rules for shared structure (*note
624: Sharing::.).
625:
626: `volatil'
627: This flag is used in `mem' and `reg' expressions and in insns.
628: In RTL dump files, it is printed as `/v'.
629:
630: In a `mem' expression, it is 1 if the memory reference is
631: volatile. Volatile memory references may not be deleted,
632: reordered or combined.
633:
634: In a `reg' expression, it is 1 if the value is a user-level
635: variable. 0 indicates an internal compiler temporary.
636:
637: In an insn, 1 means the insn has been deleted.
638:
639: `in_struct'
640: This flag is used in `mem' expressions. It is 1 if the memory
641: datum referred to is all or part of a structure or array; 0 if
642: it is (or might be) a scalar variable. A reference through a C
643: pointer has 0 because the pointer might point to a scalar
644: variable.
645:
646: This information allows the compiler to determine something
647: about possible cases of aliasing.
648:
649: In an RTL dump, this flag is represented as `/s'.
650:
651: `unchanging'
652: This flag is used in `reg' and `mem' expressions. 1 means that
653: the value of the expression never changes (at least within the
654: current function).
655:
656: In an RTL dump, this flag is represented as `/u'.
657:
658: `integrated'
659: In some kinds of expressions, including insns, this flag means
660: the rtl was produced by procedure integration.
661:
662: In a `reg' expression, this flag indicates the register
663: containing the value to be returned by the current function. On
664: machines that pass parameters in registers, the same register
665: number may be used for parameters as well, but this flag is not
666: set on such uses.
667:
668:
669:
670: File: gcc.info, Node: Machine Modes, Next: Constants, Prev: Flags, Up: RTL
671:
672: Machine Modes
673: =============
674:
675: A machine mode describes a size of data object and the representation
676: used for it. In the C code, machine modes are represented by an
677: enumeration type, `enum machine_mode', defined in `machmode.def'.
678: Each RTL expression has room for a machine mode and so do certain
679: kinds of tree expressions (declarations and types, to be precise).
680:
681: In debugging dumps and machine descriptions, the machine mode of an
682: RTL expression is written after the expression code with a colon to
683: separate them. The letters `mode' which appear at the end of each
684: machine mode name are omitted. For example, `(reg:SI 38)' is a `reg'
685: expression with machine mode `SImode'. If the mode is `VOIDmode', it
686: is not written at all.
687:
688: Here is a table of machine modes.
689:
690: `QImode'
691: ``Quarter-Integer'' mode represents a single byte treated as an
692: integer.
693:
694: `HImode'
695: ``Half-Integer'' mode represents a two-byte integer.
696:
697: `SImode'
698: ``Single Integer'' mode represents a four-byte integer.
699:
700: `DImode'
701: ``Double Integer'' mode represents an eight-byte integer.
702:
703: `TImode'
704: ``Tetra Integer'' (?) mode represents a sixteen-byte integer.
705:
706: `SFmode'
707: ``Single Floating'' mode represents a single-precision (four
708: byte) floating point number.
709:
710: `DFmode'
711: ``Double Floating'' mode represents a double-precision (eight
712: byte) floating point number.
713:
714: `TFmode'
715: ``Tetra Floating'' mode represents a quadruple-precision
716: (sixteen byte) floating point number.
717:
718: `BLKmode'
719: ``Block'' mode represents values that are aggregates to which
720: none of the other modes apply. In RTL, only memory references
721: can have this mode, and only if they appear in string-move or
722: vector instructions. On machines which have no such
723: instructions, `BLKmode' will not appear in RTL.
724:
725: `VOIDmode'
726: Void mode means the absence of a mode or an unspecified mode.
727: For example, RTL expressions of code `const_int' have mode
728: `VOIDmode' because they can be taken to have whatever mode the
729: context requires. In debugging dumps of RTL, `VOIDmode' is
730: expressed by the absence of any mode.
731:
732: `EPmode'
733: ``Entry Pointer'' mode is intended to be used for function
734: variables in Pascal and other block structured languages. Such
735: values contain both a function address and a static chain
736: pointer for access to automatic variables of outer levels. This
737: mode is only partially implemented since C does not use it.
738:
739: `CSImode, ...'
740: ``Complex Single Integer'' mode stands for a complex number
741: represented as a pair of `SImode' integers. Any of the integer
742: and floating modes may have `C' prefixed to its name to obtain a
743: complex number mode. For example, there are `CQImode',
744: `CSFmode', and `CDFmode'. Since C does not support complex
745: numbers, these machine modes are only partially implemented.
746:
747: `BImode'
748: This is the machine mode of a bit-field in a structure. It is
749: used only in the syntax tree, never in RTL, and in the syntax
750: tree it appears only in declaration nodes. In C, it appears
751: only in `FIELD_DECL' nodes for structure fields defined with a
752: bit size.
753:
754: The machine description defines `Pmode' as a C macro which expands
755: into the machine mode used for addresses. Normally this is `SImode'.
756:
757: The only modes which a machine description must support are `QImode',
758: `SImode', `SFmode' and `DFmode'. The compiler will attempt to use
759: `DImode' for two-word structures and unions, but it would not be hard
760: to program it to avoid this. Likewise, you can arrange for the C
761: type `short int' to avoid using `HImode'. In the long term it would
762: be desirable to make the set of available machine modes
763: machine-dependent and eliminate all assumptions about specific
764: machine modes or their uses from the machine-independent code of the
765: compiler.
766:
767: Here are some C macros that relate to machine modes:
768:
769: `GET_MODE (X)'
770: Returns the machine mode of the RTX X.
771:
772: `PUT_MODE (X, NEWMODE)'
773: Alters the machine mode of the RTX X to be NEWMODE.
774:
775: `GET_MODE_SIZE (M)'
776: Returns the size in bytes of a datum of mode M.
777:
778: `GET_MODE_BITSIZE (M)'
779: Returns the size in bits of a datum of mode M.
780:
781: `GET_MODE_UNIT_SIZE (M)'
782: Returns the size in bits of the subunits of a datum of mode M.
783: This is the same as `GET_MODE_SIZE' except in the case of
784: complex modes and `EPmode'. For them, the unit size is the size
785: of the real or imaginary part, or the size of the function
786: pointer or the context pointer.
787:
788:
789:
790: File: gcc.info, Node: Constants, Next: Regs and Memory, Prev: Machine Modes, Up: RTL
791:
792: Constant Expression Types
793: =========================
794:
795: The simplest RTL expressions are those that represent constant values.
796:
797: `(const_int I)'
798: This type of expression represents the integer value I. I is
799: customarily accessed with the macro `INTVAL' as in `INTVAL
800: (EXP)', which is equivalent to `XINT (EXP, 0)'.
801:
802: There is only one expression object for the integer value zero;
803: it is the value of the variable `const0_rtx'. Likewise, the
804: only expression for integer value one is found in `const1_rtx'.
805: Any attempt to create an expression of code `const_int' and
806: value zero or one will return `const0_rtx' or `const1_rtx' as
807: appropriate.
808:
809: `(const_double:M I0 I1)'
810: Represents a 64-bit constant or mode M. All floating point
811: constants are represented in this way, and so are 64-bit
812: `DImode' integer constants.
813:
814: The two integers I0 and I1 together contain the bits of the
815: value. If the constant is floating point (either single or
816: double precision), then they represent a `double'. To convert
817: them to a `double', do
818:
819: union { double d; int i[2];} u;
820: u.i[0] = XINT (x, 0);
821: u.i[1] = XINT (x, 1);
822:
823: and then refer to `u.d'.
824:
825: The global variables `dconst0_rtx' and `fconst0_rtx' hold
826: `const_double' expressions with value 0, in modes `DFmode' and
827: `SFmode', respectively.
828:
829: `(symbol_ref SYMBOL)'
830: Represents the value of an assembler label for data. SYMBOL is
831: a string that describes the name of the assembler label. If it
832: starts with a `*', the label is the rest of SYMBOL not including
833: the `*'. Otherwise, the label is SYMBOL, prefixed with `_'.
834:
835: `(label_ref LABEL)'
836: Represents the value of an assembler label for code. It
837: contains one operand, an expression, which must be a
838: `code_label' that appears in the instruction sequence to
839: identify the place where the label should go.
840:
841: The reason for using a distinct expression type for code label
842: references is so that jump optimization can distinguish them.
843:
844: `(const EXP)'
845: Represents a constant that is the result of an assembly-time
846: arithmetic computation. The operand, EXP, is an expression that
847: contains only constants (`const_int', `symbol_ref' and
848: `label_ref' expressions) combined with `plus' and `minus'.
849: However, not all combinations are valid, since the assembler
850: cannot do arbitrary arithmetic on relocatable symbols.
851:
852:
853:
854: File: gcc.info, Node: Regs and Memory, Next: Arithmetic, Prev: Constants, Up: RTL
855:
856: Registers and Memory
857: ====================
858:
859: Here are the RTL expression types for describing access to machine
860: registers and to main memory.
861:
862: `(reg:M N)'
863: For small values of the integer N (less than
864: `FIRST_PSEUDO_REGISTER'), this stands for a reference to machine
865: register number N: a "hard register". For larger values of N,
866: it stands for a temporary value or "pseudo register". The
867: compiler's strategy is to generate code assuming an unlimited
868: number of such pseudo registers, and later convert them into
869: hard registers or into memory references.
870:
871: The symbol `FIRST_PSEUDO_REGISTER' is defined by the machine
872: description, since the number of hard registers on the machine
873: is an invariant characteristic of the machine. Note, however,
874: that not all of the machine registers must be general registers.
875: All the machine registers that can be used for storage of data
876: are given hard register numbers, even those that can be used
877: only in certain instructions or can hold only certain types of
878: data.
879:
880: Each pseudo register number used in a function's RTL code is
881: represented by a unique `reg' expression.
882:
883: M is the machine mode of the reference. It is necessary because
884: machines can generally refer to each register in more than one
885: mode. For example, a register may contain a full word but there
886: may be instructions to refer to it as a half word or as a single
887: byte, as well as instructions to refer to it as a floating point
888: number of various precisions.
889:
890: Even for a register that the machine can access in only one
891: mode, the mode must always be specified.
892:
893: A hard register may be accessed in various modes throughout one
894: function, but each pseudo register is given a natural mode and
895: is accessed only in that mode. When it is necessary to describe
896: an access to a pseudo register using a nonnatural mode, a
897: `subreg' expression is used.
898:
899: A `reg' expression with a machine mode that specifies more than
900: one word of data may actually stand for several consecutive
901: registers. If in addition the register number specifies a
902: hardware register, then it actually represents several
903: consecutive hardware registers starting with the specified one.
904:
905: Such multi-word hardware register `reg' expressions may not be
906: live across the boundary of a basic block. The lifetime
907: analysis pass does not know how to record properly that several
908: consecutive registers are actually live there, and therefore
909: register allocation would be confused. The CSE pass must go out
910: of its way to make sure the situation does not arise.
911:
912: `(subreg:M REG WORDNUM)'
913: `subreg' expressions are used to refer to a register in a
914: machine mode other than its natural one, or to refer to one
915: register of a multi-word `reg' that actually refers to several
916: registers.
917:
918: Each pseudo-register has a natural mode. If it is necessary to
919: operate on it in a different mode--for example, to perform a
920: fullword move instruction on a pseudo-register that contains a
921: single byte-- the pseudo-register must be enclosed in a
922: `subreg'. In such a case, WORDNUM is zero.
923:
924: The other use of `subreg' is to extract the individual registers
925: of a multi-register value. Machine modes such as `DImode' and
926: `EPmode' indicate values longer than a word, values which
927: usually require two consecutive registers. To access one of the
928: registers, use a `subreg' with mode `SImode' and a WORDNUM that
929: says which register.
930:
931: The compilation parameter `WORDS_BIG_ENDIAN', if defined, says
932: that word number zero is the most significant part; otherwise,
933: it is the least significant part.
934:
935: Between the combiner pass and the reload pass, it is possible to
936: have a `subreg' which contains a `mem' instead of a `reg' as its
937: first operand. The reload pass eliminates these cases by
938: reloading the `mem' into a suitable register.
939:
940: Note that it is not valid to access a `DFmode' value in `SFmode'
941: using a `subreg'. On some machines the most significant part of
942: a `DFmode' value does not have the same format as a
943: single-precision floating value.
944:
945: `(cc0)'
946: This refers to the machine's condition code register. It has no
947: operands and may not have a machine mode. It may be validly
948: used in only two contexts: as the destination of an assignment
949: (in test and compare instructions) and in comparison operators
950: comparing against zero (`const_int' with value zero; that is to
951: say, `const0_rtx').
952:
953: There is only one expression object of code `cc0'; it is the
954: value of the variable `cc0_rtx'. Any attempt to create an
955: expression of code `cc0' will return `cc0_rtx'.
956:
957: One special thing about the condition code register is that
958: instructions can set it implicitly. On many machines, nearly
959: all instructions set the condition code based on the value that
960: they compute or store. It is not necessary to record these
961: actions explicitly in the RTL because the machine description
962: includes a prescription for recognizing the instructions that do
963: so (by means of the macro `NOTICE_UPDATE_CC'). Only
964: instructions whose sole purpose is to set the condition code,
965: and instructions that use the condition code, need mention
966: `(cc0)'.
967:
968: `(pc)'
969: This represents the machine's program counter. It has no
970: operands and may not have a machine mode. `(pc)' may be validly
971: used only in certain specific contexts in jump instructions.
972:
973: There is only one expression object of code `pc'; it is the
974: value of the variable `pc_rtx'. Any attempt to create an
975: expression of code `pc' will return `pc_rtx'.
976:
977: All instructions that do not jump alter the program counter
978: implicitly by incrementing it, but there is no need to mention
979: this in the RTL.
980:
981: `(mem:M ADDR)'
982: This RTX represents a reference to main memory at an address
983: represented by the expression ADDR. M specifies how large a
984: unit of memory is accessed.
985:
986:
987:
988: File: gcc.info, Node: Arithmetic, Next: Comparisons, Prev: Regs and Memory, Up: RTL
989:
990: RTL Expressions for Arithmetic
991: ==============================
992:
993: `(plus:M X Y)'
994: Represents the sum of the values represented by X and Y carried
995: out in machine mode M. This is valid only if X and Y both are
996: valid for mode M.
997:
998: `(minus:M X Y)'
999: Like `plus' but represents subtraction.
1000:
1001: `(minus X Y)'
1002: Represents the result of subtracting Y from X for purposes of
1003: comparison. The absence of a machine mode in the `minus'
1004: expression indicates that the result is computed without
1005: overflow, as if with infinite precision.
1006:
1007: Of course, machines can't really subtract with infinite precision.
1008: However, they can pretend to do so when only the sign of the
1009: result will be used, which is the case when the result is stored
1010: in `(cc0)'. And that is the only way this kind of expression
1011: may validly be used: as a value to be stored in the condition
1012: codes.
1013:
1014: `(neg:M X)'
1015: Represents the negation (subtraction from zero) of the value
1016: represented by X, carried out in mode M. X must be valid for
1017: mode M.
1018:
1019: `(mult:M X Y)'
1020: Represents the signed product of the values represented by X and
1021: Y carried out in machine mode M. If X and Y are both valid for
1022: mode M, this is ordinary size-preserving multiplication.
1023: Alternatively, both X and Y may be valid for a different,
1024: narrower mode. This represents the kind of multiplication that
1025: generates a product wider than the operands. Widening
1026: multiplication and same-size multiplication are completely
1027: distinct and supported by different machine instructions;
1028: machines may support one but not the other.
1029:
1030: `mult' may be used for floating point division as well. Then M
1031: is a floating point machine mode.
1032:
1033: `(umult:M X Y)'
1034: Like `mult' but represents unsigned multiplication. It may be
1035: used in both same-size and widening forms, like `mult'. `umult'
1036: is used only for fixed-point multiplication.
1037:
1038: `(div:M X Y)'
1039: Represents the quotient in signed division of X by Y, carried
1040: out in machine mode M. If M is a floating-point mode, it
1041: represents the exact quotient; otherwise, the integerized
1042: quotient. If X and Y are both valid for mode M, this is
1043: ordinary size-preserving division. Some machines have division
1044: instructions in which the operands and quotient widths are not
1045: all the same; such instructions are represented by `div'
1046: expressions in which the machine modes are not all the same.
1047:
1048: `(udiv:M X Y)'
1049: Like `div' but represents unsigned division.
1050:
1051: `(mod:M X Y)'
1052: `(umod:M X Y)'
1053: Like `div' and `udiv' but represent the remainder instead of the
1054: quotient.
1055:
1056: `(not:M X)'
1057: Represents the bitwise complement of the value represented by X,
1058: carried out in mode M, which must be a fixed-point machine mode.
1059: x must be valid for mode M, which must be a fixed-point mode.
1060:
1061: `(and:M X Y)'
1062: Represents the bitwise logical-and of the values represented by
1063: X and Y, carried out in machine mode M. This is valid only if X
1064: and Y both are valid for mode M, which must be a fixed-point mode.
1065:
1066: `(ior:M X Y)'
1067: Represents the bitwise inclusive-or of the values represented by
1068: X and Y, carried out in machine mode M. This is valid only if X
1069: and Y both are valid for mode M, which must be a fixed-point mode.
1070:
1071: `(xor:M X Y)'
1072: Represents the bitwise exclusive-or of the values represented by
1073: X and Y, carried out in machine mode M. This is valid only if X
1074: and Y both are valid for mode M, which must be a fixed-point mode.
1075:
1076: `(lshift:M X C)'
1077: Represents the result of logically shifting X left by C places.
1078: X must be valid for the mode M, a fixed-point machine mode. C
1079: must be valid for a fixed-point mode; which mode is determined
1080: by the mode called for in the machine description entry for the
1081: left-shift instruction. For example, on the Vax, the mode of C
1082: is `QImode' regardless of M.
1083:
1084: On some machines, negative values of C may be meaningful; this
1085: is why logical left shift and arithmetic left shift are
1086: distinguished. For example, Vaxes have no right-shift
1087: instructions, and right shifts are represented as left-shift
1088: instructions whose counts happen to be negative constants or
1089: else computed (in a previous instruction) by negation.
1090:
1091: `(ashift:M X C)'
1092: Like `lshift' but for arithmetic left shift.
1093:
1094: `(lshiftrt:M X C)'
1095: `(ashiftrt:M X C)'
1096: Like `lshift' and `ashift' but for right shift.
1097:
1098: `(rotate:M X C)'
1099: `(rotatert:M X C)'
1100: Similar but represent left and right rotate.
1101:
1102: `(abs:M X)'
1103: Represents the absolute value of X, computed in mode M. X must
1104: be valid for M.
1105:
1106: `(sqrt:M X)'
1107: Represents the square root of X, computed in mode M. X must be
1108: valid for M. Most often M will be a floating point mode.
1109:
1110: `(ffs:M X)'
1111: Represents the one plus the index of the least significant 1-bit
1112: in X, represented as an integer of mode M. (The value is zero
1113: if X is zero.) The mode of X need not be M; depending on the
1114: target machine, various mode combinations may be valid.
1115:
1116:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.