|
|
1.1 root 1: 1. Better optimization.
2:
1.1.1.4 ! root 3: * Constants in unused inline functions
! 4:
! 5: It would be nice to delay output of string constants so that string
! 6: constants mentioned in unused inline functions are never generated.
! 7: Perhaps this would also take care of string constants in dead code.
! 8:
! 9: The difficulty is in finding a clean way for the RTL which refers
! 10: to the constant (currently, only by an assembler symbol name)
! 11: to point to the constant and cause it to be output.
! 12:
1.1 root 13: * More cse
14:
15: The techniques for doing full global cse are described in the
16: red dragon book. It is likely to be slow and use a lot of memory,
17: but it might be worth offering as an additional option.
18:
19: It is probably possible to extend cse to a few very frequent cases
20: without so much expense.
21:
22: For example, it is not very hard to handle cse through if-then
23: statements with no else clauses. Here's how to do it. On reaching a
24: label, notice that the label's use-count is 1 and that the last
25: preceding jump jumps conditionally to this label. Now you know it
26: is a simple if-then statement. Remove from the hash table
27: all the expressions that were entered since that jump insn
28: and you can continue with cse.
29:
30: It is probably not hard to handle cse from the end of a loop
31: around to the beginning, and a few loops would be greatly sped
32: up by this.
33:
1.1.1.3 root 34: * Support more general tail-recursion among different functions.
1.1 root 35:
1.1.1.3 root 36: This might be possible under certain circumstances, such as when
37: the argument lists of the functions have the same lengths.
38: Perhaps it could be done with a special declaration.
39:
40: You would need to verify in the calling function that it does not
41: use the addresses of any local variables and does not use setjmp.
42:
43: * Put short statics vars at low addresses and use short addressing mode?
44: Useful on the 68000/68020 and perhaps on the 32000 series,
45: provided one has a linker that works with the feature.
46: This is said to make a 15% speedup on the 68000.
47: This brings to mind Hayes' changes for Stanford MIPS.
48:
49: * Detect dead stores into memory?
50:
51: A store into memory is dead if it is followed by another store into
52: the same location; and, in between, there is no reference to
53: that anything that might be that location (including no reference
54: to a variable address).
55:
56: * Loop optimization.
57:
58: Strength reduction and iteration variable elimination could be
59: smarter. They should know how to decide which iteration variables are
60: not worth making explicit because they can be computed as part of an
61: address calculation. Based on this information, they should decide
62: when it is desirable to eliminate one iteration iable and create
63: another in its place.
64:
65: It should be possible to compute what the value of an iteration
66: variable will be at the end of the loop, and eliminate the variable
67: within the loop by computing that value at the loop end.
68:
69: When a loop has a simple increment that adds 1,
70: instead of jumping in after the increment,
71: decrement the loop count and jump to the increment.
72: This allows aob insns to be used.
1.1 root 73:
1.1.1.2 root 74: * Using constraints on values.
1.1 root 75:
1.1.1.2 root 76: Many operations could be simplified based on knowledge of the
77: minimum and maximum possible values of a register at any particular time.
78: These limits could come from the data types in the tree, via rtl generation,
79: or they can be deduced from operations that are performed. For example,
80: the result of an `and' operation one of whose operands is 7 must be in
81: the range 0 to 7. Compare instructions also tell something about the
82: possible values of the operand, in the code beyond the test.
83:
84: Value constraints can be used to determine the results of a further
85: comparison. They can also indicate that certain `and' operations are
86: redundant. Constraints might permit a decrement and branch
87: instruction that checks zeroness to be used when the user has
88: specified to exit if negative.
1.1 root 89:
90: * Smarter reload pass.
91:
92: The reload pass as currently written can reload values only into registers
93: that are reserved for reloading. This means that in order to use a
94: register for reloading it must spill everything out of that register.
95:
96: It would be straightforward, though complicated, for reload1.c to keep
97: track, during its scan, of which hard registers were available at each
98: point in the function, and use for reloading even registers that were
99: free only at the point they were needed. This would avoid much spilling
100: and make better code.
101:
102: * Change the type of a variable.
103:
104: Sometimes a variable is declared as `int', it is assigned only once
105: from a value of type `char', and then it is used only by comparison
106: against constants. On many machines, better code would result if
107: the variable had type `char'. If the compiler could detect this
108: case, it could change the declaration of the variable and change
109: all the places that use it.
110:
111: * Order of subexpressions.
112:
113: It might be possible to make better code by paying attention
114: to the order in which to generate code for subexpressions of an expression.
115:
1.1.1.3 root 116: * More code motion.
1.1 root 117:
1.1.1.3 root 118: Consider hoisting common code up past conditional branches or
119: tablejumps.
120:
121: * Trace scheduling.
122:
123: This technique is said to be able to figure out which way a jump
124: will usually go, and rearrange the code to make that path the
125: faster one.
1.1 root 126:
127: * Distributive law.
128:
1.1.1.2 root 129: The C expression *(X + 4 * (Y + C)) compiles better on certain
130: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
131: modes. It may be tricky to determine when, and for which machines, to
132: use each alternative.
133:
1.1.1.3 root 134: Some work has been done on this, in combine.c.
135:
1.1.1.2 root 136: * Jump-execute-next.
137:
138: Many recent machines have jumps which optionally execute the following
139: instruction before the instruction jumped to, either conditionally or
140: unconditionally. To take advantage of this capability requires a new
141: compiler pass that would reorder instructions when possible. After
142: reload may be a good place for it.
143:
144: On some machines, the result of a load from memory is not available
145: until after the following instruction. The easiest way to support
146: these machines is to output each RTL load instruction as two assembler
147: instructions, the second being a no-op. Putting useful instructions
148: after the load instructions may be a similar task to putting them
149: after jump instructions.
150:
151: * Pipeline scheduling.
152:
153: On many machines, code gets faster if instructions are reordered
154: so that pipelines are kept full. Doing the best possible job of this
155: requires knowing which functional units each kind of instruction executes
156: in and how long the functional unit stays busy with it. Then the
157: goal is to reorder the instructions to keep many functional units
158: busy but never feed them so fast they must wait.
1.1 root 159:
1.1.1.3 root 160: * Can optimize by changing if (x) y; else z; into z; if (x) y;
161: if z and x do not interfere and z has no effects not undone by y.
162: This is desirable if z is faster than jumping.
163:
164: * For a two-insn loop on the 68020, such as
165: foo: movb a2@+,a3@+
166: jne foo
167: it is better to insert dbeq d0,foo before the jne.
168: d0 can be a junk register. The challenge is to fit this into
169: a portable framework: when can you detect this situation and
170: still be able to allocate a junk register?
171:
1.1 root 172: 2. Simpler porting.
173:
174: Right now, describing the target machine's instructions is done
175: cleanly, but describing its addressing mode is done with several
176: ad-hoc macro definitions. Porting would be much easier if there were
177: an RTL description for addressing modes like that for instructions.
178: Tools analogous to genflags and genrecog would generate macros from
179: this description.
180:
181: There would be one pattern in the address-description file for each
182: kind of addressing, and this pattern would have:
183:
184: * the RTL expression for the address
185: * C code to verify its validity (since that may depend on
186: the exact data).
187: * C code to print the address in assembler language.
188: * C code to convert the address into a valid one, if it is not valid.
189: (This would replace LEGITIMIZE_ADDRESS).
190: * Register constraints for all indeterminates that appear
191: in the RTL expression.
192:
193: 3. Other languages.
194:
1.1.1.2 root 195: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
196: desirable.
1.1 root 197:
1.1.1.2 root 198: Pascal, Modula-2 and Ada require the implementation of functions
199: within functions. Some of the mechanisms for this already exist.
1.1 root 200:
201: 4. Generalize the machine model.
202:
1.1.1.4 ! root 203: * Some new compiler features may be needed to do a good job on machines
1.1.1.2 root 204: where static data needs to be addressed using base registers.
1.1 root 205:
1.1.1.4 ! root 206: * Some machines have two stacks in different areas of memory, one used
1.1.1.2 root 207: for scalars and another for large objects. The compiler does not
208: now have a way to understand this.
1.1.1.3 root 209:
1.1.1.4 ! root 210: * Some machines use a caller-saves convention for saving registers over
! 211: function calls. Here is a design for how to handle such a convention.
! 212:
! 213: Classify all the hard registers as call-clobbered, then teach the
! 214: compiler how to put call-crossing pseudo-regs in call-clobbered hard
! 215: registers by generating save/restore insns around the call. Generate
! 216: the save/restores somewhere around the reload pass; this avoids
! 217: getting confused by cross jumping which happens after. Actually put
! 218: the save/restore insns into the RTL chain, so that cross-jumping and
! 219: peephole opt. will work on them.
! 220:
! 221: The time to generate these insns is when you can still tell what regs
! 222: are live at each call, but late enough to know exactly where the
! 223: pseudos have been allocated. Perhaps the best time is just before
! 224: reload_as_needed.
! 225:
! 226: This requires changes in local-alloc.c and global-alloc.c to be willing
! 227: to allocate call-crossing pseudos to call-clobbered registers.
! 228: That isn't hard if you do it with a special flag macro that simply says to
! 229: ignore (in those passes) whether a register is call-clobbered.
! 230: This macro would be defined if the target machine uses caller-saves.
! 231:
! 232: To get the best output, it is necessary to count the cost (in
! 233: additional save/restores) for each pseudo register of putting it in a
! 234: call-clobbered hard reg. This means counting *how many times* each
! 235: pseudo crosses a call. Depending on that value, it might be better to
! 236: put the pseudo in memory than in a call-clobbered register. Cost
! 237: counting would be done in flow.c, perhaps, and costs would affect
! 238: allocation decisions in local-alloc and global-alloc. Perhaps each
! 239: pseudo should have one priority for getting a call-saved reg and
! 240: another, lower (in general) priority for a call-clobbered reg.
! 241:
! 242: Then there would no longer be a need for the flag: the changes would
! 243: improve the output, on all machines.
! 244:
1.1.1.3 root 245: 5. Precompilation of header files.
246:
247: In the future, many programs will use thousands of lines of header files.
248: Compiling the headers might be slower than compiling the guts of any one
249: source file. Here is a scheme for precompiling header files to make
250: compilation faster for a sequence of headers which is often used.
251:
252: A precompiled header corresponds to a sequence of header files. The
253: preprocessor recognizes when the input starts with a sequence of
254: `#include' commands and searches a data base for a precompiled header
255: corresponding to that sequence. The modtimes of all these files are
256: stored in the data base so that one can tell whether the precompiled
257: header is still valid.
258:
259: For robustness, each directory should have its own collection of
260: precompiled headers and its own data base of them. Probably each
261: precompiled header would be a file and the data base would be one
262: more file.
263:
264: The data base records the entire collection of predefined macros and
265: their definitions, except for __FILE__, __LINE__ and __DATE__, for
266: each precompiled header. If this collection does not match the setup
267: at the start of the current compilation (including the results of -D
268: and -U switches), the precompiled header is inapplicable. It might
269: be possible to have distinct precompiled headers for the same sequence
270: of header files but different collections of predefined macros.
271:
272: The state of any option that affects macro processing, such as -ansi
273: or -traditional, must also be recorded, and the precompiled header is
274: valid only if these options match.
275:
276: The precompiled header contains an ordered series of strings. Some
277: strings are marked "unconditional"; these must be compiled each time
278: the precompiled header is used. Other strings are have keys, which
279: are identifiers. A string with keys must be compiled if at least one
280: of its keys is mentioned in the input. The order these strings appear
281: in the precompiled header is called their intrinsic order.
282:
283: The C preprocessor reads in the precompiled header file and scan all
284: the strings, making for each key an entry in the same symbol table
285: used for macros, pointing at a list of all the strings for which it is
286: a key. Each string must have a flag (one flag per string, not one per
287: key per string). The same code in `rescan' that detects references to
288: macros would detect a reference to a key and flag all of the strings
289: that it belongs to as needing to be output.
290:
291: Each of these strings is immediately recursively macroexpanded (i.e.
292: `rescan' is called), but the output from this is discarded. The
293: expansion is to detect any other keys mentioned in the string, and to
294: define any macros for which the string contains a #define. The key's
295: symbol table entry is be deleted to save time if the key is
296: encountered again, and to avoid an infinite recursion.
297:
298: The unconditional strings are macroexpanded with `rescan' (but the
299: output is discarded) at some time before anything is actually output.
300:
301: At the end of compilation, before any of the actual input text is
302: output, the list of strings is scanned in the intrinsic order, and
303: each string that is unconditional or flagged is output verbatim,
304: except that any #define lines are discarded.
305:
306: Precompiled headers would be constructed by explicit request with a
307: special tool. The first step is to run cpp on the sequence of header
308: files' contents. This would use a new option that would cause all
309: #define lines to be output unchanged as well as defining the macro.
310: The second step is to divide the output into strings, some keyed and
311: some unconditional. This division is done without changing the order
312: of the text being divided up.
1.1.1.4 ! root 313:
! 314: 6. Other possibly nice features.
! 315:
! 316: * cpp could have a #provide directive.
! 317: #provide would have the same syntax as #include,
! 318: and it would nullify any future #include directive
! 319: with the same argument. Thus, the file foo.h
! 320: could contain #provide <foo> to prevent itself from
! 321: being included twice.
! 322:
! 323: This is much cleaner than the alternative sometimes implemented,
! 324: which is to require the user to use something other than #include
! 325: in order to ensure inclusion only once.
! 326:
! 327: 7. Better documentation of how GCC works and how to port it.
! 328:
! 329: Here is an outline proposed by Allan Adler.
! 330:
! 331: I. Overview of this document
! 332: II. The machines on which GCC is implemented
! 333: A. Prose description of those characteristics of target machines and
! 334: their operating systems which are pertinent to the implementation
! 335: of GCC.
! 336: i. target machine characteristics
! 337: ii. comparison of this system of machine characteristics with
! 338: other systems of machine specification currently in use
! 339: B. Tables of the characteristics of the target machines on which
! 340: GCC is implemented.
! 341: C. A priori restrictions on the values of characteristics of target
! 342: machines, with special reference to those parts of the source code
! 343: which entail those restrictions
! 344: i. restrictions on individual characteristics
! 345: ii. restrictions involving relations between various characteristics
! 346: D. The use of GCC as a cross-compiler
! 347: i. cross-compilation to existing machines
! 348: ii. cross-compilation to non-existent machines
! 349: E. Assumptions which are made regarding the target machine
! 350: i. assumptions regarding the architecture of the target machine
! 351: ii. assumptions regarding the operating system of the target machine
! 352: iii. assumptions regarding software resident on the target machine
! 353: iv. where in the source code these assumptions are in effect made
! 354: III. A systematic approach to writing the files tm.h and xm.h
! 355: A. Macros which require special care or skill
! 356: B. Examples, with special reference to the underlying reasoning
! 357: IV. A systematic approach to writing the machine description file md
! 358: A. Minimal viable sets of insn descriptions
! 359: B. Examples, with special reference to the underlying reasoning
! 360: V. Uses of the file aux-output.c
! 361: VI. Specification of what constitutes correct performance of an
! 362: implementation of GCC
! 363: A. The components of GCC
! 364: B. The itinerary of a C program through GCC
! 365: C. A system of benchmark programs
! 366: D. What your RTL and assembler look should like with these benchmarks
! 367: E. Fine tuning for speed and size of compiled code
! 368: VII. A systematic procedure for debugging an implementation of GCC
! 369: A. Use of GDB
! 370: i. the macros in the file .gdbinit for GCC
! 371: ii. obstacles to the use of GDB
! 372: a. functions implemented as macros can't be called in GDB
! 373: B. Debugging without GDB
! 374: i. How to turn off the normal operation of GCC and access specific
! 375: parts of GCC
! 376: C. Debugging tools
! 377: D. Debugging the parser
! 378: i. how machine macros and insn definitions affect the parser
! 379: E. Debugging the recognizer
! 380: i. how machine macros and insn definitions affect the recognizer
! 381:
! 382: ditto for other components
! 383:
! 384: VIII. Data types used by GCC, with special reference to restrictions not
! 385: specified in the formal definition of the data type
! 386: IX. References to the literature for the algorithms used in GCC
! 387:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.