|
|
1.1 root 1: 1. Better optimization.
2:
3: * More cse
4:
5: The techniques for doing full global cse are described in the
6: red dragon book. It is likely to be slow and use a lot of memory,
7: but it might be worth offering as an additional option.
8:
9: It is probably possible to extend cse to a few very frequent cases
10: without so much expense.
11:
12: For example, it is not very hard to handle cse through if-then
13: statements with no else clauses. Here's how to do it. On reaching a
14: label, notice that the label's use-count is 1 and that the last
15: preceding jump jumps conditionally to this label. Now you know it
16: is a simple if-then statement. Remove from the hash table
17: all the expressions that were entered since that jump insn
18: and you can continue with cse.
19:
20: It is probably not hard to handle cse from the end of a loop
21: around to the beginning, and a few loops would be greatly sped
22: up by this.
23:
1.1.1.3 ! root 24: * Support more general tail-recursion among different functions.
1.1 root 25:
1.1.1.3 ! root 26: This might be possible under certain circumstances, such as when
! 27: the argument lists of the functions have the same lengths.
! 28: Perhaps it could be done with a special declaration.
! 29:
! 30: You would need to verify in the calling function that it does not
! 31: use the addresses of any local variables and does not use setjmp.
! 32:
! 33: * Put short statics vars at low addresses and use short addressing mode?
! 34: Useful on the 68000/68020 and perhaps on the 32000 series,
! 35: provided one has a linker that works with the feature.
! 36: This is said to make a 15% speedup on the 68000.
! 37: This brings to mind Hayes' changes for Stanford MIPS.
! 38:
! 39: * Detect dead stores into memory?
! 40:
! 41: A store into memory is dead if it is followed by another store into
! 42: the same location; and, in between, there is no reference to
! 43: that anything that might be that location (including no reference
! 44: to a variable address).
! 45:
! 46: * Loop optimization.
! 47:
! 48: Strength reduction and iteration variable elimination could be
! 49: smarter. They should know how to decide which iteration variables are
! 50: not worth making explicit because they can be computed as part of an
! 51: address calculation. Based on this information, they should decide
! 52: when it is desirable to eliminate one iteration iable and create
! 53: another in its place.
! 54:
! 55: It should be possible to compute what the value of an iteration
! 56: variable will be at the end of the loop, and eliminate the variable
! 57: within the loop by computing that value at the loop end.
! 58:
! 59: When a loop has a simple increment that adds 1,
! 60: instead of jumping in after the increment,
! 61: decrement the loop count and jump to the increment.
! 62: This allows aob insns to be used.
1.1 root 63:
1.1.1.2 root 64: * Using constraints on values.
1.1 root 65:
1.1.1.2 root 66: Many operations could be simplified based on knowledge of the
67: minimum and maximum possible values of a register at any particular time.
68: These limits could come from the data types in the tree, via rtl generation,
69: or they can be deduced from operations that are performed. For example,
70: the result of an `and' operation one of whose operands is 7 must be in
71: the range 0 to 7. Compare instructions also tell something about the
72: possible values of the operand, in the code beyond the test.
73:
74: Value constraints can be used to determine the results of a further
75: comparison. They can also indicate that certain `and' operations are
76: redundant. Constraints might permit a decrement and branch
77: instruction that checks zeroness to be used when the user has
78: specified to exit if negative.
1.1 root 79:
80: * Smarter reload pass.
81:
82: The reload pass as currently written can reload values only into registers
83: that are reserved for reloading. This means that in order to use a
84: register for reloading it must spill everything out of that register.
85:
86: It would be straightforward, though complicated, for reload1.c to keep
87: track, during its scan, of which hard registers were available at each
88: point in the function, and use for reloading even registers that were
89: free only at the point they were needed. This would avoid much spilling
90: and make better code.
91:
92: * Change the type of a variable.
93:
94: Sometimes a variable is declared as `int', it is assigned only once
95: from a value of type `char', and then it is used only by comparison
96: against constants. On many machines, better code would result if
97: the variable had type `char'. If the compiler could detect this
98: case, it could change the declaration of the variable and change
99: all the places that use it.
100:
101: * Order of subexpressions.
102:
103: It might be possible to make better code by paying attention
104: to the order in which to generate code for subexpressions of an expression.
105:
1.1.1.3 ! root 106: * More code motion.
1.1 root 107:
1.1.1.3 ! root 108: Consider hoisting common code up past conditional branches or
! 109: tablejumps.
! 110:
! 111: * Trace scheduling.
! 112:
! 113: This technique is said to be able to figure out which way a jump
! 114: will usually go, and rearrange the code to make that path the
! 115: faster one.
1.1 root 116:
117: * Distributive law.
118:
1.1.1.2 root 119: The C expression *(X + 4 * (Y + C)) compiles better on certain
120: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
121: modes. It may be tricky to determine when, and for which machines, to
122: use each alternative.
123:
1.1.1.3 ! root 124: Some work has been done on this, in combine.c.
! 125:
1.1.1.2 root 126: * Jump-execute-next.
127:
128: Many recent machines have jumps which optionally execute the following
129: instruction before the instruction jumped to, either conditionally or
130: unconditionally. To take advantage of this capability requires a new
131: compiler pass that would reorder instructions when possible. After
132: reload may be a good place for it.
133:
134: On some machines, the result of a load from memory is not available
135: until after the following instruction. The easiest way to support
136: these machines is to output each RTL load instruction as two assembler
137: instructions, the second being a no-op. Putting useful instructions
138: after the load instructions may be a similar task to putting them
139: after jump instructions.
140:
141: * Pipeline scheduling.
142:
143: On many machines, code gets faster if instructions are reordered
144: so that pipelines are kept full. Doing the best possible job of this
145: requires knowing which functional units each kind of instruction executes
146: in and how long the functional unit stays busy with it. Then the
147: goal is to reorder the instructions to keep many functional units
148: busy but never feed them so fast they must wait.
1.1 root 149:
1.1.1.3 ! root 150: * Can optimize by changing if (x) y; else z; into z; if (x) y;
! 151: if z and x do not interfere and z has no effects not undone by y.
! 152: This is desirable if z is faster than jumping.
! 153:
! 154: * For a two-insn loop on the 68020, such as
! 155: foo: movb a2@+,a3@+
! 156: jne foo
! 157: it is better to insert dbeq d0,foo before the jne.
! 158: d0 can be a junk register. The challenge is to fit this into
! 159: a portable framework: when can you detect this situation and
! 160: still be able to allocate a junk register?
! 161:
1.1 root 162: 2. Simpler porting.
163:
164: Right now, describing the target machine's instructions is done
165: cleanly, but describing its addressing mode is done with several
166: ad-hoc macro definitions. Porting would be much easier if there were
167: an RTL description for addressing modes like that for instructions.
168: Tools analogous to genflags and genrecog would generate macros from
169: this description.
170:
171: There would be one pattern in the address-description file for each
172: kind of addressing, and this pattern would have:
173:
174: * the RTL expression for the address
175: * C code to verify its validity (since that may depend on
176: the exact data).
177: * C code to print the address in assembler language.
178: * C code to convert the address into a valid one, if it is not valid.
179: (This would replace LEGITIMIZE_ADDRESS).
180: * Register constraints for all indeterminates that appear
181: in the RTL expression.
182:
183: 3. Other languages.
184:
1.1.1.2 root 185: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
186: desirable.
1.1 root 187:
1.1.1.2 root 188: Pascal, Modula-2 and Ada require the implementation of functions
189: within functions. Some of the mechanisms for this already exist.
1.1 root 190:
191: 4. Generalize the machine model.
192:
1.1.1.2 root 193: Some new compiler features may be needed to do a good job on machines
194: where static data needs to be addressed using base registers.
1.1 root 195:
1.1.1.2 root 196: Some machines have two stacks in different areas of memory, one used
197: for scalars and another for large objects. The compiler does not
198: now have a way to understand this.
1.1.1.3 ! root 199:
! 200: 5. Precompilation of header files.
! 201:
! 202: In the future, many programs will use thousands of lines of header files.
! 203: Compiling the headers might be slower than compiling the guts of any one
! 204: source file. Here is a scheme for precompiling header files to make
! 205: compilation faster for a sequence of headers which is often used.
! 206:
! 207: A precompiled header corresponds to a sequence of header files. The
! 208: preprocessor recognizes when the input starts with a sequence of
! 209: `#include' commands and searches a data base for a precompiled header
! 210: corresponding to that sequence. The modtimes of all these files are
! 211: stored in the data base so that one can tell whether the precompiled
! 212: header is still valid.
! 213:
! 214: For robustness, each directory should have its own collection of
! 215: precompiled headers and its own data base of them. Probably each
! 216: precompiled header would be a file and the data base would be one
! 217: more file.
! 218:
! 219: The data base records the entire collection of predefined macros and
! 220: their definitions, except for __FILE__, __LINE__ and __DATE__, for
! 221: each precompiled header. If this collection does not match the setup
! 222: at the start of the current compilation (including the results of -D
! 223: and -U switches), the precompiled header is inapplicable. It might
! 224: be possible to have distinct precompiled headers for the same sequence
! 225: of header files but different collections of predefined macros.
! 226:
! 227: The state of any option that affects macro processing, such as -ansi
! 228: or -traditional, must also be recorded, and the precompiled header is
! 229: valid only if these options match.
! 230:
! 231: The precompiled header contains an ordered series of strings. Some
! 232: strings are marked "unconditional"; these must be compiled each time
! 233: the precompiled header is used. Other strings are have keys, which
! 234: are identifiers. A string with keys must be compiled if at least one
! 235: of its keys is mentioned in the input. The order these strings appear
! 236: in the precompiled header is called their intrinsic order.
! 237:
! 238: The C preprocessor reads in the precompiled header file and scan all
! 239: the strings, making for each key an entry in the same symbol table
! 240: used for macros, pointing at a list of all the strings for which it is
! 241: a key. Each string must have a flag (one flag per string, not one per
! 242: key per string). The same code in `rescan' that detects references to
! 243: macros would detect a reference to a key and flag all of the strings
! 244: that it belongs to as needing to be output.
! 245:
! 246: Each of these strings is immediately recursively macroexpanded (i.e.
! 247: `rescan' is called), but the output from this is discarded. The
! 248: expansion is to detect any other keys mentioned in the string, and to
! 249: define any macros for which the string contains a #define. The key's
! 250: symbol table entry is be deleted to save time if the key is
! 251: encountered again, and to avoid an infinite recursion.
! 252:
! 253: The unconditional strings are macroexpanded with `rescan' (but the
! 254: output is discarded) at some time before anything is actually output.
! 255:
! 256: At the end of compilation, before any of the actual input text is
! 257: output, the list of strings is scanned in the intrinsic order, and
! 258: each string that is unconditional or flagged is output verbatim,
! 259: except that any #define lines are discarded.
! 260:
! 261: Precompiled headers would be constructed by explicit request with a
! 262: special tool. The first step is to run cpp on the sequence of header
! 263: files' contents. This would use a new option that would cause all
! 264: #define lines to be output unchanged as well as defining the macro.
! 265: The second step is to divide the output into strings, some keyed and
! 266: some unconditional. This division is done without changing the order
! 267: of the text being divided up.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.