|
|
1.1.1.6 root 1: 0. Improved efficiency.
2:
3: * Parse and output array initializers an element at a time, freeing
4: storage after each, instead of parsing the whole initializer first and
5: then outputting. This would reduce memory usage for large
6: initializers.
7:
1.1 root 8: 1. Better optimization.
9:
1.1.1.4 root 10: * Constants in unused inline functions
11:
12: It would be nice to delay output of string constants so that string
13: constants mentioned in unused inline functions are never generated.
14: Perhaps this would also take care of string constants in dead code.
15:
16: The difficulty is in finding a clean way for the RTL which refers
17: to the constant (currently, only by an assembler symbol name)
18: to point to the constant and cause it to be output.
19:
1.1 root 20: * More cse
21:
1.1.1.7 root 22: The techniques for doing full global cse are described in the red
23: dragon book, or (a different version) in Frederick Chow's thesis from
24: Stanford. It is likely to be slow and use a lot of memory, but it
25: might be worth offering as an additional option.
1.1 root 26:
27: It is probably possible to extend cse to a few very frequent cases
28: without so much expense.
29:
30: For example, it is not very hard to handle cse through if-then
31: statements with no else clauses. Here's how to do it. On reaching a
32: label, notice that the label's use-count is 1 and that the last
33: preceding jump jumps conditionally to this label. Now you know it
34: is a simple if-then statement. Remove from the hash table
35: all the expressions that were entered since that jump insn
36: and you can continue with cse.
37:
38: It is probably not hard to handle cse from the end of a loop
39: around to the beginning, and a few loops would be greatly sped
40: up by this.
41:
1.1.1.3 root 42: * Support more general tail-recursion among different functions.
1.1 root 43:
1.1.1.3 root 44: This might be possible under certain circumstances, such as when
45: the argument lists of the functions have the same lengths.
46: Perhaps it could be done with a special declaration.
47:
48: You would need to verify in the calling function that it does not
49: use the addresses of any local variables and does not use setjmp.
50:
51: * Put short statics vars at low addresses and use short addressing mode?
1.1.1.7 root 52:
1.1.1.3 root 53: Useful on the 68000/68020 and perhaps on the 32000 series,
54: provided one has a linker that works with the feature.
55: This is said to make a 15% speedup on the 68000.
1.1.1.7 root 56:
57: * Keep global variables in registers.
58:
59: Here is a scheme for doing this. A global variable, or a local variable
60: whose address is taken, can be kept in a register for an entire function
61: if it does not use non-constant memory addresses and (for globals only)
62: does not call other functions. If the entire function does not meet
63: this criterion, a loop may.
64:
65: The VAR_DECL for such a variable would have to have two RTL expressions:
66: the true home in memory, and the pseudo-register used temporarily.
67: It is necessary to emit insns to copy the memory location into the
68: pseudo-register at the beginning of the function or loop, and perhaps
69: back out at the end. These insns should have REG_EQUIV notes so that,
70: if the pseudo-register does not get a hard register, it is spilled into
71: the memory location which exists in any case.
72:
73: The easiest way to set up these insns is to modify the routine
74: put_var_into_stack so that it does not apply to the entire function
75: (sparing any loops which contain nothing dangerous) and to call it at
76: the end of the function regardless of where in the function the
77: address of a local variable is taken. It would be called
78: unconditionally at the end of the function for all relevant global
79: variables.
80:
81: For debugger output, the thing to do is to invent a new binding level
82: around the appropriate loop and define the variable name as a register
83: variable with that scope.
84:
85: * Live-range splitting.
86:
87: Currently a variable is allocated a hard register either for the full
88: extent of its use or not at all. Sometimes it would be good to
89: allocate a variable a hard register for just part of a function; for
90: example, through a particular loop where the variable is mostly used,
91: or outside of a particular loop where the variable is not used. (The
92: latter is nice because it might let the variable be in a register most
93: of the time even though the loop needs all the registers.)
94:
95: It might not be very hard to do this in global-alloc.c when a variable
96: fails to get a hard register for its entire life span.
97:
98: The first step is to find a loop in which the variable is live, but
99: which is not the whole life span or nearly so. It's probably best to
100: use a loop in which the variable is heavily used.
101:
102: Then create a new pseudo-register to represent the variable in that loop.
103: Substitute this for the old pseudo-register there, and insert move insns
104: to copy between the two at the loop entry and all exits. (When several
105: such moves are inserted at the same place, some new feature should be
106: added to say that none of those registers conflict merely because of
107: overlap between the new moves. And the reload pass should reorder them
108: so that a store precedes a load, for any given hard register.)
109:
110: After doing this for all the reasonable candidates, run global-alloc
111: over again. With luck, one of the two pseudo-registers will be fit
112: somewhere. It may even have a much higher priority due to its reduced
113: life span.
114:
115: There will be no room in general for the new pseudo-registers in
116: basic_block_live_at_start, so there will need to be a second such
117: matrix exclusively for the new ones. Various other vectors indexed by
118: register number will have to be made bigger, or there will have to be
119: secondary extender vectors just for global-alloc.
120:
121: A simple new feature could arrange that both pseudo-registers get the
122: same stack slot if they both fail to get hard registers.
123:
124: Other compilers split live ranges when they are not connected, or
125: try to split off pieces `at the edge'. I think splitting around loops
126: will provide more speedup.
127:
128: Creating a fake binding block and a new like-named variable with
129: shorter life span and different address might succeed in describing
130: this technique for the debugger.
1.1.1.3 root 131:
132: * Detect dead stores into memory?
133:
134: A store into memory is dead if it is followed by another store into
1.1.1.5 root 135: the same location; and, in between, there is no reference to anything
136: that might be that location (including no reference to a variable
137: address).
1.1.1.3 root 138:
139: * Loop optimization.
140:
141: Strength reduction and iteration variable elimination could be
142: smarter. They should know how to decide which iteration variables are
143: not worth making explicit because they can be computed as part of an
144: address calculation. Based on this information, they should decide
1.1.1.5 root 145: when it is desirable to eliminate one iteration variable and create
1.1.1.3 root 146: another in its place.
147:
148: It should be possible to compute what the value of an iteration
149: variable will be at the end of the loop, and eliminate the variable
150: within the loop by computing that value at the loop end.
151:
152: When a loop has a simple increment that adds 1,
153: instead of jumping in after the increment,
154: decrement the loop count and jump to the increment.
155: This allows aob insns to be used.
1.1 root 156:
1.1.1.2 root 157: * Using constraints on values.
1.1 root 158:
1.1.1.2 root 159: Many operations could be simplified based on knowledge of the
160: minimum and maximum possible values of a register at any particular time.
161: These limits could come from the data types in the tree, via rtl generation,
162: or they can be deduced from operations that are performed. For example,
163: the result of an `and' operation one of whose operands is 7 must be in
164: the range 0 to 7. Compare instructions also tell something about the
165: possible values of the operand, in the code beyond the test.
166:
167: Value constraints can be used to determine the results of a further
168: comparison. They can also indicate that certain `and' operations are
169: redundant. Constraints might permit a decrement and branch
170: instruction that checks zeroness to be used when the user has
171: specified to exit if negative.
1.1 root 172:
173: * Smarter reload pass.
174:
175: The reload pass as currently written can reload values only into registers
176: that are reserved for reloading. This means that in order to use a
177: register for reloading it must spill everything out of that register.
178:
179: It would be straightforward, though complicated, for reload1.c to keep
180: track, during its scan, of which hard registers were available at each
181: point in the function, and use for reloading even registers that were
182: free only at the point they were needed. This would avoid much spilling
183: and make better code.
184:
185: * Change the type of a variable.
186:
187: Sometimes a variable is declared as `int', it is assigned only once
188: from a value of type `char', and then it is used only by comparison
189: against constants. On many machines, better code would result if
190: the variable had type `char'. If the compiler could detect this
191: case, it could change the declaration of the variable and change
192: all the places that use it.
193:
194: * Order of subexpressions.
195:
196: It might be possible to make better code by paying attention
197: to the order in which to generate code for subexpressions of an expression.
198:
1.1.1.3 root 199: * More code motion.
1.1 root 200:
1.1.1.3 root 201: Consider hoisting common code up past conditional branches or
202: tablejumps.
203:
204: * Trace scheduling.
205:
206: This technique is said to be able to figure out which way a jump
207: will usually go, and rearrange the code to make that path the
208: faster one.
1.1 root 209:
210: * Distributive law.
211:
1.1.1.2 root 212: The C expression *(X + 4 * (Y + C)) compiles better on certain
213: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
214: modes. It may be tricky to determine when, and for which machines, to
215: use each alternative.
216:
1.1.1.3 root 217: Some work has been done on this, in combine.c.
218:
219: * Can optimize by changing if (x) y; else z; into z; if (x) y;
220: if z and x do not interfere and z has no effects not undone by y.
221: This is desirable if z is faster than jumping.
222:
223: * For a two-insn loop on the 68020, such as
224: foo: movb a2@+,a3@+
225: jne foo
226: it is better to insert dbeq d0,foo before the jne.
227: d0 can be a junk register. The challenge is to fit this into
228: a portable framework: when can you detect this situation and
229: still be able to allocate a junk register?
230:
1.1 root 231: 2. Simpler porting.
232:
233: Right now, describing the target machine's instructions is done
234: cleanly, but describing its addressing mode is done with several
235: ad-hoc macro definitions. Porting would be much easier if there were
236: an RTL description for addressing modes like that for instructions.
237: Tools analogous to genflags and genrecog would generate macros from
238: this description.
239:
240: There would be one pattern in the address-description file for each
241: kind of addressing, and this pattern would have:
242:
243: * the RTL expression for the address
244: * C code to verify its validity (since that may depend on
245: the exact data).
246: * C code to print the address in assembler language.
247: * C code to convert the address into a valid one, if it is not valid.
248: (This would replace LEGITIMIZE_ADDRESS).
249: * Register constraints for all indeterminates that appear
250: in the RTL expression.
251:
252: 3. Other languages.
253:
1.1.1.2 root 254: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
255: desirable.
1.1 root 256:
1.1.1.2 root 257: Pascal, Modula-2 and Ada require the implementation of functions
258: within functions. Some of the mechanisms for this already exist.
1.1 root 259:
1.1.1.7 root 260: 4. More extensions.
261:
262: * Label-addresses as expressions.
263:
264: It would be nice to have access to the addresses of labels; to be able to
265: store them in variables, or initialize vectors of them.
266:
267: Alas, `&label0' is the address of the variable named label0, which is
268: unrelated to the label with that name. Some other syntax is needed.
269: Perhaps colon as a unary operator? That is ambiguous with `?:' with
270: the middle operand omitted. Perhaps ^ as a unary operator? Perhaps
271: `__label__ label0' could mean the value of label0? Its type could be
272: `void *'. `goto *EXP' could be used to go to a value of type `void
273: *'--no ambiguity there.
274:
275: Jump optimization and flow analysis must know about computed jumps,
276: but that is not hard. Each basic block headed by a possible target of
277: computed jumps must be considered a successor of each basic block
278: ending in a computed jump. Aside from this, I believe no other
279: optimizer changes are needed.
280:
281: Next question: stack levels. In most functions, there is no problem,
282: but it would be a shame to make a feature that doesn't work together
283: with other features. Here is an idea:
284:
285: For each label that might need stack level restoration, construct a
286: shadow-label which will restore the stack and jump to the user-label.
287: Then use the address of the shadow label for label0 when someone asks
288: for that of label0. Jump optimization will delete all the shadow labels
289: if the function has no computed gotos.
290:
291: * Generated unique labels. Have some way of generating distinct labels
292: for use in extended asm statements. I don't know what a good syntax would
293: be.
294:
295: 5. Generalize the machine model.
1.1 root 296:
1.1.1.4 root 297: * Some new compiler features may be needed to do a good job on machines
1.1.1.2 root 298: where static data needs to be addressed using base registers.
1.1 root 299:
1.1.1.4 root 300: * Some machines have two stacks in different areas of memory, one used
1.1.1.2 root 301: for scalars and another for large objects. The compiler does not
302: now have a way to understand this.
1.1.1.3 root 303:
1.1.1.8 ! root 304: 6. Better documentation of how GCC works and how to port it.
1.1.1.4 root 305:
306: Here is an outline proposed by Allan Adler.
307:
308: I. Overview of this document
309: II. The machines on which GCC is implemented
310: A. Prose description of those characteristics of target machines and
311: their operating systems which are pertinent to the implementation
312: of GCC.
313: i. target machine characteristics
314: ii. comparison of this system of machine characteristics with
315: other systems of machine specification currently in use
316: B. Tables of the characteristics of the target machines on which
317: GCC is implemented.
318: C. A priori restrictions on the values of characteristics of target
319: machines, with special reference to those parts of the source code
320: which entail those restrictions
321: i. restrictions on individual characteristics
322: ii. restrictions involving relations between various characteristics
323: D. The use of GCC as a cross-compiler
324: i. cross-compilation to existing machines
325: ii. cross-compilation to non-existent machines
326: E. Assumptions which are made regarding the target machine
327: i. assumptions regarding the architecture of the target machine
328: ii. assumptions regarding the operating system of the target machine
329: iii. assumptions regarding software resident on the target machine
330: iv. where in the source code these assumptions are in effect made
331: III. A systematic approach to writing the files tm.h and xm.h
332: A. Macros which require special care or skill
333: B. Examples, with special reference to the underlying reasoning
334: IV. A systematic approach to writing the machine description file md
335: A. Minimal viable sets of insn descriptions
336: B. Examples, with special reference to the underlying reasoning
337: V. Uses of the file aux-output.c
338: VI. Specification of what constitutes correct performance of an
339: implementation of GCC
340: A. The components of GCC
341: B. The itinerary of a C program through GCC
342: C. A system of benchmark programs
1.1.1.5 root 343: D. What your RTL and assembler should look like with these benchmarks
1.1.1.4 root 344: E. Fine tuning for speed and size of compiled code
345: VII. A systematic procedure for debugging an implementation of GCC
346: A. Use of GDB
347: i. the macros in the file .gdbinit for GCC
348: ii. obstacles to the use of GDB
349: a. functions implemented as macros can't be called in GDB
350: B. Debugging without GDB
351: i. How to turn off the normal operation of GCC and access specific
352: parts of GCC
353: C. Debugging tools
354: D. Debugging the parser
355: i. how machine macros and insn definitions affect the parser
356: E. Debugging the recognizer
357: i. how machine macros and insn definitions affect the recognizer
358:
359: ditto for other components
360:
361: VIII. Data types used by GCC, with special reference to restrictions not
362: specified in the formal definition of the data type
363: IX. References to the literature for the algorithms used in GCC
364:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.