|
|
1.1 root 1: 0. Improved efficiency.
2:
3: * Parse and output array initializers an element at a time, freeing
4: storage after each, instead of parsing the whole initializer first and
5: then outputting. This would reduce memory usage for large
6: initializers.
7:
8: * See if the techniques describe in Oct 1991 SIGPLAN Notices
9: (Frazer and Hanson) are applicable to GCC.
10:
11: 1. Better optimization.
12:
13: * Constants in unused inline functions
14:
15: It would be nice to delay output of string constants so that string
16: constants mentioned in unused inline functions are never generated.
17: Perhaps this would also take care of string constants in dead code.
18:
19: The difficulty is in finding a clean way for the RTL which refers
20: to the constant (currently, only by an assembler symbol name)
21: to point to the constant and cause it to be output.
22:
23: * More cse
24:
25: The techniques for doing full global cse are described in the red
26: dragon book, or (a different version) in Frederick Chow's thesis from
27: Stanford. It is likely to be slow and use a lot of memory, but it
28: might be worth offering as an additional option.
29:
30: It is probably possible to extend cse to a few very frequent cases
31: without so much expense.
32:
33: For example, it is not very hard to handle cse through if-then
34: statements with no else clauses. Here's how to do it. On reaching a
35: label, notice that the label's use-count is 1 and that the last
36: preceding jump jumps conditionally to this label. Now you know it
37: is a simple if-then statement. Remove from the hash table
38: all the expressions that were entered since that jump insn
39: and you can continue with cse.
40:
41: It is probably not hard to handle cse from the end of a loop
42: around to the beginning, and a few loops would be greatly sped
43: up by this.
44:
45: * Optimize a sequence of if statements whose conditions are exclusive.
46:
47: It is possible to optimize
48:
49: if (x == 1) ...;
50: if (x == 2) ...;
51: if (x == 3) ...;
52:
53: into
54:
55: if (x == 1) ...;
56: else if (x == 2) ...;
57: else if (x == 3) ...;
58:
59: provided that x is not altered by the contents of the if statements.
60:
61: It's not certain whether this is worth doing. Perhaps programmers
62: nearly always write the else's themselves, leaving few opportunities
63: to improve anything.
64:
65: * Un-cse.
66:
67: Perhaps we should have an un-cse step right after cse, which tries to
68: replace a reg with its value if the value can be substituted for the
69: reg everywhere, if that looks like an improvement. Which is if the
70: reg is used only a few times. Use rtx_cost to determine if the
71: change is really an improvement.
72:
73: * Clean up how cse works.
74:
75: The scheme is that each value has just one hash entry. The
76: first_same_value and next_same_value chains are no longer needed.
77:
78: For arithmetic, each hash table elt has the following slots:
79:
80: * Operation. This is an rtx code.
81: * Mode.
82: * Operands 0, 1 and 2. These point to other hash table elements.
83:
84: So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
85: first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
86: points to. Then we put these elts into operands 0 and 1 of a new elt.
87: We put PLUS and SI into the new elt.
88:
89: Registers and mem refs would never be entered into the table as such.
90: However, the values they contain would be entered. There would be a
91: table indexed by regno which points at the hash entry for the value in
92: that reg.
93:
94: The hash entry index now plays the role of a qty number.
95: We still need qty_first_reg, reg_next_eqv, etc. to record which regs
96: share a particular qty.
97:
98: When a reg is used whose contents are unknown, we need to create a
99: hash table entry whose contents say "unknown", as a place holder for
100: whatever the reg contains. If that reg is added to something, then
101: the hash entry for the sum will refer to the "unknown" entry. Use
102: UNKNOWN for the rtx code in this entry. This replaces make_new_qty.
103:
104: For a constant, a unique hash entry would be made based on the
105: value of the constant.
106:
107: What about MEM? Each time a memory address is referenced, we need a
108: qty (a hash table elt) to represent what is in it. (Just as for a
109: register.) If this isn't known, create one, just as for a reg whose
110: contents are unknown.
111:
112: We need a way to find all mem refs that still contain a certain value.
113: Do this with a chain of hash elts (for memory addresses) that point to
114: locations that hold the value. The hash elt for the value itself should
115: point to the start of the chain. It would be good for the hash elt
116: for an address to point to the hash elt for the contents of that address
117: (but this ptr can be null if the contents have never been entered).
118:
119: With this data structure, nothing need ever be invalidated except
120: the lists of which regs or mems hold a particular value. It is easy
121: to see if there is a reg or mem that is equiv to a particular value.
122: If the value is constant, it is always explicitly constant.
123:
124: * Support more general tail-recursion among different functions.
125:
126: This might be possible under certain circumstances, such as when
127: the argument lists of the functions have the same lengths.
128: Perhaps it could be done with a special declaration.
129:
130: You would need to verify in the calling function that it does not
131: use the addresses of any local variables and does not use setjmp.
132:
133: * Put short statics vars at low addresses and use short addressing mode?
134:
135: Useful on the 68000/68020 and perhaps on the 32000 series,
136: provided one has a linker that works with the feature.
137: This is said to make a 15% speedup on the 68000.
138:
139: * Keep global variables in registers.
140:
141: Here is a scheme for doing this. A global variable, or a local variable
142: whose address is taken, can be kept in a register for an entire function
143: if it does not use non-constant memory addresses and (for globals only)
144: does not call other functions. If the entire function does not meet
145: this criterion, a loop may.
146:
147: The VAR_DECL for such a variable would have to have two RTL expressions:
148: the true home in memory, and the pseudo-register used temporarily.
149: It is necessary to emit insns to copy the memory location into the
150: pseudo-register at the beginning of the function or loop, and perhaps
151: back out at the end. These insns should have REG_EQUIV notes so that,
152: if the pseudo-register does not get a hard register, it is spilled into
153: the memory location which exists in any case.
154:
155: The easiest way to set up these insns is to modify the routine
156: put_var_into_stack so that it does not apply to the entire function
157: (sparing any loops which contain nothing dangerous) and to call it at
158: the end of the function regardless of where in the function the
159: address of a local variable is taken. It would be called
160: unconditionally at the end of the function for all relevant global
161: variables.
162:
163: For debugger output, the thing to do is to invent a new binding level
164: around the appropriate loop and define the variable name as a register
165: variable with that scope.
166:
167: * Live-range splitting.
168:
169: Currently a variable is allocated a hard register either for the full
170: extent of its use or not at all. Sometimes it would be good to
171: allocate a variable a hard register for just part of a function; for
172: example, through a particular loop where the variable is mostly used,
173: or outside of a particular loop where the variable is not used. (The
174: latter is nice because it might let the variable be in a register most
175: of the time even though the loop needs all the registers.)
176:
177: It might not be very hard to do this in global.c when a variable
178: fails to get a hard register for its entire life span.
179:
180: The first step is to find a loop in which the variable is live, but
181: which is not the whole life span or nearly so. It's probably best to
182: use a loop in which the variable is heavily used.
183:
184: Then create a new pseudo-register to represent the variable in that loop.
185: Substitute this for the old pseudo-register there, and insert move insns
186: to copy between the two at the loop entry and all exits. (When several
187: such moves are inserted at the same place, some new feature should be
188: added to say that none of those registers conflict merely because of
189: overlap between the new moves. And the reload pass should reorder them
190: so that a store precedes a load, for any given hard register.)
191:
192: After doing this for all the reasonable candidates, run global-alloc
193: over again. With luck, one of the two pseudo-registers will be fit
194: somewhere. It may even have a much higher priority due to its reduced
195: life span.
196:
197: There will be no room in general for the new pseudo-registers in
198: basic_block_live_at_start, so there will need to be a second such
199: matrix exclusively for the new ones. Various other vectors indexed by
200: register number will have to be made bigger, or there will have to be
201: secondary extender vectors just for global-alloc.
202:
203: A simple new feature could arrange that both pseudo-registers get the
204: same stack slot if they both fail to get hard registers.
205:
206: Other compilers split live ranges when they are not connected, or
207: try to split off pieces `at the edge'. I think splitting around loops
208: will provide more speedup.
209:
210: Creating a fake binding block and a new like-named variable with
211: shorter life span and different address might succeed in describing
212: this technique for the debugger.
213:
214: * Detect dead stores into memory?
215:
216: A store into memory is dead if it is followed by another store into
217: the same location; and, in between, there is no reference to anything
218: that might be that location (including no reference to a variable
219: address).
220:
221: * Loop optimization.
222:
223: Strength reduction and iteration variable elimination could be
224: smarter. They should know how to decide which iteration variables are
225: not worth making explicit because they can be computed as part of an
226: address calculation. Based on this information, they should decide
227: when it is desirable to eliminate one iteration variable and create
228: another in its place.
229:
230: It should be possible to compute what the value of an iteration
231: variable will be at the end of the loop, and eliminate the variable
232: within the loop by computing that value at the loop end.
233:
234: When a loop has a simple increment that adds 1,
235: instead of jumping in after the increment,
236: decrement the loop count and jump to the increment.
237: This allows aob insns to be used.
238:
239: * Using constraints on values.
240:
241: Many operations could be simplified based on knowledge of the
242: minimum and maximum possible values of a register at any particular time.
243: These limits could come from the data types in the tree, via rtl generation,
244: or they can be deduced from operations that are performed. For example,
245: the result of an `and' operation one of whose operands is 7 must be in
246: the range 0 to 7. Compare instructions also tell something about the
247: possible values of the operand, in the code beyond the test.
248:
249: Value constraints can be used to determine the results of a further
250: comparison. They can also indicate that certain `and' operations are
251: redundant. Constraints might permit a decrement and branch
252: instruction that checks zeroness to be used when the user has
253: specified to exit if negative.
254:
255: * Smarter reload pass.
256:
257: The reload pass as currently written can reload values only into registers
258: that are reserved for reloading. This means that in order to use a
259: register for reloading it must spill everything out of that register.
260:
261: It would be straightforward, though complicated, for reload1.c to keep
262: track, during its scan, of which hard registers were available at each
263: point in the function, and use for reloading even registers that were
264: free only at the point they were needed. This would avoid much spilling
265: and make better code.
266:
267: * Change the type of a variable.
268:
269: Sometimes a variable is declared as `int', it is assigned only once
270: from a value of type `char', and then it is used only by comparison
271: against constants. On many machines, better code would result if
272: the variable had type `char'. If the compiler could detect this
273: case, it could change the declaration of the variable and change
274: all the places that use it.
275:
276: * Better handling for very sparse switches.
277:
278: There may be cases where it would be better to compile a switch
279: statement to use a fixed hash table rather than the current
280: combination of jump tables and binary search.
281:
282: * Order of subexpressions.
283:
284: It might be possible to make better code by paying attention
285: to the order in which to generate code for subexpressions of an expression.
286:
287: * More code motion.
288:
289: Consider hoisting common code up past conditional branches or
290: tablejumps.
291:
292: * Trace scheduling.
293:
294: This technique is said to be able to figure out which way a jump
295: will usually go, and rearrange the code to make that path the
296: faster one.
297:
298: * Distributive law.
299:
300: The C expression *(X + 4 * (Y + C)) compiles better on certain
301: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
302: modes. It may be tricky to determine when, and for which machines, to
303: use each alternative.
304:
305: Some work has been done on this, in combine.c.
306:
307: * Can optimize by changing if (x) y; else z; into z; if (x) y;
308: if z and x do not interfere and z has no effects not undone by y.
309: This is desirable if z is faster than jumping.
310:
311: * For a two-insn loop on the 68020, such as
312: foo: movb a2@+,a3@+
313: jne foo
314: it is better to insert dbeq d0,foo before the jne.
315: d0 can be a junk register. The challenge is to fit this into
316: a portable framework: when can you detect this situation and
317: still be able to allocate a junk register?
318:
319: 2. Simpler porting.
320:
321: Right now, describing the target machine's instructions is done
322: cleanly, but describing its addressing mode is done with several
323: ad-hoc macro definitions. Porting would be much easier if there were
324: an RTL description for addressing modes like that for instructions.
325: Tools analogous to genflags and genrecog would generate macros from
326: this description.
327:
328: There would be one pattern in the address-description file for each
329: kind of addressing, and this pattern would have:
330:
331: * the RTL expression for the address
332: * C code to verify its validity (since that may depend on
333: the exact data).
334: * C code to print the address in assembler language.
335: * C code to convert the address into a valid one, if it is not valid.
336: (This would replace LEGITIMIZE_ADDRESS).
337: * Register constraints for all indeterminates that appear
338: in the RTL expression.
339:
340: 3. Other languages.
341:
342: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
343: desirable.
344:
345: Pascal, Modula-2 and Ada require the implementation of functions
346: within functions. Some of the mechanisms for this already exist.
347:
348: 4. More extensions.
349:
350: * Generated unique labels. Have some way of generating distinct labels
351: for use in extended asm statements. I don't know what a good syntax would
352: be.
353:
354: * A way of defining a structure containing a union, in which the choice of
355: union alternative is controlled by a previous structure component.
356:
357: Here is a possible syntax for this.
358:
359: struct foo {
360: enum { INT, DOUBLE } code;
361: auto union { case INT: int i; case DOUBLE: double d;} value : code;
362: };
363:
364: * Allow constructor expressions as lvalues, like this:
365:
366: (struct foo) {a, b, c} = foo();
367:
368: This would call foo, which returns a structure, and then store the
369: several components of the structure into the variables a, b, and c.
370:
371: 5. Generalize the machine model.
372:
373: * Some new compiler features may be needed to do a good job on machines
374: where static data needs to be addressed using base registers.
375:
376: * Some machines have two stacks in different areas of memory, one used
377: for scalars and another for large objects. The compiler does not
378: now have a way to understand this.
379:
380: 6. Useful warnings.
381:
382: * Warn about statements that are undefined because the order of
383: evaluation of increment operators makes a big difference. Here is an
384: example:
385:
386: *foo++ = hack (*foo);
387:
388: 7. Better documentation of how GCC works and how to port it.
389:
390: Here is an outline proposed by Allan Adler.
391:
392: I. Overview of this document
393: II. The machines on which GCC is implemented
394: A. Prose description of those characteristics of target machines and
395: their operating systems which are pertinent to the implementation
396: of GCC.
397: i. target machine characteristics
398: ii. comparison of this system of machine characteristics with
399: other systems of machine specification currently in use
400: B. Tables of the characteristics of the target machines on which
401: GCC is implemented.
402: C. A priori restrictions on the values of characteristics of target
403: machines, with special reference to those parts of the source code
404: which entail those restrictions
405: i. restrictions on individual characteristics
406: ii. restrictions involving relations between various characteristics
407: D. The use of GCC as a cross-compiler
408: i. cross-compilation to existing machines
409: ii. cross-compilation to non-existent machines
410: E. Assumptions which are made regarding the target machine
411: i. assumptions regarding the architecture of the target machine
412: ii. assumptions regarding the operating system of the target machine
413: iii. assumptions regarding software resident on the target machine
414: iv. where in the source code these assumptions are in effect made
415: III. A systematic approach to writing the files tm.h and xm.h
416: A. Macros which require special care or skill
417: B. Examples, with special reference to the underlying reasoning
418: IV. A systematic approach to writing the machine description file md
419: A. Minimal viable sets of insn descriptions
420: B. Examples, with special reference to the underlying reasoning
421: V. Uses of the file aux-output.c
422: VI. Specification of what constitutes correct performance of an
423: implementation of GCC
424: A. The components of GCC
425: B. The itinerary of a C program through GCC
426: C. A system of benchmark programs
427: D. What your RTL and assembler should look like with these benchmarks
428: E. Fine tuning for speed and size of compiled code
429: VII. A systematic procedure for debugging an implementation of GCC
430: A. Use of GDB
431: i. the macros in the file .gdbinit for GCC
432: ii. obstacles to the use of GDB
433: a. functions implemented as macros can't be called in GDB
434: B. Debugging without GDB
435: i. How to turn off the normal operation of GCC and access specific
436: parts of GCC
437: C. Debugging tools
438: D. Debugging the parser
439: i. how machine macros and insn definitions affect the parser
440: E. Debugging the recognizer
441: i. how machine macros and insn definitions affect the recognizer
442:
443: ditto for other components
444:
445: VIII. Data types used by GCC, with special reference to restrictions not
446: specified in the formal definition of the data type
447: IX. References to the literature for the algorithms used in GCC
448:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.