|
|
1.1 root 1: 1. Better optimization.
2:
3: * More cse
4:
5: The techniques for doing full global cse are described in the
6: red dragon book. It is likely to be slow and use a lot of memory,
7: but it might be worth offering as an additional option.
8:
9: It is probably possible to extend cse to a few very frequent cases
10: without so much expense.
11:
12: For example, it is not very hard to handle cse through if-then
13: statements with no else clauses. Here's how to do it. On reaching a
14: label, notice that the label's use-count is 1 and that the last
15: preceding jump jumps conditionally to this label. Now you know it
16: is a simple if-then statement. Remove from the hash table
17: all the expressions that were entered since that jump insn
18: and you can continue with cse.
19:
20: It is probably not hard to handle cse from the end of a loop
21: around to the beginning, and a few loops would be greatly sped
22: up by this.
23:
24: * Iteration variables and strength reduction.
25:
26: The red dragon book describes standard techniques for these kinds
27: of loop optimization. But be careful! These optimization techniques
28: don't always make the code better. You need to avoid performing
29: the standard transformations unless they are greatly worth while.
30:
31: In many common cases it is possible to deduce that an iteration
32: variable is always positive during the loop. This information
33: may make it possible to use decrement-and-branch instructions
34: whose branch conditions are inconvenient. For example, the 68000
35: `dbra' instruction branches if the value was not equal to zero.
36: Therefore, it is not applicable to `for (i = 10; i >= 0; i--)'
37: unless the compiler can know that I will never be negative
38: before it is decremented.
39:
40: * Special local optimizations.
41:
42: The instruction combiner finds only certain classes of local optimizations.
43: For example, it cannot use the 68020 instruction `cmp2' because it would
44: not think to combine the instructions that would be equivalent to a `cmp2'.
45:
46: In order to take advantage of such instructions, the combiner would need
47: special hints as to which instructions to consider combining. To be
48: generally useful, this feature would have to be controlled somehow
49: by new information in the machine description.
50:
51: * Smarter reload pass.
52:
53: The reload pass as currently written can reload values only into registers
54: that are reserved for reloading. This means that in order to use a
55: register for reloading it must spill everything out of that register.
56:
57: It would be straightforward, though complicated, for reload1.c to keep
58: track, during its scan, of which hard registers were available at each
59: point in the function, and use for reloading even registers that were
60: free only at the point they were needed. This would avoid much spilling
61: and make better code.
62:
63: * Change the type of a variable.
64:
65: Sometimes a variable is declared as `int', it is assigned only once
66: from a value of type `char', and then it is used only by comparison
67: against constants. On many machines, better code would result if
68: the variable had type `char'. If the compiler could detect this
69: case, it could change the declaration of the variable and change
70: all the places that use it.
71:
72: * Order of subexpressions.
73:
74: It might be possible to make better code by paying attention
75: to the order in which to generate code for subexpressions of an expression.
76:
77: * Better code for switch statements.
78:
79: If a switch statement has only a few cases, a sequence of conditional
80: branches is generated for it, rather than a jump table. It would
81: be better to output a binary tree of branches.
82:
83: * Distributive law.
84:
85: *(X + 4 * (Y + C)) compiles better as *(X + 4*C + 4*Y)
86: on some machines because of known addressing modes.
87: It may be tricky to determine when, and for which machines,
88: to use each alternative.
89:
90: 2. Simpler porting.
91:
92: Right now, describing the target machine's instructions is done
93: cleanly, but describing its addressing mode is done with several
94: ad-hoc macro definitions. Porting would be much easier if there were
95: an RTL description for addressing modes like that for instructions.
96: Tools analogous to genflags and genrecog would generate macros from
97: this description.
98:
99: There would be one pattern in the address-description file for each
100: kind of addressing, and this pattern would have:
101:
102: * the RTL expression for the address
103: * C code to verify its validity (since that may depend on
104: the exact data).
105: * C code to print the address in assembler language.
106: * C code to convert the address into a valid one, if it is not valid.
107: (This would replace LEGITIMIZE_ADDRESS).
108: * Register constraints for all indeterminates that appear
109: in the RTL expression.
110:
111: 3. Other languages.
112:
113: Front ends for Pascal, Fortran, Algol, Cobol and Ada are desirable.
114:
115: Pascal requires the implementation of functions within functions.
116: Some of the mechanisms for this already exist.
117:
118: 4. Generalize the machine model.
119:
120: 4.A. Parameters in registers.
121:
122: One some machines, conventions are that some parameters are passed
123: in general registers. The compiler currently cannot handle this.
124:
125: This requires changes in the code in expr.c for function calls.
126: For function entry, changes are required in stmt.c, and in
127: layout_parms, and perhaps also in final and in register allocation,
128: but the last should be minor.
129:
130: Where stmt.c now copies the stack slot into a pseudo register,
131: instead copy the special argument register into a pseudo register.
132: Use the pseudo register throughout the body of the function to
133: represent the parameter. That way, parameters can still be spilled
134: to the stack.
135:
136: 4.B. Jump-execute-next.
137:
138: Many recent machines have jumps which execute the following instruction
139: before the instruction jumped to. To take advantage of this capability
140: requires a new compiler pass that would reorder instructions when possible.
141: After reload is a good place for it.
142:
143: 5. Add a profiling feature like Berkeley's -pg,
144: or other debugging and measurement features.
145:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.