|
|
1.1 root 1: 1. Better optimization.
2:
3: * More cse
4:
5: The techniques for doing full global cse are described in the
6: red dragon book. It is likely to be slow and use a lot of memory,
7: but it might be worth offering as an additional option.
8:
9: It is probably possible to extend cse to a few very frequent cases
10: without so much expense.
11:
12: For example, it is not very hard to handle cse through if-then
13: statements with no else clauses. Here's how to do it. On reaching a
14: label, notice that the label's use-count is 1 and that the last
15: preceding jump jumps conditionally to this label. Now you know it
16: is a simple if-then statement. Remove from the hash table
17: all the expressions that were entered since that jump insn
18: and you can continue with cse.
19:
20: It is probably not hard to handle cse from the end of a loop
21: around to the beginning, and a few loops would be greatly sped
22: up by this.
23:
24: * Iteration variables and strength reduction.
25:
26: The red dragon book describes standard techniques for these kinds
27: of loop optimization. But be careful! These optimization techniques
28: don't always make the code better. You need to avoid performing
29: the standard transformations unless they are greatly worth while.
30:
31: In many common cases it is possible to deduce that an iteration
32: variable is always positive during the loop. This information
33: may make it possible to use decrement-and-branch instructions
34: whose branch conditions are inconvenient. For example, the 68000
35: `dbra' instruction branches if the value was not equal to zero.
36: Therefore, it is not applicable to `for (i = 10; i >= 0; i--)'
37: unless the compiler can know that I will never be negative
38: before it is decremented.
39:
1.1.1.2 ! root 40: * Using constraints on values.
1.1 root 41:
1.1.1.2 ! root 42: Many operations could be simplified based on knowledge of the
! 43: minimum and maximum possible values of a register at any particular time.
! 44: These limits could come from the data types in the tree, via rtl generation,
! 45: or they can be deduced from operations that are performed. For example,
! 46: the result of an `and' operation one of whose operands is 7 must be in
! 47: the range 0 to 7. Compare instructions also tell something about the
! 48: possible values of the operand, in the code beyond the test.
! 49:
! 50: Value constraints can be used to determine the results of a further
! 51: comparison. They can also indicate that certain `and' operations are
! 52: redundant. Constraints might permit a decrement and branch
! 53: instruction that checks zeroness to be used when the user has
! 54: specified to exit if negative.
1.1 root 55:
56: * Smarter reload pass.
57:
58: The reload pass as currently written can reload values only into registers
59: that are reserved for reloading. This means that in order to use a
60: register for reloading it must spill everything out of that register.
61:
62: It would be straightforward, though complicated, for reload1.c to keep
63: track, during its scan, of which hard registers were available at each
64: point in the function, and use for reloading even registers that were
65: free only at the point they were needed. This would avoid much spilling
66: and make better code.
67:
68: * Change the type of a variable.
69:
70: Sometimes a variable is declared as `int', it is assigned only once
71: from a value of type `char', and then it is used only by comparison
72: against constants. On many machines, better code would result if
73: the variable had type `char'. If the compiler could detect this
74: case, it could change the declaration of the variable and change
75: all the places that use it.
76:
77: * Order of subexpressions.
78:
79: It might be possible to make better code by paying attention
80: to the order in which to generate code for subexpressions of an expression.
81:
82: * Better code for switch statements.
83:
84: If a switch statement has only a few cases, a sequence of conditional
85: branches is generated for it, rather than a jump table. It would
86: be better to output a binary tree of branches.
87:
88: * Distributive law.
89:
1.1.1.2 ! root 90: The C expression *(X + 4 * (Y + C)) compiles better on certain
! 91: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
! 92: modes. It may be tricky to determine when, and for which machines, to
! 93: use each alternative.
! 94:
! 95: * Jump-execute-next.
! 96:
! 97: Many recent machines have jumps which optionally execute the following
! 98: instruction before the instruction jumped to, either conditionally or
! 99: unconditionally. To take advantage of this capability requires a new
! 100: compiler pass that would reorder instructions when possible. After
! 101: reload may be a good place for it.
! 102:
! 103: On some machines, the result of a load from memory is not available
! 104: until after the following instruction. The easiest way to support
! 105: these machines is to output each RTL load instruction as two assembler
! 106: instructions, the second being a no-op. Putting useful instructions
! 107: after the load instructions may be a similar task to putting them
! 108: after jump instructions.
! 109:
! 110: * Pipeline scheduling.
! 111:
! 112: On many machines, code gets faster if instructions are reordered
! 113: so that pipelines are kept full. Doing the best possible job of this
! 114: requires knowing which functional units each kind of instruction executes
! 115: in and how long the functional unit stays busy with it. Then the
! 116: goal is to reorder the instructions to keep many functional units
! 117: busy but never feed them so fast they must wait.
1.1 root 118:
119: 2. Simpler porting.
120:
121: Right now, describing the target machine's instructions is done
122: cleanly, but describing its addressing mode is done with several
123: ad-hoc macro definitions. Porting would be much easier if there were
124: an RTL description for addressing modes like that for instructions.
125: Tools analogous to genflags and genrecog would generate macros from
126: this description.
127:
128: There would be one pattern in the address-description file for each
129: kind of addressing, and this pattern would have:
130:
131: * the RTL expression for the address
132: * C code to verify its validity (since that may depend on
133: the exact data).
134: * C code to print the address in assembler language.
135: * C code to convert the address into a valid one, if it is not valid.
136: (This would replace LEGITIMIZE_ADDRESS).
137: * Register constraints for all indeterminates that appear
138: in the RTL expression.
139:
140: 3. Other languages.
141:
1.1.1.2 ! root 142: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
! 143: desirable.
1.1 root 144:
1.1.1.2 ! root 145: Pascal, Modula-2 and Ada require the implementation of functions
! 146: within functions. Some of the mechanisms for this already exist.
1.1 root 147:
148: 4. Generalize the machine model.
149:
1.1.1.2 ! root 150: Some new compiler features may be needed to do a good job on machines
! 151: where static data needs to be addressed using base registers.
1.1 root 152:
1.1.1.2 ! root 153: Some machines have two stacks in different areas of memory, one used
! 154: for scalars and another for large objects. The compiler does not
! 155: now have a way to understand this.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.