|
|
1.1.1.6 ! root 1: 0. Improved efficiency. ! 2: ! 3: * Parse and output array initializers an element at a time, freeing ! 4: storage after each, instead of parsing the whole initializer first and ! 5: then outputting. This would reduce memory usage for large ! 6: initializers. ! 7: 1.1 root 8: 1. Better optimization. 9: 1.1.1.4 root 10: * Constants in unused inline functions 11: 12: It would be nice to delay output of string constants so that string 13: constants mentioned in unused inline functions are never generated. 14: Perhaps this would also take care of string constants in dead code. 15: 16: The difficulty is in finding a clean way for the RTL which refers 17: to the constant (currently, only by an assembler symbol name) 18: to point to the constant and cause it to be output. 19: 1.1 root 20: * More cse 21: 22: The techniques for doing full global cse are described in the 23: red dragon book. It is likely to be slow and use a lot of memory, 24: but it might be worth offering as an additional option. 25: 26: It is probably possible to extend cse to a few very frequent cases 27: without so much expense. 28: 29: For example, it is not very hard to handle cse through if-then 30: statements with no else clauses. Here's how to do it. On reaching a 31: label, notice that the label's use-count is 1 and that the last 32: preceding jump jumps conditionally to this label. Now you know it 33: is a simple if-then statement. Remove from the hash table 34: all the expressions that were entered since that jump insn 35: and you can continue with cse. 36: 37: It is probably not hard to handle cse from the end of a loop 38: around to the beginning, and a few loops would be greatly sped 39: up by this. 40: 1.1.1.3 root 41: * Support more general tail-recursion among different functions. 1.1 root 42: 1.1.1.3 root 43: This might be possible under certain circumstances, such as when 44: the argument lists of the functions have the same lengths. 45: Perhaps it could be done with a special declaration. 46: 47: You would need to verify in the calling function that it does not 48: use the addresses of any local variables and does not use setjmp. 49: 50: * Put short statics vars at low addresses and use short addressing mode? 51: Useful on the 68000/68020 and perhaps on the 32000 series, 52: provided one has a linker that works with the feature. 53: This is said to make a 15% speedup on the 68000. 54: This brings to mind Hayes' changes for Stanford MIPS. 55: 56: * Detect dead stores into memory? 57: 58: A store into memory is dead if it is followed by another store into 1.1.1.5 root 59: the same location; and, in between, there is no reference to anything 60: that might be that location (including no reference to a variable 61: address). 1.1.1.3 root 62: 63: * Loop optimization. 64: 65: Strength reduction and iteration variable elimination could be 66: smarter. They should know how to decide which iteration variables are 67: not worth making explicit because they can be computed as part of an 68: address calculation. Based on this information, they should decide 1.1.1.5 root 69: when it is desirable to eliminate one iteration variable and create 1.1.1.3 root 70: another in its place. 71: 72: It should be possible to compute what the value of an iteration 73: variable will be at the end of the loop, and eliminate the variable 74: within the loop by computing that value at the loop end. 75: 76: When a loop has a simple increment that adds 1, 77: instead of jumping in after the increment, 78: decrement the loop count and jump to the increment. 79: This allows aob insns to be used. 1.1 root 80: 1.1.1.2 root 81: * Using constraints on values. 1.1 root 82: 1.1.1.2 root 83: Many operations could be simplified based on knowledge of the 84: minimum and maximum possible values of a register at any particular time. 85: These limits could come from the data types in the tree, via rtl generation, 86: or they can be deduced from operations that are performed. For example, 87: the result of an `and' operation one of whose operands is 7 must be in 88: the range 0 to 7. Compare instructions also tell something about the 89: possible values of the operand, in the code beyond the test. 90: 91: Value constraints can be used to determine the results of a further 92: comparison. They can also indicate that certain `and' operations are 93: redundant. Constraints might permit a decrement and branch 94: instruction that checks zeroness to be used when the user has 95: specified to exit if negative. 1.1 root 96: 97: * Smarter reload pass. 98: 99: The reload pass as currently written can reload values only into registers 100: that are reserved for reloading. This means that in order to use a 101: register for reloading it must spill everything out of that register. 102: 103: It would be straightforward, though complicated, for reload1.c to keep 104: track, during its scan, of which hard registers were available at each 105: point in the function, and use for reloading even registers that were 106: free only at the point they were needed. This would avoid much spilling 107: and make better code. 108: 109: * Change the type of a variable. 110: 111: Sometimes a variable is declared as `int', it is assigned only once 112: from a value of type `char', and then it is used only by comparison 113: against constants. On many machines, better code would result if 114: the variable had type `char'. If the compiler could detect this 115: case, it could change the declaration of the variable and change 116: all the places that use it. 117: 118: * Order of subexpressions. 119: 120: It might be possible to make better code by paying attention 121: to the order in which to generate code for subexpressions of an expression. 122: 1.1.1.3 root 123: * More code motion. 1.1 root 124: 1.1.1.3 root 125: Consider hoisting common code up past conditional branches or 126: tablejumps. 127: 128: * Trace scheduling. 129: 130: This technique is said to be able to figure out which way a jump 131: will usually go, and rearrange the code to make that path the 132: faster one. 1.1 root 133: 134: * Distributive law. 135: 1.1.1.2 root 136: The C expression *(X + 4 * (Y + C)) compiles better on certain 137: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing 138: modes. It may be tricky to determine when, and for which machines, to 139: use each alternative. 140: 1.1.1.3 root 141: Some work has been done on this, in combine.c. 142: 1.1.1.2 root 143: * Jump-execute-next. 144: 145: Many recent machines have jumps which optionally execute the following 146: instruction before the instruction jumped to, either conditionally or 147: unconditionally. To take advantage of this capability requires a new 148: compiler pass that would reorder instructions when possible. After 149: reload may be a good place for it. 150: 151: On some machines, the result of a load from memory is not available 152: until after the following instruction. The easiest way to support 153: these machines is to output each RTL load instruction as two assembler 154: instructions, the second being a no-op. Putting useful instructions 155: after the load instructions may be a similar task to putting them 156: after jump instructions. 157: 158: * Pipeline scheduling. 159: 160: On many machines, code gets faster if instructions are reordered 161: so that pipelines are kept full. Doing the best possible job of this 162: requires knowing which functional units each kind of instruction executes 163: in and how long the functional unit stays busy with it. Then the 164: goal is to reorder the instructions to keep many functional units 165: busy but never feed them so fast they must wait. 1.1 root 166: 1.1.1.3 root 167: * Can optimize by changing if (x) y; else z; into z; if (x) y; 168: if z and x do not interfere and z has no effects not undone by y. 169: This is desirable if z is faster than jumping. 170: 171: * For a two-insn loop on the 68020, such as 172: foo: movb a2@+,a3@+ 173: jne foo 174: it is better to insert dbeq d0,foo before the jne. 175: d0 can be a junk register. The challenge is to fit this into 176: a portable framework: when can you detect this situation and 177: still be able to allocate a junk register? 178: 1.1.1.6 ! root 179: * For the 80387 floating point, perhaps it would be possible to use 3 ! 180: or 4 registers in the stack to hold register variables. (It would be ! 181: necessary to keep track of how those slots move in the stack as other ! 182: pushes and pops are done.) This is probably very tricky, but if ! 183: you are a GCC wizard and you care about the speed of floating point on ! 184: an 80386, you might want to work on it. ! 185: 1.1 root 186: 2. Simpler porting. 187: 188: Right now, describing the target machine's instructions is done 189: cleanly, but describing its addressing mode is done with several 190: ad-hoc macro definitions. Porting would be much easier if there were 191: an RTL description for addressing modes like that for instructions. 192: Tools analogous to genflags and genrecog would generate macros from 193: this description. 194: 195: There would be one pattern in the address-description file for each 196: kind of addressing, and this pattern would have: 197: 198: * the RTL expression for the address 199: * C code to verify its validity (since that may depend on 200: the exact data). 201: * C code to print the address in assembler language. 202: * C code to convert the address into a valid one, if it is not valid. 203: (This would replace LEGITIMIZE_ADDRESS). 204: * Register constraints for all indeterminates that appear 205: in the RTL expression. 206: 207: 3. Other languages. 208: 1.1.1.2 root 209: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are 210: desirable. 1.1 root 211: 1.1.1.2 root 212: Pascal, Modula-2 and Ada require the implementation of functions 213: within functions. Some of the mechanisms for this already exist. 1.1 root 214: 215: 4. Generalize the machine model. 216: 1.1.1.4 root 217: * Some new compiler features may be needed to do a good job on machines 1.1.1.2 root 218: where static data needs to be addressed using base registers. 1.1 root 219: 1.1.1.4 root 220: * Some machines have two stacks in different areas of memory, one used 1.1.1.2 root 221: for scalars and another for large objects. The compiler does not 222: now have a way to understand this. 1.1.1.3 root 223: 224: 5. Precompilation of header files. 225: 226: In the future, many programs will use thousands of lines of header files. 227: Compiling the headers might be slower than compiling the guts of any one 228: source file. Here is a scheme for precompiling header files to make 229: compilation faster for a sequence of headers which is often used. 230: 231: A precompiled header corresponds to a sequence of header files. The 232: preprocessor recognizes when the input starts with a sequence of 233: `#include' commands and searches a data base for a precompiled header 234: corresponding to that sequence. The modtimes of all these files are 235: stored in the data base so that one can tell whether the precompiled 236: header is still valid. 237: 238: For robustness, each directory should have its own collection of 239: precompiled headers and its own data base of them. Probably each 240: precompiled header would be a file and the data base would be one 241: more file. 242: 243: The data base records the entire collection of predefined macros and 244: their definitions, except for __FILE__, __LINE__ and __DATE__, for 245: each precompiled header. If this collection does not match the setup 246: at the start of the current compilation (including the results of -D 247: and -U switches), the precompiled header is inapplicable. It might 248: be possible to have distinct precompiled headers for the same sequence 249: of header files but different collections of predefined macros. 250: 251: The state of any option that affects macro processing, such as -ansi 252: or -traditional, must also be recorded, and the precompiled header is 253: valid only if these options match. 254: 255: The precompiled header contains an ordered series of strings. Some 256: strings are marked "unconditional"; these must be compiled each time 257: the precompiled header is used. Other strings are have keys, which 258: are identifiers. A string with keys must be compiled if at least one 259: of its keys is mentioned in the input. The order these strings appear 260: in the precompiled header is called their intrinsic order. 261: 262: The C preprocessor reads in the precompiled header file and scan all 263: the strings, making for each key an entry in the same symbol table 264: used for macros, pointing at a list of all the strings for which it is 265: a key. Each string must have a flag (one flag per string, not one per 266: key per string). The same code in `rescan' that detects references to 267: macros would detect a reference to a key and flag all of the strings 268: that it belongs to as needing to be output. 269: 270: Each of these strings is immediately recursively macroexpanded (i.e. 271: `rescan' is called), but the output from this is discarded. The 272: expansion is to detect any other keys mentioned in the string, and to 273: define any macros for which the string contains a #define. The key's 274: symbol table entry is be deleted to save time if the key is 275: encountered again, and to avoid an infinite recursion. 276: 277: The unconditional strings are macroexpanded with `rescan' (but the 278: output is discarded) at some time before anything is actually output. 279: 280: At the end of compilation, before any of the actual input text is 281: output, the list of strings is scanned in the intrinsic order, and 282: each string that is unconditional or flagged is output verbatim, 283: except that any #define lines are discarded. 284: 285: Precompiled headers would be constructed by explicit request with a 286: special tool. The first step is to run cpp on the sequence of header 287: files' contents. This would use a new option that would cause all 288: #define lines to be output unchanged as well as defining the macro. 289: The second step is to divide the output into strings, some keyed and 290: some unconditional. This division is done without changing the order 291: of the text being divided up. 1.1.1.4 root 292: 1.1.1.5 root 293: [email protected] has some ideas on this subject also. 294: 1.1.1.6 ! root 295: 6. Better documentation of how GCC works and how to port it. 1.1.1.4 root 296: 297: Here is an outline proposed by Allan Adler. 298: 299: I. Overview of this document 300: II. The machines on which GCC is implemented 301: A. Prose description of those characteristics of target machines and 302: their operating systems which are pertinent to the implementation 303: of GCC. 304: i. target machine characteristics 305: ii. comparison of this system of machine characteristics with 306: other systems of machine specification currently in use 307: B. Tables of the characteristics of the target machines on which 308: GCC is implemented. 309: C. A priori restrictions on the values of characteristics of target 310: machines, with special reference to those parts of the source code 311: which entail those restrictions 312: i. restrictions on individual characteristics 313: ii. restrictions involving relations between various characteristics 314: D. The use of GCC as a cross-compiler 315: i. cross-compilation to existing machines 316: ii. cross-compilation to non-existent machines 317: E. Assumptions which are made regarding the target machine 318: i. assumptions regarding the architecture of the target machine 319: ii. assumptions regarding the operating system of the target machine 320: iii. assumptions regarding software resident on the target machine 321: iv. where in the source code these assumptions are in effect made 322: III. A systematic approach to writing the files tm.h and xm.h 323: A. Macros which require special care or skill 324: B. Examples, with special reference to the underlying reasoning 325: IV. A systematic approach to writing the machine description file md 326: A. Minimal viable sets of insn descriptions 327: B. Examples, with special reference to the underlying reasoning 328: V. Uses of the file aux-output.c 329: VI. Specification of what constitutes correct performance of an 330: implementation of GCC 331: A. The components of GCC 332: B. The itinerary of a C program through GCC 333: C. A system of benchmark programs 1.1.1.5 root 334: D. What your RTL and assembler should look like with these benchmarks 1.1.1.4 root 335: E. Fine tuning for speed and size of compiled code 336: VII. A systematic procedure for debugging an implementation of GCC 337: A. Use of GDB 338: i. the macros in the file .gdbinit for GCC 339: ii. obstacles to the use of GDB 340: a. functions implemented as macros can't be called in GDB 341: B. Debugging without GDB 342: i. How to turn off the normal operation of GCC and access specific 343: parts of GCC 344: C. Debugging tools 345: D. Debugging the parser 346: i. how machine macros and insn definitions affect the parser 347: E. Debugging the recognizer 348: i. how machine macros and insn definitions affect the recognizer 349: 350: ditto for other components 351: 352: VIII. Data types used by GCC, with special reference to restrictions not 353: specified in the formal definition of the data type 354: IX. References to the literature for the algorithms used in GCC 355:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.