|
|
1.1 ! root 1: 0. Improved efficiency. ! 2: ! 3: * Parse and output array initializers an element at a time, freeing ! 4: storage after each, instead of parsing the whole initializer first and ! 5: then outputting. This would reduce memory usage for large ! 6: initializers. ! 7: ! 8: * See if the techniques describe in Oct 1991 SIGPLAN Notices ! 9: (Frazer and Hanson) are applicable to GCC. ! 10: ! 11: 1. Better optimization. ! 12: ! 13: * Constants in unused inline functions ! 14: ! 15: It would be nice to delay output of string constants so that string ! 16: constants mentioned in unused inline functions are never generated. ! 17: Perhaps this would also take care of string constants in dead code. ! 18: ! 19: The difficulty is in finding a clean way for the RTL which refers ! 20: to the constant (currently, only by an assembler symbol name) ! 21: to point to the constant and cause it to be output. ! 22: ! 23: * More cse ! 24: ! 25: The techniques for doing full global cse are described in the red ! 26: dragon book, or (a different version) in Frederick Chow's thesis from ! 27: Stanford. It is likely to be slow and use a lot of memory, but it ! 28: might be worth offering as an additional option. ! 29: ! 30: It is probably possible to extend cse to a few very frequent cases ! 31: without so much expense. ! 32: ! 33: For example, it is not very hard to handle cse through if-then ! 34: statements with no else clauses. Here's how to do it. On reaching a ! 35: label, notice that the label's use-count is 1 and that the last ! 36: preceding jump jumps conditionally to this label. Now you know it ! 37: is a simple if-then statement. Remove from the hash table ! 38: all the expressions that were entered since that jump insn ! 39: and you can continue with cse. ! 40: ! 41: It is probably not hard to handle cse from the end of a loop ! 42: around to the beginning, and a few loops would be greatly sped ! 43: up by this. ! 44: ! 45: * Optimize a sequence of if statements whose conditions are exclusive. ! 46: ! 47: It is possible to optimize ! 48: ! 49: if (x == 1) ...; ! 50: if (x == 2) ...; ! 51: if (x == 3) ...; ! 52: ! 53: into ! 54: ! 55: if (x == 1) ...; ! 56: else if (x == 2) ...; ! 57: else if (x == 3) ...; ! 58: ! 59: provided that x is not altered by the contents of the if statements. ! 60: ! 61: It's not certain whether this is worth doing. Perhaps programmers ! 62: nearly always write the else's themselves, leaving few opportunities ! 63: to improve anything. ! 64: ! 65: * Un-cse. ! 66: ! 67: Perhaps we should have an un-cse step right after cse, which tries to ! 68: replace a reg with its value if the value can be substituted for the ! 69: reg everywhere, if that looks like an improvement. Which is if the ! 70: reg is used only a few times. Use rtx_cost to determine if the ! 71: change is really an improvement. ! 72: ! 73: * Clean up how cse works. ! 74: ! 75: The scheme is that each value has just one hash entry. The ! 76: first_same_value and next_same_value chains are no longer needed. ! 77: ! 78: For arithmetic, each hash table elt has the following slots: ! 79: ! 80: * Operation. This is an rtx code. ! 81: * Mode. ! 82: * Operands 0, 1 and 2. These point to other hash table elements. ! 83: ! 84: So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we ! 85: first enter (CONST_INT 104) and find the entry that (REG:SI 30) now ! 86: points to. Then we put these elts into operands 0 and 1 of a new elt. ! 87: We put PLUS and SI into the new elt. ! 88: ! 89: Registers and mem refs would never be entered into the table as such. ! 90: However, the values they contain would be entered. There would be a ! 91: table indexed by regno which points at the hash entry for the value in ! 92: that reg. ! 93: ! 94: The hash entry index now plays the role of a qty number. ! 95: We still need qty_first_reg, reg_next_eqv, etc. to record which regs ! 96: share a particular qty. ! 97: ! 98: When a reg is used whose contents are unknown, we need to create a ! 99: hash table entry whose contents say "unknown", as a place holder for ! 100: whatever the reg contains. If that reg is added to something, then ! 101: the hash entry for the sum will refer to the "unknown" entry. Use ! 102: UNKNOWN for the rtx code in this entry. This replaces make_new_qty. ! 103: ! 104: For a constant, a unique hash entry would be made based on the ! 105: value of the constant. ! 106: ! 107: What about MEM? Each time a memory address is referenced, we need a ! 108: qty (a hash table elt) to represent what is in it. (Just as for a ! 109: register.) If this isn't known, create one, just as for a reg whose ! 110: contents are unknown. ! 111: ! 112: We need a way to find all mem refs that still contain a certain value. ! 113: Do this with a chain of hash elts (for memory addresses) that point to ! 114: locations that hold the value. The hash elt for the value itself should ! 115: point to the start of the chain. It would be good for the hash elt ! 116: for an address to point to the hash elt for the contents of that address ! 117: (but this ptr can be null if the contents have never been entered). ! 118: ! 119: With this data structure, nothing need ever be invalidated except ! 120: the lists of which regs or mems hold a particular value. It is easy ! 121: to see if there is a reg or mem that is equiv to a particular value. ! 122: If the value is constant, it is always explicitly constant. ! 123: ! 124: * Support more general tail-recursion among different functions. ! 125: ! 126: This might be possible under certain circumstances, such as when ! 127: the argument lists of the functions have the same lengths. ! 128: Perhaps it could be done with a special declaration. ! 129: ! 130: You would need to verify in the calling function that it does not ! 131: use the addresses of any local variables and does not use setjmp. ! 132: ! 133: * Put short statics vars at low addresses and use short addressing mode? ! 134: ! 135: Useful on the 68000/68020 and perhaps on the 32000 series, ! 136: provided one has a linker that works with the feature. ! 137: This is said to make a 15% speedup on the 68000. ! 138: ! 139: * Keep global variables in registers. ! 140: ! 141: Here is a scheme for doing this. A global variable, or a local variable ! 142: whose address is taken, can be kept in a register for an entire function ! 143: if it does not use non-constant memory addresses and (for globals only) ! 144: does not call other functions. If the entire function does not meet ! 145: this criterion, a loop may. ! 146: ! 147: The VAR_DECL for such a variable would have to have two RTL expressions: ! 148: the true home in memory, and the pseudo-register used temporarily. ! 149: It is necessary to emit insns to copy the memory location into the ! 150: pseudo-register at the beginning of the function or loop, and perhaps ! 151: back out at the end. These insns should have REG_EQUIV notes so that, ! 152: if the pseudo-register does not get a hard register, it is spilled into ! 153: the memory location which exists in any case. ! 154: ! 155: The easiest way to set up these insns is to modify the routine ! 156: put_var_into_stack so that it does not apply to the entire function ! 157: (sparing any loops which contain nothing dangerous) and to call it at ! 158: the end of the function regardless of where in the function the ! 159: address of a local variable is taken. It would be called ! 160: unconditionally at the end of the function for all relevant global ! 161: variables. ! 162: ! 163: For debugger output, the thing to do is to invent a new binding level ! 164: around the appropriate loop and define the variable name as a register ! 165: variable with that scope. ! 166: ! 167: * Live-range splitting. ! 168: ! 169: Currently a variable is allocated a hard register either for the full ! 170: extent of its use or not at all. Sometimes it would be good to ! 171: allocate a variable a hard register for just part of a function; for ! 172: example, through a particular loop where the variable is mostly used, ! 173: or outside of a particular loop where the variable is not used. (The ! 174: latter is nice because it might let the variable be in a register most ! 175: of the time even though the loop needs all the registers.) ! 176: ! 177: It might not be very hard to do this in global.c when a variable ! 178: fails to get a hard register for its entire life span. ! 179: ! 180: The first step is to find a loop in which the variable is live, but ! 181: which is not the whole life span or nearly so. It's probably best to ! 182: use a loop in which the variable is heavily used. ! 183: ! 184: Then create a new pseudo-register to represent the variable in that loop. ! 185: Substitute this for the old pseudo-register there, and insert move insns ! 186: to copy between the two at the loop entry and all exits. (When several ! 187: such moves are inserted at the same place, some new feature should be ! 188: added to say that none of those registers conflict merely because of ! 189: overlap between the new moves. And the reload pass should reorder them ! 190: so that a store precedes a load, for any given hard register.) ! 191: ! 192: After doing this for all the reasonable candidates, run global-alloc ! 193: over again. With luck, one of the two pseudo-registers will be fit ! 194: somewhere. It may even have a much higher priority due to its reduced ! 195: life span. ! 196: ! 197: There will be no room in general for the new pseudo-registers in ! 198: basic_block_live_at_start, so there will need to be a second such ! 199: matrix exclusively for the new ones. Various other vectors indexed by ! 200: register number will have to be made bigger, or there will have to be ! 201: secondary extender vectors just for global-alloc. ! 202: ! 203: A simple new feature could arrange that both pseudo-registers get the ! 204: same stack slot if they both fail to get hard registers. ! 205: ! 206: Other compilers split live ranges when they are not connected, or ! 207: try to split off pieces `at the edge'. I think splitting around loops ! 208: will provide more speedup. ! 209: ! 210: Creating a fake binding block and a new like-named variable with ! 211: shorter life span and different address might succeed in describing ! 212: this technique for the debugger. ! 213: ! 214: * Detect dead stores into memory? ! 215: ! 216: A store into memory is dead if it is followed by another store into ! 217: the same location; and, in between, there is no reference to anything ! 218: that might be that location (including no reference to a variable ! 219: address). ! 220: ! 221: * Loop optimization. ! 222: ! 223: Strength reduction and iteration variable elimination could be ! 224: smarter. They should know how to decide which iteration variables are ! 225: not worth making explicit because they can be computed as part of an ! 226: address calculation. Based on this information, they should decide ! 227: when it is desirable to eliminate one iteration variable and create ! 228: another in its place. ! 229: ! 230: It should be possible to compute what the value of an iteration ! 231: variable will be at the end of the loop, and eliminate the variable ! 232: within the loop by computing that value at the loop end. ! 233: ! 234: When a loop has a simple increment that adds 1, ! 235: instead of jumping in after the increment, ! 236: decrement the loop count and jump to the increment. ! 237: This allows aob insns to be used. ! 238: ! 239: * Using constraints on values. ! 240: ! 241: Many operations could be simplified based on knowledge of the ! 242: minimum and maximum possible values of a register at any particular time. ! 243: These limits could come from the data types in the tree, via rtl generation, ! 244: or they can be deduced from operations that are performed. For example, ! 245: the result of an `and' operation one of whose operands is 7 must be in ! 246: the range 0 to 7. Compare instructions also tell something about the ! 247: possible values of the operand, in the code beyond the test. ! 248: ! 249: Value constraints can be used to determine the results of a further ! 250: comparison. They can also indicate that certain `and' operations are ! 251: redundant. Constraints might permit a decrement and branch ! 252: instruction that checks zeroness to be used when the user has ! 253: specified to exit if negative. ! 254: ! 255: * Smarter reload pass. ! 256: ! 257: The reload pass as currently written can reload values only into registers ! 258: that are reserved for reloading. This means that in order to use a ! 259: register for reloading it must spill everything out of that register. ! 260: ! 261: It would be straightforward, though complicated, for reload1.c to keep ! 262: track, during its scan, of which hard registers were available at each ! 263: point in the function, and use for reloading even registers that were ! 264: free only at the point they were needed. This would avoid much spilling ! 265: and make better code. ! 266: ! 267: * Change the type of a variable. ! 268: ! 269: Sometimes a variable is declared as `int', it is assigned only once ! 270: from a value of type `char', and then it is used only by comparison ! 271: against constants. On many machines, better code would result if ! 272: the variable had type `char'. If the compiler could detect this ! 273: case, it could change the declaration of the variable and change ! 274: all the places that use it. ! 275: ! 276: * Better handling for very sparse switches. ! 277: ! 278: There may be cases where it would be better to compile a switch ! 279: statement to use a fixed hash table rather than the current ! 280: combination of jump tables and binary search. ! 281: ! 282: * Order of subexpressions. ! 283: ! 284: It might be possible to make better code by paying attention ! 285: to the order in which to generate code for subexpressions of an expression. ! 286: ! 287: * More code motion. ! 288: ! 289: Consider hoisting common code up past conditional branches or ! 290: tablejumps. ! 291: ! 292: * Trace scheduling. ! 293: ! 294: This technique is said to be able to figure out which way a jump ! 295: will usually go, and rearrange the code to make that path the ! 296: faster one. ! 297: ! 298: * Distributive law. ! 299: ! 300: The C expression *(X + 4 * (Y + C)) compiles better on certain ! 301: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing ! 302: modes. It may be tricky to determine when, and for which machines, to ! 303: use each alternative. ! 304: ! 305: Some work has been done on this, in combine.c. ! 306: ! 307: * Can optimize by changing if (x) y; else z; into z; if (x) y; ! 308: if z and x do not interfere and z has no effects not undone by y. ! 309: This is desirable if z is faster than jumping. ! 310: ! 311: * For a two-insn loop on the 68020, such as ! 312: foo: movb a2@+,a3@+ ! 313: jne foo ! 314: it is better to insert dbeq d0,foo before the jne. ! 315: d0 can be a junk register. The challenge is to fit this into ! 316: a portable framework: when can you detect this situation and ! 317: still be able to allocate a junk register? ! 318: ! 319: 2. Simpler porting. ! 320: ! 321: Right now, describing the target machine's instructions is done ! 322: cleanly, but describing its addressing mode is done with several ! 323: ad-hoc macro definitions. Porting would be much easier if there were ! 324: an RTL description for addressing modes like that for instructions. ! 325: Tools analogous to genflags and genrecog would generate macros from ! 326: this description. ! 327: ! 328: There would be one pattern in the address-description file for each ! 329: kind of addressing, and this pattern would have: ! 330: ! 331: * the RTL expression for the address ! 332: * C code to verify its validity (since that may depend on ! 333: the exact data). ! 334: * C code to print the address in assembler language. ! 335: * C code to convert the address into a valid one, if it is not valid. ! 336: (This would replace LEGITIMIZE_ADDRESS). ! 337: * Register constraints for all indeterminates that appear ! 338: in the RTL expression. ! 339: ! 340: 3. Other languages. ! 341: ! 342: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are ! 343: desirable. ! 344: ! 345: Pascal, Modula-2 and Ada require the implementation of functions ! 346: within functions. Some of the mechanisms for this already exist. ! 347: ! 348: 4. More extensions. ! 349: ! 350: * Generated unique labels. Have some way of generating distinct labels ! 351: for use in extended asm statements. I don't know what a good syntax would ! 352: be. ! 353: ! 354: * A way of defining a structure containing a union, in which the choice of ! 355: union alternative is controlled by a previous structure component. ! 356: ! 357: Here is a possible syntax for this. ! 358: ! 359: struct foo { ! 360: enum { INT, DOUBLE } code; ! 361: auto union { case INT: int i; case DOUBLE: double d;} value : code; ! 362: }; ! 363: ! 364: * Allow constructor expressions as lvalues, like this: ! 365: ! 366: (struct foo) {a, b, c} = foo(); ! 367: ! 368: This would call foo, which returns a structure, and then store the ! 369: several components of the structure into the variables a, b, and c. ! 370: ! 371: 5. Generalize the machine model. ! 372: ! 373: * Some new compiler features may be needed to do a good job on machines ! 374: where static data needs to be addressed using base registers. ! 375: ! 376: * Some machines have two stacks in different areas of memory, one used ! 377: for scalars and another for large objects. The compiler does not ! 378: now have a way to understand this. ! 379: ! 380: 6. Useful warnings. ! 381: ! 382: * Warn about statements that are undefined because the order of ! 383: evaluation of increment operators makes a big difference. Here is an ! 384: example: ! 385: ! 386: *foo++ = hack (*foo); ! 387: ! 388: 7. Better documentation of how GCC works and how to port it. ! 389: ! 390: Here is an outline proposed by Allan Adler. ! 391: ! 392: I. Overview of this document ! 393: II. The machines on which GCC is implemented ! 394: A. Prose description of those characteristics of target machines and ! 395: their operating systems which are pertinent to the implementation ! 396: of GCC. ! 397: i. target machine characteristics ! 398: ii. comparison of this system of machine characteristics with ! 399: other systems of machine specification currently in use ! 400: B. Tables of the characteristics of the target machines on which ! 401: GCC is implemented. ! 402: C. A priori restrictions on the values of characteristics of target ! 403: machines, with special reference to those parts of the source code ! 404: which entail those restrictions ! 405: i. restrictions on individual characteristics ! 406: ii. restrictions involving relations between various characteristics ! 407: D. The use of GCC as a cross-compiler ! 408: i. cross-compilation to existing machines ! 409: ii. cross-compilation to non-existent machines ! 410: E. Assumptions which are made regarding the target machine ! 411: i. assumptions regarding the architecture of the target machine ! 412: ii. assumptions regarding the operating system of the target machine ! 413: iii. assumptions regarding software resident on the target machine ! 414: iv. where in the source code these assumptions are in effect made ! 415: III. A systematic approach to writing the files tm.h and xm.h ! 416: A. Macros which require special care or skill ! 417: B. Examples, with special reference to the underlying reasoning ! 418: IV. A systematic approach to writing the machine description file md ! 419: A. Minimal viable sets of insn descriptions ! 420: B. Examples, with special reference to the underlying reasoning ! 421: V. Uses of the file aux-output.c ! 422: VI. Specification of what constitutes correct performance of an ! 423: implementation of GCC ! 424: A. The components of GCC ! 425: B. The itinerary of a C program through GCC ! 426: C. A system of benchmark programs ! 427: D. What your RTL and assembler should look like with these benchmarks ! 428: E. Fine tuning for speed and size of compiled code ! 429: VII. A systematic procedure for debugging an implementation of GCC ! 430: A. Use of GDB ! 431: i. the macros in the file .gdbinit for GCC ! 432: ii. obstacles to the use of GDB ! 433: a. functions implemented as macros can't be called in GDB ! 434: B. Debugging without GDB ! 435: i. How to turn off the normal operation of GCC and access specific ! 436: parts of GCC ! 437: C. Debugging tools ! 438: D. Debugging the parser ! 439: i. how machine macros and insn definitions affect the parser ! 440: E. Debugging the recognizer ! 441: i. how machine macros and insn definitions affect the recognizer ! 442: ! 443: ditto for other components ! 444: ! 445: VIII. Data types used by GCC, with special reference to restrictions not ! 446: specified in the formal definition of the data type ! 447: IX. References to the literature for the algorithms used in GCC ! 448:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.