Annotation of gcc/PROJECTS, revision 1.1.1.3

1.1       root        1: 1. Better optimization.
                      2: 
                      3: * More cse
                      4: 
                      5: The techniques for doing full global cse are described in the
                      6: red dragon book.  It is likely to be slow and use a lot of memory,
                      7: but it might be worth offering as an additional option.
                      8: 
                      9: It is probably possible to extend cse to a few very frequent cases
                     10: without so much expense.
                     11: 
                     12: For example, it is not very hard to handle cse through if-then
                     13: statements with no else clauses.  Here's how to do it.  On reaching a
                     14: label, notice that the label's use-count is 1 and that the last
                     15: preceding jump jumps conditionally to this label.  Now you know it
                     16: is a simple if-then statement.  Remove from the hash table
                     17: all the expressions that were entered since that jump insn
                     18: and you can continue with cse.
                     19: 
                     20: It is probably not hard to handle cse from the end of a loop
                     21: around to the beginning, and a few loops would be greatly sped
                     22: up by this.
                     23: 
1.1.1.3 ! root       24: * Support more general tail-recursion among different functions.
1.1       root       25: 
1.1.1.3 ! root       26: This might be possible under certain circumstances, such as when
        !            27: the argument lists of the functions have the same lengths.
        !            28: Perhaps it could be done with a special declaration.
        !            29: 
        !            30: You would need to verify in the calling function that it does not
        !            31: use the addresses of any local variables and does not use setjmp.
        !            32: 
        !            33: * Put short statics vars at low addresses and use short addressing mode?
        !            34: Useful on the 68000/68020 and perhaps on the 32000 series,
        !            35: provided one has a linker that works with the feature.
        !            36: This is said to make a 15% speedup on the 68000.
        !            37: This brings to mind Hayes' changes for Stanford MIPS.
        !            38: 
        !            39: * Detect dead stores into memory?
        !            40: 
        !            41: A store into memory is dead if it is followed by another store into
        !            42: the same location; and, in between, there is no reference to
        !            43: that anything that might be that location (including no reference
        !            44: to a variable address).
        !            45: 
        !            46: * Loop optimization.
        !            47: 
        !            48: Strength reduction and iteration variable elimination could be
        !            49: smarter.  They should know how to decide which iteration variables are
        !            50: not worth making explicit because they can be computed as part of an
        !            51: address calculation.  Based on this information, they should decide
        !            52: when it is desirable to eliminate one iteration iable and create
        !            53: another in its place.
        !            54: 
        !            55: It should be possible to compute what the value of an iteration
        !            56: variable will be at the end of the loop, and eliminate the variable
        !            57: within the loop by computing that value at the loop end.
        !            58: 
        !            59: When a loop has a simple increment that adds 1,
        !            60: instead of jumping in after the increment,
        !            61: decrement the loop count and jump to the increment.
        !            62: This allows aob insns to be used.
1.1       root       63: 
1.1.1.2   root       64: * Using constraints on values.
1.1       root       65: 
1.1.1.2   root       66: Many operations could be simplified based on knowledge of the
                     67: minimum and maximum possible values of a register at any particular time.
                     68: These limits could come from the data types in the tree, via rtl generation,
                     69: or they can be deduced from operations that are performed.  For example,
                     70: the result of an `and' operation one of whose operands is 7 must be in
                     71: the range 0 to 7.  Compare instructions also tell something about the
                     72: possible values of the operand, in the code beyond the test.
                     73: 
                     74: Value constraints can be used to determine the results of a further
                     75: comparison.  They can also indicate that certain `and' operations are
                     76: redundant.  Constraints might permit a decrement and branch
                     77: instruction that checks zeroness to be used when the user has
                     78: specified to exit if negative.
1.1       root       79: 
                     80: * Smarter reload pass.
                     81: 
                     82: The reload pass as currently written can reload values only into registers
                     83: that are reserved for reloading.  This means that in order to use a
                     84: register for reloading it must spill everything out of that register.
                     85: 
                     86: It would be straightforward, though complicated, for reload1.c to keep
                     87: track, during its scan, of which hard registers were available at each
                     88: point in the function, and use for reloading even registers that were
                     89: free only at the point they were needed.  This would avoid much spilling
                     90: and make better code.
                     91: 
                     92: * Change the type of a variable.
                     93: 
                     94: Sometimes a variable is declared as `int', it is assigned only once
                     95: from a value of type `char', and then it is used only by comparison
                     96: against constants.  On many machines, better code would result if
                     97: the variable had type `char'.  If the compiler could detect this
                     98: case, it could change the declaration of the variable and change
                     99: all the places that use it.
                    100: 
                    101: * Order of subexpressions.
                    102: 
                    103: It might be possible to make better code by paying attention
                    104: to the order in which to generate code for subexpressions of an expression.
                    105: 
1.1.1.3 ! root      106: * More code motion.
1.1       root      107: 
1.1.1.3 ! root      108: Consider hoisting common code up past conditional branches or
        !           109: tablejumps.
        !           110: 
        !           111: * Trace scheduling.
        !           112: 
        !           113: This technique is said to be able to figure out which way a jump
        !           114: will usually go, and rearrange the code to make that path the
        !           115: faster one.
1.1       root      116: 
                    117: * Distributive law.
                    118: 
1.1.1.2   root      119: The C expression *(X + 4 * (Y + C)) compiles better on certain
                    120: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
                    121: modes.  It may be tricky to determine when, and for which machines, to
                    122: use each alternative.
                    123: 
1.1.1.3 ! root      124: Some work has been done on this, in combine.c.
        !           125: 
1.1.1.2   root      126: * Jump-execute-next.
                    127: 
                    128: Many recent machines have jumps which optionally execute the following
                    129: instruction before the instruction jumped to, either conditionally or
                    130: unconditionally.  To take advantage of this capability requires a new
                    131: compiler pass that would reorder instructions when possible.  After
                    132: reload may be a good place for it.
                    133: 
                    134: On some machines, the result of a load from memory is not available
                    135: until after the following instruction.  The easiest way to support
                    136: these machines is to output each RTL load instruction as two assembler
                    137: instructions, the second being a no-op.  Putting useful instructions
                    138: after the load instructions may be a similar task to putting them
                    139: after jump instructions.
                    140: 
                    141: * Pipeline scheduling.
                    142: 
                    143: On many machines, code gets faster if instructions are reordered
                    144: so that pipelines are kept full.  Doing the best possible job of this
                    145: requires knowing which functional units each kind of instruction executes
                    146: in and how long the functional unit stays busy with it.  Then the
                    147: goal is to reorder the instructions to keep many functional units
                    148: busy but never feed them so fast they must wait.
1.1       root      149: 
1.1.1.3 ! root      150: * Can optimize by changing if (x) y; else z; into z; if (x) y;
        !           151: if z and x do not interfere and z has no effects not undone by y.
        !           152: This is desirable if z is faster than jumping.
        !           153: 
        !           154: * For a two-insn loop on the 68020, such as
        !           155:   foo: movb    a2@+,a3@+
        !           156:        jne     foo
        !           157: it is better to insert dbeq d0,foo before the jne.
        !           158: d0 can be a junk register.  The challenge is to fit this into
        !           159: a portable framework: when can you detect this situation and
        !           160: still be able to allocate a junk register?
        !           161: 
1.1       root      162: 2. Simpler porting.
                    163: 
                    164: Right now, describing the target machine's instructions is done
                    165: cleanly, but describing its addressing mode is done with several
                    166: ad-hoc macro definitions.  Porting would be much easier if there were
                    167: an RTL description for addressing modes like that for instructions.
                    168: Tools analogous to genflags and genrecog would generate macros from
                    169: this description.
                    170: 
                    171: There would be one pattern in the address-description file for each
                    172: kind of addressing, and this pattern would have:
                    173: 
                    174:   * the RTL expression for the address
                    175:   * C code to verify its validity (since that may depend on
                    176:     the exact data).
                    177:   * C code to print the address in assembler language.
                    178:   * C code to convert the address into a valid one, if it is not valid.
                    179:     (This would replace LEGITIMIZE_ADDRESS).
                    180:   * Register constraints for all indeterminates that appear
                    181:     in the RTL expression.
                    182: 
                    183: 3. Other languages.
                    184: 
1.1.1.2   root      185: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
                    186: desirable.
1.1       root      187: 
1.1.1.2   root      188: Pascal, Modula-2 and Ada require the implementation of functions
                    189: within functions.  Some of the mechanisms for this already exist.
1.1       root      190: 
                    191: 4. Generalize the machine model.
                    192: 
1.1.1.2   root      193: Some new compiler features may be needed to do a good job on machines
                    194: where static data needs to be addressed using base registers.
1.1       root      195: 
1.1.1.2   root      196: Some machines have two stacks in different areas of memory, one used
                    197: for scalars and another for large objects.  The compiler does not
                    198: now have a way to understand this.
1.1.1.3 ! root      199: 
        !           200: 5. Precompilation of header files.
        !           201: 
        !           202: In the future, many programs will use thousands of lines of header files.
        !           203: Compiling the headers might be slower than compiling the guts of any one
        !           204: source file.  Here is a scheme for precompiling header files to make
        !           205: compilation faster for a sequence of headers which is often used.
        !           206: 
        !           207: A precompiled header corresponds to a sequence of header files.  The
        !           208: preprocessor recognizes when the input starts with a sequence of
        !           209: `#include' commands and searches a data base for a precompiled header
        !           210: corresponding to that sequence.  The modtimes of all these files are
        !           211: stored in the data base so that one can tell whether the precompiled
        !           212: header is still valid.
        !           213: 
        !           214: For robustness, each directory should have its own collection of
        !           215: precompiled headers and its own data base of them.  Probably each
        !           216: precompiled header would be a file and the data base would be one
        !           217: more file.
        !           218: 
        !           219: The data base records the entire collection of predefined macros and
        !           220: their definitions, except for __FILE__, __LINE__ and __DATE__, for
        !           221: each precompiled header.  If this collection does not match the setup
        !           222: at the start of the current compilation (including the results of -D
        !           223: and -U switches), the precompiled header is inapplicable.  It might
        !           224: be possible to have distinct precompiled headers for the same sequence
        !           225: of header files but different collections of predefined macros.
        !           226: 
        !           227: The state of any option that affects macro processing, such as -ansi
        !           228: or -traditional, must also be recorded, and the precompiled header is
        !           229: valid only if these options match.
        !           230: 
        !           231: The precompiled header contains an ordered series of strings.  Some
        !           232: strings are marked "unconditional"; these must be compiled each time
        !           233: the precompiled header is used.  Other strings are have keys, which
        !           234: are identifiers.  A string with keys must be compiled if at least one
        !           235: of its keys is mentioned in the input.  The order these strings appear
        !           236: in the precompiled header is called their intrinsic order.
        !           237: 
        !           238: The C preprocessor reads in the precompiled header file and scan all
        !           239: the strings, making for each key an entry in the same symbol table
        !           240: used for macros, pointing at a list of all the strings for which it is
        !           241: a key.  Each string must have a flag (one flag per string, not one per
        !           242: key per string).  The same code in `rescan' that detects references to
        !           243: macros would detect a reference to a key and flag all of the strings
        !           244: that it belongs to as needing to be output.
        !           245: 
        !           246: Each of these strings is immediately recursively macroexpanded (i.e.
        !           247: `rescan' is called), but the output from this is discarded.  The
        !           248: expansion is to detect any other keys mentioned in the string, and to
        !           249: define any macros for which the string contains a #define.  The key's
        !           250: symbol table entry is be deleted to save time if the key is
        !           251: encountered again, and to avoid an infinite recursion.
        !           252: 
        !           253: The unconditional strings are macroexpanded with `rescan' (but the
        !           254: output is discarded) at some time before anything is actually output.
        !           255: 
        !           256: At the end of compilation, before any of the actual input text is
        !           257: output, the list of strings is scanned in the intrinsic order, and
        !           258: each string that is unconditional or flagged is output verbatim,
        !           259: except that any #define lines are discarded.
        !           260: 
        !           261: Precompiled headers would be constructed by explicit request with a
        !           262: special tool.  The first step is to run cpp on the sequence of header
        !           263: files' contents.  This would use a new option that would cause all
        !           264: #define lines to be output unchanged as well as defining the macro.
        !           265: The second step is to divide the output into strings, some keyed and
        !           266: some unconditional.  This division is done without changing the order
        !           267: of the text being divided up.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.