Annotation of gcc/PROJECTS, revision 1.1.1.6

1.1.1.6 ! root        1: 0. Improved efficiency.
        !             2: 
        !             3: * Parse and output array initializers an element at a time, freeing
        !             4: storage after each, instead of parsing the whole initializer first and
        !             5: then outputting.  This would reduce memory usage for large
        !             6: initializers.
        !             7: 
1.1       root        8: 1. Better optimization.
                      9: 
1.1.1.4   root       10: * Constants in unused inline functions
                     11: 
                     12: It would be nice to delay output of string constants so that string
                     13: constants mentioned in unused inline functions are never generated.
                     14: Perhaps this would also take care of string constants in dead code.
                     15: 
                     16: The difficulty is in finding a clean way for the RTL which refers
                     17: to the constant (currently, only by an assembler symbol name)
                     18: to point to the constant and cause it to be output.
                     19: 
1.1       root       20: * More cse
                     21: 
                     22: The techniques for doing full global cse are described in the
                     23: red dragon book.  It is likely to be slow and use a lot of memory,
                     24: but it might be worth offering as an additional option.
                     25: 
                     26: It is probably possible to extend cse to a few very frequent cases
                     27: without so much expense.
                     28: 
                     29: For example, it is not very hard to handle cse through if-then
                     30: statements with no else clauses.  Here's how to do it.  On reaching a
                     31: label, notice that the label's use-count is 1 and that the last
                     32: preceding jump jumps conditionally to this label.  Now you know it
                     33: is a simple if-then statement.  Remove from the hash table
                     34: all the expressions that were entered since that jump insn
                     35: and you can continue with cse.
                     36: 
                     37: It is probably not hard to handle cse from the end of a loop
                     38: around to the beginning, and a few loops would be greatly sped
                     39: up by this.
                     40: 
1.1.1.3   root       41: * Support more general tail-recursion among different functions.
1.1       root       42: 
1.1.1.3   root       43: This might be possible under certain circumstances, such as when
                     44: the argument lists of the functions have the same lengths.
                     45: Perhaps it could be done with a special declaration.
                     46: 
                     47: You would need to verify in the calling function that it does not
                     48: use the addresses of any local variables and does not use setjmp.
                     49: 
                     50: * Put short statics vars at low addresses and use short addressing mode?
                     51: Useful on the 68000/68020 and perhaps on the 32000 series,
                     52: provided one has a linker that works with the feature.
                     53: This is said to make a 15% speedup on the 68000.
                     54: This brings to mind Hayes' changes for Stanford MIPS.
                     55: 
                     56: * Detect dead stores into memory?
                     57: 
                     58: A store into memory is dead if it is followed by another store into
1.1.1.5   root       59: the same location; and, in between, there is no reference to anything
                     60: that might be that location (including no reference to a variable
                     61: address).
1.1.1.3   root       62: 
                     63: * Loop optimization.
                     64: 
                     65: Strength reduction and iteration variable elimination could be
                     66: smarter.  They should know how to decide which iteration variables are
                     67: not worth making explicit because they can be computed as part of an
                     68: address calculation.  Based on this information, they should decide
1.1.1.5   root       69: when it is desirable to eliminate one iteration variable and create
1.1.1.3   root       70: another in its place.
                     71: 
                     72: It should be possible to compute what the value of an iteration
                     73: variable will be at the end of the loop, and eliminate the variable
                     74: within the loop by computing that value at the loop end.
                     75: 
                     76: When a loop has a simple increment that adds 1,
                     77: instead of jumping in after the increment,
                     78: decrement the loop count and jump to the increment.
                     79: This allows aob insns to be used.
1.1       root       80: 
1.1.1.2   root       81: * Using constraints on values.
1.1       root       82: 
1.1.1.2   root       83: Many operations could be simplified based on knowledge of the
                     84: minimum and maximum possible values of a register at any particular time.
                     85: These limits could come from the data types in the tree, via rtl generation,
                     86: or they can be deduced from operations that are performed.  For example,
                     87: the result of an `and' operation one of whose operands is 7 must be in
                     88: the range 0 to 7.  Compare instructions also tell something about the
                     89: possible values of the operand, in the code beyond the test.
                     90: 
                     91: Value constraints can be used to determine the results of a further
                     92: comparison.  They can also indicate that certain `and' operations are
                     93: redundant.  Constraints might permit a decrement and branch
                     94: instruction that checks zeroness to be used when the user has
                     95: specified to exit if negative.
1.1       root       96: 
                     97: * Smarter reload pass.
                     98: 
                     99: The reload pass as currently written can reload values only into registers
                    100: that are reserved for reloading.  This means that in order to use a
                    101: register for reloading it must spill everything out of that register.
                    102: 
                    103: It would be straightforward, though complicated, for reload1.c to keep
                    104: track, during its scan, of which hard registers were available at each
                    105: point in the function, and use for reloading even registers that were
                    106: free only at the point they were needed.  This would avoid much spilling
                    107: and make better code.
                    108: 
                    109: * Change the type of a variable.
                    110: 
                    111: Sometimes a variable is declared as `int', it is assigned only once
                    112: from a value of type `char', and then it is used only by comparison
                    113: against constants.  On many machines, better code would result if
                    114: the variable had type `char'.  If the compiler could detect this
                    115: case, it could change the declaration of the variable and change
                    116: all the places that use it.
                    117: 
                    118: * Order of subexpressions.
                    119: 
                    120: It might be possible to make better code by paying attention
                    121: to the order in which to generate code for subexpressions of an expression.
                    122: 
1.1.1.3   root      123: * More code motion.
1.1       root      124: 
1.1.1.3   root      125: Consider hoisting common code up past conditional branches or
                    126: tablejumps.
                    127: 
                    128: * Trace scheduling.
                    129: 
                    130: This technique is said to be able to figure out which way a jump
                    131: will usually go, and rearrange the code to make that path the
                    132: faster one.
1.1       root      133: 
                    134: * Distributive law.
                    135: 
1.1.1.2   root      136: The C expression *(X + 4 * (Y + C)) compiles better on certain
                    137: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
                    138: modes.  It may be tricky to determine when, and for which machines, to
                    139: use each alternative.
                    140: 
1.1.1.3   root      141: Some work has been done on this, in combine.c.
                    142: 
1.1.1.2   root      143: * Jump-execute-next.
                    144: 
                    145: Many recent machines have jumps which optionally execute the following
                    146: instruction before the instruction jumped to, either conditionally or
                    147: unconditionally.  To take advantage of this capability requires a new
                    148: compiler pass that would reorder instructions when possible.  After
                    149: reload may be a good place for it.
                    150: 
                    151: On some machines, the result of a load from memory is not available
                    152: until after the following instruction.  The easiest way to support
                    153: these machines is to output each RTL load instruction as two assembler
                    154: instructions, the second being a no-op.  Putting useful instructions
                    155: after the load instructions may be a similar task to putting them
                    156: after jump instructions.
                    157: 
                    158: * Pipeline scheduling.
                    159: 
                    160: On many machines, code gets faster if instructions are reordered
                    161: so that pipelines are kept full.  Doing the best possible job of this
                    162: requires knowing which functional units each kind of instruction executes
                    163: in and how long the functional unit stays busy with it.  Then the
                    164: goal is to reorder the instructions to keep many functional units
                    165: busy but never feed them so fast they must wait.
1.1       root      166: 
1.1.1.3   root      167: * Can optimize by changing if (x) y; else z; into z; if (x) y;
                    168: if z and x do not interfere and z has no effects not undone by y.
                    169: This is desirable if z is faster than jumping.
                    170: 
                    171: * For a two-insn loop on the 68020, such as
                    172:   foo: movb    a2@+,a3@+
                    173:        jne     foo
                    174: it is better to insert dbeq d0,foo before the jne.
                    175: d0 can be a junk register.  The challenge is to fit this into
                    176: a portable framework: when can you detect this situation and
                    177: still be able to allocate a junk register?
                    178: 
1.1.1.6 ! root      179: * For the 80387 floating point, perhaps it would be possible to use 3
        !           180: or 4 registers in the stack to hold register variables.  (It would be
        !           181: necessary to keep track of how those slots move in the stack as other
        !           182: pushes and pops are done.)  This is probably very tricky, but if
        !           183: you are a GCC wizard and you care about the speed of floating point on
        !           184: an 80386, you might want to work on it.
        !           185: 
1.1       root      186: 2. Simpler porting.
                    187: 
                    188: Right now, describing the target machine's instructions is done
                    189: cleanly, but describing its addressing mode is done with several
                    190: ad-hoc macro definitions.  Porting would be much easier if there were
                    191: an RTL description for addressing modes like that for instructions.
                    192: Tools analogous to genflags and genrecog would generate macros from
                    193: this description.
                    194: 
                    195: There would be one pattern in the address-description file for each
                    196: kind of addressing, and this pattern would have:
                    197: 
                    198:   * the RTL expression for the address
                    199:   * C code to verify its validity (since that may depend on
                    200:     the exact data).
                    201:   * C code to print the address in assembler language.
                    202:   * C code to convert the address into a valid one, if it is not valid.
                    203:     (This would replace LEGITIMIZE_ADDRESS).
                    204:   * Register constraints for all indeterminates that appear
                    205:     in the RTL expression.
                    206: 
                    207: 3. Other languages.
                    208: 
1.1.1.2   root      209: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
                    210: desirable.
1.1       root      211: 
1.1.1.2   root      212: Pascal, Modula-2 and Ada require the implementation of functions
                    213: within functions.  Some of the mechanisms for this already exist.
1.1       root      214: 
                    215: 4. Generalize the machine model.
                    216: 
1.1.1.4   root      217: * Some new compiler features may be needed to do a good job on machines
1.1.1.2   root      218: where static data needs to be addressed using base registers.
1.1       root      219: 
1.1.1.4   root      220: * Some machines have two stacks in different areas of memory, one used
1.1.1.2   root      221: for scalars and another for large objects.  The compiler does not
                    222: now have a way to understand this.
1.1.1.3   root      223: 
                    224: 5. Precompilation of header files.
                    225: 
                    226: In the future, many programs will use thousands of lines of header files.
                    227: Compiling the headers might be slower than compiling the guts of any one
                    228: source file.  Here is a scheme for precompiling header files to make
                    229: compilation faster for a sequence of headers which is often used.
                    230: 
                    231: A precompiled header corresponds to a sequence of header files.  The
                    232: preprocessor recognizes when the input starts with a sequence of
                    233: `#include' commands and searches a data base for a precompiled header
                    234: corresponding to that sequence.  The modtimes of all these files are
                    235: stored in the data base so that one can tell whether the precompiled
                    236: header is still valid.
                    237: 
                    238: For robustness, each directory should have its own collection of
                    239: precompiled headers and its own data base of them.  Probably each
                    240: precompiled header would be a file and the data base would be one
                    241: more file.
                    242: 
                    243: The data base records the entire collection of predefined macros and
                    244: their definitions, except for __FILE__, __LINE__ and __DATE__, for
                    245: each precompiled header.  If this collection does not match the setup
                    246: at the start of the current compilation (including the results of -D
                    247: and -U switches), the precompiled header is inapplicable.  It might
                    248: be possible to have distinct precompiled headers for the same sequence
                    249: of header files but different collections of predefined macros.
                    250: 
                    251: The state of any option that affects macro processing, such as -ansi
                    252: or -traditional, must also be recorded, and the precompiled header is
                    253: valid only if these options match.
                    254: 
                    255: The precompiled header contains an ordered series of strings.  Some
                    256: strings are marked "unconditional"; these must be compiled each time
                    257: the precompiled header is used.  Other strings are have keys, which
                    258: are identifiers.  A string with keys must be compiled if at least one
                    259: of its keys is mentioned in the input.  The order these strings appear
                    260: in the precompiled header is called their intrinsic order.
                    261: 
                    262: The C preprocessor reads in the precompiled header file and scan all
                    263: the strings, making for each key an entry in the same symbol table
                    264: used for macros, pointing at a list of all the strings for which it is
                    265: a key.  Each string must have a flag (one flag per string, not one per
                    266: key per string).  The same code in `rescan' that detects references to
                    267: macros would detect a reference to a key and flag all of the strings
                    268: that it belongs to as needing to be output.
                    269: 
                    270: Each of these strings is immediately recursively macroexpanded (i.e.
                    271: `rescan' is called), but the output from this is discarded.  The
                    272: expansion is to detect any other keys mentioned in the string, and to
                    273: define any macros for which the string contains a #define.  The key's
                    274: symbol table entry is be deleted to save time if the key is
                    275: encountered again, and to avoid an infinite recursion.
                    276: 
                    277: The unconditional strings are macroexpanded with `rescan' (but the
                    278: output is discarded) at some time before anything is actually output.
                    279: 
                    280: At the end of compilation, before any of the actual input text is
                    281: output, the list of strings is scanned in the intrinsic order, and
                    282: each string that is unconditional or flagged is output verbatim,
                    283: except that any #define lines are discarded.
                    284: 
                    285: Precompiled headers would be constructed by explicit request with a
                    286: special tool.  The first step is to run cpp on the sequence of header
                    287: files' contents.  This would use a new option that would cause all
                    288: #define lines to be output unchanged as well as defining the macro.
                    289: The second step is to divide the output into strings, some keyed and
                    290: some unconditional.  This division is done without changing the order
                    291: of the text being divided up.
1.1.1.4   root      292: 
1.1.1.5   root      293: [email protected] has some ideas on this subject also.
                    294: 
1.1.1.6 ! root      295: 6. Better documentation of how GCC works and how to port it.
1.1.1.4   root      296: 
                    297: Here is an outline proposed by Allan Adler.
                    298: 
                    299: I.    Overview of this document
                    300: II.   The machines on which GCC is implemented
                    301:     A. Prose description of those characteristics of target machines and
                    302:        their operating systems which are pertinent to the implementation
                    303:        of GCC.
                    304:        i. target machine characteristics
                    305:        ii. comparison of this system of machine characteristics with
                    306:            other systems of machine specification currently in use
                    307:     B. Tables of the characteristics of the target machines on which
                    308:        GCC is implemented.
                    309:     C. A priori restrictions on the values of characteristics of target 
                    310:        machines, with special reference to those parts of the source code
                    311:        which entail those restrictions
                    312:        i. restrictions on individual characteristics 
                    313:         ii. restrictions involving relations between various characteristics
                    314:     D. The use of GCC as a cross-compiler 
                    315:        i. cross-compilation to existing machines
                    316:        ii. cross-compilation to non-existent machines
                    317:     E. Assumptions which are made regarding the target machine
                    318:        i.  assumptions regarding the architecture of the target machine
                    319:        ii. assumptions regarding the operating system of the target machine
                    320:        iii. assumptions regarding software resident on the target machine
                    321:        iv. where in the source code these assumptions are in effect made
                    322: III.   A systematic approach to writing the files tm.h and xm.h
                    323:     A. Macros which require special care or skill
                    324:     B. Examples, with special reference to the underlying reasoning
                    325: IV.    A systematic approach to writing the machine description file md
                    326:     A. Minimal viable sets of insn descriptions
                    327:     B. Examples, with special reference to the underlying reasoning
                    328: V.     Uses of the file aux-output.c
                    329: VI.    Specification of what constitutes correct performance of an 
                    330:        implementation of GCC
                    331:     A. The components of GCC
                    332:     B. The itinerary of a C program through GCC
                    333:     C. A system of benchmark programs
1.1.1.5   root      334:     D. What your RTL and assembler should look like with these benchmarks
1.1.1.4   root      335:     E. Fine tuning for speed and size of compiled code
                    336: VII.   A systematic procedure for debugging an implementation of GCC
                    337:     A. Use of GDB
                    338:        i. the macros in the file .gdbinit for GCC
                    339:        ii. obstacles to the use of GDB
                    340:            a. functions implemented as macros can't be called in GDB
                    341:     B. Debugging without GDB
                    342:        i. How to turn off the normal operation of GCC and access specific
                    343:           parts of GCC
                    344:     C. Debugging tools
                    345:     D. Debugging the parser
                    346:        i. how machine macros and insn definitions affect the parser
                    347:     E. Debugging the recognizer
                    348:        i. how machine macros and insn definitions affect the recognizer
                    349: 
                    350: ditto for other components
                    351: 
                    352: VIII. Data types used by GCC, with special reference to restrictions not 
                    353:       specified in the formal definition of the data type
                    354: IX.   References to the literature for the algorithms used in GCC
                    355: 

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.