Annotation of gcc/PROJECTS, revision 1.1.1.5

1.1       root        1: 1. Better optimization.
                      2: 
1.1.1.4   root        3: * Constants in unused inline functions
                      4: 
                      5: It would be nice to delay output of string constants so that string
                      6: constants mentioned in unused inline functions are never generated.
                      7: Perhaps this would also take care of string constants in dead code.
                      8: 
                      9: The difficulty is in finding a clean way for the RTL which refers
                     10: to the constant (currently, only by an assembler symbol name)
                     11: to point to the constant and cause it to be output.
                     12: 
1.1       root       13: * More cse
                     14: 
                     15: The techniques for doing full global cse are described in the
                     16: red dragon book.  It is likely to be slow and use a lot of memory,
                     17: but it might be worth offering as an additional option.
                     18: 
                     19: It is probably possible to extend cse to a few very frequent cases
                     20: without so much expense.
                     21: 
                     22: For example, it is not very hard to handle cse through if-then
                     23: statements with no else clauses.  Here's how to do it.  On reaching a
                     24: label, notice that the label's use-count is 1 and that the last
                     25: preceding jump jumps conditionally to this label.  Now you know it
                     26: is a simple if-then statement.  Remove from the hash table
                     27: all the expressions that were entered since that jump insn
                     28: and you can continue with cse.
                     29: 
                     30: It is probably not hard to handle cse from the end of a loop
                     31: around to the beginning, and a few loops would be greatly sped
                     32: up by this.
                     33: 
1.1.1.3   root       34: * Support more general tail-recursion among different functions.
1.1       root       35: 
1.1.1.3   root       36: This might be possible under certain circumstances, such as when
                     37: the argument lists of the functions have the same lengths.
                     38: Perhaps it could be done with a special declaration.
                     39: 
                     40: You would need to verify in the calling function that it does not
                     41: use the addresses of any local variables and does not use setjmp.
                     42: 
                     43: * Put short statics vars at low addresses and use short addressing mode?
                     44: Useful on the 68000/68020 and perhaps on the 32000 series,
                     45: provided one has a linker that works with the feature.
                     46: This is said to make a 15% speedup on the 68000.
                     47: This brings to mind Hayes' changes for Stanford MIPS.
                     48: 
                     49: * Detect dead stores into memory?
                     50: 
                     51: A store into memory is dead if it is followed by another store into
1.1.1.5 ! root       52: the same location; and, in between, there is no reference to anything
        !            53: that might be that location (including no reference to a variable
        !            54: address).
1.1.1.3   root       55: 
                     56: * Loop optimization.
                     57: 
                     58: Strength reduction and iteration variable elimination could be
                     59: smarter.  They should know how to decide which iteration variables are
                     60: not worth making explicit because they can be computed as part of an
                     61: address calculation.  Based on this information, they should decide
1.1.1.5 ! root       62: when it is desirable to eliminate one iteration variable and create
1.1.1.3   root       63: another in its place.
                     64: 
                     65: It should be possible to compute what the value of an iteration
                     66: variable will be at the end of the loop, and eliminate the variable
                     67: within the loop by computing that value at the loop end.
                     68: 
                     69: When a loop has a simple increment that adds 1,
                     70: instead of jumping in after the increment,
                     71: decrement the loop count and jump to the increment.
                     72: This allows aob insns to be used.
1.1       root       73: 
1.1.1.2   root       74: * Using constraints on values.
1.1       root       75: 
1.1.1.2   root       76: Many operations could be simplified based on knowledge of the
                     77: minimum and maximum possible values of a register at any particular time.
                     78: These limits could come from the data types in the tree, via rtl generation,
                     79: or they can be deduced from operations that are performed.  For example,
                     80: the result of an `and' operation one of whose operands is 7 must be in
                     81: the range 0 to 7.  Compare instructions also tell something about the
                     82: possible values of the operand, in the code beyond the test.
                     83: 
                     84: Value constraints can be used to determine the results of a further
                     85: comparison.  They can also indicate that certain `and' operations are
                     86: redundant.  Constraints might permit a decrement and branch
                     87: instruction that checks zeroness to be used when the user has
                     88: specified to exit if negative.
1.1       root       89: 
                     90: * Smarter reload pass.
                     91: 
                     92: The reload pass as currently written can reload values only into registers
                     93: that are reserved for reloading.  This means that in order to use a
                     94: register for reloading it must spill everything out of that register.
                     95: 
                     96: It would be straightforward, though complicated, for reload1.c to keep
                     97: track, during its scan, of which hard registers were available at each
                     98: point in the function, and use for reloading even registers that were
                     99: free only at the point they were needed.  This would avoid much spilling
                    100: and make better code.
                    101: 
                    102: * Change the type of a variable.
                    103: 
                    104: Sometimes a variable is declared as `int', it is assigned only once
                    105: from a value of type `char', and then it is used only by comparison
                    106: against constants.  On many machines, better code would result if
                    107: the variable had type `char'.  If the compiler could detect this
                    108: case, it could change the declaration of the variable and change
                    109: all the places that use it.
                    110: 
                    111: * Order of subexpressions.
                    112: 
                    113: It might be possible to make better code by paying attention
                    114: to the order in which to generate code for subexpressions of an expression.
                    115: 
1.1.1.3   root      116: * More code motion.
1.1       root      117: 
1.1.1.3   root      118: Consider hoisting common code up past conditional branches or
                    119: tablejumps.
                    120: 
                    121: * Trace scheduling.
                    122: 
                    123: This technique is said to be able to figure out which way a jump
                    124: will usually go, and rearrange the code to make that path the
                    125: faster one.
1.1       root      126: 
                    127: * Distributive law.
                    128: 
1.1.1.2   root      129: The C expression *(X + 4 * (Y + C)) compiles better on certain
                    130: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
                    131: modes.  It may be tricky to determine when, and for which machines, to
                    132: use each alternative.
                    133: 
1.1.1.3   root      134: Some work has been done on this, in combine.c.
                    135: 
1.1.1.2   root      136: * Jump-execute-next.
                    137: 
                    138: Many recent machines have jumps which optionally execute the following
                    139: instruction before the instruction jumped to, either conditionally or
                    140: unconditionally.  To take advantage of this capability requires a new
                    141: compiler pass that would reorder instructions when possible.  After
                    142: reload may be a good place for it.
                    143: 
                    144: On some machines, the result of a load from memory is not available
                    145: until after the following instruction.  The easiest way to support
                    146: these machines is to output each RTL load instruction as two assembler
                    147: instructions, the second being a no-op.  Putting useful instructions
                    148: after the load instructions may be a similar task to putting them
                    149: after jump instructions.
                    150: 
                    151: * Pipeline scheduling.
                    152: 
                    153: On many machines, code gets faster if instructions are reordered
                    154: so that pipelines are kept full.  Doing the best possible job of this
                    155: requires knowing which functional units each kind of instruction executes
                    156: in and how long the functional unit stays busy with it.  Then the
                    157: goal is to reorder the instructions to keep many functional units
                    158: busy but never feed them so fast they must wait.
1.1       root      159: 
1.1.1.3   root      160: * Can optimize by changing if (x) y; else z; into z; if (x) y;
                    161: if z and x do not interfere and z has no effects not undone by y.
                    162: This is desirable if z is faster than jumping.
                    163: 
                    164: * For a two-insn loop on the 68020, such as
                    165:   foo: movb    a2@+,a3@+
                    166:        jne     foo
                    167: it is better to insert dbeq d0,foo before the jne.
                    168: d0 can be a junk register.  The challenge is to fit this into
                    169: a portable framework: when can you detect this situation and
                    170: still be able to allocate a junk register?
                    171: 
1.1       root      172: 2. Simpler porting.
                    173: 
                    174: Right now, describing the target machine's instructions is done
                    175: cleanly, but describing its addressing mode is done with several
                    176: ad-hoc macro definitions.  Porting would be much easier if there were
                    177: an RTL description for addressing modes like that for instructions.
                    178: Tools analogous to genflags and genrecog would generate macros from
                    179: this description.
                    180: 
                    181: There would be one pattern in the address-description file for each
                    182: kind of addressing, and this pattern would have:
                    183: 
                    184:   * the RTL expression for the address
                    185:   * C code to verify its validity (since that may depend on
                    186:     the exact data).
                    187:   * C code to print the address in assembler language.
                    188:   * C code to convert the address into a valid one, if it is not valid.
                    189:     (This would replace LEGITIMIZE_ADDRESS).
                    190:   * Register constraints for all indeterminates that appear
                    191:     in the RTL expression.
                    192: 
                    193: 3. Other languages.
                    194: 
1.1.1.2   root      195: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
                    196: desirable.
1.1       root      197: 
1.1.1.2   root      198: Pascal, Modula-2 and Ada require the implementation of functions
                    199: within functions.  Some of the mechanisms for this already exist.
1.1       root      200: 
                    201: 4. Generalize the machine model.
                    202: 
1.1.1.4   root      203: * Some new compiler features may be needed to do a good job on machines
1.1.1.2   root      204: where static data needs to be addressed using base registers.
1.1       root      205: 
1.1.1.4   root      206: * Some machines have two stacks in different areas of memory, one used
1.1.1.2   root      207: for scalars and another for large objects.  The compiler does not
                    208: now have a way to understand this.
1.1.1.3   root      209: 
                    210: 5. Precompilation of header files.
                    211: 
                    212: In the future, many programs will use thousands of lines of header files.
                    213: Compiling the headers might be slower than compiling the guts of any one
                    214: source file.  Here is a scheme for precompiling header files to make
                    215: compilation faster for a sequence of headers which is often used.
                    216: 
                    217: A precompiled header corresponds to a sequence of header files.  The
                    218: preprocessor recognizes when the input starts with a sequence of
                    219: `#include' commands and searches a data base for a precompiled header
                    220: corresponding to that sequence.  The modtimes of all these files are
                    221: stored in the data base so that one can tell whether the precompiled
                    222: header is still valid.
                    223: 
                    224: For robustness, each directory should have its own collection of
                    225: precompiled headers and its own data base of them.  Probably each
                    226: precompiled header would be a file and the data base would be one
                    227: more file.
                    228: 
                    229: The data base records the entire collection of predefined macros and
                    230: their definitions, except for __FILE__, __LINE__ and __DATE__, for
                    231: each precompiled header.  If this collection does not match the setup
                    232: at the start of the current compilation (including the results of -D
                    233: and -U switches), the precompiled header is inapplicable.  It might
                    234: be possible to have distinct precompiled headers for the same sequence
                    235: of header files but different collections of predefined macros.
                    236: 
                    237: The state of any option that affects macro processing, such as -ansi
                    238: or -traditional, must also be recorded, and the precompiled header is
                    239: valid only if these options match.
                    240: 
                    241: The precompiled header contains an ordered series of strings.  Some
                    242: strings are marked "unconditional"; these must be compiled each time
                    243: the precompiled header is used.  Other strings are have keys, which
                    244: are identifiers.  A string with keys must be compiled if at least one
                    245: of its keys is mentioned in the input.  The order these strings appear
                    246: in the precompiled header is called their intrinsic order.
                    247: 
                    248: The C preprocessor reads in the precompiled header file and scan all
                    249: the strings, making for each key an entry in the same symbol table
                    250: used for macros, pointing at a list of all the strings for which it is
                    251: a key.  Each string must have a flag (one flag per string, not one per
                    252: key per string).  The same code in `rescan' that detects references to
                    253: macros would detect a reference to a key and flag all of the strings
                    254: that it belongs to as needing to be output.
                    255: 
                    256: Each of these strings is immediately recursively macroexpanded (i.e.
                    257: `rescan' is called), but the output from this is discarded.  The
                    258: expansion is to detect any other keys mentioned in the string, and to
                    259: define any macros for which the string contains a #define.  The key's
                    260: symbol table entry is be deleted to save time if the key is
                    261: encountered again, and to avoid an infinite recursion.
                    262: 
                    263: The unconditional strings are macroexpanded with `rescan' (but the
                    264: output is discarded) at some time before anything is actually output.
                    265: 
                    266: At the end of compilation, before any of the actual input text is
                    267: output, the list of strings is scanned in the intrinsic order, and
                    268: each string that is unconditional or flagged is output verbatim,
                    269: except that any #define lines are discarded.
                    270: 
                    271: Precompiled headers would be constructed by explicit request with a
                    272: special tool.  The first step is to run cpp on the sequence of header
                    273: files' contents.  This would use a new option that would cause all
                    274: #define lines to be output unchanged as well as defining the macro.
                    275: The second step is to divide the output into strings, some keyed and
                    276: some unconditional.  This division is done without changing the order
                    277: of the text being divided up.
1.1.1.4   root      278: 
1.1.1.5 ! root      279: [email protected] has some ideas on this subject also.
        !           280: 
1.1.1.4   root      281: 6. Other possibly nice features.
                    282: 
                    283: * cpp could have a #provide directive.
                    284: #provide would have the same syntax as #include,
                    285: and it would nullify any future #include directive
                    286: with the same argument.  Thus, the file foo.h
                    287: could contain #provide <foo> to prevent itself from
                    288: being included twice.
                    289: 
                    290: This is much cleaner than the alternative sometimes implemented,
                    291: which is to require the user to use something other than #include
                    292: in order to ensure inclusion only once.
                    293: 
                    294: 7. Better documentation of how GCC works and how to port it.
                    295: 
                    296: Here is an outline proposed by Allan Adler.
                    297: 
                    298: I.    Overview of this document
                    299: II.   The machines on which GCC is implemented
                    300:     A. Prose description of those characteristics of target machines and
                    301:        their operating systems which are pertinent to the implementation
                    302:        of GCC.
                    303:        i. target machine characteristics
                    304:        ii. comparison of this system of machine characteristics with
                    305:            other systems of machine specification currently in use
                    306:     B. Tables of the characteristics of the target machines on which
                    307:        GCC is implemented.
                    308:     C. A priori restrictions on the values of characteristics of target 
                    309:        machines, with special reference to those parts of the source code
                    310:        which entail those restrictions
                    311:        i. restrictions on individual characteristics 
                    312:         ii. restrictions involving relations between various characteristics
                    313:     D. The use of GCC as a cross-compiler 
                    314:        i. cross-compilation to existing machines
                    315:        ii. cross-compilation to non-existent machines
                    316:     E. Assumptions which are made regarding the target machine
                    317:        i.  assumptions regarding the architecture of the target machine
                    318:        ii. assumptions regarding the operating system of the target machine
                    319:        iii. assumptions regarding software resident on the target machine
                    320:        iv. where in the source code these assumptions are in effect made
                    321: III.   A systematic approach to writing the files tm.h and xm.h
                    322:     A. Macros which require special care or skill
                    323:     B. Examples, with special reference to the underlying reasoning
                    324: IV.    A systematic approach to writing the machine description file md
                    325:     A. Minimal viable sets of insn descriptions
                    326:     B. Examples, with special reference to the underlying reasoning
                    327: V.     Uses of the file aux-output.c
                    328: VI.    Specification of what constitutes correct performance of an 
                    329:        implementation of GCC
                    330:     A. The components of GCC
                    331:     B. The itinerary of a C program through GCC
                    332:     C. A system of benchmark programs
1.1.1.5 ! root      333:     D. What your RTL and assembler should look like with these benchmarks
1.1.1.4   root      334:     E. Fine tuning for speed and size of compiled code
                    335: VII.   A systematic procedure for debugging an implementation of GCC
                    336:     A. Use of GDB
                    337:        i. the macros in the file .gdbinit for GCC
                    338:        ii. obstacles to the use of GDB
                    339:            a. functions implemented as macros can't be called in GDB
                    340:     B. Debugging without GDB
                    341:        i. How to turn off the normal operation of GCC and access specific
                    342:           parts of GCC
                    343:     C. Debugging tools
                    344:     D. Debugging the parser
                    345:        i. how machine macros and insn definitions affect the parser
                    346:     E. Debugging the recognizer
                    347:        i. how machine macros and insn definitions affect the recognizer
                    348: 
                    349: ditto for other components
                    350: 
                    351: VIII. Data types used by GCC, with special reference to restrictions not 
                    352:       specified in the formal definition of the data type
                    353: IX.   References to the literature for the algorithms used in GCC
                    354: 

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.