Annotation of gcc/PROJECTS, revision 1.1.1.4

1.1       root        1: 1. Better optimization.
                      2: 
1.1.1.4 ! root        3: * Constants in unused inline functions
        !             4: 
        !             5: It would be nice to delay output of string constants so that string
        !             6: constants mentioned in unused inline functions are never generated.
        !             7: Perhaps this would also take care of string constants in dead code.
        !             8: 
        !             9: The difficulty is in finding a clean way for the RTL which refers
        !            10: to the constant (currently, only by an assembler symbol name)
        !            11: to point to the constant and cause it to be output.
        !            12: 
1.1       root       13: * More cse
                     14: 
                     15: The techniques for doing full global cse are described in the
                     16: red dragon book.  It is likely to be slow and use a lot of memory,
                     17: but it might be worth offering as an additional option.
                     18: 
                     19: It is probably possible to extend cse to a few very frequent cases
                     20: without so much expense.
                     21: 
                     22: For example, it is not very hard to handle cse through if-then
                     23: statements with no else clauses.  Here's how to do it.  On reaching a
                     24: label, notice that the label's use-count is 1 and that the last
                     25: preceding jump jumps conditionally to this label.  Now you know it
                     26: is a simple if-then statement.  Remove from the hash table
                     27: all the expressions that were entered since that jump insn
                     28: and you can continue with cse.
                     29: 
                     30: It is probably not hard to handle cse from the end of a loop
                     31: around to the beginning, and a few loops would be greatly sped
                     32: up by this.
                     33: 
1.1.1.3   root       34: * Support more general tail-recursion among different functions.
1.1       root       35: 
1.1.1.3   root       36: This might be possible under certain circumstances, such as when
                     37: the argument lists of the functions have the same lengths.
                     38: Perhaps it could be done with a special declaration.
                     39: 
                     40: You would need to verify in the calling function that it does not
                     41: use the addresses of any local variables and does not use setjmp.
                     42: 
                     43: * Put short statics vars at low addresses and use short addressing mode?
                     44: Useful on the 68000/68020 and perhaps on the 32000 series,
                     45: provided one has a linker that works with the feature.
                     46: This is said to make a 15% speedup on the 68000.
                     47: This brings to mind Hayes' changes for Stanford MIPS.
                     48: 
                     49: * Detect dead stores into memory?
                     50: 
                     51: A store into memory is dead if it is followed by another store into
                     52: the same location; and, in between, there is no reference to
                     53: that anything that might be that location (including no reference
                     54: to a variable address).
                     55: 
                     56: * Loop optimization.
                     57: 
                     58: Strength reduction and iteration variable elimination could be
                     59: smarter.  They should know how to decide which iteration variables are
                     60: not worth making explicit because they can be computed as part of an
                     61: address calculation.  Based on this information, they should decide
                     62: when it is desirable to eliminate one iteration iable and create
                     63: another in its place.
                     64: 
                     65: It should be possible to compute what the value of an iteration
                     66: variable will be at the end of the loop, and eliminate the variable
                     67: within the loop by computing that value at the loop end.
                     68: 
                     69: When a loop has a simple increment that adds 1,
                     70: instead of jumping in after the increment,
                     71: decrement the loop count and jump to the increment.
                     72: This allows aob insns to be used.
1.1       root       73: 
1.1.1.2   root       74: * Using constraints on values.
1.1       root       75: 
1.1.1.2   root       76: Many operations could be simplified based on knowledge of the
                     77: minimum and maximum possible values of a register at any particular time.
                     78: These limits could come from the data types in the tree, via rtl generation,
                     79: or they can be deduced from operations that are performed.  For example,
                     80: the result of an `and' operation one of whose operands is 7 must be in
                     81: the range 0 to 7.  Compare instructions also tell something about the
                     82: possible values of the operand, in the code beyond the test.
                     83: 
                     84: Value constraints can be used to determine the results of a further
                     85: comparison.  They can also indicate that certain `and' operations are
                     86: redundant.  Constraints might permit a decrement and branch
                     87: instruction that checks zeroness to be used when the user has
                     88: specified to exit if negative.
1.1       root       89: 
                     90: * Smarter reload pass.
                     91: 
                     92: The reload pass as currently written can reload values only into registers
                     93: that are reserved for reloading.  This means that in order to use a
                     94: register for reloading it must spill everything out of that register.
                     95: 
                     96: It would be straightforward, though complicated, for reload1.c to keep
                     97: track, during its scan, of which hard registers were available at each
                     98: point in the function, and use for reloading even registers that were
                     99: free only at the point they were needed.  This would avoid much spilling
                    100: and make better code.
                    101: 
                    102: * Change the type of a variable.
                    103: 
                    104: Sometimes a variable is declared as `int', it is assigned only once
                    105: from a value of type `char', and then it is used only by comparison
                    106: against constants.  On many machines, better code would result if
                    107: the variable had type `char'.  If the compiler could detect this
                    108: case, it could change the declaration of the variable and change
                    109: all the places that use it.
                    110: 
                    111: * Order of subexpressions.
                    112: 
                    113: It might be possible to make better code by paying attention
                    114: to the order in which to generate code for subexpressions of an expression.
                    115: 
1.1.1.3   root      116: * More code motion.
1.1       root      117: 
1.1.1.3   root      118: Consider hoisting common code up past conditional branches or
                    119: tablejumps.
                    120: 
                    121: * Trace scheduling.
                    122: 
                    123: This technique is said to be able to figure out which way a jump
                    124: will usually go, and rearrange the code to make that path the
                    125: faster one.
1.1       root      126: 
                    127: * Distributive law.
                    128: 
1.1.1.2   root      129: The C expression *(X + 4 * (Y + C)) compiles better on certain
                    130: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
                    131: modes.  It may be tricky to determine when, and for which machines, to
                    132: use each alternative.
                    133: 
1.1.1.3   root      134: Some work has been done on this, in combine.c.
                    135: 
1.1.1.2   root      136: * Jump-execute-next.
                    137: 
                    138: Many recent machines have jumps which optionally execute the following
                    139: instruction before the instruction jumped to, either conditionally or
                    140: unconditionally.  To take advantage of this capability requires a new
                    141: compiler pass that would reorder instructions when possible.  After
                    142: reload may be a good place for it.
                    143: 
                    144: On some machines, the result of a load from memory is not available
                    145: until after the following instruction.  The easiest way to support
                    146: these machines is to output each RTL load instruction as two assembler
                    147: instructions, the second being a no-op.  Putting useful instructions
                    148: after the load instructions may be a similar task to putting them
                    149: after jump instructions.
                    150: 
                    151: * Pipeline scheduling.
                    152: 
                    153: On many machines, code gets faster if instructions are reordered
                    154: so that pipelines are kept full.  Doing the best possible job of this
                    155: requires knowing which functional units each kind of instruction executes
                    156: in and how long the functional unit stays busy with it.  Then the
                    157: goal is to reorder the instructions to keep many functional units
                    158: busy but never feed them so fast they must wait.
1.1       root      159: 
1.1.1.3   root      160: * Can optimize by changing if (x) y; else z; into z; if (x) y;
                    161: if z and x do not interfere and z has no effects not undone by y.
                    162: This is desirable if z is faster than jumping.
                    163: 
                    164: * For a two-insn loop on the 68020, such as
                    165:   foo: movb    a2@+,a3@+
                    166:        jne     foo
                    167: it is better to insert dbeq d0,foo before the jne.
                    168: d0 can be a junk register.  The challenge is to fit this into
                    169: a portable framework: when can you detect this situation and
                    170: still be able to allocate a junk register?
                    171: 
1.1       root      172: 2. Simpler porting.
                    173: 
                    174: Right now, describing the target machine's instructions is done
                    175: cleanly, but describing its addressing mode is done with several
                    176: ad-hoc macro definitions.  Porting would be much easier if there were
                    177: an RTL description for addressing modes like that for instructions.
                    178: Tools analogous to genflags and genrecog would generate macros from
                    179: this description.
                    180: 
                    181: There would be one pattern in the address-description file for each
                    182: kind of addressing, and this pattern would have:
                    183: 
                    184:   * the RTL expression for the address
                    185:   * C code to verify its validity (since that may depend on
                    186:     the exact data).
                    187:   * C code to print the address in assembler language.
                    188:   * C code to convert the address into a valid one, if it is not valid.
                    189:     (This would replace LEGITIMIZE_ADDRESS).
                    190:   * Register constraints for all indeterminates that appear
                    191:     in the RTL expression.
                    192: 
                    193: 3. Other languages.
                    194: 
1.1.1.2   root      195: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
                    196: desirable.
1.1       root      197: 
1.1.1.2   root      198: Pascal, Modula-2 and Ada require the implementation of functions
                    199: within functions.  Some of the mechanisms for this already exist.
1.1       root      200: 
                    201: 4. Generalize the machine model.
                    202: 
1.1.1.4 ! root      203: * Some new compiler features may be needed to do a good job on machines
1.1.1.2   root      204: where static data needs to be addressed using base registers.
1.1       root      205: 
1.1.1.4 ! root      206: * Some machines have two stacks in different areas of memory, one used
1.1.1.2   root      207: for scalars and another for large objects.  The compiler does not
                    208: now have a way to understand this.
1.1.1.3   root      209: 
1.1.1.4 ! root      210: * Some machines use a caller-saves convention for saving registers over
        !           211: function calls.  Here is a design for how to handle such a convention.
        !           212: 
        !           213: Classify all the hard registers as call-clobbered, then teach the
        !           214: compiler how to put call-crossing pseudo-regs in call-clobbered hard
        !           215: registers by generating save/restore insns around the call.  Generate
        !           216: the save/restores somewhere around the reload pass; this avoids
        !           217: getting confused by cross jumping which happens after.  Actually put
        !           218: the save/restore insns into the RTL chain, so that cross-jumping and
        !           219: peephole opt. will work on them.
        !           220: 
        !           221: The time to generate these insns is when you can still tell what regs
        !           222: are live at each call, but late enough to know exactly where the
        !           223: pseudos have been allocated.  Perhaps the best time is just before
        !           224: reload_as_needed.
        !           225: 
        !           226: This requires changes in local-alloc.c and global-alloc.c to be willing
        !           227: to allocate call-crossing pseudos to call-clobbered registers.
        !           228: That isn't hard if you do it with a special flag macro that simply says to
        !           229: ignore (in those passes) whether a register is call-clobbered.
        !           230: This macro would be defined if the target machine uses caller-saves.
        !           231: 
        !           232: To get the best output, it is necessary to count the cost (in
        !           233: additional save/restores) for each pseudo register of putting it in a
        !           234: call-clobbered hard reg.  This means counting *how many times* each
        !           235: pseudo crosses a call.  Depending on that value, it might be better to
        !           236: put the pseudo in memory than in a call-clobbered register.  Cost
        !           237: counting would be done in flow.c, perhaps, and costs would affect
        !           238: allocation decisions in local-alloc and global-alloc.  Perhaps each
        !           239: pseudo should have one priority for getting a call-saved reg and
        !           240: another, lower (in general) priority for a call-clobbered reg.
        !           241: 
        !           242: Then there would no longer be a need for the flag: the changes would
        !           243: improve the output, on all machines.
        !           244: 
1.1.1.3   root      245: 5. Precompilation of header files.
                    246: 
                    247: In the future, many programs will use thousands of lines of header files.
                    248: Compiling the headers might be slower than compiling the guts of any one
                    249: source file.  Here is a scheme for precompiling header files to make
                    250: compilation faster for a sequence of headers which is often used.
                    251: 
                    252: A precompiled header corresponds to a sequence of header files.  The
                    253: preprocessor recognizes when the input starts with a sequence of
                    254: `#include' commands and searches a data base for a precompiled header
                    255: corresponding to that sequence.  The modtimes of all these files are
                    256: stored in the data base so that one can tell whether the precompiled
                    257: header is still valid.
                    258: 
                    259: For robustness, each directory should have its own collection of
                    260: precompiled headers and its own data base of them.  Probably each
                    261: precompiled header would be a file and the data base would be one
                    262: more file.
                    263: 
                    264: The data base records the entire collection of predefined macros and
                    265: their definitions, except for __FILE__, __LINE__ and __DATE__, for
                    266: each precompiled header.  If this collection does not match the setup
                    267: at the start of the current compilation (including the results of -D
                    268: and -U switches), the precompiled header is inapplicable.  It might
                    269: be possible to have distinct precompiled headers for the same sequence
                    270: of header files but different collections of predefined macros.
                    271: 
                    272: The state of any option that affects macro processing, such as -ansi
                    273: or -traditional, must also be recorded, and the precompiled header is
                    274: valid only if these options match.
                    275: 
                    276: The precompiled header contains an ordered series of strings.  Some
                    277: strings are marked "unconditional"; these must be compiled each time
                    278: the precompiled header is used.  Other strings are have keys, which
                    279: are identifiers.  A string with keys must be compiled if at least one
                    280: of its keys is mentioned in the input.  The order these strings appear
                    281: in the precompiled header is called their intrinsic order.
                    282: 
                    283: The C preprocessor reads in the precompiled header file and scan all
                    284: the strings, making for each key an entry in the same symbol table
                    285: used for macros, pointing at a list of all the strings for which it is
                    286: a key.  Each string must have a flag (one flag per string, not one per
                    287: key per string).  The same code in `rescan' that detects references to
                    288: macros would detect a reference to a key and flag all of the strings
                    289: that it belongs to as needing to be output.
                    290: 
                    291: Each of these strings is immediately recursively macroexpanded (i.e.
                    292: `rescan' is called), but the output from this is discarded.  The
                    293: expansion is to detect any other keys mentioned in the string, and to
                    294: define any macros for which the string contains a #define.  The key's
                    295: symbol table entry is be deleted to save time if the key is
                    296: encountered again, and to avoid an infinite recursion.
                    297: 
                    298: The unconditional strings are macroexpanded with `rescan' (but the
                    299: output is discarded) at some time before anything is actually output.
                    300: 
                    301: At the end of compilation, before any of the actual input text is
                    302: output, the list of strings is scanned in the intrinsic order, and
                    303: each string that is unconditional or flagged is output verbatim,
                    304: except that any #define lines are discarded.
                    305: 
                    306: Precompiled headers would be constructed by explicit request with a
                    307: special tool.  The first step is to run cpp on the sequence of header
                    308: files' contents.  This would use a new option that would cause all
                    309: #define lines to be output unchanged as well as defining the macro.
                    310: The second step is to divide the output into strings, some keyed and
                    311: some unconditional.  This division is done without changing the order
                    312: of the text being divided up.
1.1.1.4 ! root      313: 
        !           314: 6. Other possibly nice features.
        !           315: 
        !           316: * cpp could have a #provide directive.
        !           317: #provide would have the same syntax as #include,
        !           318: and it would nullify any future #include directive
        !           319: with the same argument.  Thus, the file foo.h
        !           320: could contain #provide <foo> to prevent itself from
        !           321: being included twice.
        !           322: 
        !           323: This is much cleaner than the alternative sometimes implemented,
        !           324: which is to require the user to use something other than #include
        !           325: in order to ensure inclusion only once.
        !           326: 
        !           327: 7. Better documentation of how GCC works and how to port it.
        !           328: 
        !           329: Here is an outline proposed by Allan Adler.
        !           330: 
        !           331: I.    Overview of this document
        !           332: II.   The machines on which GCC is implemented
        !           333:     A. Prose description of those characteristics of target machines and
        !           334:        their operating systems which are pertinent to the implementation
        !           335:        of GCC.
        !           336:        i. target machine characteristics
        !           337:        ii. comparison of this system of machine characteristics with
        !           338:            other systems of machine specification currently in use
        !           339:     B. Tables of the characteristics of the target machines on which
        !           340:        GCC is implemented.
        !           341:     C. A priori restrictions on the values of characteristics of target 
        !           342:        machines, with special reference to those parts of the source code
        !           343:        which entail those restrictions
        !           344:        i. restrictions on individual characteristics 
        !           345:         ii. restrictions involving relations between various characteristics
        !           346:     D. The use of GCC as a cross-compiler 
        !           347:        i. cross-compilation to existing machines
        !           348:        ii. cross-compilation to non-existent machines
        !           349:     E. Assumptions which are made regarding the target machine
        !           350:        i.  assumptions regarding the architecture of the target machine
        !           351:        ii. assumptions regarding the operating system of the target machine
        !           352:        iii. assumptions regarding software resident on the target machine
        !           353:        iv. where in the source code these assumptions are in effect made
        !           354: III.   A systematic approach to writing the files tm.h and xm.h
        !           355:     A. Macros which require special care or skill
        !           356:     B. Examples, with special reference to the underlying reasoning
        !           357: IV.    A systematic approach to writing the machine description file md
        !           358:     A. Minimal viable sets of insn descriptions
        !           359:     B. Examples, with special reference to the underlying reasoning
        !           360: V.     Uses of the file aux-output.c
        !           361: VI.    Specification of what constitutes correct performance of an 
        !           362:        implementation of GCC
        !           363:     A. The components of GCC
        !           364:     B. The itinerary of a C program through GCC
        !           365:     C. A system of benchmark programs
        !           366:     D. What your RTL and assembler look should like with these benchmarks
        !           367:     E. Fine tuning for speed and size of compiled code
        !           368: VII.   A systematic procedure for debugging an implementation of GCC
        !           369:     A. Use of GDB
        !           370:        i. the macros in the file .gdbinit for GCC
        !           371:        ii. obstacles to the use of GDB
        !           372:            a. functions implemented as macros can't be called in GDB
        !           373:     B. Debugging without GDB
        !           374:        i. How to turn off the normal operation of GCC and access specific
        !           375:           parts of GCC
        !           376:     C. Debugging tools
        !           377:     D. Debugging the parser
        !           378:        i. how machine macros and insn definitions affect the parser
        !           379:     E. Debugging the recognizer
        !           380:        i. how machine macros and insn definitions affect the recognizer
        !           381: 
        !           382: ditto for other components
        !           383: 
        !           384: VIII. Data types used by GCC, with special reference to restrictions not 
        !           385:       specified in the formal definition of the data type
        !           386: IX.   References to the literature for the algorithms used in GCC
        !           387: 

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.