gcc/PROJECTS - annotate

Return to PROJECTS CVS log
Up to [GCC 1.x] / gcc
Annotation of gcc/PROJECTS, revision 1.1.1.7

1.1.1.6   root        1: 0. Improved efficiency.
                      2: 
                      3: * Parse and output array initializers an element at a time, freeing
                      4: storage after each, instead of parsing the whole initializer first and
                      5: then outputting.  This would reduce memory usage for large
                      6: initializers.
                      7: 
1.1       root        8: 1. Better optimization.
                      9: 
1.1.1.4   root       10: * Constants in unused inline functions
                     11: 
                     12: It would be nice to delay output of string constants so that string
                     13: constants mentioned in unused inline functions are never generated.
                     14: Perhaps this would also take care of string constants in dead code.
                     15: 
                     16: The difficulty is in finding a clean way for the RTL which refers
                     17: to the constant (currently, only by an assembler symbol name)
                     18: to point to the constant and cause it to be output.
                     19: 
1.1       root       20: * More cse
                     21: 
1.1.1.7 ! root       22: The techniques for doing full global cse are described in the red
        !            23: dragon book, or (a different version) in Frederick Chow's thesis from
        !            24: Stanford.  It is likely to be slow and use a lot of memory, but it
        !            25: might be worth offering as an additional option.
1.1       root       26: 
                     27: It is probably possible to extend cse to a few very frequent cases
                     28: without so much expense.
                     29: 
                     30: For example, it is not very hard to handle cse through if-then
                     31: statements with no else clauses.  Here's how to do it.  On reaching a
                     32: label, notice that the label's use-count is 1 and that the last
                     33: preceding jump jumps conditionally to this label.  Now you know it
                     34: is a simple if-then statement.  Remove from the hash table
                     35: all the expressions that were entered since that jump insn
                     36: and you can continue with cse.
                     37: 
                     38: It is probably not hard to handle cse from the end of a loop
                     39: around to the beginning, and a few loops would be greatly sped
                     40: up by this.
                     41: 
1.1.1.3   root       42: * Support more general tail-recursion among different functions.
1.1       root       43: 
1.1.1.3   root       44: This might be possible under certain circumstances, such as when
                     45: the argument lists of the functions have the same lengths.
                     46: Perhaps it could be done with a special declaration.
                     47: 
                     48: You would need to verify in the calling function that it does not
                     49: use the addresses of any local variables and does not use setjmp.
                     50: 
                     51: * Put short statics vars at low addresses and use short addressing mode?
1.1.1.7 ! root       52: 
1.1.1.3   root       53: Useful on the 68000/68020 and perhaps on the 32000 series,
                     54: provided one has a linker that works with the feature.
                     55: This is said to make a 15% speedup on the 68000.
1.1.1.7 ! root       56: 
        !            57: * Keep global variables in registers.
        !            58: 
        !            59: Here is a scheme for doing this.  A global variable, or a local variable
        !            60: whose address is taken, can be kept in a register for an entire function
        !            61: if it does not use non-constant memory addresses and (for globals only)
        !            62: does not call other functions.  If the entire function does not meet
        !            63: this criterion, a loop may.
        !            64: 
        !            65: The VAR_DECL for such a variable would have to have two RTL expressions:
        !            66: the true home in memory, and the pseudo-register used temporarily. 
        !            67: It is necessary to emit insns to copy the memory location into the
        !            68: pseudo-register at the beginning of the function or loop, and perhaps
        !            69: back out at the end.  These insns should have REG_EQUIV notes so that,
        !            70: if the pseudo-register does not get a hard register, it is spilled into
        !            71: the memory location which exists in any case.
        !            72: 
        !            73: The easiest way to set up these insns is to modify the routine
        !            74: put_var_into_stack so that it does not apply to the entire function
        !            75: (sparing any loops which contain nothing dangerous) and to call it at
        !            76: the end of the function regardless of where in the function the
        !            77: address of a local variable is taken.  It would be called
        !            78: unconditionally at the end of the function for all relevant global
        !            79: variables.
        !            80: 
        !            81: For debugger output, the thing to do is to invent a new binding level
        !            82: around the appropriate loop and define the variable name as a register
        !            83: variable with that scope.
        !            84: 
        !            85: * Live-range splitting.
        !            86: 
        !            87: Currently a variable is allocated a hard register either for the full
        !            88: extent of its use or not at all.  Sometimes it would be good to
        !            89: allocate a variable a hard register for just part of a function; for
        !            90: example, through a particular loop where the variable is mostly used,
        !            91: or outside of a particular loop where the variable is not used.  (The
        !            92: latter is nice because it might let the variable be in a register most
        !            93: of the time even though the loop needs all the registers.)
        !            94: 
        !            95: It might not be very hard to do this in global-alloc.c when a variable
        !            96: fails to get a hard register for its entire life span.
        !            97: 
        !            98: The first step is to find a loop in which the variable is live, but
        !            99: which is not the whole life span or nearly so.  It's probably best to
        !           100: use a loop in which the variable is heavily used.
        !           101: 
        !           102: Then create a new pseudo-register to represent the variable in that loop.
        !           103: Substitute this for the old pseudo-register there, and insert move insns
        !           104: to copy between the two at the loop entry and all exits.  (When several
        !           105: such moves are inserted at the same place, some new feature should be
        !           106: added to say that none of those registers conflict merely because of
        !           107: overlap between the new moves.  And the reload pass should reorder them
        !           108: so that a store precedes a load, for any given hard register.)
        !           109: 
        !           110: After doing this for all the reasonable candidates, run global-alloc
        !           111: over again.  With luck, one of the two pseudo-registers will be fit
        !           112: somewhere.  It may even have a much higher priority due to its reduced
        !           113: life span.
        !           114: 
        !           115: There will be no room in general for the new pseudo-registers in
        !           116: basic_block_live_at_start, so there will need to be a second such
        !           117: matrix exclusively for the new ones.  Various other vectors indexed by
        !           118: register number will have to be made bigger, or there will have to be
        !           119: secondary extender vectors just for global-alloc.
        !           120: 
        !           121: A simple new feature could arrange that both pseudo-registers get the
        !           122: same stack slot if they both fail to get hard registers.
        !           123: 
        !           124: Other compilers split live ranges when they are not connected, or
        !           125: try to split off pieces `at the edge'.  I think splitting around loops
        !           126: will provide more speedup.
        !           127: 
        !           128: Creating a fake binding block and a new like-named variable with
        !           129: shorter life span and different address might succeed in describing
        !           130: this technique for the debugger.
1.1.1.3   root      131: 
                    132: * Detect dead stores into memory?
                    133: 
                    134: A store into memory is dead if it is followed by another store into
1.1.1.5   root      135: the same location; and, in between, there is no reference to anything
                    136: that might be that location (including no reference to a variable
                    137: address).
1.1.1.3   root      138: 
                    139: * Loop optimization.
                    140: 
                    141: Strength reduction and iteration variable elimination could be
                    142: smarter.  They should know how to decide which iteration variables are
                    143: not worth making explicit because they can be computed as part of an
                    144: address calculation.  Based on this information, they should decide
1.1.1.5   root      145: when it is desirable to eliminate one iteration variable and create
1.1.1.3   root      146: another in its place.
                    147: 
                    148: It should be possible to compute what the value of an iteration
                    149: variable will be at the end of the loop, and eliminate the variable
                    150: within the loop by computing that value at the loop end.
                    151: 
                    152: When a loop has a simple increment that adds 1,
                    153: instead of jumping in after the increment,
                    154: decrement the loop count and jump to the increment.
                    155: This allows aob insns to be used.
1.1       root      156: 
1.1.1.2   root      157: * Using constraints on values.
1.1       root      158: 
1.1.1.2   root      159: Many operations could be simplified based on knowledge of the
                    160: minimum and maximum possible values of a register at any particular time.
                    161: These limits could come from the data types in the tree, via rtl generation,
                    162: or they can be deduced from operations that are performed.  For example,
                    163: the result of an `and' operation one of whose operands is 7 must be in
                    164: the range 0 to 7.  Compare instructions also tell something about the
                    165: possible values of the operand, in the code beyond the test.
                    166: 
                    167: Value constraints can be used to determine the results of a further
                    168: comparison.  They can also indicate that certain `and' operations are
                    169: redundant.  Constraints might permit a decrement and branch
                    170: instruction that checks zeroness to be used when the user has
                    171: specified to exit if negative.
1.1       root      172: 
                    173: * Smarter reload pass.
                    174: 
                    175: The reload pass as currently written can reload values only into registers
                    176: that are reserved for reloading.  This means that in order to use a
                    177: register for reloading it must spill everything out of that register.
                    178: 
                    179: It would be straightforward, though complicated, for reload1.c to keep
                    180: track, during its scan, of which hard registers were available at each
                    181: point in the function, and use for reloading even registers that were
                    182: free only at the point they were needed.  This would avoid much spilling
                    183: and make better code.
                    184: 
                    185: * Change the type of a variable.
                    186: 
                    187: Sometimes a variable is declared as `int', it is assigned only once
                    188: from a value of type `char', and then it is used only by comparison
                    189: against constants.  On many machines, better code would result if
                    190: the variable had type `char'.  If the compiler could detect this
                    191: case, it could change the declaration of the variable and change
                    192: all the places that use it.
                    193: 
                    194: * Order of subexpressions.
                    195: 
                    196: It might be possible to make better code by paying attention
                    197: to the order in which to generate code for subexpressions of an expression.
                    198: 
1.1.1.3   root      199: * More code motion.
1.1       root      200: 
1.1.1.3   root      201: Consider hoisting common code up past conditional branches or
                    202: tablejumps.
                    203: 
                    204: * Trace scheduling.
                    205: 
                    206: This technique is said to be able to figure out which way a jump
                    207: will usually go, and rearrange the code to make that path the
                    208: faster one.
1.1       root      209: 
                    210: * Distributive law.
                    211: 
1.1.1.2   root      212: The C expression *(X + 4 * (Y + C)) compiles better on certain
                    213: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
                    214: modes.  It may be tricky to determine when, and for which machines, to
                    215: use each alternative.
                    216: 
1.1.1.3   root      217: Some work has been done on this, in combine.c.
                    218: 
1.1.1.2   root      219: * Jump-execute-next.
                    220: 
                    221: Many recent machines have jumps which optionally execute the following
                    222: instruction before the instruction jumped to, either conditionally or
                    223: unconditionally.  To take advantage of this capability requires a new
                    224: compiler pass that would reorder instructions when possible.  After
                    225: reload may be a good place for it.
                    226: 
                    227: On some machines, the result of a load from memory is not available
                    228: until after the following instruction.  The easiest way to support
                    229: these machines is to output each RTL load instruction as two assembler
                    230: instructions, the second being a no-op.  Putting useful instructions
                    231: after the load instructions may be a similar task to putting them
                    232: after jump instructions.
                    233: 
                    234: * Pipeline scheduling.
                    235: 
                    236: On many machines, code gets faster if instructions are reordered
                    237: so that pipelines are kept full.  Doing the best possible job of this
                    238: requires knowing which functional units each kind of instruction executes
                    239: in and how long the functional unit stays busy with it.  Then the
                    240: goal is to reorder the instructions to keep many functional units
                    241: busy but never feed them so fast they must wait.
1.1       root      242: 
1.1.1.3   root      243: * Can optimize by changing if (x) y; else z; into z; if (x) y;
                    244: if z and x do not interfere and z has no effects not undone by y.
                    245: This is desirable if z is faster than jumping.
                    246: 
                    247: * For a two-insn loop on the 68020, such as
                    248:   foo: movb    a2@+,a3@+
                    249:        jne     foo
                    250: it is better to insert dbeq d0,foo before the jne.
                    251: d0 can be a junk register.  The challenge is to fit this into
                    252: a portable framework: when can you detect this situation and
                    253: still be able to allocate a junk register?
                    254: 
1.1.1.6   root      255: * For the 80387 floating point, perhaps it would be possible to use 3
                    256: or 4 registers in the stack to hold register variables.  (It would be
                    257: necessary to keep track of how those slots move in the stack as other
                    258: pushes and pops are done.)  This is probably very tricky, but if
                    259: you are a GCC wizard and you care about the speed of floating point on
                    260: an 80386, you might want to work on it.
                    261: 
1.1       root      262: 2. Simpler porting.
                    263: 
                    264: Right now, describing the target machine's instructions is done
                    265: cleanly, but describing its addressing mode is done with several
                    266: ad-hoc macro definitions.  Porting would be much easier if there were
                    267: an RTL description for addressing modes like that for instructions.
                    268: Tools analogous to genflags and genrecog would generate macros from
                    269: this description.
                    270: 
                    271: There would be one pattern in the address-description file for each
                    272: kind of addressing, and this pattern would have:
                    273: 
                    274:   * the RTL expression for the address
                    275:   * C code to verify its validity (since that may depend on
                    276:     the exact data).
                    277:   * C code to print the address in assembler language.
                    278:   * C code to convert the address into a valid one, if it is not valid.
                    279:     (This would replace LEGITIMIZE_ADDRESS).
                    280:   * Register constraints for all indeterminates that appear
                    281:     in the RTL expression.
                    282: 
                    283: 3. Other languages.
                    284: 
1.1.1.2   root      285: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
                    286: desirable.
1.1       root      287: 
1.1.1.2   root      288: Pascal, Modula-2 and Ada require the implementation of functions
                    289: within functions.  Some of the mechanisms for this already exist.
1.1       root      290: 
1.1.1.7 ! root      291: 4. More extensions.
        !           292: 
        !           293: * Label-addresses as expressions.
        !           294: 
        !           295: It would be nice to have access to the addresses of labels; to be able to
        !           296: store them in variables, or initialize vectors of them.
        !           297: 
        !           298: Alas, `&label0' is the address of the variable named label0, which is
        !           299: unrelated to the label with that name.  Some other syntax is needed.
        !           300: Perhaps colon as a unary operator?  That is ambiguous with `?:' with
        !           301: the middle operand omitted.  Perhaps ^ as a unary operator?  Perhaps
        !           302: `__label__ label0' could mean the value of label0?  Its type could be
        !           303: `void *'.  `goto *EXP' could be used to go to a value of type `void
        !           304: *'--no ambiguity there.
        !           305: 
        !           306: Jump optimization and flow analysis must know about computed jumps,
        !           307: but that is not hard.  Each basic block headed by a possible target of
        !           308: computed jumps must be considered a successor of each basic block
        !           309: ending in a computed jump.  Aside from this, I believe no other
        !           310: optimizer changes are needed.
        !           311: 
        !           312: Next question: stack levels.  In most functions, there is no problem,
        !           313: but it would be a shame to make a feature that doesn't work together
        !           314: with other features.  Here is an idea:
        !           315: 
        !           316: For each label that might need stack level restoration, construct a
        !           317: shadow-label which will restore the stack and jump to the user-label.
        !           318: Then use the address of the shadow label for label0 when someone asks
        !           319: for that of label0.  Jump optimization will delete all the shadow labels
        !           320: if the function has no computed gotos.
        !           321: 
        !           322: * Block structure for labels.
        !           323: 
        !           324: The ({...}) construct should serve as a lexical block for label names,
        !           325: so that the same label may be defined both inside and outside of it.
        !           326: Then macro definitions could use labels internally safely, by enclosing
        !           327: the label and the goto in one of these constructs.
        !           328: 
        !           329: * Generated unique labels.  Have some way of generating distinct labels
        !           330: for use in extended asm statements.  I don't know what a good syntax would
        !           331: be.
        !           332: 
        !           333: 5. Generalize the machine model.
1.1       root      334: 
1.1.1.4   root      335: * Some new compiler features may be needed to do a good job on machines
1.1.1.2   root      336: where static data needs to be addressed using base registers.
1.1       root      337: 
1.1.1.4   root      338: * Some machines have two stacks in different areas of memory, one used
1.1.1.2   root      339: for scalars and another for large objects.  The compiler does not
                    340: now have a way to understand this.
1.1.1.3   root      341: 
1.1.1.7 ! root      342: 6. Precompilation of header files.
1.1.1.3   root      343: 
                    344: In the future, many programs will use thousands of lines of header files.
                    345: Compiling the headers might be slower than compiling the guts of any one
                    346: source file.  Here is a scheme for precompiling header files to make
                    347: compilation faster for a sequence of headers which is often used.
                    348: 
                    349: A precompiled header corresponds to a sequence of header files.  The
                    350: preprocessor recognizes when the input starts with a sequence of
                    351: `#include' commands and searches a data base for a precompiled header
                    352: corresponding to that sequence.  The modtimes of all these files are
                    353: stored in the data base so that one can tell whether the precompiled
                    354: header is still valid.
                    355: 
                    356: For robustness, each directory should have its own collection of
                    357: precompiled headers and its own data base of them.  Probably each
                    358: precompiled header would be a file and the data base would be one
                    359: more file.
                    360: 
                    361: The data base records the entire collection of predefined macros and
                    362: their definitions, except for __FILE__, __LINE__ and __DATE__, for
                    363: each precompiled header.  If this collection does not match the setup
                    364: at the start of the current compilation (including the results of -D
                    365: and -U switches), the precompiled header is inapplicable.  It might
                    366: be possible to have distinct precompiled headers for the same sequence
                    367: of header files but different collections of predefined macros.
                    368: 
                    369: The state of any option that affects macro processing, such as -ansi
                    370: or -traditional, must also be recorded, and the precompiled header is
                    371: valid only if these options match.
                    372: 
                    373: The precompiled header contains an ordered series of strings.  Some
                    374: strings are marked "unconditional"; these must be compiled each time
                    375: the precompiled header is used.  Other strings are have keys, which
                    376: are identifiers.  A string with keys must be compiled if at least one
                    377: of its keys is mentioned in the input.  The order these strings appear
                    378: in the precompiled header is called their intrinsic order.
                    379: 
                    380: The C preprocessor reads in the precompiled header file and scan all
                    381: the strings, making for each key an entry in the same symbol table
                    382: used for macros, pointing at a list of all the strings for which it is
                    383: a key.  Each string must have a flag (one flag per string, not one per
                    384: key per string).  The same code in `rescan' that detects references to
                    385: macros would detect a reference to a key and flag all of the strings
                    386: that it belongs to as needing to be output.
                    387: 
                    388: Each of these strings is immediately recursively macroexpanded (i.e.
                    389: `rescan' is called), but the output from this is discarded.  The
                    390: expansion is to detect any other keys mentioned in the string, and to
                    391: define any macros for which the string contains a #define.  The key's
                    392: symbol table entry is be deleted to save time if the key is
                    393: encountered again, and to avoid an infinite recursion.
                    394: 
                    395: The unconditional strings are macroexpanded with `rescan' (but the
                    396: output is discarded) at some time before anything is actually output.
                    397: 
                    398: At the end of compilation, before any of the actual input text is
                    399: output, the list of strings is scanned in the intrinsic order, and
                    400: each string that is unconditional or flagged is output verbatim,
                    401: except that any #define lines are discarded.
                    402: 
                    403: Precompiled headers would be constructed by explicit request with a
                    404: special tool.  The first step is to run cpp on the sequence of header
                    405: files' contents.  This would use a new option that would cause all
                    406: #define lines to be output unchanged as well as defining the macro.
                    407: The second step is to divide the output into strings, some keyed and
                    408: some unconditional.  This division is done without changing the order
                    409: of the text being divided up.
1.1.1.4   root      410: 
1.1.1.5   root      411: [email protected] has some ideas on this subject also.
                    412: 
1.1.1.7 ! root      413: 7. Better documentation of how GCC works and how to port it.
1.1.1.4   root      414: 
                    415: Here is an outline proposed by Allan Adler.
                    416: 
                    417: I.    Overview of this document
                    418: II.   The machines on which GCC is implemented
                    419:     A. Prose description of those characteristics of target machines and
                    420:        their operating systems which are pertinent to the implementation
                    421:        of GCC.
                    422:        i. target machine characteristics
                    423:        ii. comparison of this system of machine characteristics with
                    424:            other systems of machine specification currently in use
                    425:     B. Tables of the characteristics of the target machines on which
                    426:        GCC is implemented.
                    427:     C. A priori restrictions on the values of characteristics of target 
                    428:        machines, with special reference to those parts of the source code
                    429:        which entail those restrictions
                    430:        i. restrictions on individual characteristics 
                    431:         ii. restrictions involving relations between various characteristics
                    432:     D. The use of GCC as a cross-compiler 
                    433:        i. cross-compilation to existing machines
                    434:        ii. cross-compilation to non-existent machines
                    435:     E. Assumptions which are made regarding the target machine
                    436:        i.  assumptions regarding the architecture of the target machine
                    437:        ii. assumptions regarding the operating system of the target machine
                    438:        iii. assumptions regarding software resident on the target machine
                    439:        iv. where in the source code these assumptions are in effect made
                    440: III.   A systematic approach to writing the files tm.h and xm.h
                    441:     A. Macros which require special care or skill
                    442:     B. Examples, with special reference to the underlying reasoning
                    443: IV.    A systematic approach to writing the machine description file md
                    444:     A. Minimal viable sets of insn descriptions
                    445:     B. Examples, with special reference to the underlying reasoning
                    446: V.     Uses of the file aux-output.c
                    447: VI.    Specification of what constitutes correct performance of an 
                    448:        implementation of GCC
                    449:     A. The components of GCC
                    450:     B. The itinerary of a C program through GCC
                    451:     C. A system of benchmark programs
1.1.1.5   root      452:     D. What your RTL and assembler should look like with these benchmarks
1.1.1.4   root      453:     E. Fine tuning for speed and size of compiled code
                    454: VII.   A systematic procedure for debugging an implementation of GCC
                    455:     A. Use of GDB
                    456:        i. the macros in the file .gdbinit for GCC
                    457:        ii. obstacles to the use of GDB
                    458:            a. functions implemented as macros can't be called in GDB
                    459:     B. Debugging without GDB
                    460:        i. How to turn off the normal operation of GCC and access specific
                    461:           parts of GCC
                    462:     C. Debugging tools
                    463:     D. Debugging the parser
                    464:        i. how machine macros and insn definitions affect the parser
                    465:     E. Debugging the recognizer
                    466:        i. how machine macros and insn definitions affect the recognizer
                    467: 
                    468: ditto for other components
                    469: 
                    470: VIII. Data types used by GCC, with special reference to restrictions not 
                    471:       specified in the formal definition of the data type
                    472: IX.   References to the literature for the algorithms used in GCC
                    473:
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.