Annotation of GNUtools/cc/PROJECTS, revision 1.1.1.1

1.1       root        1: 0. Improved efficiency.
                      2: 
                      3: * Parse and output array initializers an element at a time, freeing
                      4: storage after each, instead of parsing the whole initializer first and
                      5: then outputting.  This would reduce memory usage for large
                      6: initializers.
                      7: 
                      8: * See if the techniques describe in Oct 1991 SIGPLAN Notices
                      9: (Frazer and Hanson) are applicable to GCC.
                     10: 
                     11: 1. Better optimization.
                     12: 
                     13: * Constants in unused inline functions
                     14: 
                     15: It would be nice to delay output of string constants so that string
                     16: constants mentioned in unused inline functions are never generated.
                     17: Perhaps this would also take care of string constants in dead code.
                     18: 
                     19: The difficulty is in finding a clean way for the RTL which refers
                     20: to the constant (currently, only by an assembler symbol name)
                     21: to point to the constant and cause it to be output.
                     22: 
                     23: * More cse
                     24: 
                     25: The techniques for doing full global cse are described in the red
                     26: dragon book, or (a different version) in Frederick Chow's thesis from
                     27: Stanford.  It is likely to be slow and use a lot of memory, but it
                     28: might be worth offering as an additional option.
                     29: 
                     30: It is probably possible to extend cse to a few very frequent cases
                     31: without so much expense.
                     32: 
                     33: For example, it is not very hard to handle cse through if-then
                     34: statements with no else clauses.  Here's how to do it.  On reaching a
                     35: label, notice that the label's use-count is 1 and that the last
                     36: preceding jump jumps conditionally to this label.  Now you know it
                     37: is a simple if-then statement.  Remove from the hash table
                     38: all the expressions that were entered since that jump insn
                     39: and you can continue with cse.
                     40: 
                     41: It is probably not hard to handle cse from the end of a loop
                     42: around to the beginning, and a few loops would be greatly sped
                     43: up by this.
                     44: 
                     45: * Optimize a sequence of if statements whose conditions are exclusive.
                     46: 
                     47: It is possible to optimize 
                     48: 
                     49:     if (x == 1) ...;
                     50:     if (x == 2) ...;
                     51:     if (x == 3) ...;
                     52: 
                     53: into
                     54: 
                     55:     if (x == 1) ...;
                     56:     else if (x == 2) ...;
                     57:     else if (x == 3) ...;
                     58: 
                     59: provided that x is not altered by the contents of the if statements.
                     60: 
                     61: It's not certain whether this is worth doing.  Perhaps programmers
                     62: nearly always write the else's themselves, leaving few opportunities
                     63: to improve anything.
                     64: 
                     65: * Un-cse.
                     66: 
                     67: Perhaps we should have an un-cse step right after cse, which tries to
                     68: replace a reg with its value if the value can be substituted for the
                     69: reg everywhere, if that looks like an improvement.  Which is if the
                     70: reg is used only a few times.  Use rtx_cost to determine if the
                     71: change is really an improvement.
                     72: 
                     73: * Clean up how cse works.
                     74: 
                     75: The scheme is that each value has just one hash entry.  The
                     76: first_same_value and next_same_value chains are no longer needed.
                     77: 
                     78: For arithmetic, each hash table elt has the following slots:
                     79: 
                     80: * Operation.  This is an rtx code.
                     81: * Mode.
                     82: * Operands 0, 1 and 2.  These point to other hash table elements.
                     83: 
                     84: So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
                     85: first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
                     86: points to.  Then we put these elts into operands 0 and 1 of a new elt.
                     87: We put PLUS and SI into the new elt.
                     88: 
                     89: Registers and mem refs would never be entered into the table as such.
                     90: However, the values they contain would be entered.  There would be a
                     91: table indexed by regno which points at the hash entry for the value in
                     92: that reg.
                     93: 
                     94: The hash entry index now plays the role of a qty number.
                     95: We still need qty_first_reg, reg_next_eqv, etc. to record which regs
                     96: share a particular qty.
                     97: 
                     98: When a reg is used whose contents are unknown, we need to create a
                     99: hash table entry whose contents say "unknown", as a place holder for
                    100: whatever the reg contains.  If that reg is added to something, then
                    101: the hash entry for the sum will refer to the "unknown" entry.  Use
                    102: UNKNOWN for the rtx code in this entry.  This replaces make_new_qty.
                    103: 
                    104: For a constant, a unique hash entry would be made based on the
                    105: value of the constant.
                    106: 
                    107: What about MEM?  Each time a memory address is referenced, we need a
                    108: qty (a hash table elt) to represent what is in it.  (Just as for a
                    109: register.)  If this isn't known, create one, just as for a reg whose
                    110: contents are unknown.
                    111: 
                    112: We need a way to find all mem refs that still contain a certain value.
                    113: Do this with a chain of hash elts (for memory addresses) that point to
                    114: locations that hold the value.  The hash elt for the value itself should
                    115: point to the start of the chain.  It would be good for the hash elt
                    116: for an address to point to the hash elt for the contents of that address
                    117: (but this ptr can be null if the contents have never been entered).
                    118: 
                    119: With this data structure, nothing need ever be invalidated except
                    120: the lists of which regs or mems hold a particular value.  It is easy
                    121: to see if there is a reg or mem that is equiv to a particular value.
                    122: If the value is constant, it is always explicitly constant.
                    123: 
                    124: * Support more general tail-recursion among different functions.
                    125: 
                    126: This might be possible under certain circumstances, such as when
                    127: the argument lists of the functions have the same lengths.
                    128: Perhaps it could be done with a special declaration.
                    129: 
                    130: You would need to verify in the calling function that it does not
                    131: use the addresses of any local variables and does not use setjmp.
                    132: 
                    133: * Put short statics vars at low addresses and use short addressing mode?
                    134: 
                    135: Useful on the 68000/68020 and perhaps on the 32000 series,
                    136: provided one has a linker that works with the feature.
                    137: This is said to make a 15% speedup on the 68000.
                    138: 
                    139: * Keep global variables in registers.
                    140: 
                    141: Here is a scheme for doing this.  A global variable, or a local variable
                    142: whose address is taken, can be kept in a register for an entire function
                    143: if it does not use non-constant memory addresses and (for globals only)
                    144: does not call other functions.  If the entire function does not meet
                    145: this criterion, a loop may.
                    146: 
                    147: The VAR_DECL for such a variable would have to have two RTL expressions:
                    148: the true home in memory, and the pseudo-register used temporarily. 
                    149: It is necessary to emit insns to copy the memory location into the
                    150: pseudo-register at the beginning of the function or loop, and perhaps
                    151: back out at the end.  These insns should have REG_EQUIV notes so that,
                    152: if the pseudo-register does not get a hard register, it is spilled into
                    153: the memory location which exists in any case.
                    154: 
                    155: The easiest way to set up these insns is to modify the routine
                    156: put_var_into_stack so that it does not apply to the entire function
                    157: (sparing any loops which contain nothing dangerous) and to call it at
                    158: the end of the function regardless of where in the function the
                    159: address of a local variable is taken.  It would be called
                    160: unconditionally at the end of the function for all relevant global
                    161: variables.
                    162: 
                    163: For debugger output, the thing to do is to invent a new binding level
                    164: around the appropriate loop and define the variable name as a register
                    165: variable with that scope.
                    166: 
                    167: * Live-range splitting.
                    168: 
                    169: Currently a variable is allocated a hard register either for the full
                    170: extent of its use or not at all.  Sometimes it would be good to
                    171: allocate a variable a hard register for just part of a function; for
                    172: example, through a particular loop where the variable is mostly used,
                    173: or outside of a particular loop where the variable is not used.  (The
                    174: latter is nice because it might let the variable be in a register most
                    175: of the time even though the loop needs all the registers.)
                    176: 
                    177: It might not be very hard to do this in global.c when a variable
                    178: fails to get a hard register for its entire life span.
                    179: 
                    180: The first step is to find a loop in which the variable is live, but
                    181: which is not the whole life span or nearly so.  It's probably best to
                    182: use a loop in which the variable is heavily used.
                    183: 
                    184: Then create a new pseudo-register to represent the variable in that loop.
                    185: Substitute this for the old pseudo-register there, and insert move insns
                    186: to copy between the two at the loop entry and all exits.  (When several
                    187: such moves are inserted at the same place, some new feature should be
                    188: added to say that none of those registers conflict merely because of
                    189: overlap between the new moves.  And the reload pass should reorder them
                    190: so that a store precedes a load, for any given hard register.)
                    191: 
                    192: After doing this for all the reasonable candidates, run global-alloc
                    193: over again.  With luck, one of the two pseudo-registers will be fit
                    194: somewhere.  It may even have a much higher priority due to its reduced
                    195: life span.
                    196: 
                    197: There will be no room in general for the new pseudo-registers in
                    198: basic_block_live_at_start, so there will need to be a second such
                    199: matrix exclusively for the new ones.  Various other vectors indexed by
                    200: register number will have to be made bigger, or there will have to be
                    201: secondary extender vectors just for global-alloc.
                    202: 
                    203: A simple new feature could arrange that both pseudo-registers get the
                    204: same stack slot if they both fail to get hard registers.
                    205: 
                    206: Other compilers split live ranges when they are not connected, or
                    207: try to split off pieces `at the edge'.  I think splitting around loops
                    208: will provide more speedup.
                    209: 
                    210: Creating a fake binding block and a new like-named variable with
                    211: shorter life span and different address might succeed in describing
                    212: this technique for the debugger.
                    213: 
                    214: * Detect dead stores into memory?
                    215: 
                    216: A store into memory is dead if it is followed by another store into
                    217: the same location; and, in between, there is no reference to anything
                    218: that might be that location (including no reference to a variable
                    219: address).
                    220: 
                    221: * Loop optimization.
                    222: 
                    223: Strength reduction and iteration variable elimination could be
                    224: smarter.  They should know how to decide which iteration variables are
                    225: not worth making explicit because they can be computed as part of an
                    226: address calculation.  Based on this information, they should decide
                    227: when it is desirable to eliminate one iteration variable and create
                    228: another in its place.
                    229: 
                    230: It should be possible to compute what the value of an iteration
                    231: variable will be at the end of the loop, and eliminate the variable
                    232: within the loop by computing that value at the loop end.
                    233: 
                    234: When a loop has a simple increment that adds 1,
                    235: instead of jumping in after the increment,
                    236: decrement the loop count and jump to the increment.
                    237: This allows aob insns to be used.
                    238: 
                    239: * Using constraints on values.
                    240: 
                    241: Many operations could be simplified based on knowledge of the
                    242: minimum and maximum possible values of a register at any particular time.
                    243: These limits could come from the data types in the tree, via rtl generation,
                    244: or they can be deduced from operations that are performed.  For example,
                    245: the result of an `and' operation one of whose operands is 7 must be in
                    246: the range 0 to 7.  Compare instructions also tell something about the
                    247: possible values of the operand, in the code beyond the test.
                    248: 
                    249: Value constraints can be used to determine the results of a further
                    250: comparison.  They can also indicate that certain `and' operations are
                    251: redundant.  Constraints might permit a decrement and branch
                    252: instruction that checks zeroness to be used when the user has
                    253: specified to exit if negative.
                    254: 
                    255: * Smarter reload pass.
                    256: 
                    257: The reload pass as currently written can reload values only into registers
                    258: that are reserved for reloading.  This means that in order to use a
                    259: register for reloading it must spill everything out of that register.
                    260: 
                    261: It would be straightforward, though complicated, for reload1.c to keep
                    262: track, during its scan, of which hard registers were available at each
                    263: point in the function, and use for reloading even registers that were
                    264: free only at the point they were needed.  This would avoid much spilling
                    265: and make better code.
                    266: 
                    267: * Change the type of a variable.
                    268: 
                    269: Sometimes a variable is declared as `int', it is assigned only once
                    270: from a value of type `char', and then it is used only by comparison
                    271: against constants.  On many machines, better code would result if
                    272: the variable had type `char'.  If the compiler could detect this
                    273: case, it could change the declaration of the variable and change
                    274: all the places that use it.
                    275: 
                    276: * Better handling for very sparse switches.
                    277: 
                    278: There may be cases where it would be better to compile a switch
                    279: statement to use a fixed hash table rather than the current
                    280: combination of jump tables and binary search.
                    281: 
                    282: * Order of subexpressions.
                    283: 
                    284: It might be possible to make better code by paying attention
                    285: to the order in which to generate code for subexpressions of an expression.
                    286: 
                    287: * More code motion.
                    288: 
                    289: Consider hoisting common code up past conditional branches or
                    290: tablejumps.
                    291: 
                    292: * Trace scheduling.
                    293: 
                    294: This technique is said to be able to figure out which way a jump
                    295: will usually go, and rearrange the code to make that path the
                    296: faster one.
                    297: 
                    298: * Distributive law.
                    299: 
                    300: The C expression *(X + 4 * (Y + C)) compiles better on certain
                    301: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
                    302: modes.  It may be tricky to determine when, and for which machines, to
                    303: use each alternative.
                    304: 
                    305: Some work has been done on this, in combine.c.
                    306: 
                    307: * Can optimize by changing if (x) y; else z; into z; if (x) y;
                    308: if z and x do not interfere and z has no effects not undone by y.
                    309: This is desirable if z is faster than jumping.
                    310: 
                    311: * For a two-insn loop on the 68020, such as
                    312:   foo: movb    a2@+,a3@+
                    313:        jne     foo
                    314: it is better to insert dbeq d0,foo before the jne.
                    315: d0 can be a junk register.  The challenge is to fit this into
                    316: a portable framework: when can you detect this situation and
                    317: still be able to allocate a junk register?
                    318: 
                    319: 2. Simpler porting.
                    320: 
                    321: Right now, describing the target machine's instructions is done
                    322: cleanly, but describing its addressing mode is done with several
                    323: ad-hoc macro definitions.  Porting would be much easier if there were
                    324: an RTL description for addressing modes like that for instructions.
                    325: Tools analogous to genflags and genrecog would generate macros from
                    326: this description.
                    327: 
                    328: There would be one pattern in the address-description file for each
                    329: kind of addressing, and this pattern would have:
                    330: 
                    331:   * the RTL expression for the address
                    332:   * C code to verify its validity (since that may depend on
                    333:     the exact data).
                    334:   * C code to print the address in assembler language.
                    335:   * C code to convert the address into a valid one, if it is not valid.
                    336:     (This would replace LEGITIMIZE_ADDRESS).
                    337:   * Register constraints for all indeterminates that appear
                    338:     in the RTL expression.
                    339: 
                    340: 3. Other languages.
                    341: 
                    342: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
                    343: desirable.
                    344: 
                    345: Pascal, Modula-2 and Ada require the implementation of functions
                    346: within functions.  Some of the mechanisms for this already exist.
                    347: 
                    348: 4. More extensions.
                    349: 
                    350: * Generated unique labels.  Have some way of generating distinct labels
                    351: for use in extended asm statements.  I don't know what a good syntax would
                    352: be.
                    353: 
                    354: * A way of defining a structure containing a union, in which the choice of
                    355: union alternative is controlled by a previous structure component.
                    356: 
                    357: Here is a possible syntax for this.
                    358: 
                    359: struct foo {
                    360:   enum { INT, DOUBLE } code;
                    361:   auto union { case INT: int i; case DOUBLE: double d;} value : code;
                    362: };
                    363: 
                    364: * Allow constructor expressions as lvalues, like this:
                    365: 
                    366:        (struct foo) {a, b, c} = foo();
                    367: 
                    368: This would call foo, which returns a structure, and then store the
                    369: several components of the structure into the variables a, b, and c.
                    370: 
                    371: 5. Generalize the machine model.
                    372: 
                    373: * Some new compiler features may be needed to do a good job on machines
                    374: where static data needs to be addressed using base registers.
                    375: 
                    376: * Some machines have two stacks in different areas of memory, one used
                    377: for scalars and another for large objects.  The compiler does not
                    378: now have a way to understand this.
                    379: 
                    380: 6. Useful warnings.
                    381: 
                    382: * Warn about statements that are undefined because the order of
                    383: evaluation of increment operators makes a big difference.  Here is an
                    384: example:
                    385: 
                    386:     *foo++ = hack (*foo);
                    387: 
                    388: 7. Better documentation of how GCC works and how to port it.
                    389: 
                    390: Here is an outline proposed by Allan Adler.
                    391: 
                    392: I.    Overview of this document
                    393: II.   The machines on which GCC is implemented
                    394:     A. Prose description of those characteristics of target machines and
                    395:        their operating systems which are pertinent to the implementation
                    396:        of GCC.
                    397:        i. target machine characteristics
                    398:        ii. comparison of this system of machine characteristics with
                    399:            other systems of machine specification currently in use
                    400:     B. Tables of the characteristics of the target machines on which
                    401:        GCC is implemented.
                    402:     C. A priori restrictions on the values of characteristics of target 
                    403:        machines, with special reference to those parts of the source code
                    404:        which entail those restrictions
                    405:        i. restrictions on individual characteristics 
                    406:         ii. restrictions involving relations between various characteristics
                    407:     D. The use of GCC as a cross-compiler 
                    408:        i. cross-compilation to existing machines
                    409:        ii. cross-compilation to non-existent machines
                    410:     E. Assumptions which are made regarding the target machine
                    411:        i.  assumptions regarding the architecture of the target machine
                    412:        ii. assumptions regarding the operating system of the target machine
                    413:        iii. assumptions regarding software resident on the target machine
                    414:        iv. where in the source code these assumptions are in effect made
                    415: III.   A systematic approach to writing the files tm.h and xm.h
                    416:     A. Macros which require special care or skill
                    417:     B. Examples, with special reference to the underlying reasoning
                    418: IV.    A systematic approach to writing the machine description file md
                    419:     A. Minimal viable sets of insn descriptions
                    420:     B. Examples, with special reference to the underlying reasoning
                    421: V.     Uses of the file aux-output.c
                    422: VI.    Specification of what constitutes correct performance of an 
                    423:        implementation of GCC
                    424:     A. The components of GCC
                    425:     B. The itinerary of a C program through GCC
                    426:     C. A system of benchmark programs
                    427:     D. What your RTL and assembler should look like with these benchmarks
                    428:     E. Fine tuning for speed and size of compiled code
                    429: VII.   A systematic procedure for debugging an implementation of GCC
                    430:     A. Use of GDB
                    431:        i. the macros in the file .gdbinit for GCC
                    432:        ii. obstacles to the use of GDB
                    433:            a. functions implemented as macros can't be called in GDB
                    434:     B. Debugging without GDB
                    435:        i. How to turn off the normal operation of GCC and access specific
                    436:           parts of GCC
                    437:     C. Debugging tools
                    438:     D. Debugging the parser
                    439:        i. how machine macros and insn definitions affect the parser
                    440:     E. Debugging the recognizer
                    441:        i. how machine macros and insn definitions affect the recognizer
                    442: 
                    443: ditto for other components
                    444: 
                    445: VIII. Data types used by GCC, with special reference to restrictions not 
                    446:       specified in the formal definition of the data type
                    447: IX.   References to the literature for the algorithms used in GCC
                    448: 

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.