GNUtools/cc/PROJECTS - annotate

Return to PROJECTS CVS log
Up to [Apple XNU] / GNUtools / cc
Annotation of GNUtools/cc/PROJECTS, revision 1.1

1.1     ! root        1: 0. Improved efficiency.
        !             2: 
        !             3: * Parse and output array initializers an element at a time, freeing
        !             4: storage after each, instead of parsing the whole initializer first and
        !             5: then outputting.  This would reduce memory usage for large
        !             6: initializers.
        !             7: 
        !             8: * See if the techniques describe in Oct 1991 SIGPLAN Notices
        !             9: (Frazer and Hanson) are applicable to GCC.
        !            10: 
        !            11: 1. Better optimization.
        !            12: 
        !            13: * Constants in unused inline functions
        !            14: 
        !            15: It would be nice to delay output of string constants so that string
        !            16: constants mentioned in unused inline functions are never generated.
        !            17: Perhaps this would also take care of string constants in dead code.
        !            18: 
        !            19: The difficulty is in finding a clean way for the RTL which refers
        !            20: to the constant (currently, only by an assembler symbol name)
        !            21: to point to the constant and cause it to be output.
        !            22: 
        !            23: * More cse
        !            24: 
        !            25: The techniques for doing full global cse are described in the red
        !            26: dragon book, or (a different version) in Frederick Chow's thesis from
        !            27: Stanford.  It is likely to be slow and use a lot of memory, but it
        !            28: might be worth offering as an additional option.
        !            29: 
        !            30: It is probably possible to extend cse to a few very frequent cases
        !            31: without so much expense.
        !            32: 
        !            33: For example, it is not very hard to handle cse through if-then
        !            34: statements with no else clauses.  Here's how to do it.  On reaching a
        !            35: label, notice that the label's use-count is 1 and that the last
        !            36: preceding jump jumps conditionally to this label.  Now you know it
        !            37: is a simple if-then statement.  Remove from the hash table
        !            38: all the expressions that were entered since that jump insn
        !            39: and you can continue with cse.
        !            40: 
        !            41: It is probably not hard to handle cse from the end of a loop
        !            42: around to the beginning, and a few loops would be greatly sped
        !            43: up by this.
        !            44: 
        !            45: * Optimize a sequence of if statements whose conditions are exclusive.
        !            46: 
        !            47: It is possible to optimize 
        !            48: 
        !            49:     if (x == 1) ...;
        !            50:     if (x == 2) ...;
        !            51:     if (x == 3) ...;
        !            52: 
        !            53: into
        !            54: 
        !            55:     if (x == 1) ...;
        !            56:     else if (x == 2) ...;
        !            57:     else if (x == 3) ...;
        !            58: 
        !            59: provided that x is not altered by the contents of the if statements.
        !            60: 
        !            61: It's not certain whether this is worth doing.  Perhaps programmers
        !            62: nearly always write the else's themselves, leaving few opportunities
        !            63: to improve anything.
        !            64: 
        !            65: * Un-cse.
        !            66: 
        !            67: Perhaps we should have an un-cse step right after cse, which tries to
        !            68: replace a reg with its value if the value can be substituted for the
        !            69: reg everywhere, if that looks like an improvement.  Which is if the
        !            70: reg is used only a few times.  Use rtx_cost to determine if the
        !            71: change is really an improvement.
        !            72: 
        !            73: * Clean up how cse works.
        !            74: 
        !            75: The scheme is that each value has just one hash entry.  The
        !            76: first_same_value and next_same_value chains are no longer needed.
        !            77: 
        !            78: For arithmetic, each hash table elt has the following slots:
        !            79: 
        !            80: * Operation.  This is an rtx code.
        !            81: * Mode.
        !            82: * Operands 0, 1 and 2.  These point to other hash table elements.
        !            83: 
        !            84: So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
        !            85: first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
        !            86: points to.  Then we put these elts into operands 0 and 1 of a new elt.
        !            87: We put PLUS and SI into the new elt.
        !            88: 
        !            89: Registers and mem refs would never be entered into the table as such.
        !            90: However, the values they contain would be entered.  There would be a
        !            91: table indexed by regno which points at the hash entry for the value in
        !            92: that reg.
        !            93: 
        !            94: The hash entry index now plays the role of a qty number.
        !            95: We still need qty_first_reg, reg_next_eqv, etc. to record which regs
        !            96: share a particular qty.
        !            97: 
        !            98: When a reg is used whose contents are unknown, we need to create a
        !            99: hash table entry whose contents say "unknown", as a place holder for
        !           100: whatever the reg contains.  If that reg is added to something, then
        !           101: the hash entry for the sum will refer to the "unknown" entry.  Use
        !           102: UNKNOWN for the rtx code in this entry.  This replaces make_new_qty.
        !           103: 
        !           104: For a constant, a unique hash entry would be made based on the
        !           105: value of the constant.
        !           106: 
        !           107: What about MEM?  Each time a memory address is referenced, we need a
        !           108: qty (a hash table elt) to represent what is in it.  (Just as for a
        !           109: register.)  If this isn't known, create one, just as for a reg whose
        !           110: contents are unknown.
        !           111: 
        !           112: We need a way to find all mem refs that still contain a certain value.
        !           113: Do this with a chain of hash elts (for memory addresses) that point to
        !           114: locations that hold the value.  The hash elt for the value itself should
        !           115: point to the start of the chain.  It would be good for the hash elt
        !           116: for an address to point to the hash elt for the contents of that address
        !           117: (but this ptr can be null if the contents have never been entered).
        !           118: 
        !           119: With this data structure, nothing need ever be invalidated except
        !           120: the lists of which regs or mems hold a particular value.  It is easy
        !           121: to see if there is a reg or mem that is equiv to a particular value.
        !           122: If the value is constant, it is always explicitly constant.
        !           123: 
        !           124: * Support more general tail-recursion among different functions.
        !           125: 
        !           126: This might be possible under certain circumstances, such as when
        !           127: the argument lists of the functions have the same lengths.
        !           128: Perhaps it could be done with a special declaration.
        !           129: 
        !           130: You would need to verify in the calling function that it does not
        !           131: use the addresses of any local variables and does not use setjmp.
        !           132: 
        !           133: * Put short statics vars at low addresses and use short addressing mode?
        !           134: 
        !           135: Useful on the 68000/68020 and perhaps on the 32000 series,
        !           136: provided one has a linker that works with the feature.
        !           137: This is said to make a 15% speedup on the 68000.
        !           138: 
        !           139: * Keep global variables in registers.
        !           140: 
        !           141: Here is a scheme for doing this.  A global variable, or a local variable
        !           142: whose address is taken, can be kept in a register for an entire function
        !           143: if it does not use non-constant memory addresses and (for globals only)
        !           144: does not call other functions.  If the entire function does not meet
        !           145: this criterion, a loop may.
        !           146: 
        !           147: The VAR_DECL for such a variable would have to have two RTL expressions:
        !           148: the true home in memory, and the pseudo-register used temporarily. 
        !           149: It is necessary to emit insns to copy the memory location into the
        !           150: pseudo-register at the beginning of the function or loop, and perhaps
        !           151: back out at the end.  These insns should have REG_EQUIV notes so that,
        !           152: if the pseudo-register does not get a hard register, it is spilled into
        !           153: the memory location which exists in any case.
        !           154: 
        !           155: The easiest way to set up these insns is to modify the routine
        !           156: put_var_into_stack so that it does not apply to the entire function
        !           157: (sparing any loops which contain nothing dangerous) and to call it at
        !           158: the end of the function regardless of where in the function the
        !           159: address of a local variable is taken.  It would be called
        !           160: unconditionally at the end of the function for all relevant global
        !           161: variables.
        !           162: 
        !           163: For debugger output, the thing to do is to invent a new binding level
        !           164: around the appropriate loop and define the variable name as a register
        !           165: variable with that scope.
        !           166: 
        !           167: * Live-range splitting.
        !           168: 
        !           169: Currently a variable is allocated a hard register either for the full
        !           170: extent of its use or not at all.  Sometimes it would be good to
        !           171: allocate a variable a hard register for just part of a function; for
        !           172: example, through a particular loop where the variable is mostly used,
        !           173: or outside of a particular loop where the variable is not used.  (The
        !           174: latter is nice because it might let the variable be in a register most
        !           175: of the time even though the loop needs all the registers.)
        !           176: 
        !           177: It might not be very hard to do this in global.c when a variable
        !           178: fails to get a hard register for its entire life span.
        !           179: 
        !           180: The first step is to find a loop in which the variable is live, but
        !           181: which is not the whole life span or nearly so.  It's probably best to
        !           182: use a loop in which the variable is heavily used.
        !           183: 
        !           184: Then create a new pseudo-register to represent the variable in that loop.
        !           185: Substitute this for the old pseudo-register there, and insert move insns
        !           186: to copy between the two at the loop entry and all exits.  (When several
        !           187: such moves are inserted at the same place, some new feature should be
        !           188: added to say that none of those registers conflict merely because of
        !           189: overlap between the new moves.  And the reload pass should reorder them
        !           190: so that a store precedes a load, for any given hard register.)
        !           191: 
        !           192: After doing this for all the reasonable candidates, run global-alloc
        !           193: over again.  With luck, one of the two pseudo-registers will be fit
        !           194: somewhere.  It may even have a much higher priority due to its reduced
        !           195: life span.
        !           196: 
        !           197: There will be no room in general for the new pseudo-registers in
        !           198: basic_block_live_at_start, so there will need to be a second such
        !           199: matrix exclusively for the new ones.  Various other vectors indexed by
        !           200: register number will have to be made bigger, or there will have to be
        !           201: secondary extender vectors just for global-alloc.
        !           202: 
        !           203: A simple new feature could arrange that both pseudo-registers get the
        !           204: same stack slot if they both fail to get hard registers.
        !           205: 
        !           206: Other compilers split live ranges when they are not connected, or
        !           207: try to split off pieces `at the edge'.  I think splitting around loops
        !           208: will provide more speedup.
        !           209: 
        !           210: Creating a fake binding block and a new like-named variable with
        !           211: shorter life span and different address might succeed in describing
        !           212: this technique for the debugger.
        !           213: 
        !           214: * Detect dead stores into memory?
        !           215: 
        !           216: A store into memory is dead if it is followed by another store into
        !           217: the same location; and, in between, there is no reference to anything
        !           218: that might be that location (including no reference to a variable
        !           219: address).
        !           220: 
        !           221: * Loop optimization.
        !           222: 
        !           223: Strength reduction and iteration variable elimination could be
        !           224: smarter.  They should know how to decide which iteration variables are
        !           225: not worth making explicit because they can be computed as part of an
        !           226: address calculation.  Based on this information, they should decide
        !           227: when it is desirable to eliminate one iteration variable and create
        !           228: another in its place.
        !           229: 
        !           230: It should be possible to compute what the value of an iteration
        !           231: variable will be at the end of the loop, and eliminate the variable
        !           232: within the loop by computing that value at the loop end.
        !           233: 
        !           234: When a loop has a simple increment that adds 1,
        !           235: instead of jumping in after the increment,
        !           236: decrement the loop count and jump to the increment.
        !           237: This allows aob insns to be used.
        !           238: 
        !           239: * Using constraints on values.
        !           240: 
        !           241: Many operations could be simplified based on knowledge of the
        !           242: minimum and maximum possible values of a register at any particular time.
        !           243: These limits could come from the data types in the tree, via rtl generation,
        !           244: or they can be deduced from operations that are performed.  For example,
        !           245: the result of an `and' operation one of whose operands is 7 must be in
        !           246: the range 0 to 7.  Compare instructions also tell something about the
        !           247: possible values of the operand, in the code beyond the test.
        !           248: 
        !           249: Value constraints can be used to determine the results of a further
        !           250: comparison.  They can also indicate that certain `and' operations are
        !           251: redundant.  Constraints might permit a decrement and branch
        !           252: instruction that checks zeroness to be used when the user has
        !           253: specified to exit if negative.
        !           254: 
        !           255: * Smarter reload pass.
        !           256: 
        !           257: The reload pass as currently written can reload values only into registers
        !           258: that are reserved for reloading.  This means that in order to use a
        !           259: register for reloading it must spill everything out of that register.
        !           260: 
        !           261: It would be straightforward, though complicated, for reload1.c to keep
        !           262: track, during its scan, of which hard registers were available at each
        !           263: point in the function, and use for reloading even registers that were
        !           264: free only at the point they were needed.  This would avoid much spilling
        !           265: and make better code.
        !           266: 
        !           267: * Change the type of a variable.
        !           268: 
        !           269: Sometimes a variable is declared as `int', it is assigned only once
        !           270: from a value of type `char', and then it is used only by comparison
        !           271: against constants.  On many machines, better code would result if
        !           272: the variable had type `char'.  If the compiler could detect this
        !           273: case, it could change the declaration of the variable and change
        !           274: all the places that use it.
        !           275: 
        !           276: * Better handling for very sparse switches.
        !           277: 
        !           278: There may be cases where it would be better to compile a switch
        !           279: statement to use a fixed hash table rather than the current
        !           280: combination of jump tables and binary search.
        !           281: 
        !           282: * Order of subexpressions.
        !           283: 
        !           284: It might be possible to make better code by paying attention
        !           285: to the order in which to generate code for subexpressions of an expression.
        !           286: 
        !           287: * More code motion.
        !           288: 
        !           289: Consider hoisting common code up past conditional branches or
        !           290: tablejumps.
        !           291: 
        !           292: * Trace scheduling.
        !           293: 
        !           294: This technique is said to be able to figure out which way a jump
        !           295: will usually go, and rearrange the code to make that path the
        !           296: faster one.
        !           297: 
        !           298: * Distributive law.
        !           299: 
        !           300: The C expression *(X + 4 * (Y + C)) compiles better on certain
        !           301: machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
        !           302: modes.  It may be tricky to determine when, and for which machines, to
        !           303: use each alternative.
        !           304: 
        !           305: Some work has been done on this, in combine.c.
        !           306: 
        !           307: * Can optimize by changing if (x) y; else z; into z; if (x) y;
        !           308: if z and x do not interfere and z has no effects not undone by y.
        !           309: This is desirable if z is faster than jumping.
        !           310: 
        !           311: * For a two-insn loop on the 68020, such as
        !           312:   foo: movb    a2@+,a3@+
        !           313:        jne     foo
        !           314: it is better to insert dbeq d0,foo before the jne.
        !           315: d0 can be a junk register.  The challenge is to fit this into
        !           316: a portable framework: when can you detect this situation and
        !           317: still be able to allocate a junk register?
        !           318: 
        !           319: 2. Simpler porting.
        !           320: 
        !           321: Right now, describing the target machine's instructions is done
        !           322: cleanly, but describing its addressing mode is done with several
        !           323: ad-hoc macro definitions.  Porting would be much easier if there were
        !           324: an RTL description for addressing modes like that for instructions.
        !           325: Tools analogous to genflags and genrecog would generate macros from
        !           326: this description.
        !           327: 
        !           328: There would be one pattern in the address-description file for each
        !           329: kind of addressing, and this pattern would have:
        !           330: 
        !           331:   * the RTL expression for the address
        !           332:   * C code to verify its validity (since that may depend on
        !           333:     the exact data).
        !           334:   * C code to print the address in assembler language.
        !           335:   * C code to convert the address into a valid one, if it is not valid.
        !           336:     (This would replace LEGITIMIZE_ADDRESS).
        !           337:   * Register constraints for all indeterminates that appear
        !           338:     in the RTL expression.
        !           339: 
        !           340: 3. Other languages.
        !           341: 
        !           342: Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
        !           343: desirable.
        !           344: 
        !           345: Pascal, Modula-2 and Ada require the implementation of functions
        !           346: within functions.  Some of the mechanisms for this already exist.
        !           347: 
        !           348: 4. More extensions.
        !           349: 
        !           350: * Generated unique labels.  Have some way of generating distinct labels
        !           351: for use in extended asm statements.  I don't know what a good syntax would
        !           352: be.
        !           353: 
        !           354: * A way of defining a structure containing a union, in which the choice of
        !           355: union alternative is controlled by a previous structure component.
        !           356: 
        !           357: Here is a possible syntax for this.
        !           358: 
        !           359: struct foo {
        !           360:   enum { INT, DOUBLE } code;
        !           361:   auto union { case INT: int i; case DOUBLE: double d;} value : code;
        !           362: };
        !           363: 
        !           364: * Allow constructor expressions as lvalues, like this:
        !           365: 
        !           366:        (struct foo) {a, b, c} = foo();
        !           367: 
        !           368: This would call foo, which returns a structure, and then store the
        !           369: several components of the structure into the variables a, b, and c.
        !           370: 
        !           371: 5. Generalize the machine model.
        !           372: 
        !           373: * Some new compiler features may be needed to do a good job on machines
        !           374: where static data needs to be addressed using base registers.
        !           375: 
        !           376: * Some machines have two stacks in different areas of memory, one used
        !           377: for scalars and another for large objects.  The compiler does not
        !           378: now have a way to understand this.
        !           379: 
        !           380: 6. Useful warnings.
        !           381: 
        !           382: * Warn about statements that are undefined because the order of
        !           383: evaluation of increment operators makes a big difference.  Here is an
        !           384: example:
        !           385: 
        !           386:     *foo++ = hack (*foo);
        !           387: 
        !           388: 7. Better documentation of how GCC works and how to port it.
        !           389: 
        !           390: Here is an outline proposed by Allan Adler.
        !           391: 
        !           392: I.    Overview of this document
        !           393: II.   The machines on which GCC is implemented
        !           394:     A. Prose description of those characteristics of target machines and
        !           395:        their operating systems which are pertinent to the implementation
        !           396:        of GCC.
        !           397:        i. target machine characteristics
        !           398:        ii. comparison of this system of machine characteristics with
        !           399:            other systems of machine specification currently in use
        !           400:     B. Tables of the characteristics of the target machines on which
        !           401:        GCC is implemented.
        !           402:     C. A priori restrictions on the values of characteristics of target 
        !           403:        machines, with special reference to those parts of the source code
        !           404:        which entail those restrictions
        !           405:        i. restrictions on individual characteristics 
        !           406:         ii. restrictions involving relations between various characteristics
        !           407:     D. The use of GCC as a cross-compiler 
        !           408:        i. cross-compilation to existing machines
        !           409:        ii. cross-compilation to non-existent machines
        !           410:     E. Assumptions which are made regarding the target machine
        !           411:        i.  assumptions regarding the architecture of the target machine
        !           412:        ii. assumptions regarding the operating system of the target machine
        !           413:        iii. assumptions regarding software resident on the target machine
        !           414:        iv. where in the source code these assumptions are in effect made
        !           415: III.   A systematic approach to writing the files tm.h and xm.h
        !           416:     A. Macros which require special care or skill
        !           417:     B. Examples, with special reference to the underlying reasoning
        !           418: IV.    A systematic approach to writing the machine description file md
        !           419:     A. Minimal viable sets of insn descriptions
        !           420:     B. Examples, with special reference to the underlying reasoning
        !           421: V.     Uses of the file aux-output.c
        !           422: VI.    Specification of what constitutes correct performance of an 
        !           423:        implementation of GCC
        !           424:     A. The components of GCC
        !           425:     B. The itinerary of a C program through GCC
        !           426:     C. A system of benchmark programs
        !           427:     D. What your RTL and assembler should look like with these benchmarks
        !           428:     E. Fine tuning for speed and size of compiled code
        !           429: VII.   A systematic procedure for debugging an implementation of GCC
        !           430:     A. Use of GDB
        !           431:        i. the macros in the file .gdbinit for GCC
        !           432:        ii. obstacles to the use of GDB
        !           433:            a. functions implemented as macros can't be called in GDB
        !           434:     B. Debugging without GDB
        !           435:        i. How to turn off the normal operation of GCC and access specific
        !           436:           parts of GCC
        !           437:     C. Debugging tools
        !           438:     D. Debugging the parser
        !           439:        i. how machine macros and insn definitions affect the parser
        !           440:     E. Debugging the recognizer
        !           441:        i. how machine macros and insn definitions affect the recognizer
        !           442: 
        !           443: ditto for other components
        !           444: 
        !           445: VIII. Data types used by GCC, with special reference to restrictions not 
        !           446:       specified in the formal definition of the data type
        !           447: IX.   References to the literature for the algorithms used in GCC
        !           448:
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.