--- gcc/PROJECTS 2018/04/24 16:37:52 1.1 +++ gcc/PROJECTS 2018/04/24 16:51:24 1.1.1.5 @@ -1,5 +1,15 @@ 1. Better optimization. +* Constants in unused inline functions + +It would be nice to delay output of string constants so that string +constants mentioned in unused inline functions are never generated. +Perhaps this would also take care of string constants in dead code. + +The difficulty is in finding a clean way for the RTL which refers +to the constant (currently, only by an assembler symbol name) +to point to the constant and cause it to be output. + * More cse The techniques for doing full global cse are described in the @@ -21,32 +31,61 @@ It is probably not hard to handle cse fr around to the beginning, and a few loops would be greatly sped up by this. -* Iteration variables and strength reduction. +* Support more general tail-recursion among different functions. -The red dragon book describes standard techniques for these kinds -of loop optimization. But be careful! These optimization techniques -don't always make the code better. You need to avoid performing -the standard transformations unless they are greatly worth while. - -In many common cases it is possible to deduce that an iteration -variable is always positive during the loop. This information -may make it possible to use decrement-and-branch instructions -whose branch conditions are inconvenient. For example, the 68000 -`dbra' instruction branches if the value was not equal to zero. -Therefore, it is not applicable to `for (i = 10; i >= 0; i--)' -unless the compiler can know that I will never be negative -before it is decremented. - -* Special local optimizations. - -The instruction combiner finds only certain classes of local optimizations. -For example, it cannot use the 68020 instruction `cmp2' because it would -not think to combine the instructions that would be equivalent to a `cmp2'. - -In order to take advantage of such instructions, the combiner would need -special hints as to which instructions to consider combining. To be -generally useful, this feature would have to be controlled somehow -by new information in the machine description. +This might be possible under certain circumstances, such as when +the argument lists of the functions have the same lengths. +Perhaps it could be done with a special declaration. + +You would need to verify in the calling function that it does not +use the addresses of any local variables and does not use setjmp. + +* Put short statics vars at low addresses and use short addressing mode? +Useful on the 68000/68020 and perhaps on the 32000 series, +provided one has a linker that works with the feature. +This is said to make a 15% speedup on the 68000. +This brings to mind Hayes' changes for Stanford MIPS. + +* Detect dead stores into memory? + +A store into memory is dead if it is followed by another store into +the same location; and, in between, there is no reference to anything +that might be that location (including no reference to a variable +address). + +* Loop optimization. + +Strength reduction and iteration variable elimination could be +smarter. They should know how to decide which iteration variables are +not worth making explicit because they can be computed as part of an +address calculation. Based on this information, they should decide +when it is desirable to eliminate one iteration variable and create +another in its place. + +It should be possible to compute what the value of an iteration +variable will be at the end of the loop, and eliminate the variable +within the loop by computing that value at the loop end. + +When a loop has a simple increment that adds 1, +instead of jumping in after the increment, +decrement the loop count and jump to the increment. +This allows aob insns to be used. + +* Using constraints on values. + +Many operations could be simplified based on knowledge of the +minimum and maximum possible values of a register at any particular time. +These limits could come from the data types in the tree, via rtl generation, +or they can be deduced from operations that are performed. For example, +the result of an `and' operation one of whose operands is 7 must be in +the range 0 to 7. Compare instructions also tell something about the +possible values of the operand, in the code beyond the test. + +Value constraints can be used to determine the results of a further +comparison. They can also indicate that certain `and' operations are +redundant. Constraints might permit a decrement and branch +instruction that checks zeroness to be used when the user has +specified to exit if negative. * Smarter reload pass. @@ -74,18 +113,61 @@ all the places that use it. It might be possible to make better code by paying attention to the order in which to generate code for subexpressions of an expression. -* Better code for switch statements. +* More code motion. + +Consider hoisting common code up past conditional branches or +tablejumps. -If a switch statement has only a few cases, a sequence of conditional -branches is generated for it, rather than a jump table. It would -be better to output a binary tree of branches. +* Trace scheduling. + +This technique is said to be able to figure out which way a jump +will usually go, and rearrange the code to make that path the +faster one. * Distributive law. -*(X + 4 * (Y + C)) compiles better as *(X + 4*C + 4*Y) -on some machines because of known addressing modes. -It may be tricky to determine when, and for which machines, -to use each alternative. +The C expression *(X + 4 * (Y + C)) compiles better on certain +machines if rewritten as *(X + 4*C + 4*Y) because of known addressing +modes. It may be tricky to determine when, and for which machines, to +use each alternative. + +Some work has been done on this, in combine.c. + +* Jump-execute-next. + +Many recent machines have jumps which optionally execute the following +instruction before the instruction jumped to, either conditionally or +unconditionally. To take advantage of this capability requires a new +compiler pass that would reorder instructions when possible. After +reload may be a good place for it. + +On some machines, the result of a load from memory is not available +until after the following instruction. The easiest way to support +these machines is to output each RTL load instruction as two assembler +instructions, the second being a no-op. Putting useful instructions +after the load instructions may be a similar task to putting them +after jump instructions. + +* Pipeline scheduling. + +On many machines, code gets faster if instructions are reordered +so that pipelines are kept full. Doing the best possible job of this +requires knowing which functional units each kind of instruction executes +in and how long the functional unit stays busy with it. Then the +goal is to reorder the instructions to keep many functional units +busy but never feed them so fast they must wait. + +* Can optimize by changing if (x) y; else z; into z; if (x) y; +if z and x do not interfere and z has no effects not undone by y. +This is desirable if z is faster than jumping. + +* For a two-insn loop on the 68020, such as + foo: movb a2@+,a3@+ + jne foo +it is better to insert dbeq d0,foo before the jne. +d0 can be a junk register. The challenge is to fit this into +a portable framework: when can you detect this situation and +still be able to allocate a junk register? 2. Simpler porting. @@ -110,36 +192,163 @@ kind of addressing, and this pattern wou 3. Other languages. -Front ends for Pascal, Fortran, Algol, Cobol and Ada are desirable. +Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are +desirable. -Pascal requires the implementation of functions within functions. -Some of the mechanisms for this already exist. +Pascal, Modula-2 and Ada require the implementation of functions +within functions. Some of the mechanisms for this already exist. 4. Generalize the machine model. -4.A. Parameters in registers. - -One some machines, conventions are that some parameters are passed -in general registers. The compiler currently cannot handle this. - -This requires changes in the code in expr.c for function calls. -For function entry, changes are required in stmt.c, and in -layout_parms, and perhaps also in final and in register allocation, -but the last should be minor. - -Where stmt.c now copies the stack slot into a pseudo register, -instead copy the special argument register into a pseudo register. -Use the pseudo register throughout the body of the function to -represent the parameter. That way, parameters can still be spilled -to the stack. - -4.B. Jump-execute-next. - -Many recent machines have jumps which execute the following instruction -before the instruction jumped to. To take advantage of this capability -requires a new compiler pass that would reorder instructions when possible. -After reload is a good place for it. +* Some new compiler features may be needed to do a good job on machines +where static data needs to be addressed using base registers. -5. Add a profiling feature like Berkeley's -pg, -or other debugging and measurement features. +* Some machines have two stacks in different areas of memory, one used +for scalars and another for large objects. The compiler does not +now have a way to understand this. + +5. Precompilation of header files. + +In the future, many programs will use thousands of lines of header files. +Compiling the headers might be slower than compiling the guts of any one +source file. Here is a scheme for precompiling header files to make +compilation faster for a sequence of headers which is often used. + +A precompiled header corresponds to a sequence of header files. The +preprocessor recognizes when the input starts with a sequence of +`#include' commands and searches a data base for a precompiled header +corresponding to that sequence. The modtimes of all these files are +stored in the data base so that one can tell whether the precompiled +header is still valid. + +For robustness, each directory should have its own collection of +precompiled headers and its own data base of them. Probably each +precompiled header would be a file and the data base would be one +more file. + +The data base records the entire collection of predefined macros and +their definitions, except for __FILE__, __LINE__ and __DATE__, for +each precompiled header. If this collection does not match the setup +at the start of the current compilation (including the results of -D +and -U switches), the precompiled header is inapplicable. It might +be possible to have distinct precompiled headers for the same sequence +of header files but different collections of predefined macros. + +The state of any option that affects macro processing, such as -ansi +or -traditional, must also be recorded, and the precompiled header is +valid only if these options match. + +The precompiled header contains an ordered series of strings. Some +strings are marked "unconditional"; these must be compiled each time +the precompiled header is used. Other strings are have keys, which +are identifiers. A string with keys must be compiled if at least one +of its keys is mentioned in the input. The order these strings appear +in the precompiled header is called their intrinsic order. + +The C preprocessor reads in the precompiled header file and scan all +the strings, making for each key an entry in the same symbol table +used for macros, pointing at a list of all the strings for which it is +a key. Each string must have a flag (one flag per string, not one per +key per string). The same code in `rescan' that detects references to +macros would detect a reference to a key and flag all of the strings +that it belongs to as needing to be output. + +Each of these strings is immediately recursively macroexpanded (i.e. +`rescan' is called), but the output from this is discarded. The +expansion is to detect any other keys mentioned in the string, and to +define any macros for which the string contains a #define. The key's +symbol table entry is be deleted to save time if the key is +encountered again, and to avoid an infinite recursion. + +The unconditional strings are macroexpanded with `rescan' (but the +output is discarded) at some time before anything is actually output. + +At the end of compilation, before any of the actual input text is +output, the list of strings is scanned in the intrinsic order, and +each string that is unconditional or flagged is output verbatim, +except that any #define lines are discarded. + +Precompiled headers would be constructed by explicit request with a +special tool. The first step is to run cpp on the sequence of header +files' contents. This would use a new option that would cause all +#define lines to be output unchanged as well as defining the macro. +The second step is to divide the output into strings, some keyed and +some unconditional. This division is done without changing the order +of the text being divided up. + +JNC@lcs.mit.edu has some ideas on this subject also. + +6. Other possibly nice features. + +* cpp could have a #provide directive. +#provide would have the same syntax as #include, +and it would nullify any future #include directive +with the same argument. Thus, the file foo.h +could contain #provide to prevent itself from +being included twice. + +This is much cleaner than the alternative sometimes implemented, +which is to require the user to use something other than #include +in order to ensure inclusion only once. + +7. Better documentation of how GCC works and how to port it. + +Here is an outline proposed by Allan Adler. + +I. Overview of this document +II. The machines on which GCC is implemented + A. Prose description of those characteristics of target machines and + their operating systems which are pertinent to the implementation + of GCC. + i. target machine characteristics + ii. comparison of this system of machine characteristics with + other systems of machine specification currently in use + B. Tables of the characteristics of the target machines on which + GCC is implemented. + C. A priori restrictions on the values of characteristics of target + machines, with special reference to those parts of the source code + which entail those restrictions + i. restrictions on individual characteristics + ii. restrictions involving relations between various characteristics + D. The use of GCC as a cross-compiler + i. cross-compilation to existing machines + ii. cross-compilation to non-existent machines + E. Assumptions which are made regarding the target machine + i. assumptions regarding the architecture of the target machine + ii. assumptions regarding the operating system of the target machine + iii. assumptions regarding software resident on the target machine + iv. where in the source code these assumptions are in effect made +III. A systematic approach to writing the files tm.h and xm.h + A. Macros which require special care or skill + B. Examples, with special reference to the underlying reasoning +IV. A systematic approach to writing the machine description file md + A. Minimal viable sets of insn descriptions + B. Examples, with special reference to the underlying reasoning +V. Uses of the file aux-output.c +VI. Specification of what constitutes correct performance of an + implementation of GCC + A. The components of GCC + B. The itinerary of a C program through GCC + C. A system of benchmark programs + D. What your RTL and assembler should look like with these benchmarks + E. Fine tuning for speed and size of compiled code +VII. A systematic procedure for debugging an implementation of GCC + A. Use of GDB + i. the macros in the file .gdbinit for GCC + ii. obstacles to the use of GDB + a. functions implemented as macros can't be called in GDB + B. Debugging without GDB + i. How to turn off the normal operation of GCC and access specific + parts of GCC + C. Debugging tools + D. Debugging the parser + i. how machine macros and insn definitions affect the parser + E. Debugging the recognizer + i. how machine macros and insn definitions affect the recognizer + +ditto for other components + +VIII. Data types used by GCC, with special reference to restrictions not + specified in the formal definition of the data type +IX. References to the literature for the algorithms used in GCC