--- gcc/PROJECTS 2018/04/24 16:38:49 1.1.1.2 +++ gcc/PROJECTS 2018/04/24 16:52:25 1.1.1.6 @@ -1,5 +1,22 @@ +0. Improved efficiency. + +* Parse and output array initializers an element at a time, freeing +storage after each, instead of parsing the whole initializer first and +then outputting. This would reduce memory usage for large +initializers. + 1. Better optimization. +* Constants in unused inline functions + +It would be nice to delay output of string constants so that string +constants mentioned in unused inline functions are never generated. +Perhaps this would also take care of string constants in dead code. + +The difficulty is in finding a clean way for the RTL which refers +to the constant (currently, only by an assembler symbol name) +to point to the constant and cause it to be output. + * More cse The techniques for doing full global cse are described in the @@ -21,21 +38,45 @@ It is probably not hard to handle cse fr around to the beginning, and a few loops would be greatly sped up by this. -* Iteration variables and strength reduction. +* Support more general tail-recursion among different functions. -The red dragon book describes standard techniques for these kinds -of loop optimization. But be careful! These optimization techniques -don't always make the code better. You need to avoid performing -the standard transformations unless they are greatly worth while. - -In many common cases it is possible to deduce that an iteration -variable is always positive during the loop. This information -may make it possible to use decrement-and-branch instructions -whose branch conditions are inconvenient. For example, the 68000 -`dbra' instruction branches if the value was not equal to zero. -Therefore, it is not applicable to `for (i = 10; i >= 0; i--)' -unless the compiler can know that I will never be negative -before it is decremented. +This might be possible under certain circumstances, such as when +the argument lists of the functions have the same lengths. +Perhaps it could be done with a special declaration. + +You would need to verify in the calling function that it does not +use the addresses of any local variables and does not use setjmp. + +* Put short statics vars at low addresses and use short addressing mode? +Useful on the 68000/68020 and perhaps on the 32000 series, +provided one has a linker that works with the feature. +This is said to make a 15% speedup on the 68000. +This brings to mind Hayes' changes for Stanford MIPS. + +* Detect dead stores into memory? + +A store into memory is dead if it is followed by another store into +the same location; and, in between, there is no reference to anything +that might be that location (including no reference to a variable +address). + +* Loop optimization. + +Strength reduction and iteration variable elimination could be +smarter. They should know how to decide which iteration variables are +not worth making explicit because they can be computed as part of an +address calculation. Based on this information, they should decide +when it is desirable to eliminate one iteration variable and create +another in its place. + +It should be possible to compute what the value of an iteration +variable will be at the end of the loop, and eliminate the variable +within the loop by computing that value at the loop end. + +When a loop has a simple increment that adds 1, +instead of jumping in after the increment, +decrement the loop count and jump to the increment. +This allows aob insns to be used. * Using constraints on values. @@ -79,11 +120,16 @@ all the places that use it. It might be possible to make better code by paying attention to the order in which to generate code for subexpressions of an expression. -* Better code for switch statements. +* More code motion. + +Consider hoisting common code up past conditional branches or +tablejumps. + +* Trace scheduling. -If a switch statement has only a few cases, a sequence of conditional -branches is generated for it, rather than a jump table. It would -be better to output a binary tree of branches. +This technique is said to be able to figure out which way a jump +will usually go, and rearrange the code to make that path the +faster one. * Distributive law. @@ -92,6 +138,8 @@ machines if rewritten as *(X + 4*C + 4*Y modes. It may be tricky to determine when, and for which machines, to use each alternative. +Some work has been done on this, in combine.c. + * Jump-execute-next. Many recent machines have jumps which optionally execute the following @@ -116,6 +164,25 @@ in and how long the functional unit stay goal is to reorder the instructions to keep many functional units busy but never feed them so fast they must wait. +* Can optimize by changing if (x) y; else z; into z; if (x) y; +if z and x do not interfere and z has no effects not undone by y. +This is desirable if z is faster than jumping. + +* For a two-insn loop on the 68020, such as + foo: movb a2@+,a3@+ + jne foo +it is better to insert dbeq d0,foo before the jne. +d0 can be a junk register. The challenge is to fit this into +a portable framework: when can you detect this situation and +still be able to allocate a junk register? + +* For the 80387 floating point, perhaps it would be possible to use 3 +or 4 registers in the stack to hold register variables. (It would be +necessary to keep track of how those slots move in the stack as other +pushes and pops are done.) This is probably very tricky, but if +you are a GCC wizard and you care about the speed of floating point on +an 80386, you might want to work on it. + 2. Simpler porting. Right now, describing the target machine's instructions is done @@ -147,9 +214,142 @@ within functions. Some of the mechanism 4. Generalize the machine model. -Some new compiler features may be needed to do a good job on machines +* Some new compiler features may be needed to do a good job on machines where static data needs to be addressed using base registers. -Some machines have two stacks in different areas of memory, one used +* Some machines have two stacks in different areas of memory, one used for scalars and another for large objects. The compiler does not now have a way to understand this. + +5. Precompilation of header files. + +In the future, many programs will use thousands of lines of header files. +Compiling the headers might be slower than compiling the guts of any one +source file. Here is a scheme for precompiling header files to make +compilation faster for a sequence of headers which is often used. + +A precompiled header corresponds to a sequence of header files. The +preprocessor recognizes when the input starts with a sequence of +`#include' commands and searches a data base for a precompiled header +corresponding to that sequence. The modtimes of all these files are +stored in the data base so that one can tell whether the precompiled +header is still valid. + +For robustness, each directory should have its own collection of +precompiled headers and its own data base of them. Probably each +precompiled header would be a file and the data base would be one +more file. + +The data base records the entire collection of predefined macros and +their definitions, except for __FILE__, __LINE__ and __DATE__, for +each precompiled header. If this collection does not match the setup +at the start of the current compilation (including the results of -D +and -U switches), the precompiled header is inapplicable. It might +be possible to have distinct precompiled headers for the same sequence +of header files but different collections of predefined macros. + +The state of any option that affects macro processing, such as -ansi +or -traditional, must also be recorded, and the precompiled header is +valid only if these options match. + +The precompiled header contains an ordered series of strings. Some +strings are marked "unconditional"; these must be compiled each time +the precompiled header is used. Other strings are have keys, which +are identifiers. A string with keys must be compiled if at least one +of its keys is mentioned in the input. The order these strings appear +in the precompiled header is called their intrinsic order. + +The C preprocessor reads in the precompiled header file and scan all +the strings, making for each key an entry in the same symbol table +used for macros, pointing at a list of all the strings for which it is +a key. Each string must have a flag (one flag per string, not one per +key per string). The same code in `rescan' that detects references to +macros would detect a reference to a key and flag all of the strings +that it belongs to as needing to be output. + +Each of these strings is immediately recursively macroexpanded (i.e. +`rescan' is called), but the output from this is discarded. The +expansion is to detect any other keys mentioned in the string, and to +define any macros for which the string contains a #define. The key's +symbol table entry is be deleted to save time if the key is +encountered again, and to avoid an infinite recursion. + +The unconditional strings are macroexpanded with `rescan' (but the +output is discarded) at some time before anything is actually output. + +At the end of compilation, before any of the actual input text is +output, the list of strings is scanned in the intrinsic order, and +each string that is unconditional or flagged is output verbatim, +except that any #define lines are discarded. + +Precompiled headers would be constructed by explicit request with a +special tool. The first step is to run cpp on the sequence of header +files' contents. This would use a new option that would cause all +#define lines to be output unchanged as well as defining the macro. +The second step is to divide the output into strings, some keyed and +some unconditional. This division is done without changing the order +of the text being divided up. + +JNC@lcs.mit.edu has some ideas on this subject also. + +6. Better documentation of how GCC works and how to port it. + +Here is an outline proposed by Allan Adler. + +I. Overview of this document +II. The machines on which GCC is implemented + A. Prose description of those characteristics of target machines and + their operating systems which are pertinent to the implementation + of GCC. + i. target machine characteristics + ii. comparison of this system of machine characteristics with + other systems of machine specification currently in use + B. Tables of the characteristics of the target machines on which + GCC is implemented. + C. A priori restrictions on the values of characteristics of target + machines, with special reference to those parts of the source code + which entail those restrictions + i. restrictions on individual characteristics + ii. restrictions involving relations between various characteristics + D. The use of GCC as a cross-compiler + i. cross-compilation to existing machines + ii. cross-compilation to non-existent machines + E. Assumptions which are made regarding the target machine + i. assumptions regarding the architecture of the target machine + ii. assumptions regarding the operating system of the target machine + iii. assumptions regarding software resident on the target machine + iv. where in the source code these assumptions are in effect made +III. A systematic approach to writing the files tm.h and xm.h + A. Macros which require special care or skill + B. Examples, with special reference to the underlying reasoning +IV. A systematic approach to writing the machine description file md + A. Minimal viable sets of insn descriptions + B. Examples, with special reference to the underlying reasoning +V. Uses of the file aux-output.c +VI. Specification of what constitutes correct performance of an + implementation of GCC + A. The components of GCC + B. The itinerary of a C program through GCC + C. A system of benchmark programs + D. What your RTL and assembler should look like with these benchmarks + E. Fine tuning for speed and size of compiled code +VII. A systematic procedure for debugging an implementation of GCC + A. Use of GDB + i. the macros in the file .gdbinit for GCC + ii. obstacles to the use of GDB + a. functions implemented as macros can't be called in GDB + B. Debugging without GDB + i. How to turn off the normal operation of GCC and access specific + parts of GCC + C. Debugging tools + D. Debugging the parser + i. how machine macros and insn definitions affect the parser + E. Debugging the recognizer + i. how machine macros and insn definitions affect the recognizer + +ditto for other components + +VIII. Data types used by GCC, with special reference to restrictions not + specified in the formal definition of the data type +IX. References to the literature for the algorithms used in GCC +