Annotation of researchv10no/cmd/lcc/doc/interface.tex, revision 1.1

1.1     ! root        1: \documentstyle{article}
        !             2: 
        !             3: \begin{document}
        !             4: \begin{titlepage}
        !             5: \normalsize
        !             6: \vspace*{.7in}
        !             7: \begin{center}
        !             8: \begin{tabular}{c}
        !             9: \bf\Large A Code Generation Interface for ANSI C \\[.5in]
        !            10: 
        !            11: Christopher W. Fraser \\
        !            12: \em AT\&T Bell Laboratories, 600 Mountain Avenue, \\
        !            13: \em Murray Hill, NJ 07974 \\[1ex]
        !            14: and \\[1ex]
        !            15: David R. Hanson \\
        !            16: \em Department of Computer Science, Princeton University, \\
        !            17: \em Princeton, NJ 08544 \\[.7in]
        !            18: 
        !            19: Research Report CS-TR-270-90 \\[1ex]
        !            20: July 1990 \\
        !            21: Last Revised September 1992 \\[.5in]
        !            22: 
        !            23: \end{tabular}
        !            24: \end{center}
        !            25: 
        !            26: \begin{abstract}
        !            27: \normalsize
        !            28: \verb|lcc| is a retargetable, production compiler for ANSI~C;
        !            29: it has been ported to the VAX, Motorola 68020, SPARC, and MIPS R3000,
        !            30: and some versions have been in use for over two years.
        !            31: It is smaller and faster than generally available alternatives,
        !            32: and its local code is comparable.
        !            33: This report describes the interface between the target-independent front end
        !            34: and the target-dependent back ends.
        !            35: The interface consists of shared data structures, a few functions,
        !            36: and a dag language. While this approach couples the front
        !            37: and back ends tightly, it results in efficient, compact compilers.
        !            38: The interface is illustrated by detailing a complete code generator
        !            39: that emits naive VAX code.
        !            40: \end{abstract}
        !            41: 
        !            42: \end{titlepage}
        !            43: 
        !            44: %\setcounter{secnumdepth}{0}
        !            45: %\bibliographystyle{abbrv}
        !            46: 
        !            47: \section{Introduction}
        !            48: 
        !            49: \verb|lcc| is a retargetable compiler for ANSI~C~\cite{ansi:Cstandard}.
        !            50: It has been ported to the VAX, Motorola 68020, SPARC, and MIPS R3000.
        !            51: It emits code that is comparable with that from other generally
        !            52: available C compilers, but it runs up to twice as fast and is about
        !            53: half the size~\cite{fraser:hanson:91b}.
        !            54: \verb|lcc| is in production use at Princeton University
        !            55: and AT\&T Bell Laboratories.
        !            56: 
        !            57: This paper describes the interface between the target-independent
        !            58: front end and the target-dependent back ends.\footnote{References~\cite{fraser:hanson:91a}
        !            59: and~\cite{fraser:hanson:92} were drawn from earlier versions of this report.
        !            60: This report is slightly more detailed and corresponds to the latest
        !            61: release of \verb|lcc|.}
        !            62: Good code-generation interfaces are hard to design.
        !            63: An inadequate interface may force each back end
        !            64: to do work that could otherwise be done once in the front end.
        !            65: Annotating frequently referenced variables for assignment
        !            66: to registers is an example.
        !            67: If the interface is too small, it
        !            68: may not encode enough information to exploit new
        !            69: machines thoroughly.  If the interface is too large, the back ends
        !            70: may be needlessly complicated.
        !            71: These competing demands
        !            72: require careful engineering, and re-engineering as new targets expose
        !            73: flaws. This paper reports the results of such experience.
        !            74: 
        !            75: The interface is illustrated with a {\em sample} code generator
        !            76: for the VAX.  This code generator emits naive code; i.e., it uses only
        !            77: the `RISC subset' of the VAX instruction set.  It is nevertheless
        !            78: complete:  when used with a conforming preprocessor and library, the
        !            79: compiler with this code generator passes the conformance section of
        !            80: Version 2.00 of the Plum-Hall Validation Suite for ANSI~C, except that
        !            81: it does not detect overflow in floating constants.
        !            82: The production versions of \verb|lcc| use the interface described here,
        !            83: but use back ends in which instruction selection and optimization are
        !            84: generated automatically from a compact specification~\cite{fraser:sigplan89}.%
        !            85: \footnote{The {\tt lcc} front end and a sample code generator are available
        !            86: for anonymous {\tt ftp} from {\tt princeton.edu}.
        !            87: The file {\tt README} in the directory {\tt pub/lcc} gives details.}
        !            88: 
        !            89: Unlike other interfaces, \verb|lcc|'s interface is not a monolithic
        !            90: intermediate language~\cite{tanenbaum:toplas:82}. Such interfaces
        !            91: promote decoupling between the front and back ends, but sacrifice
        !            92: compiler performance~\cite{tanenbaum:sigplan:89}. With \verb|lcc|'s
        !            93: interface, the front and back ends are tightly coupled; this approach
        !            94: yields efficient, compact compilers, but can complicate maintenance
        !            95: because changes to the front end may affect the back ends.  This
        !            96: complication is less important for standardized languages like ANSI~C
        !            97: because there will be few changes to the language.
        !            98: 
        !            99: The interface consists of a few shared data structures, 19 functions,
        !           100: most of which are very simple, and a 36-operator dag language, which
        !           101: encodes the executable code from a source program. The dag language
        !           102: corresponds to the `intermediate language' used in other compilers, but
        !           103: it is smaller than typical intermediate languages.
        !           104: The functions, which can be implemented as true functions or as macros,
        !           105: are listed in Appendix~\ref{appendix:interface} and are
        !           106: described in the sections below.
        !           107: 
        !           108: The front and back ends are clients of each other.  The front end calls
        !           109: on the back end to generate and emit code.  The back end calls on the
        !           110: front end to perform output, allocate storage, interrogate types, and
        !           111: manage nodes, symbols, and strings. These front-end services are
        !           112: performed by functions that are summarized in Appendix~\ref{appendix:functions}.
        !           113: 
        !           114: 
        !           115: \section{Configuration}
        !           116: 
        !           117: \label{configuration}
        !           118: Target-specific configuration parameters specify the widths and
        !           119: alignments of the basic datatypes and optionally define conditional
        !           120: compilation flags.  They are defined in a `header' file, \verb|config.h|,
        !           121: which is included when a target-specific \verb|lcc| is compiled.
        !           122: 
        !           123: The sample type metrics
        !           124: and conditional compilation flags are defined as follows.
        !           125: Target-specific components of the shared data structures
        !           126: are defined in Sections~\ref{symbols} and \ref{dags}.
        !           127: \begin{verbatim}
        !           128: typedef int Xinterface;
        !           129: \end{verbatim}
        !           130: 
        !           131: Type metrics are triples that give the size and alignment of the type
        !           132: in bytes, along with a flag indicating whether or not constants of that
        !           133: type can appear in instructions. A \verb|1| indicates that the
        !           134: constants {\em cannot} appear in instructions; the front end generates
        !           135: variables to hold them. The size of a type must be a multiple of its alignment.
        !           136: 
        !           137: Both the size and alignment for characters must be \verb|1|.
        !           138: Unsigned and long integers are assumed to have metrics
        !           139: identical to integer, and long doubles are assumed to have metrics
        !           140: identical to double. The front end correctly treats all of these types
        !           141: as separate, however.
        !           142: \verb|POINTER_METRICS| apply to all pointer types, and pointers
        !           143: must fit in unsigned integers.
        !           144: The alignment of a structure is the maximum of the alignments
        !           145: of its fields and \verb|STRUCT_ALIGN|.
        !           146: 
        !           147: If \verb|JUMP_ON_RETURN| is non-zero, the front end generates a jump to
        !           148: a generated label for each \verb|return| statement and defines this
        !           149: label at the end of each function.  Similar action is taken if the
        !           150: compiler is asked to generate data for a debugger, because some
        !           151: debuggers assume a common exit point.  Since the VAX does returns with
        !           152: one instruction, \verb|JUMP_ON_RETURN| is defined as \verb|0|.
        !           153: 
        !           154: By default, the front end generates code that evaluates function
        !           155: arguments from right to left;
        !           156: defining \verb|LEFT_TO_RIGHT| yields the opposite order.
        !           157: A sequence of bit fields is laid out from left to right
        !           158: in one or more unsigned values; defining \verb|LITTLE_ENDIAN|
        !           159: yields a right-to-left layout.
        !           160: Defining \verb|LITTLE_ENDIAN| for the sample
        !           161: makes the code compatible with existing VAX C compilers.
        !           162: The standard permits either argument evaluation
        !           163: or bit-field layout order, however.
        !           164: 
        !           165: The front end contains a few target-specific operations.
        !           166: These are protected by conditional compilation on \verb|VAX|,
        !           167: \verb|MIPS|, \verb|MC|, \verb|SPARC|, etc. Thus, \verb|VAX| is defined above.
        !           168: 
        !           169: \subsection{{\tt progbeg} and \tt progend}
        !           170: 
        !           171: \label{progbeg}\label{progend}
        !           172: During initialization, the front end calls
        !           173: \begin{verbatim}
        !           174: void progbeg(int argc, char *argv[])
        !           175: \end{verbatim}
        !           176: \verb|argv[0..argc-1]| point
        !           177: to those program arguments that are not recognized by the front end,
        !           178: e.g., target-specific options.
        !           179: Typical implementations of \verb|progbeg| process such
        !           180: options and initialize themselves.
        !           181: At the end of compilation, \verb|progend(void)| is called to give
        !           182: the back end an opportunity to finalize its output.
        !           183: 
        !           184: The sample \verb|progbeg| initializes a register usage mask
        !           185: \begin{verbatim}
        !           186: void progbeg(int argc, char *argv[]) {
        !           187:    rmask = ((~0)<<12)|1;
        !           188: }
        !           189: \end{verbatim}
        !           190: which is described in Section~\ref{dags:registers}.
        !           191: Finalization is unnecessary in the sample,
        !           192: so \verb|progend| is an empty function:
        !           193: \begin{verbatim}
        !           194: static void progend(void) {
        !           195: }
        !           196: \end{verbatim}
        !           197: All back-end functions except \verb|address| and \verb|gen| return nothing.
        !           198: 
        !           199: 
        !           200: \section{Symbols}
        !           201: 
        !           202: \label{symbols}
        !           203: The front and back ends share two major data structures: symbol table
        !           204: entries and dag nodes. Dag nodes are described in Section~\ref{dags}.
        !           205: Symbol table entries are used for variables, constants, and labels.
        !           206: They are represented by pointers to the following structure.
        !           207: \begin{verbatim}
        !           208: typedef struct symbol *Symbol;
        !           209: struct symbol {        /* symbol table entries: */
        !           210:    Xsymbol x;             /* type extension for code generator */
        !           211:    char *name;            /* name */
        !           212:    unsigned char scope;   /* scope level */
        !           213:    unsigned char sclass;  /* storage class */
        !           214:    unsigned defined:1;    /* 1 if defined */
        !           215:    unsigned temporary:1;  /* 1 if a temporary */
        !           216:    unsigned generated:1;  /* 1 if a generated identifier */
        !           217:    unsigned addressed:1;  /* 1 if its address is taken */
        !           218:    unsigned structarg:1;  /* 1 if parameter is a struct */
        !           219:    Type type;             /* data type */
        !           220:    union {
        !           221:       struct {           /* labels: */
        !           222:          int label;          /* label number */
        !           223:       } l;
        !           224:       struct {            /* constants: */
        !           225:          Value v;            /* value */
        !           226:          Symbol loc;         /* out-of-line location */
        !           227:       } c;
        !           228:       int seg;            /* globals, statics: definition segment */
        !           229:    } u;
        !           230: };
        !           231: \end{verbatim}
        !           232: The \verb|scope|, \verb|sclass|, and \verb|type| fields give the symbol's
        !           233: scope level, its storage class, and its type, respectively.
        !           234: Most of the bit fields flag self-explanatory attributes for each symbol;
        !           235: fields relevant only to the front end are elided above.
        !           236: 
        !           237: Scope values classify symbols as constants, labels, or variables.
        !           238: For labels, constants, and some variables,
        !           239: a field of the union \verb|u| supplies additional data.
        !           240: 
        !           241: Labels have a \verb|scope| equal to the enumeration constant \verb|LABELS|.
        !           242: The \verb|u.l.label| field is the numeric value of the label,
        !           243: and \verb|name| is the string representation of that value.
        !           244: Labels have no \verb|type| or \verb|sclass|.
        !           245: 
        !           246: Constants have a scope equal to \verb|CONSTANTS|, a \verb|sclass| equal
        !           247: to \verb|STATIC|, and a \verb|name|
        !           248: equal to the string representation of the constant.
        !           249: The actual value of the constant is stored in the \verb|u.c.v| field,
        !           250: which is defined by
        !           251: \begin{verbatim}
        !           252: typedef union value {   /* constant values: */
        !           253:    char sc;                /* signed */
        !           254:    short ss;               /* signed */
        !           255:    int i;                  /* signed */
        !           256:    unsigned char uc;
        !           257:    unsigned short us;
        !           258:    unsigned int u;
        !           259:    float f;
        !           260:    double d;
        !           261:    char *p;                /* pointer to anything */
        !           262: } Value;
        !           263: \end{verbatim}
        !           264: If a variable is generated to hold the constant,
        !           265: \verb|u.c.loc| points to the symbol-table entry for that variable.
        !           266: 
        !           267: Variables have a \verb|scope| equal to \verb|GLOBAL|, \verb|PARAM|,
        !           268: or \verb|LOCAL|$+k$ for nesting level $k$.
        !           269: \verb|sclass| is \verb|STATIC|, \verb|AUTO|, \verb|EXTERN|, or \verb|REGISTER|.
        !           270: The \verb|name| of most variables is the name used in the source code.
        !           271: For temporaries and other generated variables, \verb|name| is a digit sequence.
        !           272: \verb|temporary| and \verb|generated| are set for temporaries,
        !           273: and \verb|generated| is set for labels and other generated variables, e.g., those
        !           274: that hold constants.
        !           275: \verb|structarg| is described in Section~\ref{emit}.
        !           276: For global and static variables,
        !           277: \verb|u.seg| gives the logical segment in which the variable is defined
        !           278: (see Section~\ref{segment}).
        !           279: 
        !           280: The \verb|type| field for constants and variables points to
        !           281: a structure that describes types. The \verb|size| and \verb|align|
        !           282: fields of this structure give, respectively, the size and alignment constraints
        !           283: of the type in bytes.
        !           284: 
        !           285: The \verb|x| field is an `extension' in which the back end stores
        !           286: target-specific data for the symbol.
        !           287: The sample has only two fields:
        !           288: \begin{verbatim}
        !           289: typedef struct {
        !           290:    char *name;    /* name for back end */
        !           291:    int offset;    /* frame offset */
        !           292: } Xsymbol;
        !           293: \end{verbatim}
        !           294: This definition appears in the configuration file.
        !           295: \verb|p->name| identifies the symbol to the front end, but the back end
        !           296: may need to emit a different `name'. For example,
        !           297: the `name' for locals is usually an offset from a frame pointer.
        !           298: \verb|p->x.name| is the back end's name for the symbol.
        !           299: For parameters and locals, \verb|p->x.offset| is the
        !           300: offset from the frame pointer (see Sections~\ref{function} and~\ref{local}).
        !           301: 
        !           302: \subsection{\tt defsymbol}
        !           303: 
        !           304: \label{defsymbol}
        !           305: Whenever the front end defines a new symbol with
        !           306: \verb|scope| \verb|CONSTANTS|, \verb|LABELS|, or \verb|GLOBAL|
        !           307: or a static variable, it calls
        !           308: \verb|defsymbol(Symbol p)| to give the back end an opportunity to initialize
        !           309: its \verb|Xsymbol| fields. The sample's \verb|defsymbol| is
        !           310: \begin{verbatim}
        !           311: static void defsymbol(Symbol p) {
        !           312:    if (p->scope == CONSTANTS)
        !           313:       p->x.name = p->name;
        !           314:    else if (p->scope >= LOCAL && p->sclass == STATIC)
        !           315:       p->x.name = stringf("L%d", genlabel(1));
        !           316:    else if (p->generated)
        !           317:       p->x.name = stringf("L%s", p->name);
        !           318:    else
        !           319:       p->x.name = stringf("_%s", p->name);
        !           320: }
        !           321: \end{verbatim}
        !           322: The back end's name for a constant is just the constant itself.
        !           323: For generated symbols including labels, \verb|x.name| is the front end's name,
        !           324: which is a digit string, prefixed with an \verb|L|,
        !           325: and an underscore prefixes the names of other symbols.
        !           326: \verb|stringf| returns a pointer to a string formatted as specified
        !           327: by its \verb|printf|-style arguments.
        !           328: 
        !           329: For \verb|scope| \verb|PARAM| and  \verb|LOCAL|$+k$,
        !           330: the \verb|Xsymbol| fields are initialized
        !           331: by \verb|function| (see Section~\ref{function})
        !           332: and \verb|local| (see Section~\ref{local}), respectively,
        !           333: and symbols that represent address computations
        !           334: are initialized by \verb|address| (see Section~\ref{address}).
        !           335: 
        !           336: \subsection{{\tt import} and \tt export}
        !           337: 
        !           338: \label{import}\label{export}
        !           339: A symbol can be exported or imported.
        !           340: Non-static variables and functions are exported
        !           341: in order to be available to other,
        !           342: separately compiled modules. Likewise, variables and functions used
        !           343: in one module, but defined in another one, are imported
        !           344: in the former module.
        !           345: The front end identifies an exported or imported symbol
        !           346: by calling \verb|export(Symbol p)| or \verb|import(Symbol p)|. \verb|export|
        !           347: is always called {\em before} the symbol is defined;
        !           348: \verb|import|, however, may be called any time, before or after the symbol
        !           349: is used.
        !           350: 
        !           351: The sample back end emits an assembler directive for \verb|export|,
        !           352: but no output is required for \verb|import|:
        !           353: \begin{verbatim}
        !           354: static void export(Symbol p) {
        !           355:    print(".globl %s\n", p->x.name);
        !           356: }
        !           357: 
        !           358: static void import(Symbol p) {
        !           359: }
        !           360: \end{verbatim}
        !           361: \verb|print| is a front-end function similar to the standard \verb|printf|.
        !           362: 
        !           363: \subsection{\tt segment}
        !           364: 
        !           365: \label{segment}
        !           366: The front end manages four logical segments:
        !           367: \verb|CODE|, \verb|BSS|, \verb|DATA|, and \verb|LIT|.
        !           368: Executable code is emitted into the \verb|CODE| segment,
        !           369: uninitialized variables are defined in the \verb|BSS| segment,
        !           370: initialized variables are defined and initialized in the \verb|DATA| segment,
        !           371: and constants appear in the \verb|LIT| segment.
        !           372: 
        !           373: The front end announces a segment change by calling
        !           374: \verb|segment(int s)| where \verb|s| is one of the segments listed above.
        !           375: \verb|segment| maps the logical segments onto the segments
        !           376: provided by the target machine. The \verb|CODE| and \verb|LIT| segments
        !           377: can be mapped to read-only segments; the others must be mapped to read/write segments.
        !           378: The sample mapping is
        !           379: \begin{verbatim}
        !           380: static void segment(int s) {
        !           381:    switch (s) {
        !           382:    case CODE: print(".text\n");   break;
        !           383:    case  LIT: print(".text 1\n"); break;
        !           384:    case DATA:
        !           385:    case  BSS: print(".data\n");   break;
        !           386:    }
        !           387: }
        !           388: \end{verbatim}
        !           389: 
        !           390: \subsection{\tt global}
        !           391: 
        !           392: \label{global}
        !           393: \verb|global(Symbol p)| emits code to define a global variable.
        !           394: The front end will have already directed the definition
        !           395: to the appropriate logical segment by calling \verb|segment| and
        !           396: set \verb|p->u.seg| to that segment,
        !           397: and it will follow the call to \verb|global|
        !           398: with any appropriate calls to the data initialization functions.
        !           399: \verb|global| handles the necessary alignment adjustments and
        !           400: the actual definition.
        !           401: 
        !           402: The sample definition for global \verb|y| is simply
        !           403: \verb|y|'s \verb|x.name| field preceded by an alignment directive, if necessary:
        !           404: \begin{verbatim}
        !           405: static void global(Symbol p) {
        !           406:    switch (p->type->align) {
        !           407:    case 2: print(".align 1; "); break;
        !           408:    case 4: print(".align 2; "); break;
        !           409:    case 8: print(".align 3; "); break;
        !           410:    }
        !           411:    print("%s:", p->x.name);
        !           412: }
        !           413: \end{verbatim}
        !           414: 
        !           415: \subsection{\tt defconst}
        !           416: 
        !           417: \label{defconst}
        !           418: 
        !           419: \verb|defconst(int ty, Value v)| emits the scalar \verb|v|.
        !           420: \verb|ty| indicates which field of
        !           421: \verb|v| is to be emitted according the following table.
        !           422: 
        !           423: \begin{center}
        !           424: \begin{tabular}{lll}
        !           425: \tt ty & \tt v \it field       & \it type \\ \hline
        !           426: \tt C  & \tt v.uc              & character \\
        !           427: \tt S  & \tt v.us              & short \\
        !           428: \tt I  & \tt v.i               & int \\
        !           429: \tt U  & \tt v.u               & unsigned \\
        !           430: \tt P  & \tt v.p               & any pointer type \\
        !           431: \tt F  & \tt v.f               & float \\
        !           432: \tt D  & \tt v.d               & double \\
        !           433: \end{tabular}
        !           434: \end{center}
        !           435: The codes \verb|S|, \verb|I|, \ldots\ are identical to the type suffixes
        !           436: used for the dag operators, which are described in Section~\ref{dags}.
        !           437: The signed fields \verb|v.sc| and \verb|v.ss| can be used instead
        !           438: of \verb|v.uc| and \verb|v.us|, but \verb|defconst|
        !           439: must initialize only the specified number of bits.
        !           440: 
        !           441: The sample \verb|defconst| is
        !           442: \begin{verbatim}
        !           443: static void defconst(int ty, Value v) {
        !           444:    switch (ty) {
        !           445:    case C: print(".byte %d\n",   v.uc); break;
        !           446:    case S: print(".word %d\n",   v.us); break;
        !           447:    case I: print(".long %d\n",   v.i ); break;
        !           448:    case U: print(".long 0x%x\n", v.u ); break;
        !           449:    case P: print(".long 0x%x\n", v.p ); break;
        !           450:    case F:
        !           451:       print(".long 0x%x\n", ((unsigned *) &v.f)[0]);
        !           452:       break;
        !           453:    case D: 
        !           454:       print(".long 0x%x,0x%x\n", ((unsigned *) &v.d)[0],
        !           455:          ((unsigned *) &v.d)[1]);
        !           456:       break;
        !           457:    }
        !           458: }
        !           459: \end{verbatim}
        !           460: In the production compilers, \verb|defconst| accommodates cross-compilation,
        !           461: so it corrects for different representations and byte orders.
        !           462: 
        !           463: If \verb|ty| is \verb|P|, \verb|v.p| holds a numeric constant of some pointer type.
        !           464: These originate from declarations like \verb|char *p=(char *)0xF0|.
        !           465: \verb|defaddress| emits addresses relative to a symbol.
        !           466: 
        !           467: Few ANSI~C compilers can leave the encoding of floating-point constants
        !           468: to the assembler, because few assemblers can cope with the effect of
        !           469: casts on these constants.  For example, the correct initialization for
        !           470: \begin{verbatim}
        !           471: double x = (float)0.3;
        !           472: \end{verbatim}
        !           473: is \verb|.long 0x999a3f99,0x0|.
        !           474: The directive \verb|.double 0.3| would erroneously initialize \verb|x|
        !           475: to the equivalent of
        !           476: \begin{verbatim}
        !           477: .long 0x99993f99,0x0999a9999
        !           478: \end{verbatim}
        !           479: because it cannot represent the effect of the cast.
        !           480: 
        !           481: 
        !           482: \subsection{{\tt defstring}, {\tt defaddress}, and \tt space}
        !           483: 
        !           484: \label{defstring}
        !           485: \label{defaddress}
        !           486: \label{space}
        !           487: 
        !           488: \verb|defstring(int len, char *s)| emits code to initialize
        !           489: a string of length \verb|len| to the characters in \verb|s|.
        !           490: The front end converts escape sequences, like \verb|\n|, into
        !           491: the corresponding ASCII characters.
        !           492: 
        !           493: \verb|defaddress(Symbol p)| emits the address denoted by \verb|p|.
        !           494: \verb|space(int n)| emits code to allocate \verb|n| zero bytes.
        !           495: 
        !           496: The sample \verb|defstring|, \verb|defaddress|, and \verb|space| are
        !           497: \begin{verbatim}
        !           498: static void defstring(int len, char *s) {
        !           499:    while (len-- > 0)
        !           500:       print(".byte %d\n", *s++);
        !           501: }
        !           502: 
        !           503: static void defaddress(Symbol p) {
        !           504:    print(".long %s\n", p->x.name);
        !           505: }
        !           506: 
        !           507: static void space(int n) {
        !           508:    print(".space %d\n", n);
        !           509: }
        !           510: \end{verbatim}
        !           511: 
        !           512: 
        !           513: \section{Functions}
        !           514: 
        !           515: The front end completely consumes each function before passing any part
        !           516: of the function to the back end.  This organization permits certain
        !           517: optimizations.  For example, only by processing complete functions can
        !           518: the front end identify the locals and parameters whose address is not taken; only
        !           519: these variables may be assigned to registers.
        !           520: 
        !           521: \subsection{\tt function}
        !           522: 
        !           523: \label{function}
        !           524: 
        !           525: The front end accumulates functions into private data structures.  At
        !           526: the end of each function, it calls \verb|function| to generate and emit
        !           527: code. The typical form of \verb|function| is
        !           528: \begin{flushleft}\tt
        !           529: \verb|void function(Symbol f, Symbol caller[], Symbol callee[], int ncalls) {| \\
        !           530: \ \ \ \ldots {\rm initialize} \\
        !           531: \ \ \ gencode(caller, callee); \\
        !           532: \ \ \ \ldots {\rm emit prologue} \\
        !           533: \ \ \ emitcode(); \\
        !           534: \ \ \ \ldots {\rm emit epilogue} \\
        !           535: \verb|}|
        !           536: \end{flushleft}
        !           537: \verb|gencode| is a front-end routine that traverses the front end's
        !           538: private structures and passes each dag to the back end's \verb|gen|
        !           539: (see Section~\ref{gen}), which selects code, annotates the dag to
        !           540: record its selection, and returns a dag pointer. \verb|emitcode| is a
        !           541: front-end routine that traverses the private structures again and
        !           542: passes each of the pointers from \verb|gen| to \verb|emit| (see
        !           543: Section~\ref{emit}), which emits the code.
        !           544: 
        !           545: This organization offers the back end flexibility in
        !           546: generating function prologue and epilogue code.
        !           547: Before calling \verb|gencode|, \verb|function| initializes
        !           548: the \verb|Xsymbol| fields of the function's parameters, as described
        !           549: below, and does other per-function initializations, if necessary.
        !           550: After calling \verb|gencode|, the size of the activation record,
        !           551: or {\em frame}, the number of registers used, etc.\ are known;
        !           552: this information is usually needed to emit the prologue.
        !           553: After calling \verb|emitcode| to emit the code for the body of the function,
        !           554: \verb|function| emits the epilogue.
        !           555: 
        !           556: The argument \verb|f| to \verb|function| is the pointer
        !           557: to the symbol-table entry for the current function,
        !           558: and \verb|ncalls| is the number of calls to other functions
        !           559: made by the current function. \verb|ncalls| is useful
        !           560: on targets like the SPARC where `leaf' functions get special treatment.
        !           561: 
        !           562: \verb|caller| and \verb|callee| are arrays of pointers to symbol
        !           563: table entries; each is terminated with a zero pointer.
        !           564: The symbols in \verb|caller| are the function parameters
        !           565: as passed by a caller; those in \verb|callee| are the parameters
        !           566: as seen within the function. For most functions,
        !           567: the symbols in each array are the same, but they
        !           568: can differ in both \verb|sclass| and \verb|type|.
        !           569: For example, in
        !           570: \begin{verbatim}
        !           571: foo(x) float x; { ... }
        !           572: \end{verbatim}
        !           573: a call to \verb|foo| passes the actual argument as a double.
        !           574: Within \verb|foo|, \verb|x| is a float.
        !           575: Thus, \verb|caller[0]->type| refers to `double' and \verb|callee[0]->type|
        !           576: refers to `float.' And in
        !           577: \begin{verbatim}
        !           578: int strlen(register char *s) { ... }
        !           579: \end{verbatim}
        !           580: \verb|caller[0]->sclass| is \verb|AUTO| and \verb|callee[0]->sclass| is \verb|REGISTER|.
        !           581: Even without register declarations, the front end assigns
        !           582: frequently referenced parameters to the \verb|REGISTER| class,
        !           583: and \verb|callee|'s \verb|sclass| is set accordingly.
        !           584: This assignment is made only when there are no explicit register {\em locals}
        !           585: to avoid interfering with the programmer's intentions (see Section~\ref{local}).
        !           586: 
        !           587: \verb|caller| and \verb|callee| are passed to \verb|gencode|.
        !           588: If \verb|caller[i]->type| is not equal to \verb|callee[i]->type|
        !           589: or if \verb|caller[i]->sclass| is not equal to \verb|callee[i]->sclass|,
        !           590: \verb|gencode| generates an assignment of \verb|caller[i]|
        !           591: to \verb|callee[i]|. If the types are not equal, this assignment
        !           592: may include a conversion; for example, the assignment to \verb|x| in
        !           593: \verb|foo| includes a truncation of a double to a float.
        !           594: For parameters that include register declarations, \verb|function|
        !           595: must assign a register and initialize the \verb|x| field accordingly,
        !           596: or change the \verb|callee|'s \verb|sclass| to \verb|AUTO|.
        !           597: 
        !           598: \verb|function| could also change \verb|callee[i]->sclass| from \verb|AUTO|
        !           599: to \verb|REGISTER| if it wished to assign a register to that parameter.
        !           600: On the MIPS, for example, some of the parameters are passed in registers,
        !           601: so \verb|function| assigns those registers to the corresponding
        !           602: \verb|callee|s in leaf functions. If, however, \verb|callee[i]->addressed| is set,
        !           603: the address of the parameter is taken in the function body,
        !           604: and it must be stored in memory on most machines.
        !           605: 
        !           606: Initialization of the \verb|Xsymbol| fields of the symbols
        !           607: in \verb|caller| and \verb|callee| depends on the frame layout,
        !           608: which is target specific. Figure~\ref{fig:frame} shows the layout
        !           609: of the VAX frame. The stack grows towards
        !           610: lower addresses and towards the top of the page.
        !           611: 
        !           612: \begin{figure}
        !           613: \begin{center}
        !           614: \setlength{\unitlength}{12pt}
        !           615: \begin{picture}(10,16)
        !           616: \thinlines
        !           617: 
        !           618: \put( 0,14){\makebox(10, 2){argument build area}}
        !           619:        \put( 0,14){\line( 1, 0){10}}
        !           620:        \put(13,15.5){\vector(-1,0){3}}\put(13,15){\makebox(1.5,1){\tt sp}}
        !           621: \put( 0,10){\makebox(10, 4){locals}}
        !           622:        \put( 0,10){\line( 1, 0){10}}
        !           623: \put( 0, 9){\makebox(10, 1){\tt 0}}
        !           624:        \put( 0, 9){\line( 1, 0){10}}
        !           625:        \put(13, 9.5){\vector(-1,0){3}}\put(13, 9){\makebox(1.5,1){\tt fp}}
        !           626: \put( 0, 8){\makebox(10, 1){\sc psw}}
        !           627:        \put( 0, 8){\line( 1, 0){10}}
        !           628: \put( 0, 7){\makebox(10, 1){previous \tt ap}}
        !           629:        \put( 0, 7){\line( 1, 0){10}}
        !           630: \put( 0, 6){\makebox(10, 1){previous \tt fp}}
        !           631:        \put( 0, 6){\line( 1, 0){10}}
        !           632: \put( 0, 5){\makebox(10, 1){previous \tt pc}}
        !           633:        \put( 0, 5){\line( 1, 0){10}}
        !           634: \put( 0, 3){\makebox(10, 2){saved registers}}
        !           635:        \put( 0, 3){\line( 1, 0){10}}
        !           636: \put( 0, 2){\makebox(10, 1){argument count}}
        !           637:        \put( 0, 2){\line( 1, 0){10}}
        !           638:        \put(13, 2.5){\vector(-1,0){3}}\put(13, 2){\makebox(1.5,1){\tt ap}}
        !           639: \put( 0, 0){\makebox(10, 2){actual arguments}}
        !           640: 
        !           641: \thicklines
        !           642: \put( 0, 0){\line( 1, 0){10}}
        !           643: \put( 0, 0){\line( 0, 1){16}}
        !           644: \put(10,16){\line( 0,-1){16}}
        !           645: \put(10,16){\line(-1, 0){10}}
        !           646: 
        !           647: \end{picture}
        !           648: \end{center}
        !           649: \caption{VAX Frame Layout.\label{fig:frame}}
        !           650: \end{figure}
        !           651: 
        !           652: Arguments are referenced by displacement-mode addressing
        !           653: with positive offsets from register \verb|ap|, so
        !           654: the first argument is at address \verb|4(ap)|. Locals are referenced
        !           655: via negative offsets from \verb|fp|, e.g., the first local is at \verb|-4(fp)|.
        !           656: The `argument build area' is used to store arguments to functions that are
        !           657: called by the current function. The front end `un-nests' calls so that
        !           658: the back end does not need to deal with nested calls. The argument build area
        !           659: can thus be used for all calls and must be large enough to hold the largest
        !           660: argument list. When a function is called, the caller's argument build area
        !           661: becomes the callee's `actual arguments'.
        !           662: 
        !           663: Typical VAX calling sequences can handle nested calls,
        !           664: so using an argument build area is not strictly necessary.
        !           665: But other targets, such as the MIPS, require this approach,
        !           666: so it's used here to illustrate the technique. This approach also
        !           667: has the advantage that stack overflow can occur only at function
        !           668: entry, which is useful on targets that require explicit
        !           669: prologue code to detect stack overflow.
        !           670: 
        !           671: The sample version of \verb|function| is
        !           672: \begin{verbatim}
        !           673: static int framesize;     /* size of activation record */
        !           674: static int offset;        /* current frame offset */
        !           675: static int argbuildsize;  /* size of argument build area */
        !           676: 
        !           677: \end{verbatim}
        !           678: \begin{verbatim}
        !           679: static void function(Symbol f, Symbol caller[],
        !           680:    Symbol callee[], int ncalls) {
        !           681:    int i;
        !           682: 
        !           683:    offset = 4;
        !           684:    for (i = 0; caller[i] && callee[i]; i++) {
        !           685:       offset = roundup(offset, caller[i]->type->align);
        !           686:       callee[i]->x.offset = caller[i]->x.offset = offset;
        !           687:       callee[i]->x.name = caller[i]->x.name = stringf("%d(ap)", offset);
        !           688:       offset += caller[i]->type->size;
        !           689:       callee[i]->sclass = AUTO;
        !           690:    }
        !           691:    usedmask = argbuildsize = framesize = offset = 0;
        !           692:    gencode(caller, callee);
        !           693:    print("%s:.word 0x%x\n", f->x.name, usedmask&~0x3f);
        !           694:    framesize += 4*nregs + argbuildsize;
        !           695:    print("subl2 $%d,sp\n", framesize);
        !           696:    if (isstruct(freturn(f->type)))
        !           697:       print("movl r1,-4(fp)\n");
        !           698:    emitcode();
        !           699:    if (glevel > 1)
        !           700:       print("ret\n");
        !           701: }
        !           702: \end{verbatim}
        !           703: The VAX \verb|calls| instruction saves the general registers specified
        !           704: by the entry mask, \verb|ap|, \verb|fp|,
        !           705: and the return address, \verb|pc|, as shown in the frame figure above.
        !           706: \verb|function| computes the size of the locals and argument build area,
        !           707: given by \verb|framesize| and \verb|argbuildsize|, respectively.
        !           708: 
        !           709: The first part of \verb|function| initializes the \verb|x.offset| and \verb|x.name| fields
        !           710: of each \verb|caller| and \verb|callee| symbol to the appropriate offset and name, respectively.
        !           711: The running \verb|offset| is rounded up to the alignment for each argument.
        !           712: \verb|roundup(n,m)| is a front-end macro that returns \verb|n| rounded up to the
        !           713: next multiple of \verb|m|. Depending on the type metrics, the size of an
        !           714: argument may not be a multiple of longwords (e.g., 3-byte structures),
        !           715: but the front end ensures that the minimum alignment for each \verb|caller[i]|
        !           716: is that for integers, which keeps the VAX stack longword-aligned.
        !           717: \verb|stringd(n)| is a front-end function that returns the string
        !           718: representation of the integer \verb|n|.
        !           719: 
        !           720: This version of \verb|function| does not support register
        !           721: declarations, so each \verb|callee|'s storage \verb|sclass| is
        !           722: set to \verb|AUTO|.
        !           723: 
        !           724: During code generation, \verb|argbuildsize| is increased when
        !           725: code for calls is generated (see Section~\ref{gen}), \verb|offset| is adjusted in response
        !           726: to the definition of locals and block boundaries,
        !           727: and \verb|framesize| records \verb|offset|'s maximum
        !           728: (see Sections~\ref{local} and \ref{blockbeg}).
        !           729: Before calling \verb|gencode|, \verb|function| clears
        !           730: \verb|usedmask|, which records the registers used in the function body,
        !           731: and clears \verb|argbuildsize|, \verb|framesize|, and \verb|offset|.
        !           732: 
        !           733: After \verb|gencode| returns, \verb|usedmask| and \verb|framesize|
        !           734: hold the information needed to generate the prologue.
        !           735: \verb|framesize| is adjusted to include the argument build area
        !           736: and space for saving all of the registers.
        !           737: This space, which is in addition to that specified by the register save mask,
        !           738: is used to save registers across those instructions that
        !           739: destroy fixed registers, e.g., \verb|movc3|.
        !           740: The \verb|subl2| instruction allocates the remainder of the frame.
        !           741: 
        !           742: The last instruction of the prologue is emitted only for
        !           743: functions that return structures. For a function type \verb|ty|,
        !           744: \verb|freturn(ty)| gives the type of the value returned by the function,
        !           745: and \verb|isstruct(ty)| is true if \verb|ty| is a structure or union type.
        !           746: Section~\ref{dags:calls} gives details on returning structures.
        !           747: 
        !           748: If \verb|JUMP_ON_RETURN| had been defined as \verb|1| in the configuration,
        !           749: returns would have been followed by a jump to the label
        !           750: that follows the code emitted by \verb|emitcode|. If the front
        !           751: end is passed the \verb|-g|$n$ option, it sets the global
        !           752: variable \verb|glevel| to $n$
        !           753: and behaves as if \verb|JUMP_ON_RETURN| is \verb|1|, so the code
        !           754: above emits the necessary \verb|ret| instruction at the end
        !           755: of each function.
        !           756: 
        !           757: \subsection{\tt local}
        !           758: 
        !           759: \label{local}
        !           760: 
        !           761: During the execution of \verb|gencode|, the front end announces local
        !           762: variables by calling \verb|local(Symbol p)|, where \verb|p| points the
        !           763: relevant symbol-table entry.  It announces temporaries likewise; these
        !           764: have \verb|p->temporary| set. \verb|local| must initialize \verb|p|'s
        !           765: \verb|Xsymbol| fields. That is, it must set \verb|p->x.offset| and \verb|p->x.name|
        !           766: so that they identify a
        !           767: stack offset or register number, depending on the availability of
        !           768: registers and on the value of \verb|p->sclass|, which is \verb|AUTO| or
        !           769: \verb|REGISTER|.
        !           770: 
        !           771: For each block, the front end first announces locals with explicit
        !           772: register declarations, in order of declaration, to permit programmer
        !           773: control of register assignment. Then it announces the rest, starting
        !           774: with those that appear to be most frequently referenced.  It assigns
        !           775: \verb|REGISTER| class to even these locals {\em if} their address
        !           776: is never taken and if their estimated frequency of use exceeds two.
        !           777: This announcement order and \verb|sclass| override collaborate to put the most
        !           778: promising locals in registers even if no registers were declared.  As
        !           779: with parameters, \verb|local| could assign a register to \verb|p| and
        !           780: change \verb|p->sclass| from \verb|AUTO| to \verb|REGISTER|, but it
        !           781: should do so only if \verb|p->addressed| is not set.
        !           782: 
        !           783: If \verb|p->sclass| is \verb|REGISTER|, \verb|local|
        !           784: can decline to allocate a register by setting \verb|p->sclass| to \verb|AUTO|
        !           785: and initializing \verb|p->x.offset| and \verb|p->x.name| to the appropriate frame offset
        !           786: and address string, respectively.
        !           787: This choice is illustrated by the sample version:
        !           788: \begin{verbatim}
        !           789: static void local(Symbol p) {
        !           790:    offset = roundup(offset + p->type->size, p->type->align);
        !           791:    offset = roundup(offset, 4);
        !           792:    p->x.offset = -offset;
        !           793:    p->x.name = stringf("%d(fp)", -offset);
        !           794:    p->sclass = AUTO;
        !           795: }
        !           796: \end{verbatim}
        !           797: The second \verb|roundup| keeps \verb|offset| and hence the stack
        !           798: aligned on longwords, as described above.
        !           799: 
        !           800: \subsection{\tt address}
        !           801: 
        !           802: \label{address}
        !           803: The front end calls \verb|address(Symbol q, Symbol p, int n)| to
        !           804: initialize \verb|q->x| to a symbol that represents an address of the form $x+{\tt n}$,
        !           805: where $x$ is the address represented by \verb|p| and \verb|n| is positive or negative.
        !           806: Like \verb|defsymbol|, \verb|address| initializes \verb|q->x|, but
        !           807: does so based on the values of \verb|p->x| and \verb|n|.
        !           808: The sample \verb|address| is
        !           809: \begin{verbatim}
        !           810: static void address(Symbol q, Symbol p, int n) {
        !           811:    if (p->scope == GLOBAL || p->sclass == STATIC || p->sclass == EXTERN)
        !           812:       q->x.name = stringf("%s%s%d", p->x.name, n >= 0 ? "+" : "", n);
        !           813:    else {
        !           814:       q->x.offset = p->x.offset + n;
        !           815:       q->x.name = stringf("%d(%s)", q->x.offset,
        !           816:          p->scope == PARAM ? "ap" : "fp");
        !           817:    }
        !           818: }
        !           819: \end{verbatim}
        !           820: which computes \verb|q->x.offset| and \verb|q->x.name| for locals and parameters,
        !           821: or sets \verb|q->x.name| to \verb|p->x.name| concatenated with \verb|+n| or \verb|-n|
        !           822: for other variables. 
        !           823: 
        !           824: For example, in
        !           825: \begin{verbatim}
        !           826: struct node { struct node *link; int count; } a;
        !           827: f() { int b[10]; b[4] = a.count; ... }
        !           828: \end{verbatim}
        !           829: suppose \verb|a| and \verb|b| point to the symbol-table entries
        !           830: for \verb|a| and \verb|b|, respectively.
        !           831: \verb|a->x.name| is set to \verb|"_a"| by \verb|defsymbol|, and
        !           832: \verb|b->x.offset| and \verb|b->x.name| are set to, respectively,
        !           833: \verb|-40| \verb|"-40(fp)"| by \verb|local|.
        !           834: \verb|address(q1,a,4)| is called with \verb|q1| representing the address
        !           835: of \verb|a.count|, and \verb|q1->x.name| is set to \verb|"_a+4"|.
        !           836: Likewise, \verb|address(q2,b,16)| sets \verb|q2->x.offset|
        !           837: and \verb|q2->x.name| to, respectively, \verb|-24| and \verb|"-24(fp)"|,
        !           838: which together denote the address of \verb|b[4]|.
        !           839: 
        !           840: \verb|address| accepts globals, parameters, and locals,
        !           841: and is called only after these symbols have been initialized
        !           842: by {\tt defsymbol}, {\tt function}, or {\tt local}.
        !           843: 
        !           844: \subsection{{\tt blockbeg} and \tt blockend}
        !           845: 
        !           846: \label{blockbeg}\label{blockend}
        !           847: Source-language blocks bracket the lifetime of locals.
        !           848: \verb|gencode| announces the beginning and end of a block
        !           849: by calling \verb|blockbeg(Env *e)| and \verb|blockend(Env *e)|, respectively.
        !           850: \verb|Env| is target specific and typically includes
        !           851: the data necessary to reuse that portion of the local frame
        !           852: space associated with the block
        !           853: and to release any registers assigned to locals within the block.
        !           854: The sample \verb|Env| is
        !           855: \begin{verbatim}
        !           856: typedef struct {
        !           857:    unsigned rmask;
        !           858:    int offset;
        !           859: } Env;
        !           860: \end{verbatim}
        !           861: The fields save the values of \verb|rmask| and \verb|offset| at the beginning
        !           862: of a block so that they can be restored on the end of the block.
        !           863: The sample \verb|blockbeg| and \verb|blockend| are thus
        !           864: 
        !           865: \noindent
        !           866: \begin{minipage}[t]{.5\textwidth}
        !           867: \vspace{\parskip}
        !           868: \begin{verbatim}
        !           869: static void blockbeg(Env *e) {
        !           870:    e->rmask = rmask;
        !           871:    e->offset = offset;
        !           872: }
        !           873: \end{verbatim}
        !           874: \vspace{\parskip}
        !           875: \end{minipage}
        !           876: \begin{minipage}[t]{.5\textwidth}
        !           877: \vspace{\parskip}
        !           878: \begin{verbatim}
        !           879: static void blockend(Env *e) {
        !           880:    if (offset > framesize)
        !           881:       framesize = offset;
        !           882:    offset = e->offset;
        !           883:    rmask = e->rmask;
        !           884: }
        !           885: \end{verbatim}
        !           886: \vspace{\parskip}
        !           887: \end{minipage}
        !           888: 
        !           889: \noindent
        !           890: \verb|blockend| also updates \verb|framesize| if the locals for the current
        !           891: block require more space than previous blocks.
        !           892: The sample could do without the \verb|rmask| field, but
        !           893: if its \verb|local| assigned registers to locals, it would need
        !           894: the field to release those registers.
        !           895: 
        !           896: Temporaries --- locals with \verb|temporary| set --- to which
        !           897: \verb|local| assigned registers live only for the expressions in which
        !           898: they are used.  They are announced by \verb|local| as usual, but are
        !           899: used only in the dags passed to next call on \verb|gen|
        !           900: (see~\ref{gen}).  \verb|gen| can thus release all registers assigned to
        !           901: temporaries.
        !           902: 
        !           903: 
        !           904: \section{Dags}
        !           905: 
        !           906: \label{dags}
        !           907: Executable code is specified by dags.  A function body is a sequence of
        !           908: forests of dags, each of which is passed the back end via \verb|gen|,
        !           909: as described below.  Dag nodes, or simply nodes, are defined by
        !           910: \begin{verbatim}
        !           911: typedef struct node *Node;
        !           912: struct node {           /* dag nodes: */
        !           913:    Opcode op;              /* operator */
        !           914:    short count;            /* reference count */
        !           915:    Symbol syms[MAXSYMS];   /* symbols */
        !           916:    Node kids[MAXKIDS];     /* operands */
        !           917:    Node link;              /* next dag in the forest */
        !           918:    Xnode x;                /* back-end's type extension */
        !           919: };
        !           920: \end{verbatim}
        !           921: 
        !           922: The \verb|kids| point to the operand nodes.  Some operators also
        !           923: take symbol table pointers as operands; these appear in the \verb|syms|
        !           924: array.  The default and minimum allowable value for both \verb|MAXKIDS|
        !           925: and \verb|MAXSYMS| is \verb|2|; larger values can be defined for the
        !           926: back end's convenience in the configuration file. \verb|count|
        !           927: holds the number of references to this node from \verb|kids| in other
        !           928: nodes. \verb|link| points to the root of the next dag in the forest.
        !           929: 
        !           930: The \verb|x| field is the back end's `extension' to nodes.  The
        !           931: configuration defines the type \verb|Xnode| to hold the per-node data
        !           932: that the back end needs to generate code.  The sample \verb|Xnode| is
        !           933: \begin{verbatim}
        !           934: typedef struct {
        !           935:    int state;
        !           936:    unsigned visited:1; /* 1 if dag has been linearized */
        !           937:    int reg;            /* register number */
        !           938:    unsigned rmask;     /* unshifted register mask */
        !           939:    unsigned busy;      /* busy regs */
        !           940:    int argoffset;      /* ARG: argument offset */
        !           941:    Node next;          /* next node on emit list */
        !           942: } Xnode;
        !           943: \end{verbatim}
        !           944: Section~\ref{gen} describes the fields.
        !           945: 
        !           946: The \verb|op| field holds an operator. The last character of each is a
        !           947: {\em type suffix} from Table~\ref{dags:suffix-table}.
        !           948: For example, the generic operator \verb|ADD| has the variants
        !           949: \verb|ADDI|, \verb|ADDU|, \verb|ADDP|, \verb|ADDF|, and \verb|ADDD|.
        !           950: 
        !           951: \begin{table}
        !           952: \begin{center}
        !           953: \begin{tabular}{lll}
        !           954: \em type suffix        & \em type \\ \hline
        !           955: \tt C          & char \\
        !           956: \tt S          & short \\
        !           957: \tt I          & int \\
        !           958: \tt U          & unsigned \\
        !           959: \tt P          & any pointer type \\
        !           960: \tt F          & float \\
        !           961: \tt D          & double \\
        !           962: \tt B          & structure or block \\
        !           963: \tt V          & void \\
        !           964: \end{tabular}
        !           965: \end{center}
        !           966: \caption{Type Suffixes.\label{dags:suffix-table}}
        !           967: \end{table}
        !           968: 
        !           969: Table~\ref{dags:op-table} lists each generic operator, its valid type
        !           970: suffixes, and the number of \verb|kids| and \verb|syms| that it uses;
        !           971: multiple values for \verb|kids| indicate type-specific variations,
        !           972: which are detailed below.  For most operators, the type suffix denotes
        !           973: the type of operation to perform and the type of the result.
        !           974: Exceptions are \verb|ADDP|, in which an integer operand
        !           975: is added to a pointer operand, and \verb|SUBP|, which
        !           976: subtracts the second integer operand from the first pointer operand.
        !           977: The operators for assignment, comparison, arguments, and some calls return
        !           978: no results; their type suffixes denote the type of operation to perform.
        !           979: 
        !           980: \begin{table}
        !           981: \begin{center}
        !           982: \begin{tabular}{lllll}
        !           983: \tt syms & \tt kids & \it operator & \it type suffixes & \it operation \\ \hline
        !           984:  1     & 0     & \tt ADDRF     & \verb|    P    |      & address of a parameter \\
        !           985:  1     & 0     & \tt ADDRG     & \verb|    P    |      & address of a global \\
        !           986:  1     & 0     & \tt ADDRL     & \verb|    P    |      & address of a local\\
        !           987:  1     & 0     & \tt CNST      & \verb|CSIUPFD  |      & constant \\[1.5ex]
        !           988: 
        !           989:        & 1     & \tt BCOM      & \verb|   U     |      & bitwise complement \\
        !           990:        & 1     & \tt CVC       & \verb|  IU     |      & convert from \tt char \\
        !           991:        & 1     & \tt CVD       & \verb|  I  F   |      & convert from \tt double \\
        !           992:        & 1     & \tt CVF       & \verb|      D  |      & convert from \tt float \\
        !           993:        & 1     & \tt CVI       & \verb|CS U  D  |      & convert from \tt int \\
        !           994:        & 1     & \tt CVP       & \verb|   U     |      & convert from pointer \\
        !           995:        & 1     & \tt CVS       & \verb|  IU     |      & convert from \tt short \\
        !           996:        & 1     & \tt CVU       & \verb|CSI P    |      & convert from \tt unsigned \\
        !           997:        & 1     & \tt INDIR     & \verb|CSI PFDB |      & fetch \\
        !           998:        & 1     & \tt NEG       & \verb|  I  FD  |      & negation \\[1.5ex]
        !           999: 
        !          1000:        & 2     & \tt ADD       & \verb|  IUPFD  |      & addition \\
        !          1001:        & 2     & \tt BAND      & \verb|   U     |      & bitwise \sc AND \\
        !          1002:        & 2     & \tt BOR       & \verb|   U     |      & bitwise inclusive \sc OR \\
        !          1003:        & 2     & \tt BXOR      & \verb|   U     |      & bitwise exclusive \sc OR \\
        !          1004:        & 2     & \tt DIV       & \verb|  IU FD  |      & division \\
        !          1005:        & 2     & \tt LSH       & \verb|  IU     |      & left shift \\
        !          1006:        & 2     & \tt MOD       & \verb|  IU     |      & modulus \\
        !          1007:        & 2     & \tt MUL       & \verb|  IU FD  |      & multiplication \\
        !          1008:        & 2     & \tt RSH       & \verb|  IU     |      & right shift \\
        !          1009:        & 2     & \tt SUB       & \verb|  IUPFD  |      & subtraction \\[1.5ex]
        !          1010: 
        !          1011:  2     & 2     & \tt ASGN      & \verb|CSI PFDB |      & assignment \\
        !          1012:  1     & 2     & \tt EQ        & \verb|  I  FD  |      & jump if equal \\
        !          1013:  1     & 2     & \tt GE        & \verb|  IU FD  |      & jump if greater than or equal \\
        !          1014:  1     & 2     & \tt GT        & \verb|  IU FD  |      & jump if greater than \\
        !          1015:  1     & 2     & \tt LE        & \verb|  IU FD  |      & jump if less than or equal \\
        !          1016:  1     & 2     & \tt LT        & \verb|  IU FD  |      & jump if less than \\
        !          1017:  1     & 2     & \tt NE        & \verb|  I  FD  |      & jump if not equal \\[1.5ex]
        !          1018: 
        !          1019:  2     & 1     & \tt ARG       & \verb|  I PFDB |      & argument \\
        !          1020:  1     & 1 2   & \tt CALL      & \verb|  I  FDBV|      & function call \\
        !          1021:        & 0 1   & \tt RET       & \verb|  I  FD V|      & return from function \\[1.5ex]
        !          1022: 
        !          1023:        & 1     & \tt JUMP      & \verb|        V|      & unconditional jump \\
        !          1024:  1     & 0     & \tt LABEL     & \verb|        V|      & label definition \\
        !          1025: \end{tabular}
        !          1026: \end{center}
        !          1027: \caption{Node Operators.\label{dags:op-table}}
        !          1028: \end{table}
        !          1029: 
        !          1030: The leaf operators yield the address of a variable or the value of a
        !          1031: constant. \verb|syms[0]| identifies the variable or constant.
        !          1032: 
        !          1033: The unary operators accept and yield a number, except for \verb|INDIR|,
        !          1034: which accepts an address and yields the value at that address.
        !          1035: There is no \verb|BCOMI|; signed integers are complemented using
        !          1036: \verb|BCOMU|. The binary operators accept two numbers and yield one.
        !          1037: 
        !          1038: The type suffix for a conversion operator denotes the type of the
        !          1039: result.  For example, \verb|CVUI| converts an unsigned (\verb|U|) to an
        !          1040: signed integer (\verb|I|).  Conversions between unsigned and short and
        !          1041: between unsigned and character are unsigned conversions; those between
        !          1042: integer and short and between integer and character are signed
        !          1043: conversions.  For example, \verb|CVSU| converts an unsigned short to an
        !          1044: unsigned while \verb|CVSI| converts an signed short to a signed
        !          1045: integer.
        !          1046: 
        !          1047: The front end composes conversions to form those not in the table.
        !          1048: For example, it converts a short to a float by first
        !          1049: converting it to an int and then a double.  The 16
        !          1050: conversion operators are represented by arrows in Figure~\ref{fig:conversions}.
        !          1051: Composed conversions follow the path from the source
        !          1052: type to the destination type.
        !          1053: 
        !          1054: \begin{figure}
        !          1055: \begin{center}
        !          1056: \setlength{\unitlength}{7pt}
        !          1057: \begin{picture}(20,12)
        !          1058: \multiput(1,2)(6,0){3}{\begin{picture}(0,3)
        !          1059:        \put(0,0){\vector(0,1){3}}\put(0,0){\vector(0,-1){0}}\end{picture}}
        !          1060: \multiput(7,7)(6,0){2}{\begin{picture}(0,3)
        !          1061:        \put(0,0){\vector(0,1){3}}\put(0,0){\vector(0,-1){0}}\end{picture}}
        !          1062: \multiput(2,6)(6,0){3}{\begin{picture}(4,0)
        !          1063:        \put(0,0){\vector(1,0){4}}\put(0,0){\vector(-1,0){0}}\end{picture}}
        !          1064: \put( 1, 1){\makebox(0,0){\tt F}}
        !          1065: \put( 7, 1){\makebox(0,0){\tt S}}
        !          1066: \put(13, 1){\makebox(0,0){\tt S}}
        !          1067: 
        !          1068: \put( 1, 6){\makebox(0,0){\tt D}}
        !          1069: \put( 7, 6){\makebox(0,0){\tt I}}
        !          1070: \put(13, 6){\makebox(0,0){\tt U}}
        !          1071: \put(19, 6){\makebox(0,0){\tt P}}
        !          1072: 
        !          1073: \put( 7,11){\makebox(0,0){\tt C}}
        !          1074: \put(13,11){\makebox(0,0){\tt C}}
        !          1075: \end{picture}
        !          1076: \end{center}
        !          1077: \vspace{-1.5pc}\small\em
        !          1078: \caption{Conversions~\label{fig:conversions}}
        !          1079: \end{figure}
        !          1080: 
        !          1081: There is no \verb|CVUD|; conversion of an unsigned \verb|u| to a double
        !          1082: is done by the equivalent of the expression
        !          1083: \begin{verbatim}
        !          1084: (int)u >= 0 ? (double)(int)u : (double)(int)u + UINT_MAX + 1
        !          1085: \end{verbatim}
        !          1086: where \verb|UINT_MAX| is the ANSI-specified maximum value for an unsigned.
        !          1087: Likewise, there is no \verb|CVDU|; conversion of a double \verb|d| to an unsigned
        !          1088: is done by the equivalent of the expression
        !          1089: \begin{verbatim}
        !          1090: d >= INT_MAX + 1 ? (unsigned)(int)(d-(INT_MAX+1)) + INT_MAX + 1
        !          1091:                  : (unsigned)(int)d
        !          1092: \end{verbatim}
        !          1093: where \verb|INT_MAX| is the ANSI-specified maximum value for a signed integer.
        !          1094: 
        !          1095: \verb|ASGN| stores the value of \verb|kids[1]| in the cell addressed by
        !          1096: \verb|kids[0]|. \verb|syms[0]| and \verb|syms[1]| point to symbol table
        !          1097: entries for integer constants that give the size of the value and
        !          1098: its alignment, respectively. These are most useful for \verb|ASGNB|,
        !          1099: which implements structure assignment and other
        !          1100: `block moves' (e.g., initialization of automatic arrays).
        !          1101: 
        !          1102: For the comparisons, \verb|syms[0]| points to a symbol-table entry for
        !          1103: the label to jump to if the comparison is true.  \verb|JUMPV| is an
        !          1104: unconditional jump to the address computed by \verb|kids[0]|. \verb|LABEL|
        !          1105: defines the label given by \verb|syms[0]| and is otherwise a no-op.
        !          1106: As on most computers, signed comparisons are used for unsigned equals and not equals. 
        !          1107: 
        !          1108: Function calls have a \verb|CALL| node preceded by zero or more
        !          1109: \verb|ARG| nodes. The front end `un-nests'
        !          1110: function calls, so \verb|ARG| nodes are always associated with the next
        !          1111: \verb|CALL| node in the forest. The order of the \verb|ARG| nodes is
        !          1112: right-to-left unless the configuration parameter \verb|LEFT_TO_RIGHT|
        !          1113: is defined, which it is for the sample.
        !          1114: A \verb|CALL| node's \verb|syms[0]| points to a symbol
        !          1115: whose only non-null field is \verb|type|,
        !          1116: which is the function type of the callee.
        !          1117: 
        !          1118: \verb|ARG| nodes establish the value computed by \verb|kids[0]| as the
        !          1119: next argument.  \verb|syms[0]| and \verb|syms[1]| point to symbol table
        !          1120: entries for integer constants that give the size of the argument and
        !          1121: its alignment, respectively.
        !          1122: 
        !          1123: \label{dags:calls}
        !          1124: 
        !          1125: In \verb|CALL| nodes, \verb|kids[0]| computes the address of the callee.
        !          1126: \verb|CALLB| calls functions that return structures;
        !          1127: \verb|kids[1]| computes the address of a
        !          1128: temporary local variable to hold the returned value. There is no
        !          1129: \verb|RETB|; the front end uses a \verb|RETV| preceded by an
        !          1130: \verb|ASGNB| to the structure addressed by the first local. The
        !          1131: \verb|CALLB| code and the function prologue must collaborate to store the
        !          1132: \verb|CALLB|'s \verb|kids[1]| into the callee's first local.
        !          1133: \verb|function| (Section~\ref{function}), \verb|local| (Section~\ref{local})
        !          1134: and the code emitted below for \verb|CALLB| (Section~\ref{emit})
        !          1135: illustrate such collaboration.
        !          1136: \verb|CALLB| nodes have a \verb|count| of \verb|0| because the front end references
        !          1137: the temporary wherever the returned value is referenced.
        !          1138: 
        !          1139: In \verb|RET| nodes, \verb|kids[0]| computes the value returned, except
        !          1140: for \verb|RETV| nodes, which are childless.
        !          1141: 
        !          1142: Character and short integer arguments are always promoted to the
        !          1143: corresponding integer type even in the presence of a prototype.
        !          1144: The promoted values are converted back to the intended
        !          1145: type upon entry to the function. The front end accomplishes
        !          1146: this conversion by specifying the intended types for the callee
        !          1147: as described in Section~\ref{function}. 
        !          1148: For example, the body for
        !          1149: \begin{verbatim}
        !          1150: f(char c) { f(c); }
        !          1151: \end{verbatim}
        !          1152: becomes two forests, which are linearized below;
        !          1153: the \verb|syms| columns list the \verb|x.name| fields.
        !          1154: \begin{verbatim}
        !          1155:  #  op      count kids    syms
        !          1156:  1. ADDRFP   1            4(ap)
        !          1157:  2. ADDRFP   1            4(ap)
        !          1158:  3. INDIRI   1     2                                    
        !          1159:  4. CVIC     1     3                                    
        !          1160:  5. ASGNC    0     1  4                                 
        !          1161:                                   
        !          1162:  1. ADDRFP   1            4(ap)
        !          1163:  2. INDIRC   1     1                                    
        !          1164:  3. CVCI     1     2                                    
        !          1165:  4. ARGI     0     3      4 4                           
        !          1166:  5. ADDRGP   1            _f                             
        !          1167:  6. CALLI    0     5      4                             
        !          1168: \end{verbatim}
        !          1169: The first forest holds one dag, which converts the actual argument to the
        !          1170: intended type.  The second forest holds two dags.
        !          1171: The first (nodes 1--4) promotes \verb|c| to pass it as an integer,
        !          1172: and the second (nodes 5 \& 6) calls \verb|f|.
        !          1173: 
        !          1174: Unsigned variants of \verb|ASGN|, \verb|INDIR|, \verb|ARG|,
        !          1175: \verb|CALL|, and \verb|RET| were omitted as unnecessary.  Signed and
        !          1176: unsigned integers have the same size, so the corresponding signed
        !          1177: operator is used instead.  Likewise, there is no \verb|CALLP| or
        !          1178: \verb|RETP|. A pointer is returned by using \verb|CVPU| and
        !          1179: \verb|RETI|. A pointer-valued function is called by using \verb|CALLI|
        !          1180: and \verb|CVUP|.
        !          1181: 
        !          1182: In Table~\ref{dags:op-table}, the operators listed at and following \verb|ASGN|
        !          1183: are used for their side-effects. They always appear as roots in the
        !          1184: forest of dags, and they appear in the order in which they must be executed.
        !          1185: Except for \verb|CALLD|, \verb|CALLF| and \verb|CALLI|, their reference
        !          1186: counts are always zero.  Even these \verb|CALL| nodes have zero
        !          1187: reference counts when their values are unused.
        !          1188: 
        !          1189: \subsection{\tt gen}
        !          1190: 
        !          1191: \label{gen}
        !          1192: \label{dags:registers}
        !          1193: The front end calls \verb|gen| to select code.  It passes \verb|gen| a
        !          1194: forest of dags.  For example,
        !          1195: \begin{verbatim}
        !          1196: int i, *p; f() { i = *p++; }
        !          1197: \end{verbatim}
        !          1198: yields the forest linearized below.
        !          1199: \begin{verbatim}
        !          1200:  #  op      count kids      syms
        !          1201:  1. ADDRGP   2              _p
        !          1202:  2. INDIRP   2     1
        !          1203:  3. CNSTI    1              4
        !          1204:  4. ADDP     1     2  3
        !          1205:  5. ASGNP    0     1  4
        !          1206:  6. ADDRGP   1              _i
        !          1207:  7. INDIRI   1     2
        !          1208:  8. ASGNI    0     6  7
        !          1209: \end{verbatim}
        !          1210: This forest consists of three dags, rooted at nodes 2, 5, and 8 above.
        !          1211: The \verb|INDIRP| node, which fetches the value of \verb|p|, comes
        !          1212: before node 5, which changes \verb|p|, so the original value of
        !          1213: \verb|p| is available for subsequent use by node 7, which fetches the
        !          1214: integer pointed to by that value.
        !          1215: 
        !          1216: \verb|gen| traverses the forest and selects code, but it
        !          1217: emits nothing because it may be necessary to determine, for example,
        !          1218: the registers needed before the function prologue
        !          1219: can be emitted (see Section~\ref{function}).
        !          1220: So \verb|gen| merely annotates the nodes to identify the code selected,
        !          1221: and it returns a pointer that is ultimately passed to the back end's
        !          1222: \verb|emit| to actually output the code.  Once the front end calls
        !          1223: \verb|gen|, it does not inspect the contents of the nodes again, so
        !          1224: \verb|gen| may modify them freely.
        !          1225: 
        !          1226: The sample code generator emits naive code, so \verb|gen| concerns
        !          1227: itself mainly with register allocation. 
        !          1228: The sample \verb|gen|
        !          1229: \begin{verbatim}
        !          1230: static Node gen(Node p) {
        !          1231:    Node head, *last;
        !          1232: 
        !          1233:    for (last = &head; p; p = p->link)
        !          1234:       last = linearize(p, last, 0);
        !          1235:    for (p = head; p; p = p->x.next) {
        !          1236:       ralloc(p);
        !          1237:       if (p->count == 0 && sets(p))
        !          1238:          putreg(p);
        !          1239:    }
        !          1240:    return head;
        !          1241: }
        !          1242: \end{verbatim}
        !          1243: linearizes each dag in the forest, in execution order, and then
        !          1244: allocates registers for each node.  If the node sets a register, but no
        !          1245: subsequent node references it, which is indicated by a reference
        !          1246: \verb|count| of \verb|0|, \verb|gen| releases the register.  Finally,
        !          1247: it returns the linearized node list for traversal by \verb|emit|.
        !          1248: 
        !          1249: \verb|gen| linearizes the dags before allocating registers to simplify
        !          1250: the insertion of spills and reloads. The function
        !          1251: \begin{verbatim}
        !          1252: static Node *linearize(Node p, Node *last, Node next) {
        !          1253:    if (p && !p->x.visited) {
        !          1254:       last = linearize(p->kids[0], last, 0);
        !          1255:       last = linearize(p->kids[1], last, 0);
        !          1256:       p->x.visited = 1;
        !          1257:       *last = p;
        !          1258:       last = &p->x.next;
        !          1259:    }
        !          1260:    *last = next;
        !          1261:    return last;
        !          1262: }
        !          1263: \end{verbatim}
        !          1264: linearizes the dag at \verb|p| and inserts it into the growing list of
        !          1265: nodes between \verb|*last| and \verb|next|.
        !          1266: 
        !          1267: \verb|ralloc| handles register allocation for a single node.
        !          1268: \begin{verbatim}
        !          1269: static void ralloc(Node p) {
        !          1270:    int i;
        !          1271: 
        !          1272:    switch (generic(p->op)) {
        !          1273:    case ARG:
        !          1274:       argoffset = roundup(argoffset, p->syms[1]->u.c.v.i);
        !          1275:       p->x.argoffset = argoffset;
        !          1276:       argoffset += p->syms[0]->u.c.v.i;
        !          1277:       if (argoffset > argbuildsize)
        !          1278:          argbuildsize = roundup(argoffset, 4);
        !          1279:       break;
        !          1280:    case CALL:
        !          1281:       argoffset = 0;
        !          1282:       break;
        !          1283:    }
        !          1284:    for (i = 0; i < MAXKIDS; i++)
        !          1285:       putreg(p->kids[i]);
        !          1286:    p->x.busy = rmask;
        !          1287:    if (needsreg(p))
        !          1288:       getreg(p);
        !          1289: }
        !          1290: \end{verbatim}
        !          1291: \verb|generic| strips the type suffix from an operator and
        !          1292: thus simplifies the switch.  \verb|ralloc| calls \verb|putreg| to
        !          1293: release the registers allocated for the node's children and then it
        !          1294: calls \verb|getreg| to allocate a register for the node itself if it needs one.  It sets
        !          1295: the node's \verb|x.busy| to record the busy registers; \verb|emit|
        !          1296: needs this information for a few operators.  The register state is
        !          1297: encoded in \verb|rmask|; \verb|rmask&(1<<r)| is \verb|1| if register
        !          1298: \verb|r| is busy, for \verb|r| from \verb|0| to \verb|11|.
        !          1299: 
        !          1300: \verb|CALL| and \verb|ARG| nodes require extra steps.  \verb|ARG| nodes
        !          1301: produce no result; instead, they store the value computed by
        !          1302: \verb|kids[0]| into the next location in the argument build area.
        !          1303: \verb|syms[0]| and \verb|syms[1]| point to constant symbols that give
        !          1304: the size and alignment of the argument.  \verb|argoffset| is the
        !          1305: running offset into the argument build area.
        !          1306: \verb|argoffset| is rounded up to the appropriate alignment boundary, saved
        !          1307: in the node's \verb|x.argoffset| for use in \verb|emit|, and incremented
        !          1308: by the size of the argument.  \verb|argbuildsize| tracks the maximum
        !          1309: \verb|argoffset| needed by the current function.  The case for \verb|CALL| nodes
        !          1310: clears \verb|argoffset| for the next set of \verb|ARG| nodes.
        !          1311: 
        !          1312: \verb|getreg|
        !          1313: accepts a node and allocates a register for it:
        !          1314: \begin{verbatim}
        !          1315: static void getreg(Node p) {
        !          1316:    int r, m = optype(p->op) == D ? 3 : 1;
        !          1317: 
        !          1318:    for (r = 0; r < nregs; r++)
        !          1319:       if ((rmask&(m<<r)) == 0) {
        !          1320:          p->x.rmask = m;
        !          1321:          p->x.reg = r;
        !          1322:          rmask |= sets(p);
        !          1323:          usedmask |= sets(p);
        !          1324:          return;
        !          1325:       }
        !          1326:    r = spillee(p, m);
        !          1327:    spill(r, m, p);
        !          1328:    getreg(p);
        !          1329: }
        !          1330: \end{verbatim}
        !          1331: \verb|optype(op)| returns the type suffix of operator \verb|op|.
        !          1332: \verb|m| is set to the mask \verb|1| if
        !          1333: the result of \verb|p->op| needs an ordinary register and \verb|3| if it
        !          1334: needs a double register.
        !          1335: \verb|getreg| loops over the registers. If it finds one that's free,
        !          1336: it sets the node's \verb|x.reg| field to the register allocated and the
        !          1337: \verb|x.rmask| field to the mask. It also
        !          1338: updates \verb|usedmask|, which is used to generate the prologue
        !          1339: described in Section~\ref{function}; \verb|sets(p)| returns
        !          1340: \verb|p->x.rmask<<p->x.reg|.  If no registers are free, \verb|getreg|
        !          1341: spills a register and calls itself recursively to try again.
        !          1342: Section~\ref{spills} describes spills.
        !          1343: 
        !          1344: \verb|putreg| releases registers:
        !          1345: \begin{verbatim}
        !          1346: static void putreg(Node p) {
        !          1347:    if (p && --p->count <= 0)
        !          1348:       rmask &= ~sets(p);
        !          1349: }
        !          1350: \end{verbatim}
        !          1351: Dags can use result registers multiple times, so \verb|putreg|
        !          1352: decrements the reference count and frees the register only when the
        !          1353: last reference is removed by clearing the appropriate bits in
        !          1354: \verb|rmask|.
        !          1355: 
        !          1356: \subsection{\tt emit}
        !          1357: 
        !          1358: \label{emit}
        !          1359: \verb|emit| emits the linearized forest. The sample walks down the list,
        !          1360: switches on the opcode to identify the code to emit:
        !          1361: \begin{verbatim}
        !          1362: void emit(Node p) {
        !          1363:    for (; p; p = p->x.next) {
        !          1364:       Node a = p->kids[0], b = p->kids[1];
        !          1365:       int r = p->x.reg;
        !          1366:       switch (p->op) {
        !          1367:       case CNSTC: case CNSTI: case CNSTP:
        !          1368:       case CNSTS: case CNSTU:
        !          1369:          print("movl $%s,r%d\n", p->syms[0]->x.name, r);
        !          1370:          break;
        !          1371:       case ADDRGP: case ADDRFP: case ADDRLP:
        !          1372:          print("moval %s,r%d\n", p->syms[0]->x.name, r);
        !          1373:          break;
        !          1374:       ...
        !          1375:       }
        !          1376:    }
        !          1377: }
        !          1378: \end{verbatim}
        !          1379: The individual cases emit naive code for a single operator.
        !          1380: The \verb|CNST| cases above emit a VAX instruction that loads a constant
        !          1381: into a register. \verb|defsymbol| stored the constant string in
        !          1382: \verb|p->syms[0]->x.name|, and \verb|ralloc| stored the result register
        !          1383: name \verb|p->x.reg|. The \verb|ADDR| cases above emit a VAX
        !          1384: instruction that moves the address of a variable into the result
        !          1385: register.
        !          1386: 
        !          1387: Most of the unary operators share a common pattern, which the sample
        !          1388: abstracts into a macro:
        !          1389: \begin{verbatim}
        !          1390: #define suffix(p)    ".fdbwllll."[optype((p)->op)]
        !          1391: #define unary(inst)  print("%s%c r%d,r%d\n", inst, suffix(p), a->x.reg, r)
        !          1392: case BCOMU:                             unary("mcom" );  break;
        !          1393: case NEGD:  case NEGF:  case NEGI:      unary("mneg" );  break;
        !          1394: case CVCI:                              unary("cvtb" );  break;
        !          1395: case CVCU:                              unary("movzb");  break;
        !          1396: case CVSI:                              unary("cvtw" );  break;
        !          1397: case CVSU:                              unary("movzw");  break;
        !          1398: case CVDF:  case CVDI:                  unary("cvtd" );  break;
        !          1399: case CVFD:                              unary("cvtf" );  break;
        !          1400: case CVUC:  case CVUS:                  unary("cvtl" );  break;
        !          1401: case CVIC:  case CVIS:  case CVID:      unary("cvtl" );  break;
        !          1402: case CVIU:  case CVUI:                  unary("mov"  );  break;
        !          1403: case CVPU:  case CVUP:                  unary("mov"  );  break;
        !          1404: \end{verbatim}
        !          1405: \verb|unary| emits the VAX operator, the VAX type suffix --- which is
        !          1406: computed by indexing a string of such suffixes --- and the source and
        !          1407: destination registers. Most of the binary operators
        !          1408: \begin{verbatim}
        !          1409: #define binary(inst) print("%s%c3 r%d,r%d,r%d\n", inst, suffix(p), \
        !          1410:                         b->x.reg, a->x.reg, r)
        !          1411: case BANDU:                             binary("bic");   break;
        !          1412: case BORU:                              binary("bis");   break;
        !          1413: case BXORU:                             binary("xor");   break;
        !          1414: case ADDD:  case ADDF:                  binary("add");   break;
        !          1415: case ADDI:  case ADDP:  case ADDU:      binary("add");   break;
        !          1416: case SUBD:  case SUBF:                  binary("sub");   break;
        !          1417: case SUBI:  case SUBP:  case SUBU:      binary("sub");   break;
        !          1418: case MULD:  case MULF:                  binary("mul");   break;
        !          1419: case MULI:  case MULU:                  binary("mul");   break;
        !          1420: case DIVD:  case DIVF:  case DIVI:      binary("div");   break;
        !          1421: \end{verbatim}
        !          1422: and all of the comparisons
        !          1423: \begin{verbatim}
        !          1424: #define compare(cp)  print("cmp%c r%d,r%d; j%s %s\n", suffix(p), \
        !          1425:                         a->x.reg, b->x.reg, cp, p->syms[0]->x.name)
        !          1426: case EQD:   case EQF:   case EQI:       compare("eql" ); break;
        !          1427: case GED:   case GEF:   case GEI:       compare("geq" ); break;
        !          1428: case GEU:                               compare("gequ"); break;
        !          1429: case GTD:   case GTF:   case GTI:       compare("gtr" ); break;
        !          1430: case GTU:                               compare("gtru"); break;
        !          1431: case LED:   case LEF:   case LEI:       compare("leq" ); break;
        !          1432: case LEU:                               compare("lequ"); break;
        !          1433: case LTD:   case LTF:   case LTI:       compare("lss" ); break;
        !          1434: case LTU:                               compare("lssu"); break;
        !          1435: case NED:   case NEF:   case NEI:       compare("neq" ); break;
        !          1436: \end{verbatim}
        !          1437: are handled similarly.
        !          1438: 
        !          1439: The cases
        !          1440: \begin{verbatim}
        !          1441: case INDIRC: case INDIRD: case INDIRF: case INDIRI:
        !          1442: case INDIRP: case INDIRS:
        !          1443:    print("mov%c (r%d),r%d\n", suffix(p), a->x.reg, r);
        !          1444:    break;
        !          1445: case ASGNC: case ASGND: case ASGNF: case ASGNI: case ASGNP: case ASGNS:
        !          1446:    print("mov%c r%d,(r%d)\n", suffix(p), b->x.reg, a->x.reg);
        !          1447:    break;
        !          1448: case JUMPV:
        !          1449:    print("jmp (r%d)\n", a->x.reg);
        !          1450:    break;
        !          1451: case LABELV:
        !          1452:    print("%s:", p->syms[0]->x.name);
        !          1453:    break;
        !          1454: \end{verbatim}
        !          1455: emit code to indirectly load and store memory cells, to jump indirectly,
        !          1456: and to emit a label definition.
        !          1457: 
        !          1458: The cases
        !          1459: \begin{verbatim}
        !          1460: case ARGD: case ARGF: case ARGI: case ARGP:
        !          1461:    print("mov%c r%d,%d(sp)\n", suffix(p),
        !          1462:       a->x.reg, p->x.argoffset);
        !          1463:    break;
        !          1464: case CALLD: case CALLF: case CALLI: case CALLV:
        !          1465:    save(p->x.busy&0x3e);
        !          1466:    print("calls $0,(r%d)\n", a->x.reg);
        !          1467:    if (p->op != CALLV)
        !          1468:       print("mov%c r0,r%d\n", suffix(p), r);
        !          1469:    restore(p->x.busy&0x3e);
        !          1470:    break;
        !          1471: case RETD: case RETF: case RETI:
        !          1472:    print("mov%c r%d,r0; ret\n", suffix(p), a->x.reg);
        !          1473:    break;
        !          1474: case RETV:
        !          1475:    print("ret\n");
        !          1476:    break;
        !          1477: \end{verbatim}
        !          1478: emit code to move an argument onto the stack, to call a procedure, and
        !          1479: to return a value. The \verb|ARG| cases use \verb|x.argoffset|,
        !          1480: into which \verb|ralloc| stored the stack offset for the argument.
        !          1481: Procedures may destroy registers 1--5, so the \verb|CALL| cases use
        !          1482: \begin{verbatim}
        !          1483: static void save(unsigned mask) {
        !          1484:    int i;
        !          1485: 
        !          1486:    for (i = 1; i < nregs; i++)
        !          1487:       if (mask&(1<<i))
        !          1488:          print("movl r%d,%d(fp)\n", i,
        !          1489:             4*i - framesize + argbuildsize);
        !          1490: }
        !          1491: \end{verbatim}
        !          1492: to emit code to save any of those registers that are busy;
        !          1493: \verb|restore| is similar.
        !          1494: The \verb|RET| cases merely copy any return value into register 0 and return.
        !          1495: 
        !          1496: Several binary operators require special handling. The shift
        !          1497: instructions use a syntax that differs slightly from the other binary instructions:
        !          1498: \begin{verbatim}
        !          1499: case RSHI: case LSHI: case LSHU:
        !          1500:    print("ashl r%d,r%d,r%d\n", b->x.reg, a->x.reg, r);
        !          1501:    break;
        !          1502: \end{verbatim}
        !          1503: \verb|RSHI| and \verb|LSHI| generate the same code because the front
        !          1504: end negates the shift count for \verb|RSHI| if the configuration
        !          1505: parameter \verb|VAX| is defined. No instructions directly
        !          1506: implement unsigned right shift, division, or modulus, so \verb|emit|
        !          1507: uses a field-extraction instruction for the first and library calls for the others:
        !          1508: \begin{verbatim}
        !          1509: case RSHU:
        !          1510:    print("subl3 r%d,$32,r0; extzv r%d,r0,r%d,r%d\n",
        !          1511:       b->x.reg, b->x.reg, a->x.reg, r);
        !          1512:    break;
        !          1513: case DIVU:
        !          1514:    save(p->x.busy&0x3e);
        !          1515:    print("pushl r%d; pushl r%d; calls $2,udiv; movl r0,r%d\n",
        !          1516:       b->x.reg, a->x.reg, r);
        !          1517:    restore(p->x.busy&0x3e);
        !          1518:    break;
        !          1519: case MODU:
        !          1520:    save(p->x.busy&0x3e);
        !          1521:    print("pushl r%d; pushl r%d; calls $2,urem; movl r0,r%d\n",
        !          1522:       b->x.reg, a->x.reg, r);
        !          1523:    restore(p->x.busy&0x3e);
        !          1524:    break;
        !          1525: \end{verbatim}
        !          1526: Finally, ANSI C requires that \verb|a%b| equal \verb|a-(a/b)*b|,
        !          1527: so \verb|MODI| computes it just this way:
        !          1528: \begin{verbatim}
        !          1529: case MODI:
        !          1530:    print("divl3 r%d,r%d,r0; mull2 r%d,r0; subl3 r0,r%d,r%d\n",
        !          1531:       b->x.reg, a->x.reg, b->x.reg, a->x.reg, r);
        !          1532:    break;
        !          1533: \end{verbatim}
        !          1534: 
        !          1535: Only the structure instructions remain:
        !          1536: \begin{verbatim}
        !          1537: case INDIRB:
        !          1538:    print("moval (r%d),r%d\n", a->x.reg, r);
        !          1539:    break;
        !          1540: case ASGNB:
        !          1541:    save(p->x.busy&0x3f);
        !          1542:    print("movc3 $%s,(r%d),(r%d)\n", p->syms[0]->x.name,
        !          1543:       b->x.reg, a->x.reg);
        !          1544:    restore(p->x.busy&0x3f);
        !          1545:    break;
        !          1546: case ARGB:
        !          1547:    save(p->x.busy&0x3f);
        !          1548:    print("movc3 $%s,(r%d),%d(sp)\n", p->syms[0]->x.name,
        !          1549:       a->x.reg, p->x.argoffset);
        !          1550:    restore(p->x.busy&0x3f);
        !          1551:    break;
        !          1552: \end{verbatim}
        !          1553: The scalar \verb|INDIR|s and \verb|ASGN|s load and store values
        !          1554: directly, but structures won't fit in registers, so their instructions
        !          1555: manipulate {\em addresses} instead. \verb|ASGNB| uses a
        !          1556: block move instruction, which needs the size from
        !          1557: \verb|p->syms[0]->x.name|; it also destroys registers 0--5, so
        !          1558: \verb|emit| arranges to save and restore their values. \verb|ARGB|
        !          1559: operates similarly, but copies the structure into the stack instead.
        !          1560: Finally,
        !          1561: \begin{verbatim}
        !          1562: case CALLB:
        !          1563:    save(p->x.busy&0x3e);
        !          1564:    if (a->x.reg == 1) {
        !          1565:       print("movl r1,r0\n");
        !          1566:       a->x.reg = 0;
        !          1567:    }
        !          1568:    if (b->x.reg != 1)
        !          1569:       print("movl r%d,r1\n", b->x.reg);
        !          1570:    print("calls $0,(r%d)\n", a->x.reg);
        !          1571:    restore(p->x.busy&0x3e);
        !          1572:    break;
        !          1573: \end{verbatim}
        !          1574: is like an ordinary call, but it also passes the address at which to
        !          1575: store the return value in register 1; if the address of the {\em function}
        !          1576: is already in register 1, it is first moved out of the way into
        !          1577: register 0, which is not otherwise allocated.
        !          1578: 
        !          1579: If \verb|NOARGB| is defined (e.g., in the configuration file),
        !          1580: the front end uses Sun's convention for passing structures by value.
        !          1581: It builds dags that copy structure arguments to temporaries,
        !          1582: passes pointers to these temporaries, adds an extra indirection
        !          1583: to references to these parameters in the callee, and changes
        !          1584: the types of the callee's formals to reflect this convention.
        !          1585: It also sets \verb|structarg| for these parameters to distinquish
        !          1586: them from bona fide structure pointer parameters.
        !          1587: 
        !          1588: \section{Spills}
        !          1589: 
        !          1590: \label{spills}
        !          1591: When \verb|getreg| finds that all registers are busy,
        !          1592: one or more must be {\em spilled} now and {\em reloaded}
        !          1593: when the values are needed again.
        !          1594: Handling spills correctly is difficult and a common
        !          1595: source of compiler bugs. Test cases that expose
        !          1596: bugs in the spill code are necessarily complex.
        !          1597: As such, and for completeness,
        !          1598: the implementation described in this section is adapted
        !          1599: from that used in the production versions of \verb|lcc|.
        !          1600: Reference~\cite{fraser:hanson:92} describes the important
        !          1601: differences and explains the rationale behind this spiller.
        !          1602: 
        !          1603: The dag constructed by the front end minimizes the reevaluation of
        !          1604: common subexpressions. Spills are, in a sense,
        !          1605: a result of the front end's eagerness to avoid
        !          1606: reevaluation, and handling spills amounts to
        !          1607: `breaking the dags' generated by the front end.
        !          1608: A node representing a common subexpression is changed
        !          1609: so that it stores the value in a temporary, and
        !          1610: subsequent references to that node are edited
        !          1611: to load the value from the temporary.
        !          1612: 
        !          1613: Spilling involves three major steps:
        !          1614: Identifying the registers to spill,
        !          1615: generating the code for the spills at the correct
        !          1616: location the output stream, and
        !          1617: generating the code for the reloads, again at the correct locations.
        !          1618: These steps are performed at the end of \verb|getreg|:
        !          1619: \begin{verbatim}
        !          1620: static void getreg(Node p) {
        !          1621:    int r, m = optype(p->op) == D ? 3 : 1;
        !          1622:    ...
        !          1623:    r = spillee(p, m);
        !          1624:    spill(r, m, p);
        !          1625:    getreg(p);
        !          1626: }
        !          1627: \end{verbatim}
        !          1628: \verb|spillee| identifies the register (\verb|m==1|) or register pair (\verb|m==3|)
        !          1629: to be spilled on behalf of \verb|p|,
        !          1630: and \verb|spill| generates the spill and reload code.
        !          1631: Once \verb|r| has been spilled,
        !          1632: it is available,
        !          1633: and the call to \verb|getreg| is guaranteed to succeed.
        !          1634: 
        !          1635: \verb|ralloc| frees registers as soon as possible.
        !          1636: If the available registers are exhausted,
        !          1637: it is because there are multiple references
        !          1638: to the nodes holding the registers, which arise
        !          1639: from common subexpressions and from multiple assignment, augmented
        !          1640: assignment, and the operators \verb|++| and \verb|--|.
        !          1641: Consider the following program.
        !          1642: \begin{verbatim}
        !          1643: double a[10],b[10];
        !          1644: int i;
        !          1645: f(){ i = (a[i]+b[i])*(a[i]-b[i]); }
        !          1646: \end{verbatim}
        !          1647: The initial linearized forest is shown to the left of
        !          1648: the vertical line in the display below.
        !          1649: \begin{flushleft}\tt
        !          1650: \begin{tabular}{rlllc|cll}
        !          1651: \#\ & op      & kids    & syms  & count &count   & uses   & sets  \\
        !          1652:  1. & ADDRGP  &         & \_i   &   2   &   1    &        & r1    \\
        !          1653:  2. & INDIRI  &  1      &       &   1   &   0    & r1     & r2    \\
        !          1654:  3. & CNSTI   &         & 3     &   1   &   0    &        & r3    \\
        !          1655:  4. & LSHI    &  2  3   &       &   2   &   0    & r2 r3  & r2    \\
        !          1656:  5. & ADDRGP  &         & \_a   &   1   &   0    &        & r3    \\
        !          1657:  6. & ADDP    &  4  5   &       &   1   &   0    & r2 r3  & r3    \\
        !          1658:  7. & INDIRD  &  6      &       &   2   &   2    & r3     & r3 r4 \\
        !          1659:  8. & ADDRGP  &         & \_b   &   1   &   0    &        & r5    \\
        !          1660:  9. & ADDP    &  4  8   &       &   1   &   0    & r2 r5  & r2    \\
        !          1661: 10. & INDIRD  &  9      &       &   2   &   2    & r2    \\
        !          1662: 11. & ADDD    &  7 10   &       &   1   &   1    & r3 r4 \\
        !          1663: 12. & SUBD    &  7 10   &       &   1   &   1    & r3 r4 \\
        !          1664: 13. & MULD    & 11 12   &       &   1   &   1 \\
        !          1665: 14. & CVDI    & 13      &       &   1   &   1 \\
        !          1666: 15. & ASGNI   &  1 14   &       &   0   &   0    & r1 \\
        !          1667: \end{tabular}
        !          1668: \end{flushleft}
        !          1669: Nodes with \verb|count| fields greater than 1 represent
        !          1670: four common subexpressions: the lvalue of \verb|i| (node 1),
        !          1671: the addressing expression \verb|i<<3| (node 4), and
        !          1672: the rvalues of \verb|a[i]| (node 7) and \verb|b[i]| (node 10).
        !          1673: If only registers 1--5 are available, \verb|ralloc| runs
        !          1674: out of registers at node 10.
        !          1675: The linearized forest at that point is shown to the right of
        !          1676: the vertical line in the display above.
        !          1677: As indicated by the non-zero \verb|count|s and the \verb|sets| column,
        !          1678: node 1 is using \verb|r1| and
        !          1679: node 7 is using \verb|r3| and \verb|r4|.
        !          1680: Node 10 needs a register pair,
        !          1681: but only registers \verb|r2| and \verb|r5| are available.
        !          1682: Note that the \verb|count| of node 4, which is \verb|i<<3|,
        !          1683: has dropped to 0 because \verb|ralloc| has
        !          1684: processed both uses (nodes 6 and 9).
        !          1685: 
        !          1686: \verb|getreg| calls \verb|spillee| to identify a register pair
        !          1687: to be spilled so that it can be used for node 10.
        !          1688: \verb|spillee| chooses the register whose next use
        !          1689: is the most distant in the linearized forest.
        !          1690: This choice is analogous to the optimal demand paging strategy that
        !          1691: replaces pages whose next use is most distant
        !          1692: in the execution stream~\cite{freiburghouse}.
        !          1693: 
        !          1694: For each register,
        !          1695: \verb|spillee| simply searches down the linearized forest
        !          1696: for register uses and records the most distant.
        !          1697: \begin{verbatim}
        !          1698: static int spillee(Node dot, unsigned m) {
        !          1699:    int bestdist = -1, bestreg = 0, dist, r;
        !          1700:    Node q;
        !          1701: 
        !          1702:    for (r = 1; r < nregs - (m>>1); r++) {
        !          1703:       dist = 0;
        !          1704:       for (q = dot->x.next; q && !(uses(q)&(m<<r)); q = q->x.next)
        !          1705:          dist++;
        !          1706:       if (dist > bestdist) {
        !          1707:          bestdist = dist;
        !          1708:          bestreg = r;
        !          1709:       }
        !          1710:    }
        !          1711:    return bestreg;
        !          1712: }
        !          1713: \end{verbatim}
        !          1714: \verb|spillee| is called with \verb|dot| equal to the node at which
        !          1715: the spill is required, which is node 10 in the example above.
        !          1716: \verb|uses(p)| returns a bit mask giving the registers used by node \verb|p|,
        !          1717: i.e., the registers set by \verb|p|'s kids:
        !          1718: \begin{verbatim}
        !          1719: static unsigned uses(Node p) {
        !          1720:    int i;
        !          1721:    unsigned m = 0;
        !          1722: 
        !          1723:    for (i = 0; i < MAXKIDS; i++)
        !          1724:       if (p->kids[i])
        !          1725:          m |= sets(p->kids[i]);
        !          1726:    return m;
        !          1727: }
        !          1728: \end{verbatim}
        !          1729: 
        !          1730: In this example, \verb|spillee| finds \verb|r1| used
        !          1731: in node 15, which is at `distance' 4, and
        !          1732: \verb|r3| and \verb|r4| used in node 11 at distance 0.
        !          1733: Consequently, \verb|spillee| returns \verb|r1|, which denotes
        !          1734: both \verb|r1| and \verb|r2| because \verb|m| is 3.
        !          1735: 
        !          1736: The implementations of \verb|spillee| above and \verb|spill| below
        !          1737: assume that the instruction
        !          1738: emitted for each node reads the registers it uses before setting
        !          1739: the register allocated to it. This assumption
        !          1740: permits \verb|ralloc| to reassign registers used by a node to that node.
        !          1741: It also permits \verb|spillee| to start its scan {\em after} the
        !          1742: current instruction.
        !          1743: This assumption is invalid on machines with two-address instructions.
        !          1744: 
        !          1745: Actually spilling the register chosen by \verb|spillee| and inserting
        !          1746: the reloads is done by \verb|spill| and its associates, \verb|genspill|
        !          1747: and \verb|genreloads|.
        !          1748: In the example, these functions collaborate to `break the dag'
        !          1749: at node 1, which sets register \verb|r1| (\verb|r2| is already free).
        !          1750: \verb|genspill| generates a temporary and the nodes necessary
        !          1751: to store the value computed by node 1 into the temporary.
        !          1752: These new nodes are stitched into the linearized forest immediately
        !          1753: after node 1. \verb|genreloads| changes future {\em uses} of the
        !          1754: value computed by node 1 to reload the value from
        !          1755: the temporary instead of referencing node 1 directly.
        !          1756: The effect is to remove the common subexpression
        !          1757: by assigning it to a temporary.
        !          1758: 
        !          1759: \verb|spill| identifies the locations at which
        !          1760: to insert the spills by searching the linearized forest beyond \verb|dot|
        !          1761: for nodes that use the registers that are to be spilled, e.g.,
        !          1762: \verb|r1| in the example.
        !          1763: \begin{verbatim}
        !          1764: static void spill(int r, unsigned m, Node dot) {
        !          1765:    int i;
        !          1766:    Node p = dot;
        !          1767: 
        !          1768:    while (p = p->x.next)
        !          1769:       for (i = 0; i < MAXKIDS; i++)
        !          1770:          if (p->kids[i] && sets(p->kids[i])&(m<<r)) {
        !          1771:             Symbol temp = genspill(p->kids[i]);
        !          1772:             rmask &= ~sets(p->kids[i]);
        !          1773:             genreloads(dot, p->kids[i], temp);
        !          1774:          }
        !          1775: }
        !          1776: \end{verbatim}
        !          1777: \verb|genspill| allocates the temporary,
        !          1778: generates the spill code, and inserts it into the linearized forest
        !          1779: right after the node that set the spillee.
        !          1780: It returns the symbol-table entry for the temporary
        !          1781: so that it can be used by \verb|genreloads| to generate the reloads.
        !          1782: \begin{verbatim}
        !          1783: static Symbol genspill(Node p) {
        !          1784:    Symbol temp = newtemp(AUTO, typecode(p));
        !          1785:    Node q = p->x.next;
        !          1786: 
        !          1787:    linearize(newnode(ASGN + typecode(p),
        !          1788:       newnode(ADDRLP, 0, 0, temp), p, 0),
        !          1789:       &p->x.next, p->x.next);
        !          1790:    rmask &= ~1;
        !          1791:    for (p = p->x.next; p != q; p = p->x.next)
        !          1792:       ralloc(p);
        !          1793:    rmask |= 1;
        !          1794:    return temp;
        !          1795: }
        !          1796: \end{verbatim}
        !          1797: \verb|typecode| is like \verb|optype|, but maps \verb|U| to \verb|I|
        !          1798: because the front-end uses \verb|ASGNI| and \verb|INDIRI|
        !          1799: to store and load unsigned values. It also maps \verb|B| to
        !          1800: \verb|P| because registers always point structures.
        !          1801: The front-end's \verb|newtemp(sclass, type)|
        !          1802: allocates a new temporary or reuses an existing one if
        !          1803: its storage \verb|sclass| and \verb|type| code match those requested,
        !          1804: and it announces a new temporary by calling \verb|local| as usual.
        !          1805: 
        !          1806: The spill code is an \verb|ADDRLP| node to compute the
        !          1807: address of the temporary and an \verb|ASGN| node to store the value
        !          1808: into that address. The right operand of the \verb|ASGN| is simply
        !          1809: \verb|p|, the node that set the registers to spill, e.g.,
        !          1810: node 1 above.
        !          1811: A node is allocated and initialized by the front-end function
        !          1812: \verb|newnode(op,l,r,sym)|, where
        !          1813: \verb|op| is the operator, \verb|l| and \verb|r|
        !          1814: are \verb|kids[0]| and \verb|kids[1]|, and \verb|sym| is the symbol
        !          1815: table pointer for leaf nodes. \verb|newnode| also increments
        !          1816: the reference counts of \verb|l| and \verb|r|.
        !          1817: 
        !          1818: Finally, \verb|genspill| linearizes the spill code and
        !          1819: inserts it right after \verb|p|, which sets the spilled register.
        !          1820: The manipulation of \verb|rmask| ensures that \verb|ralloc|
        !          1821: assigns register \verb|r0| --- which is not otherwise allocated ---
        !          1822: to the \verb|ADDRLP| node to compute the address.
        !          1823: The linearized forest after \verb|genspill| returns is
        !          1824: \begin{verbatim}
        !          1825:  #  op      kids    syms  count  uses   sets
        !          1826:  1. ADDRGP          _i      1           r1
        !          1827: 16. ADDRLP          -4(fp)  0           r0
        !          1828: 17. ASGNP   16  1           0    r0 r1        
        !          1829:  2. INDIRI   1              0    r1     r2
        !          1830: ...     
        !          1831: \end{verbatim}
        !          1832: Nodes 16 and 17 are the spill code. This code references
        !          1833: node 1, so its \verb|count| goes to 2 momentarily
        !          1834: until \verb|ralloc| processes the reference from the spill code.
        !          1835: The remaining reference is from node 15.
        !          1836: Node 16's symbol \verb|-4(fp)| is the temporary location.
        !          1837: 
        !          1838: After \verb|genspill| returns the temporary,
        !          1839: \verb|spill| frees the registers set by the node and calls \verb|genreloads|.
        !          1840: \verb|genreloads| searches the linearized forest
        !          1841: after \verb|dot| for uses of \verb|p|.
        !          1842: \begin{verbatim}
        !          1843: static void genreloads(Node dot, Node p, Symbol temp) {
        !          1844:    int i;
        !          1845:    Node last;
        !          1846: 
        !          1847:    for (last = dot; dot = dot->x.next; last = dot)
        !          1848:       for (i = 0; i < MAXKIDS; i++)
        !          1849:          if (dot->kids[i] == p) {
        !          1850:             dot->kids[i] = newnode(INDIR + typecode(p),
        !          1851:                newnode(ADDRL+P, 0, 0, temp), 0, 0);
        !          1852:             dot->kids[i]->count = 1;
        !          1853:             p->count--;
        !          1854:             linearize(dot->kids[i], &last->x.next, last->x.next);
        !          1855:             last = dot->kids[i];
        !          1856:          }
        !          1857: }
        !          1858: \end{verbatim}
        !          1859: Each use of \verb|p| is changed
        !          1860: to a reload of the temporary \verb|temp|.
        !          1861: The reload code is an \verb|ADDRLP| node to compute the
        !          1862: address of the temporary and an \verb|INDIR| node to load the value
        !          1863: from that address. There is only one reference to each reload, so the
        !          1864: \verb|INDIR|'s \verb|count| is set accordingly,
        !          1865: and \verb|p->count| is decremented to reflect the change.
        !          1866: The reload code is linearized and inserted into the linearized node
        !          1867: list immediately {\em before} its use.
        !          1868: For the example above, a reload is placed before node 15.
        !          1869: Since the reload is beyond the point that \verb|gen| has reached,
        !          1870: registers are allocated for these nodes by subsequent calls to \verb|ralloc|.
        !          1871: 
        !          1872: The linearized forest after \verb|spill| returns is
        !          1873: \begin{verbatim}
        !          1874:  #  op      kids    syms  count  uses   sets
        !          1875:  1. ADDRGP          _i      0           r1
        !          1876: 16. ADDRLP          -4(fp)  0           r0
        !          1877: 17. ASGNP   16  1           0    r0 r1        
        !          1878:  2. INDIRI   1              0    r1     r2
        !          1879: ...
        !          1880: 14. CVDI    13              1                 
        !          1881: 18. ADDRLP          -4(fp)  1                 
        !          1882: 19. INDIRP  18              1                 
        !          1883: 15. ASGNI   19 14           0                 
        !          1884: \end{verbatim}
        !          1885: Nodes 18 and 19 are the reload.
        !          1886: Note that node 1's \verb|count| has become 0;
        !          1887: after inserting the reload, there are no unprocessed references.
        !          1888: 
        !          1889: Another spill occurs at node 11, which computes \verb|a[i]+b[i]|.
        !          1890: Registers \verb|r1| and \verb|r2|, which are set by node 10, are spilled,
        !          1891: and node 12 is edited to refer to the second temporary, which is reloaded
        !          1892: by nodes 22 and 23 inserted before node 12.
        !          1893: The last spill occurs at node 23, which is the reload
        !          1894: created for the second spill. \verb|r1| and \verb|r2|, which are
        !          1895: set by node 11, are spilled again, and node 13 is edited
        !          1896: to refer to the third temporary.
        !          1897: The final linearized forest and corresponding VAX code follows;
        !          1898: \verb|count| fields are omitted because they're all 0,
        !          1899: and each instruction shows which registers are used and set by each node.
        !          1900: \begin{verbatim}
        !          1901: VAX instruction     #  op      kids    syms
        !          1902: moval _i,r1         1. ADDRGP          _i
        !          1903: moval -4(fp),r0    16. ADDRLP          -4(fp)
        !          1904: movl r1,(r0)       17. ASGNP   16  1
        !          1905: movl (r1),r2        2. INDIRI   1
        !          1906: movl $3,r3          3. CNSTI           3
        !          1907: ashl r3,r2,r2       4. LSHI     2  3
        !          1908: moval _a,r3         5. ADDRGP          _a
        !          1909: addl3 r3,r2,r3      6. ADDP     4  5
        !          1910: movd (r3),r3        7. INDIRD   6
        !          1911: moval _b,r5         8. ADDRGP          _b
        !          1912: addl3 r5,r2,r2      9. ADDP     4  8
        !          1913: movd (r2),r1       10. INDIRD   9
        !          1914: moval -12(fp),r0   20. ADDRLP          -12(fp)
        !          1915: movd r1,(r0)       21. ASGND   20 10
        !          1916: addd3 r1,r3,r1     11. ADDD     7 10
        !          1917: moval -20(fp),r0   24. ADDRLP          -20(fp)
        !          1918: movd r1,(r0)       25. ASGND   24 11
        !          1919: moval -12(fp),r5   22. ADDRLP          -12(fp)
        !          1920: movd (r5),r1       23. INDIRD  22
        !          1921: subd3 r1,r3,r1     12. SUBD     7 23
        !          1922: moval -20(fp),r3   26. ADDRLP          -20(fp)
        !          1923: movd (r3),r3       27. INDIRD  26
        !          1924: muld3 r1,r3,r1     13. MULD    27 12
        !          1925: cvtdl r1,r1        14. CVDI    13
        !          1926: moval -4(fp),r2    18. ADDRLP          -4(fp)
        !          1927: movl (r2),r2       19. INDIRP  18
        !          1928: movl r1,(r2)       15. ASGNI   19 14
        !          1929: \end{verbatim}
        !          1930: Spills have added nodes 16--27.
        !          1931: 
        !          1932: 
        !          1933: \section{Discussion}
        !          1934: 
        !          1935: \verb|lcc|'s code generation interface is smaller than most
        !          1936: because it omits the inessential and makes simplifying assumptions.
        !          1937: These omissions and assumptions do, however,
        !          1938: limit the interface's applicability to other languages and machines.
        !          1939: 
        !          1940: The datatype assumptions detailed in Section~\ref{configuration},
        !          1941: e.g., that unsigneds, integers, and long integers all have the same size,
        !          1942: make it possible to have only 9 type suffixes and 109 type-specific operators.
        !          1943: Relaxing these assumptions would increase this vocabulary;
        !          1944: e.g., adding a suffix for long doubles would also add at least
        !          1945: 19 more operators.
        !          1946: 
        !          1947: The interface assumes that all pointer types have the same representation,
        !          1948: which precludes its use for word-addressed machines.
        !          1949: Differentiating between character and word pointers
        !          1950: would add another suffix and at least 13 more operators.
        !          1951: 
        !          1952: The operator repertoire omits some operators whose effect
        !          1953: can be synthesized from simpler ones.
        !          1954: For example, bit fields are accessed with shifting
        !          1955: and masking instead of specific bit-field operators, so
        !          1956: code quality may suffer on machines with bit-field instructions.
        !          1957: The front end special-cases one-bit fields
        !          1958: and generates efficient masking dags,
        !          1959: which often yields better code than code
        !          1960: that uses bit-field instructions.
        !          1961: 
        !          1962: The front end implements switch statements completely.
        !          1963: It generates a binary search of dense branch tables;
        !          1964: inline comparisons replace small tables~\cite{bernstein85}.
        !          1965: It fabricates the tables and indirect jumps
        !          1966: using the functions \verb|global| and \verb|defaddress|
        !          1967: and the \verb|JUMPV| operator.
        !          1968: This decision prevents back ends from using larger `case' instructions,
        !          1969: which usually combine a bounds check and an indirect branch through a table,
        !          1970: but these instructions are increasingly rare, and some don't suit C anyway.
        !          1971: For example, the branch table for the VAX's \verb|casel| instruction is limited
        !          1972: to 16-bit offsets.
        !          1973: 
        !          1974: The interface has no direct support for building
        !          1975: a flow graph and other structures that facilitate
        !          1976: global optimization. More elaborate versions of
        !          1977: \verb|function| and \verb|gen| could collaborate
        !          1978: to build the relevant structures, perform optimizations,
        !          1979: and invoke the simpler \verb|gen|.
        !          1980: The front end's \verb|gencode| and \verb|emitcode| traverse
        !          1981: an approximation of a flow graph; future work may focus on
        !          1982: a machine-independent optimizer that edits
        !          1983: this graph while preserving the current interface.
        !          1984: 
        !          1985: To date, the interface has been used only for ANSI~C.
        !          1986: But it has little that is specific to C, and it could be used for
        !          1987: similar languages and perhaps for languages with features like nested
        !          1988: procedures, objects, and exception handling.
        !          1989: Existing compilers for some object-oriented languages with these features,
        !          1990: such as C\verb|++|, Modula-3, and Eiffel, generate C, so, in principle,
        !          1991: the interface could be used for these languages.
        !          1992: 
        !          1993: %\bibliography{refs,lib}
        !          1994: \begin{thebibliography}{1}
        !          1995: 
        !          1996: \bibitem{ansi:Cstandard}
        !          1997: American National Standard Institute, Inc., New York.
        !          1998: \newblock {\em American National Standards for Information Systems, Programming
        !          1999:   Language C {ANSI} X3.159--1989}, 1990.
        !          2000: 
        !          2001: \bibitem{bernstein85}
        !          2002: R.~L. Bernstein.
        !          2003: \newblock Producing good code for the case statement.
        !          2004: \newblock {\em Software---Practice \& Experience}, 15(10):1021--1024, Oct.
        !          2005:   1985.
        !          2006: 
        !          2007: \bibitem{fraser:sigplan89}
        !          2008: C.~W. Fraser.
        !          2009: \newblock A language for writing code generators.
        !          2010: \newblock {\em Proceedings of the SIGPLAN'89 Conference on Programming Language
        !          2011:   Design and Implementation, SIGPLAN Notices}, 24(7):238--245, July 1989.
        !          2012: 
        !          2013: \bibitem{fraser:hanson:91a}
        !          2014: C.~W. Fraser and D.~R. Hanson.
        !          2015: \newblock A code generation interface for {ANSI C}.
        !          2016: \newblock {\em Software---Practice \& Experience}, 21(9):963--988, Sept. 1991.
        !          2017: 
        !          2018: \bibitem{fraser:hanson:91b}
        !          2019: C.~W. Fraser and D.~R. Hanson.
        !          2020: \newblock A retargetable compiler for {ANSI C}.
        !          2021: \newblock {\em SIGPLAN Notices}, 26(10):29--43, Oct. 1991.
        !          2022: 
        !          2023: \bibitem{fraser:hanson:92}
        !          2024: C.~W. Fraser and D.~R. Hanson.
        !          2025: \newblock Simple register spilling in a retargetable compiler.
        !          2026: \newblock {\em Software---Practice \& Experience}, 22(1):85--99, Jan. 1992.
        !          2027: 
        !          2028: \bibitem{freiburghouse}
        !          2029: R.~A. Freiburghouse.
        !          2030: \newblock Register allocation via usage counts.
        !          2031: \newblock {\em Communications of the ACM}, 17(11):638--642, Nov. 1974.
        !          2032: 
        !          2033: \bibitem{tanenbaum:sigplan:89}
        !          2034: A.~S. Tanenbaum, M.~F. Kaashoek, K.~G. Langendoen, and C.~J.~H. Jacobs.
        !          2035: \newblock The design of very fast portable compilers.
        !          2036: \newblock {\em SIGPLAN Notices}, 24(11):125--131, Nov. 1989.
        !          2037: 
        !          2038: \bibitem{tanenbaum:toplas:82}
        !          2039: A.~S. Tanenbaum, H.~van Staveren, and J.~W. Stevenson.
        !          2040: \newblock Using peephole optimization on intermediate code.
        !          2041: \newblock {\em ACM Transactions on Programming Languages and Systems},
        !          2042:   4(1):21--36, Jan. 1982.
        !          2043: 
        !          2044: \end{thebibliography}
        !          2045: 
        !          2046: \appendix
        !          2047: 
        !          2048: \section{Interface Functions}
        !          2049: \label{appendix:interface}
        !          2050: 
        !          2051: \begin{center}
        !          2052: \begin{tabular}{cl}
        !          2053: \em Section    & \em Interface Function \\ \hline
        !          2054: \ref{address}  & \tt void address(Symbol, Symbol, int) \\
        !          2055: \ref{blockbeg} & \tt void blockbeg(Env *e) \\
        !          2056: \ref{blockend} & \tt void blockend(Env *e) \\
        !          2057: \ref{defaddress}& \tt void defaddress(Symbol) \\
        !          2058: \ref{defconst} & \tt void defconst(int, Value) \\
        !          2059: \ref{defstring}        & \tt void defstring(int, char *) \\
        !          2060: \ref{defsymbol}        & \tt void defsymbol(Symbol) \\
        !          2061: \ref{emit}     & \tt void emit(Node) \\
        !          2062: \ref{export}   & \tt void export(Symbol) \\
        !          2063: \ref{function} & \tt void function(Symbol, Symbol [], Symbol [], int) \\
        !          2064: \ref{gen}      & \tt Node gen(Node) \\
        !          2065: \ref{global}   & \tt void global(Symbol) \\
        !          2066: \ref{import}   & \tt void import(Symbol) \\
        !          2067: \ref{local}    & \tt void local(Symbol) \\
        !          2068: \ref{progbeg}  & \tt void progbeg(int, char *[]) \\
        !          2069: \ref{progend}  & \tt void progend(void) \\
        !          2070: \ref{segment}  & \tt void segment(int) \\
        !          2071: \ref{space}    & \tt void space(int) \\
        !          2072: \end{tabular}
        !          2073: \end{center}
        !          2074: 
        !          2075: 
        !          2076: \section{Front-End Functions}
        !          2077: 
        !          2078: \label{appendix:functions}
        !          2079: 
        !          2080: \begin{description}
        !          2081: 
        !          2082: \item[{\tt void *alloc(int n)}] permanently allocates \verb|n| bytes and returns
        !          2083: a pointer to the first byte.  No alignment is guaranteed.
        !          2084: 
        !          2085: \item[{\tt void fprint(int fd, char *fmt, \ldots)}] prints up to 5 arguments on
        !          2086: the file descriptor \verb|fd|. See \verb|print| for formatting details.
        !          2087: If \verb|fd| is not \verb|1| (standard output),
        !          2088: \verb|fprint| flushes the output buffer for \verb|fd| via \verb|outflush|.
        !          2089: 
        !          2090: \item[{\tt Type freturn(Type t)}] is the type of the return value for
        !          2091: function type \verb|t|.
        !          2092: 
        !          2093: \item[{\tt int generic(int op)}] is the generic version
        !          2094: of the type-specific operator \verb|op|.
        !          2095: 
        !          2096: \item[{\tt int genlabel(int n)}] increments the generated identifier
        !          2097: counter by \verb|n| and returns its previous value.
        !          2098: 
        !          2099: \item[{\tt int is\it type\/\tt(Type t)}] are a set of type predicates that
        !          2100: return non-zero if type \verb|t| is the type in the following table.
        !          2101: 
        !          2102: \begin{center}
        !          2103: \begin{tabular}{lllll}
        !          2104: \it predicate  & \it type      && \it predicate        & \it type \\ \hline
        !          2105: \tt isarith    & arithmetic    && \tt isint            & integral \\
        !          2106: \tt isarray    & array         && \tt isptr            & pointer \\
        !          2107: \tt ischar     & character     && \tt isscalar         & scalar \\
        !          2108: \tt isdouble   & double        && \tt isstruct         & structure or union \\
        !          2109: \tt isenum     & enumeration   && \tt isunion          & union \\
        !          2110: \tt isfloat    & floating      && \tt isunsigned       & unsigned \\
        !          2111: \tt isfunc     & function      && \\
        !          2112: \end{tabular}
        !          2113: \end{center}
        !          2114: 
        !          2115: \item[{\tt Node newnode(int op, Node l, Node r, Symbol sym)}] allocates a node
        !          2116: and initializes the \verb|op| field to \verb|op|, \verb|kids[0]| and \verb|kids[1]|
        !          2117: to \verb|l| and \verb|r|, and \verb|syms[0]| to \verb|sym| and returns
        !          2118: a pointer to the new node. If \verb|l| and \verb|r| are non-null,
        !          2119: their \verb|count| fields are incremented.
        !          2120: 
        !          2121: \item[{\tt Symbol newconst(Value v, int t)}] installs an constant with value \verb|v|
        !          2122: and type suffix \verb|t| into the symbol table, if necessary,
        !          2123: and returns a pointer to the symbol-table entry.
        !          2124: 
        !          2125: \item[{\tt Symbol newtemp(int sclass, int t)}] creates a temporary
        !          2126: with storage class \verb|sclass| and a type synonymous with type suffix \verb|t|
        !          2127: and returns a pointer the symbol-table entry.
        !          2128: If an existing temporary with the appropriate class and type is available,
        !          2129: it is used; otherwise, the new temporary is announced by calling \verb|local|.
        !          2130: 
        !          2131: \item[{\tt void outflush(void)}] writes the current output buffer
        !          2132: to the standard output, if it's not empty.
        !          2133: 
        !          2134: \item[{\tt void outs(char *s)}] appends string \verb|s| to the output buffer
        !          2135: for standard output and calls \verb|outflush|
        !          2136: if the resulting buffer pointer is within 80 characters
        !          2137: of the end of the buffer.
        !          2138: 
        !          2139: \item[{\tt int optype(op)}] is the type suffix of the type-specific operator \verb|op|.
        !          2140: 
        !          2141: \item[{\tt void print(char *fmt, \ldots)}] prints up to 5 arguments on standard output
        !          2142: similar to \verb|printf|. The supported formats are
        !          2143: \verb|%c|, \verb|%d|, \verb|%o|, \verb|%x|, and \verb|%s|, 
        !          2144: and precision and field width specifications are not supported.
        !          2145: \verb|print| calls \verb|outflush| if it prints a newline character from
        !          2146: \verb|fmt| within 80 characters of the end of the output buffer,
        !          2147: and each format except \verb|%c| does the actual output
        !          2148: with \verb|outs|, which may also flush the buffer.
        !          2149: 
        !          2150: \item[{\tt int roundup(int n, int m)}] is \verb|n| rounded up to the
        !          2151: next multiple of \verb|m| where \verb|m| is a power of 2.
        !          2152: 
        !          2153: \item[{\tt char *string(char *s)}] installs \verb|s| in the string space,
        !          2154: if necessary, and returns a pointer to the installed copy.
        !          2155: 
        !          2156: \item[{\tt char *stringd(int n)}] returns the string representation of \verb|n|;
        !          2157: the returned string is installed in the string space by \verb|string|.
        !          2158: 
        !          2159: \item[{\tt char *stringf(char *fmt, \ldots)}] formats up to 5 arguments
        !          2160: into a string in the string space and returns a pointer to that string.
        !          2161: See \verb|print| for formatting details.
        !          2162: 
        !          2163: \item[{\tt void *talloc(int n)}] temporarily allocates \verb|n| bytes and returns
        !          2164: a pointer to the first byte. The storage is released at the end of
        !          2165: the current function.  No alignment is guaranteed.
        !          2166: 
        !          2167: \item[{\tt int ttob(Type t)}] is the type suffix synonymous with type \verb|t|.
        !          2168: 
        !          2169: \item[{\tt int variadic(Type t)}] is true if type \verb|t| denotes a variadic function.
        !          2170: 
        !          2171: \end{description}
        !          2172: 
        !          2173: \end{document}

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.