Annotation of 43BSDReno/share/doc/smm/19.porttour/porttour2.ms, revision 1.1.1.1

1.1       root        1: .\"    @(#)porttour2.ms        6.3 (Berkeley) 5/30/86
                      2: .\"
                      3: .SH
                      4: Pass Two
                      5: .PP
                      6: Code generation is far less well understood than
                      7: parsing or lexical analysis, and for this reason
                      8: the second pass is far harder to discuss in a file by file manner.
                      9: A great deal of the difficulty is in understanding the issues
                     10: and the strategies employed to meet them.
                     11: Any particular function is likely to be reasonably straightforward.
                     12: .PP
                     13: Thus, this part of the paper will concentrate a good deal on the broader
                     14: aspects of strategy in the code generator,
                     15: and will not get too intimate with the details.
                     16: .SH
                     17: Overview
                     18: .PP
                     19: It is difficult to organize a code generator to be flexible enough to
                     20: generate code for a large number of machines,
                     21: and still be efficient for any one of them.
                     22: Flexibility is also important when it comes time to tune
                     23: the code generator to improve the output code quality.
                     24: On the other hand, too much flexibility can lead to semantically
                     25: incorrect code, and potentially a combinatorial explosion in the
                     26: number of cases to be considered in the compiler.
                     27: .PP
                     28: One goal of the code generator is to have a high degree of correctness.
                     29: It is very desirable to have the compiler detect its own inability to
                     30: generate correct code, rather than to produce incorrect code.
                     31: This goal is achieved by having a simple model of the job
                     32: to be done (e.g., an expression tree)
                     33: and a simple model of the machine state
                     34: (e.g., which registers are free).
                     35: The act of generating an instruction performs a transformation
                     36: on the tree and the machine state;
                     37: hopefully, the tree eventually gets
                     38: reduced to a single node.
                     39: If each of these instruction/transformation pairs is correct,
                     40: and if the machine state model really represents the actual machine,
                     41: and if the transformations reduce the input tree to the desired single node, then the
                     42: output code will be correct.
                     43: .PP
                     44: For most real machines, there is no definitive theory of code generation that
                     45: encompasses all the C operators.
                     46: Thus the selection of which instruction/transformations to generate, and in what
                     47: order, will have a heuristic flavor.
                     48: If, for some expression tree, no transformation applies, or, more
                     49: seriously, if the heuristics select a sequence of instruction/transformations
                     50: that do not in fact reduce the tree, the compiler
                     51: will report its inability to generate code, and abort.
                     52: .PP
                     53: A major part of the code generator is concerned with the model and the transformations.
                     54: Most of this is machine independent, or depends only on simple tables.
                     55: The flexibility comes from the heuristics that guide the transformations
                     56: of the trees, the selection of subgoals, and the ordering of the computation.
                     57: .SH
                     58: The Machine Model
                     59: .PP
                     60: The machine is assumed to have a number of registers, of at most two different
                     61: types:
                     62: .I A
                     63: and
                     64: .I B .
                     65: Within each register class, there may be scratch (temporary) registers and dedicated registers
                     66: (e.g., register variables, the stack pointer, etc.).
                     67: Requests to allocate and free registers involve only the temporary registers.
                     68: .PP
                     69: Each of the registers in the machine is given a name and a number
                     70: in the
                     71: .I mac2defs.h
                     72: file; the numbers are used as indices into various tables
                     73: that describe the registers, so they should be kept small.
                     74: One such table is the
                     75: .I rstatus
                     76: table on file
                     77: .I local2.c .
                     78: This table is indexed by register number, and
                     79: contains expressions made up from manifest constants describing the register types:
                     80: SAREG for dedicated AREG's, SAREG\(orSTAREG for scratch AREG's,
                     81: and SBREG and SBREG\(orSTBREG similarly for BREG's.
                     82: There are macros that access this information:
                     83: .I isbreg(r)
                     84: returns true if register number
                     85: .I r
                     86: is a BREG, and
                     87: .I istreg(r)
                     88: returns true if register number
                     89: .I r
                     90: is a temporary AREG or BREG.
                     91: Another table,
                     92: .I rnames ,
                     93: contains the register names; this is used when
                     94: putting out assembler code and diagnostics.
                     95: .PP
                     96: The usage of registers is kept track of by an array called
                     97: .I busy .
                     98: .I Busy[r]
                     99: is the number of uses of register
                    100: .I r
                    101: in the current tree being processed.
                    102: The allocation and freeing of registers will be discussed later as part of the code generation
                    103: algorithm.
                    104: .SH
                    105: General Organization
                    106: .PP
                    107: As mentioned above, the second pass reads lines from
                    108: the intermediate file, copying through to the output unchanged
                    109: any lines that begin with a `)', and making note of the
                    110: information about stack usage and register allocation contained on
                    111: lines beginning with `]' and `['.
                    112: The expression trees, whose beginning is indicated by a line beginning with `.',
                    113: are read and rebuilt into trees.
                    114: If the compiler is loaded as one pass, the expression trees are
                    115: immediately available to the code generator.
                    116: .PP
                    117: The actual code generation is done by a hierarchy of routines.
                    118: The routine
                    119: .I delay
                    120: is first given the tree; it attempts to delay some postfix
                    121: \fB+\|+\fP and \fB\-\|\-\fP computations that might reasonably be done after the
                    122: smoke clears.
                    123: It also attempts to handle comma (`,') operators by computing the
                    124: left side expression first, and then rewriting the tree
                    125: to eliminate the operator.
                    126: .I Delay
                    127: calls
                    128: .I codgen
                    129: to control the actual code generation process.
                    130: .I Codgen
                    131: takes as arguments a pointer to the expression tree,
                    132: and a second argument that, for socio-historical reasons, is called a
                    133: .I cookie .
                    134: The cookie describes a set of goals that would be acceptable
                    135: for the code generation: these are assigned to individual bits,
                    136: so they may be logically or'ed together to form a large number of possible goals.
                    137: Among the possible goals are FOREFF (compute for side effects only;
                    138: don't worry about the value), INTEMP (compute and store value into a temporary location
                    139: in memory), INAREG (compute into an A register), INTAREG (compute into a scratch
                    140: A register), INBREG and INTBREG similarly, FORCC (compute for condition codes),
                    141: and FORARG (compute it as a function argument; e.g., stack it if appropriate).
                    142: .PP
                    143: .I Codgen
                    144: first canonicalizes the tree by calling
                    145: .I canon .
                    146: This routine looks for certain transformations that might now be
                    147: applicable to the tree.
                    148: One, which is very common and very powerful, is to
                    149: fold together an indirection operator (UNARY MUL)
                    150: and a register (REG); in most machines, this combination is
                    151: addressable directly, and so is similar to a NAME in its
                    152: behavior.
                    153: The UNARY MUL and REG are folded together to make
                    154: another node type called OREG.
                    155: In fact, in many machines it is possible to directly address not just the
                    156: cell pointed to by a register, but also cells differing by a constant
                    157: offset from the cell pointed to by the register.
                    158: .I Canon
                    159: also looks for such cases, calling the
                    160: machine dependent routine 
                    161: .I notoff
                    162: to decide if the offset is acceptable (for example, in the IBM 370 the offset
                    163: must be between 0 and 4095 bytes).
                    164: Another optimization is to replace bit field operations
                    165: by shifts and masks if the operation involves extracting the field.
                    166: Finally, a machine dependent routine,
                    167: .I sucomp ,
                    168: is called that computes the Sethi-Ullman numbers
                    169: for the tree (see below).
                    170: .PP
                    171: After the tree is canonicalized,
                    172: .I codgen
                    173: calls the routine
                    174: .I store
                    175: whose job is to select a subtree of the tree to be computed
                    176: and (usually) stored before beginning the
                    177: computation of the full tree.
                    178: .I Store
                    179: must return a tree that can be computed without need
                    180: for any temporary storage locations.
                    181: In effect, the only store operations generated while processing the subtree must be as a response
                    182: to explicit assignment operators in the tree.
                    183: This division of the job marks one of the more significant, and successful,
                    184: departures from most other compilers.
                    185: It means that the code generator can operate
                    186: under the assumption that there are enough
                    187: registers to do its job, without worrying about
                    188: temporary storage.
                    189: If a store into a temporary appears in the output, it is always
                    190: as a direct result of logic in the
                    191: .I store
                    192: routine; this makes debugging easier.
                    193: .PP
                    194: One consequence of this organization is that
                    195: code is not generated by a treewalk.
                    196: There are theoretical results that support this decision.
                    197: .[
                    198: Aho Johnson Optimal Code Trees jacm
                    199: .]
                    200: It may be desirable to compute several subtrees and store
                    201: them before tackling the whole tree;
                    202: if a subtree is to be stored, this is known before the code
                    203: generation for the subtree is begun, and the subtree is computed
                    204: when all scratch registers are available.
                    205: .PP
                    206: The
                    207: .I store
                    208: routine decides what subtrees, if any, should be
                    209: stored by making use of numbers,
                    210: called
                    211: .I "Sethi-Ullman numbers" ,
                    212: that give, for each subtree of an expression tree,
                    213: the minimum number of scratch registers required to
                    214: compile the subtree, without any stores into temporaries.
                    215: .[
                    216: Sethi Ullman jacm 1970
                    217: .]
                    218: These numbers are computed by the machine-dependent
                    219: routine
                    220: .I sucomp ,
                    221: called by
                    222: .I canon .
                    223: The basic notion is that, knowing the Sethi-Ullman numbers for
                    224: the descendants of a node, and knowing the operator of the
                    225: node and some information about the machine, the
                    226: Sethi-Ullman number of the node itself can be computed.
                    227: If the Sethi-Ullman number for a tree exceeds the number of scratch
                    228: registers available, some subtree must be stored.
                    229: Unfortunately, the theory behind the Sethi-Ullman numbers
                    230: applies only to uselessly simple machines and operators.
                    231: For the rich set of C operators, and for machines with
                    232: asymmetric registers, register pairs, different kinds of registers,
                    233: and exceptional forms of addressing,
                    234: the theory cannot be applied directly.
                    235: The basic idea of estimation is a good one, however,
                    236: and well worth applying; the application, especially
                    237: when the compiler comes to be tuned for high code
                    238: quality, goes beyond the park of theory into the
                    239: swamp of heuristics.
                    240: This topic will be taken up again later, when more of the compiler
                    241: structure has been described.
                    242: .PP
                    243: After examining the Sethi-Ullman numbers,
                    244: .I store
                    245: selects a subtree, if any, to be stored, and returns the subtree and the associated cookie in
                    246: the external variables
                    247: .I stotree
                    248: and
                    249: .I stocook .
                    250: If a subtree has been selected, or if
                    251: the whole tree is ready to be processed, the
                    252: routine
                    253: .I order
                    254: is called, with a tree and cookie.
                    255: .I Order
                    256: generates code for trees that
                    257: do not require temporary locations.
                    258: .I Order
                    259: may make recursive calls on itself, and,
                    260: in some cases, on
                    261: .I codgen ;
                    262: for example, when processing the operators \fB&&\fP, \fB\(or\(or\fP, and comma (`,'), that have
                    263: a left to right evaluation, it is
                    264: incorrect for
                    265: .I store
                    266: examine the right operand for subtrees to be stored.
                    267: In these cases,
                    268: .I order
                    269: will call
                    270: .I codgen
                    271: recursively when it is permissible to work on the right operand.
                    272: A similar issue arises with the \fB? :\fP operator.
                    273: .PP
                    274: The
                    275: .I order
                    276: routine works by matching the current tree with
                    277: a set of code templates.
                    278: If a template is discovered that will
                    279: match the current tree and cookie, the associated assembly language
                    280: statement or statements are generated.
                    281: The tree is then rewritten,
                    282: as specified by the template, to represent the effect of the output instruction(s).
                    283: If no template match is found, first an attempt is made to find a match with a
                    284: different cookie; for example, in order to compute
                    285: an expression with cookie INTEMP (store into a temporary storage location),
                    286: it is usually necessary to compute the expression into a scratch register
                    287: first.
                    288: If all attempts to match the tree fail, the heuristic part of
                    289: the algorithm becomes dominant.
                    290: Control is typically given to one of a number of machine-dependent routines
                    291: that may in turn recursively call
                    292: .I order
                    293: to achieve a subgoal of the computation (for example, one of the
                    294: arguments may be computed into a temporary register).
                    295: After this subgoal has been achieved, the process begins again with the
                    296: modified tree.
                    297: If the machine-dependent heuristics are unable to reduce the tree further,
                    298: a number of default rewriting rules may be considered appropriate.
                    299: For example, if the left operand of a \fB+\fP is a scratch
                    300: register, the \fB+\fP can be replaced by a \fB+=\fP operator;
                    301: the tree may then match a template.
                    302: .PP
                    303: To close this introduction, we will discuss the steps in compiling
                    304: code for the expression
                    305: .DS
                    306: \fIa\fR += \fIb\fR
                    307: .DE
                    308: where
                    309: .I a
                    310: and
                    311: .I b
                    312: are static variables.
                    313: .PP
                    314: To begin with, the whole expression tree is examined with cookie FOREFF, and
                    315: no match is found.  Search with other cookies is equally fruitless, so an
                    316: attempt at rewriting is made.
                    317: Suppose we are dealing with the Interdata 8/32 for the moment.
                    318: It is recognized that the left hand and right hand sides of the \fB+=\fP operator
                    319: are addressable, and in particular the left hand side has no
                    320: side effects, so it is permissible to rewrite this as
                    321: .DS
                    322: \fIa\fR = \fIa\fR + \fIb\fR
                    323: .DE
                    324: and this is done.
                    325: No match is found on this tree either, so a machine dependent rewrite is done; it is recognized
                    326: that the left hand side of the assignment is addressable, but the right hand side is not
                    327: in a register, so
                    328: .I order
                    329: is called recursively, being asked to put the right
                    330: hand side of the assignment into a register.
                    331: This invocation of
                    332: .I order
                    333: searches the tree for a match, and fails.
                    334: The machine dependent rule for \fB+\fP
                    335: notices that the right hand operand is addressable;
                    336: it decides to put the left operand into a scratch register.
                    337: Another recursive call to
                    338: .I order
                    339: is made, with the tree
                    340: consisting solely of the leaf
                    341: .I a ,
                    342: and the cookie asking that the value be placed into a scratch register.
                    343: This now matches a template, and a load instruction is emitted.
                    344: The node consisting of
                    345: .I a
                    346: is rewritten in place to represent the register into which
                    347: .I a
                    348: is loaded,
                    349: and this third call to
                    350: .I order
                    351: returns.
                    352: The second call to
                    353: .I order
                    354: now finds that it has the tree
                    355: .DS
                    356: \fBreg\fR + \fIb\fR
                    357: .DE
                    358: to consider.
                    359: Once again, there is no match, but the default rewriting rule rewrites
                    360: the \fB+\fP as a \fB+=\fP operator, since the left operand is a scratch register.
                    361: When this is done, there is a match: in fact,
                    362: .DS
                    363: \fBreg\fR += \fIb\fR
                    364: .DE
                    365: simply describes the effect of the add instruction
                    366: on a typical machine.
                    367: After the add is emitted, the tree is rewritten
                    368: to consist merely of the register node, since the result of the add
                    369: is now in the register.
                    370: This agrees with the cookie passed to the second invocation of
                    371: .I order ,
                    372: so this invocation
                    373: terminates, returning to the first level.
                    374: The original tree has now
                    375: become
                    376: .DS
                    377: \fIa\fR = \fBreg\fR
                    378: .DE
                    379: which matches a template for the store instruction.
                    380: The store is output, and the tree rewritten to become
                    381: just a single register node.
                    382: At this point, since the top level call to
                    383: .I order
                    384: was
                    385: interested only in side effects, the call to
                    386: .I order
                    387: returns, and the code generation is completed;
                    388: we have generated a load, add, and store, as might have been expected.
                    389: .PP
                    390: The effect of machine architecture on this is considerable.
                    391: For example, on the Honeywell 6000, the machine dependent heuristics recognize that there is an ``add to storage''
                    392: instruction, so the strategy is quite different;
                    393: .I b
                    394: is loaded in to
                    395: a register, and then an add to storage instruction generated
                    396: to add this register in to
                    397: .I a .
                    398: The transformations, involving as they do the semantics of C,
                    399: are largely machine independent.
                    400: The decisions as to when to use them, however, are
                    401: almost totally machine dependent.
                    402: .PP
                    403: Having given a broad outline of
                    404: the code generation process, we shall next consider the
                    405: heart of it: the templates.
                    406: This leads naturally into discussions of template matching and register allocation,
                    407: and finally a discussion of the machine dependent interfaces and strategies.
                    408: .SH
                    409: The Templates
                    410: .PP
                    411: The templates describe the effect of the target machine instructions
                    412: on the model of computation around which the compiler is organized.
                    413: In effect, each template has five logical sections, and represents an assertion
                    414: of the form:
                    415: .IP
                    416: .B If
                    417: we have a subtree of a given shape (1), and we have a goal (cookie) or goals to
                    418: achieve (2), and we have sufficient free resources (3),
                    419: .B then
                    420: we may emit an instruction or instructions (4), and
                    421: rewrite the subtree in a particular manner (5),
                    422: and the rewritten tree will achieve the desired goals.
                    423: .PP
                    424: These five sections will be discussed in more
                    425: detail later.  First, we give an example of a
                    426: template:
                    427: .DS
                    428: .ta 1i 2i 3i 4i 5i
                    429: ASG PLUS,      INAREG,
                    430:        SAREG,  TINT,
                    431:        SNAME,  TINT,
                    432:                0,      RLEFT,
                    433:                "       add     AL,AR\en",
                    434: .DE
                    435: The top line specifies the operator (\fB+=\fP) and the cookie (compute the
                    436: value of the subtree into an AREG).
                    437: The second and third lines specify the left and right descendants,
                    438: respectively,
                    439: of the \fB+=\fP operator.
                    440: The left descendant must be a REG node, representing an
                    441: A register, and have integer type, while the right side must be a NAME node,
                    442: and also have integer type.
                    443: The fourth line contains the resource requirements (no scratch registers
                    444: or temporaries needed), and the rewriting rule (replace the subtree by the left descendant).
                    445: Finally, the quoted string on the last line represents the output to the assembler:
                    446: lower case letters, tabs, spaces, etc. are copied
                    447: .I verbatim .
                    448: to the output; upper case letters trigger various macro-like expansions.
                    449: Thus,
                    450: .B AL
                    451: would expand into the \fBA\fRddress form of the \fBL\fReft operand \(em
                    452: presumably the register number.
                    453: Similarly,
                    454: .B AR
                    455: would expand into the name of the right operand.
                    456: The
                    457: .I add
                    458: instruction of the last section might well be
                    459: emitted by this template.
                    460: .PP
                    461: In principle, it would be possible to make separate templates
                    462: for all legal combinations of operators, cookies, types, and shapes.
                    463: In practice, the number of combinations is very large.
                    464: Thus, a considerable amount of mechanism is present to
                    465: permit a large number of subtrees to be matched
                    466: by a single template.
                    467: Most of the shape and type specifiers are individual bits, and can
                    468: be logically
                    469: or'ed
                    470: together.
                    471: There are a number of special descriptors for matching classes of
                    472: operators.
                    473: The cookies can also be combined.
                    474: As an example of the kind of template
                    475: that really arises in practice, the
                    476: actual template for the Interdata 8/32
                    477: that subsumes the above example is:
                    478: .DS
                    479: .ta 1i 2i 3i 4i 5i
                    480: ASG OPSIMP,    INAREG\(orFORCC,
                    481:        SAREG,  TINT\(orTUNSIGNED\(orTPOINT,
                    482:        SAREG\(orSNAME\(orSOREG\(orSCON,        TINT\(orTUNSIGNED\(orTPOINT,
                    483:                0,      RLEFT\(orRESCC,
                    484:                "       OI      AL,AR\en",
                    485: .DE
                    486: Here, OPSIMP represents the operators
                    487: +, \-, \(or, &, and ^.
                    488: The
                    489: .B OI
                    490: macro in the output string expands into the
                    491: appropriate \fBI\fRnteger \fBO\fRpcode for the operator.
                    492: The left and right sides can be integers, unsigned, or pointer types.
                    493: The right side can be, in addition to a name, a register,
                    494: a memory location whose address is given by a register and displacement (OREG),
                    495: or a constant.
                    496: Finally, these instructions set the condition codes,
                    497: and so can be used in condition contexts:
                    498: the cookie and rewriting rules reflect this.
                    499: .SH
                    500: The Template Matching Algorithm
                    501: .PP
                    502: The heart of the second pass is the template matching
                    503: algorithm, in the routine
                    504: .I match .
                    505: .I Match
                    506: is called with a tree and a cookie; it attempts to match
                    507: the given tree against some template that will transform it
                    508: according to one of the goals given in the cookie.
                    509: If a match is successful, the transformation is
                    510: applied;
                    511: .I expand
                    512: is called to generate the assembly code, and then
                    513: .I reclaim
                    514: rewrites the tree, and reclaims the resources, such
                    515: as registers, that might have become free as a result
                    516: of the generated code.
                    517: .PP
                    518: This part of the compiler is among the most time critical.
                    519: There is a spectrum of implementation techniques available
                    520: for doing this matching.
                    521: The most naive algorithm simply looks at the templates one by one.
                    522: This can be considerably improved upon by restricting the search
                    523: for an acceptable template.
                    524: It would be possible to do better than this if the templates were given
                    525: to a separate program that ate them and generated a template
                    526: matching subroutine.
                    527: This would make maintenance of the compiler much more
                    528: complicated, however, so this has not been done.
                    529: .PP
                    530: The matching algorithm is actually carried out by restricting
                    531: the range in the table that must be searched for each opcode.
                    532: This introduces a number of complications, however, and needs a
                    533: bit of sympathetic help by the person constructing the
                    534: compiler in order to obtain best results.
                    535: The exact tuning of this algorithm continues; it
                    536: is best to consult the code and comments in
                    537: .I match
                    538: for the latest version.
                    539: .PP
                    540: In order to match a template to a tree,
                    541: it is necessary to match not only the cookie and the
                    542: operator of the root, but also the types and shapes of the
                    543: left and right descendants (if any) of the tree.
                    544: A convention is established here that is carried out throughout
                    545: the second pass of the compiler.
                    546: If a node represents a unary operator, the single descendant
                    547: is always the ``left'' descendant.
                    548: If a node represents a unary operator or a leaf node (no descendants)
                    549: the ``right'' descendant is taken by convention to be the node itself.
                    550: This enables templates to easily match leaves and conversion operators, for example,
                    551: without any additional mechanism in the matching program.
                    552: .PP
                    553: The type matching is straightforward; it is possible to specify any combination
                    554: of basic types, general pointers, and pointers to one or more of
                    555: the basic types.
                    556: The shape matching is somewhat more complicated, but still pretty simple.
                    557: Templates have a collection of possible operand shapes
                    558: on which the opcode might match.
                    559: In the simplest case, an
                    560: .I add
                    561: operation might be able to add to either a register variable
                    562: or a scratch register, and might be able (with appropriate
                    563: help from the assembler) to add an integer constant (ICON), a static
                    564: memory cell (NAME), or a stack location (OREG).
                    565: .PP
                    566: It is usually attractive to specify a number of such shapes,
                    567: and distinguish between them when the assembler output is produced.
                    568: It is possible to describe the union of many elementary
                    569: shapes such as ICON, NAME, OREG,
                    570: AREG or BREG
                    571: (both scratch and register forms), etc.
                    572: To handle at least the simple forms of indirection, one can also
                    573: match some more complicated forms of trees: STARNM and STARREG
                    574: can match more complicated trees headed by an indirection operator,
                    575: and SFLD can match certain trees headed by a FLD operator.
                    576: These patterns call machine dependent routines that match the
                    577: patterns of interest on a given machine.
                    578: The shape SWADD may be used to recognize NAME or OREG
                    579: nodes that lie on word boundaries: this may be of some importance
                    580: on word addressed machines.
                    581: Finally, there are some special shapes: these may not
                    582: be used in conjunction with the other shapes, but may be
                    583: defined and extended in machine dependent ways.
                    584: The special shapes SZERO, SONE, and SMONE are predefined and match
                    585: constants 0, 1, and \-1, respectively; others are easy to add
                    586: and match by using the machine dependent routine
                    587: .I special .
                    588: .PP
                    589: When a template has been found that matches the root of the tree,
                    590: the cookie, and the shapes and types of the descendants,
                    591: there is still one bar to a total match: the template may
                    592: call for some resources (for example, a scratch register).
                    593: The routine
                    594: .I allo
                    595: is called, and it attempts to allocate the resources.
                    596: If it cannot, the match fails; no resources are
                    597: allocated.
                    598: If successful, the allocated resources are given numbers
                    599: 1, 2, etc. for later reference when the assembly code is
                    600: generated.
                    601: The routines
                    602: .I expand
                    603: and
                    604: .I reclaim
                    605: are then called.
                    606: The
                    607: .I match
                    608: routine then returns a special value, MDONE.
                    609: If no match was found, the value MNOPE is returned;
                    610: this is a signal to the caller to try more cookie
                    611: values, or attempt a rewriting rule.
                    612: .I Match
                    613: is also used to select rewriting rules, although
                    614: the way of doing this is pretty straightforward.
                    615: A special cookie, FORREW, is used to ask
                    616: .I match
                    617: to search for a rewriting rule.
                    618: The rewriting rules are keyed to various opcodes; most
                    619: are carried out in
                    620: .I order .
                    621: Since the question of when to rewrite is one of the key issues in
                    622: code generation, it will be taken up again later.
                    623: .SH
                    624: Register Allocation
                    625: .PP
                    626: The register allocation routines, and the allocation strategy,
                    627: play a central role in the correctness of the code generation algorithm.
                    628: If there are bugs in the Sethi-Ullman computation that cause the
                    629: number of needed registers to be underestimated,
                    630: the compiler may run out of scratch registers;
                    631: it is essential that the allocator keep track of those registers that
                    632: are free and busy, in order to detect such conditions.
                    633: .PP
                    634: Allocation of registers takes place as the result of a template
                    635: match; the routine
                    636: .I allo
                    637: is called with a word describing the number of A registers,
                    638: B registers, and temporary locations needed.
                    639: The allocation of temporary locations on the stack is relatively
                    640: straightforward, and will not be further covered; the
                    641: bookkeeping is a bit tricky, but conceptually trivial, and requests
                    642: for temporary space on the stack will never fail.
                    643: .PP
                    644: Register allocation is less straightforward.
                    645: The two major complications are
                    646: .I pairing
                    647: and
                    648: .I sharing .
                    649: In many machines, some operations (such as multiplication
                    650: and division), and/or some types (such as longs or double precision)
                    651: require even/odd pairs of registers.
                    652: Operations of the first type are exceptionally difficult to
                    653: deal with in the compiler; in fact, their theoretical
                    654: properties are rather bad as well.
                    655: .[
                    656: Aho Johnson Ullman Multiregister
                    657: .]
                    658: The second issue is dealt with rather more successfully;
                    659: a machine dependent function called
                    660: .I szty(t)
                    661: is called that returns 1 or 2, depending on the
                    662: number of A registers required to hold an object of type
                    663: .I t .
                    664: If
                    665: .I szty
                    666: returns 2, an even/odd pair of A registers is allocated
                    667: for each request.
                    668: As part of its duties, the routine
                    669: .I usable
                    670: finds usable register pairs for various operations.
                    671: This task is not as easy as it sounds;
                    672: it does not suffice to merely use
                    673: .I szty
                    674: on the expression tree,
                    675: since there are situations in which
                    676: a register pair temporary is needed even though
                    677: the result of the expression requires only one register.
                    678: This can occur with assignment operator expressions
                    679: which have \fBint\fP type but a \fBdouble\fP right hand side,
                    680: or with relational expressions where
                    681: one operand is \fBfloat\fP and the other \fBdouble\fP.
                    682: .PP
                    683: The other issue, sharing, is more subtle, but
                    684: important for good code quality.
                    685: When registers are allocated, it
                    686: is possible to reuse registers that hold address
                    687: information, and use them to contain the values
                    688: computed or accessed.
                    689: For example, on the IBM 360, if register 2 has
                    690: a pointer to an integer in it, we may load the
                    691: integer into register 2 itself by saying:
                    692: .DS
                    693: L      2,0(2)
                    694: .DE
                    695: If register 2 had a byte pointer, however, the sequence for
                    696: loading a character involves clearing the target
                    697: register first, and then inserting the desired character:
                    698: .DS
                    699: SR     3,3
                    700: IC     3,0(2)
                    701: .DE
                    702: In the first case, if register 3 were used as the target,
                    703: it would lead to a larger number of registers
                    704: used for the expression than were required; the compiler would
                    705: generate inefficient code.
                    706: On the other hand, if register 2 were used as the target in the second
                    707: case, the code would simply be wrong.
                    708: In the first case, register 2 can be 
                    709: .I shared
                    710: while in the second, it cannot.
                    711: .PP
                    712: In the specification of the register needs in the templates,
                    713: it is possible to indicate whether required scratch registers
                    714: may be shared with possible registers on the left or the right of the input tree.
                    715: In order that a register be shared, it must be scratch, and it must
                    716: be used only once, on the appropriate side of the tree being compiled.
                    717: .PP
                    718: The
                    719: .I allo
                    720: routine thus has a bit more to do than meets the eye;
                    721: it calls
                    722: .I freereg
                    723: to obtain a free register for each A and B register request.
                    724: .I Freereg
                    725: makes multiple calls on the routine
                    726: .I usable
                    727: to decide if a given register can be used to satisfy
                    728: a given need.
                    729: .I Usable
                    730: calls
                    731: .I shareit
                    732: if the register is busy, but might be shared.
                    733: Finally,
                    734: .I shareit
                    735: calls
                    736: .I ushare
                    737: to decide if the desired register is actually in the appropriate
                    738: subtree, and can be shared.
                    739: .PP
                    740: Just to add additional complexity, on some machines (such as the IBM 370) it
                    741: is possible to have ``double indexing'' forms of
                    742: addressing; these are represented by OREG's
                    743: with the base and index registers encoded into the register field.
                    744: While the register allocation and deallocation
                    745: .I "per se"
                    746: is not made more difficult by this phenomenon, the code itself
                    747: is somewhat more complex.
                    748: .PP
                    749: Having allocated the registers and expanded the assembly language,
                    750: it is time to reclaim the resources; the routine
                    751: .I reclaim
                    752: does this.
                    753: Many operations produce more than one result.
                    754: For example, many arithmetic operations may produce
                    755: a value in a register, and also set the condition
                    756: codes.
                    757: Assignment operations may leave results both in a register and in memory.
                    758: .I Reclaim
                    759: is passed three parameters; the tree and cookie
                    760: that were matched, and the rewriting field of the template.
                    761: The rewriting field allows the specification of possible results;
                    762: the tree is rewritten to reflect the results of the operation.
                    763: If the tree was computed for side effects only (FOREFF),
                    764: the tree is freed, and all resources in it reclaimed.
                    765: If the tree was computed for condition codes, the resources
                    766: are also freed, and the tree replaced by a special
                    767: node type, FORCC.
                    768: Otherwise, the value may be found in the left
                    769: argument of the root, the right argument of the root,
                    770: or one of the temporary resources allocated.
                    771: In these cases, first the resources of the tree,
                    772: and the newly allocated resources,
                    773: are
                    774: freed; then the resources needed by the result
                    775: are made busy again.
                    776: The final result must always match the shape of the input cookie;
                    777: otherwise, the compiler error
                    778: ``cannot reclaim''
                    779: is generated.
                    780: There are some machine dependent ways of
                    781: preferring results in registers or memory when
                    782: there are multiple results matching multiple goals in the cookie.
                    783: .PP
                    784: .I Reclaim
                    785: also implements, in a curious way,
                    786: C's ``usual arithmetic conversions''.
                    787: When a value is generated into a temporary register,
                    788: .I reclaim
                    789: decides what the type and size of the result will be.
                    790: Unless automatic conversion is specifically suppressed
                    791: in the code template with the \fBT\fP macro,
                    792: \fIreclaim\fP converts \fBchar\fP and \fBshort\fP
                    793: results to \fBint\fP,
                    794: \fBunsigned char\fP and \fBunsigned short\fP results
                    795: to \fBunsigned int\fP,
                    796: and \fBfloat\fP into \fBdouble\fP
                    797: (for double only floating point arithmetic).
                    798: This conversion is a simple type pun;
                    799: no instructions for converting the value
                    800: are actually emitted.
                    801: This implies that registers must always
                    802: contain a value that is at least as wide
                    803: as a register,
                    804: which greatly restricts the range of possible templates.
                    805: .SH
                    806: The Machine Dependent Interface
                    807: .PP
                    808: The files
                    809: .I order.c ,
                    810: .I local2.c ,
                    811: and
                    812: .I table.c ,
                    813: as well as the header file
                    814: .I mac2defs ,
                    815: represent the machine dependent portion of the second pass.
                    816: The machine dependent portion can be roughly divided into
                    817: two: the easy portion and the hard portion.
                    818: The easy portion
                    819: tells the compiler the names of the registers, and arranges that
                    820: the compiler generate the proper assembler formats, opcode names, location counters, etc.
                    821: The hard portion involves the Sethi\-Ullman computation, the
                    822: rewriting rules, and, to some extent, the templates.
                    823: It is hard because there are no real algorithms that apply;
                    824: most of this portion is based on heuristics.
                    825: This section discusses the easy portion; the next several
                    826: sections will discuss the hard portion.
                    827: .PP
                    828: If the compiler is adapted from a compiler for a machine
                    829: of similar architecture, the easy part is indeed easy.
                    830: In
                    831: .I mac2defs ,
                    832: the register numbers are defined, as well as various parameters for
                    833: the stack frame, and various macros that describe the machine architecture.
                    834: If double indexing is to be permitted, for example, the symbol
                    835: R2REGS is defined.
                    836: Also, a number of macros that are involved in function call processing,
                    837: especially for unusual function call mechanisms, are defined here.
                    838: .PP
                    839: In
                    840: .I local2.c ,
                    841: a large number of simple functions are defined.
                    842: These do things such as write out opcodes, register names,
                    843: and address forms for the assembler.
                    844: Part of the function call code is defined here; that is nontrivial
                    845: to design, but typically rather straightforward to implement.
                    846: Among the easy routines in
                    847: .I order.c
                    848: are routines for generating a created label,
                    849: defining a label, and generating the arguments
                    850: of a function call.
                    851: .PP
                    852: These routines tend to have a local effect, and depend on a fairly straightforward way
                    853: on the target assembler and the design decisions already made about
                    854: the compiler.
                    855: Thus they will not be further treated here.
                    856: .SH
                    857: The Rewriting Rules
                    858: .PP
                    859: When a tree fails to match any template, it becomes
                    860: a candidate for rewriting.
                    861: Before the tree is rewritten,
                    862: the machine dependent routine
                    863: .I nextcook
                    864: is called with the tree and the cookie; it suggests
                    865: another cookie that might be a better candidate for the
                    866: matching of the tree.
                    867: If all else fails, the templates are searched with the cookie
                    868: FORREW, to look for a rewriting rule.
                    869: The rewriting rules are of two kinds;
                    870: for most of the common operators, there are
                    871: machine dependent rewriting rules that may be applied;
                    872: these are handled by machine dependent functions
                    873: that are called and given the tree to be computed.
                    874: These routines may recursively call
                    875: .I order
                    876: or
                    877: .I codgen
                    878: to cause certain subgoals to be achieved;
                    879: if they actually call for some alteration of the tree,
                    880: they return 1, and the
                    881: code generation algorithm recanonicalizes and tries again.
                    882: If these routines choose not to deal with the tree, the
                    883: default rewriting rules are applied.
                    884: .PP
                    885: The assignment operators, when rewritten, call the routine
                    886: .I setasg .
                    887: This is assumed to rewrite the tree at least to the point where there are
                    888: no side effects in the left hand side.
                    889: If there is still no template match,
                    890: a default rewriting is done that causes
                    891: an expression such as
                    892: .DS
                    893: .I "a += b"
                    894: .DE
                    895: to be rewritten as
                    896: .DS
                    897: .I "a = a + b"
                    898: .DE
                    899: This is a useful default for certain mixtures of strange types
                    900: (for example, when
                    901: .I a
                    902: is a bit field and
                    903: .I b
                    904: an character) that
                    905: otherwise might need separate table entries.
                    906: .PP
                    907: Simple assignment, structure assignment, and all forms of calls
                    908: are handled completely by the machine dependent routines.
                    909: For historical reasons, the routines generating the calls return
                    910: 1 on failure, 0 on success, unlike the other routines.
                    911: .PP
                    912: The machine dependent routine
                    913: .I setbin
                    914: handles binary operators; it too must do most of the job.
                    915: In particular, when it returns 0, it must do so with
                    916: the left hand side in a temporary register.
                    917: The default rewriting rule in this case is to convert the
                    918: binary operator into the associated assignment operator;
                    919: since the left hand side is assumed to be a temporary register,
                    920: this preserves the semantics and often allows a considerable
                    921: saving in the template table.
                    922: .PP
                    923: The increment and decrement operators may be dealt with with
                    924: the machine dependent routine
                    925: .I setincr .
                    926: If this routine chooses not to deal with the tree, the rewriting rule replaces
                    927: .DS
                    928: .I "x ++"
                    929: .DE
                    930: by
                    931: .DS
                    932: .I "( (x += 1) \- 1)"
                    933: .DE
                    934: which preserves the semantics.
                    935: Once again, this is not too attractive for the most common
                    936: cases, but can generate close to optimal code when the
                    937: type of x is unusual.
                    938: .PP
                    939: Finally, the indirection (UNARY MUL) operator is also handled
                    940: in a special way.
                    941: The machine dependent routine
                    942: .I offstar
                    943: is extremely important for the efficient generation of code.
                    944: .I Offstar
                    945: is called with a tree that is the direct descendant of a UNARY MUL node;
                    946: its job is to transform this tree so that the combination of
                    947: UNARY MUL with the transformed tree becomes addressable.
                    948: On most machines,
                    949: .I offstar
                    950: can simply compute the tree into an A or B register,
                    951: depending on the architecture, and then
                    952: .I canon
                    953: will make the resulting tree into an OREG.
                    954: On many machines,
                    955: .I offstar
                    956: can profitably choose to do less work than computing
                    957: its entire argument into a register.
                    958: For example, if the target machine supports OREG's
                    959: with a constant offset from a register, and
                    960: .I offstar
                    961: is called
                    962: with a tree of the form
                    963: .DS
                    964: .I "expr + const"
                    965: .DE
                    966: where
                    967: .I const
                    968: is a constant, then
                    969: .I offstar
                    970: need only compute
                    971: .I expr
                    972: into the appropriate form of register.
                    973: On machines that support double indexing,
                    974: .I offstar
                    975: may have even more choice as to how to proceed.
                    976: The proper tuning of
                    977: .I offstar ,
                    978: which is not typically too difficult, should be one of the
                    979: first tries at optimization attempted by the
                    980: compiler writer.
                    981: .SH
                    982: The Sethi-Ullman Computation
                    983: .PP
                    984: The heart of the heuristics is the computation of the Sethi-Ullman numbers.
                    985: This computation is closely linked with the rewriting rules and the
                    986: templates.
                    987: As mentioned before, the Sethi-Ullman numbers are expected to
                    988: estimate the number of scratch registers needed to compute
                    989: the subtrees without using any stores.
                    990: However, the original theory does not apply to real machines.
                    991: For one thing, the theory assumes that all registers
                    992: are interchangeable.
                    993: Real machines have general purpose, floating point, and index registers,
                    994: register pairs, etc.
                    995: The theory also does not account for side effects;
                    996: this rules out various forms of pathology that arise
                    997: from assignment and assignment operators.
                    998: Condition codes are also undreamed of.
                    999: Finally, the influence of types, conversions, and the
                   1000: various addressability restrictions and extensions of real
                   1001: machines are also ignored.
                   1002: .PP
                   1003: Nevertheless, for a ``useless'' theory,
                   1004: the basic insight of Sethi and Ullman is amazingly
                   1005: useful in a real compiler.
                   1006: The notion that one should attempt to estimate the
                   1007: resource needs of trees before starting the
                   1008: code generation provides a natural means of splitting the
                   1009: code generation problem, and provides a bit of redundancy
                   1010: and self checking in the compiler.
                   1011: Moreover, if writing the
                   1012: Sethi-Ullman routines is hard, describing, writing, and debugging the
                   1013: alternative (routines that attempt to free up registers by stores into
                   1014: temporaries ``on the fly'') is even worse.
                   1015: Nevertheless, it should be clearly understood that these routines exist in a realm
                   1016: where there is no ``right'' way to write them;
                   1017: it is an art, the realm of heuristics, and, consequently, a major
                   1018: source of bugs in the compiler.
                   1019: Often, the early, crude versions of these routines give little trouble;
                   1020: only after
                   1021: the compiler is actually working
                   1022: and the
                   1023: code quality is being improved do serious problem have to be faced.
                   1024: Having a simple, regular machine architecture is worth quite
                   1025: a lot at this time.
                   1026: .PP
                   1027: The major problems arise from asymmetries in the registers: register pairs,
                   1028: having different kinds of registers, and the related problem of
                   1029: needing more than one register (frequently a pair) to store certain
                   1030: data
                   1031: types (such as longs or doubles).
                   1032: There appears to be no general way of treating this problem;
                   1033: solutions have to be fudged for each machine where the problem arises.
                   1034: On the Honeywell 66, for example, there are only two general purpose registers,
                   1035: so a need for a pair is the same as the need for two registers.
                   1036: On the IBM 370, the register pair (0,1) is used to do multiplications and divisions;
                   1037: registers 0 and 1 are not generally considered part of the scratch registers, and so
                   1038: do not require allocation explicitly.
                   1039: On the Interdata 8/32, after much consideration, the
                   1040: decision was made not to try to deal with the register pair issue;
                   1041: operations such as multiplication and division that required pairs
                   1042: were simply assumed to take all of the scratch registers.
                   1043: Several weeks of effort had failed to produce
                   1044: an algorithm that seemed to have much chance of running successfully
                   1045: without inordinate debugging effort.
                   1046: The difficulty of this issue should not be minimized; it represents one of the
                   1047: main intellectual efforts in porting the compiler.
                   1048: Nevertheless, this problem has been fudged with a degree of
                   1049: success on nearly a dozen machines, so the compiler writer should
                   1050: not abandon hope.
                   1051: .PP
                   1052: The Sethi-Ullman computations interact with the
                   1053: rest of the compiler in a number of rather subtle ways.
                   1054: As already discussed, the
                   1055: .I store
                   1056: routine uses the Sethi-Ullman numbers to decide which subtrees are too difficult
                   1057: to compute in registers, and must be stored.
                   1058: There are also subtle interactions between the
                   1059: rewriting routines and the Sethi-Ullman numbers.
                   1060: Suppose we have a tree such as
                   1061: .DS
                   1062: .I "A \- B"
                   1063: .DE
                   1064: where
                   1065: .I A
                   1066: and
                   1067: .I B
                   1068: are expressions; suppose further that
                   1069: .I B
                   1070: takes two registers, and
                   1071: .I A
                   1072: one.
                   1073: It is possible to compute the full expression in two registers by
                   1074: first computing
                   1075: .I B ,
                   1076: and then, using the scratch register
                   1077: used by
                   1078: .I B ,
                   1079: but not containing the answer, compute
                   1080: .I A .
                   1081: The subtraction can then be done, computing the expression.
                   1082: (Note that this assumes a number of things, not the least of which
                   1083: are register-to-register subtraction operators and symmetric
                   1084: registers.)
                   1085: If the machine dependent routine
                   1086: .I setbin ,
                   1087: however, is not prepared to recognize this case
                   1088: and compute the more difficult side of the expression
                   1089: first, the
                   1090: Sethi-Ullman number must be set to three.
                   1091: Thus, the
                   1092: Sethi-Ullman number for a tree should represent the code that
                   1093: the machine dependent routines are actually willing to generate.
                   1094: .PP
                   1095: The interaction can go the other way.
                   1096: If we take an expression such as
                   1097: .DS
                   1098: * ( \fIp\fP + \fIi\fP )
                   1099: .DE
                   1100: where
                   1101: .I p
                   1102: is a pointer and
                   1103: .I i
                   1104: an integer,
                   1105: this can probably be done in one register on most machines.
                   1106: Thus, its Sethi-Ullman number would probably be set to one.
                   1107: If double indexing is possible in the machine, a possible way
                   1108: of computing the expression is to load both
                   1109: .I p
                   1110: and
                   1111: .I i
                   1112: into registers, and then use double indexing.
                   1113: This would use two scratch registers; in such a case,
                   1114: it is possible that the scratch registers might be unobtainable,
                   1115: or might make some other part of the computation run out of
                   1116: registers.
                   1117: The usual solution is to cause
                   1118: .I offstar
                   1119: to ignore opportunities for double indexing that would tie up more scratch
                   1120: registers than the Sethi-Ullman number had reserved.
                   1121: .PP
                   1122: In summary, the Sethi-Ullman computation represents much of the craftsmanship and artistry in any application
                   1123: of the portable compiler.
                   1124: It is also a frequent source of bugs.
                   1125: Algorithms are available that will produce nearly optimal code
                   1126: for specialized machines, but unfortunately most existing machines
                   1127: are far removed from these ideals.
                   1128: The best way of proceeding in practice is to start with a compiler
                   1129: for a similar machine to the target, and proceed very
                   1130: carefully.
                   1131: .SH
                   1132: Register Allocation
                   1133: .PP
                   1134: After the Sethi-Ullman numbers are computed,
                   1135: .I order
                   1136: calls a routine,
                   1137: .I rallo ,
                   1138: that does register allocation, if appropriate.
                   1139: This routine does relatively little, in general;
                   1140: this is especially true if the target machine
                   1141: is fairly regular.
                   1142: There are a few cases where it is assumed that
                   1143: the result of a computation takes place in a particular register;
                   1144: switch and function return are the two major places.
                   1145: The expression tree has a field,
                   1146: .I rall ,
                   1147: that may be filled with a register number; this is taken
                   1148: to be a preferred register, and the first temporary
                   1149: register allocated by a template match will be this preferred one, if
                   1150: it is free.
                   1151: If not, no particular action is taken; this is just a heuristic.
                   1152: If no register preference is present, the field contains NOPREF.
                   1153: In some cases, the result must be placed in
                   1154: a given register, no matter what.
                   1155: The register number is placed in
                   1156: .I rall ,
                   1157: and the mask MUSTDO is logically or'ed in with it.
                   1158: In this case, if the subtree is requested in a register, and comes
                   1159: back in a register other than the demanded one, it is moved
                   1160: by calling the routine
                   1161: .I rmove .
                   1162: If the target register for this move is busy, it is a compiler
                   1163: error.
                   1164: .PP
                   1165: Note that this mechanism is the only one that will ever cause a register-to-register
                   1166: move between scratch registers (unless such a move is buried in the depths of
                   1167: some template).
                   1168: This simplifies debugging.
                   1169: In some cases, there is a rather strange interaction between
                   1170: the register allocation and the Sethi-Ullman number;
                   1171: if there is an operator or situation requiring a
                   1172: particular register, the allocator and the Sethi-Ullman
                   1173: computation must conspire to ensure that the target
                   1174: register is not being used by some intermediate result of some far-removed computation.
                   1175: This is most easily done by making the special operation take
                   1176: all of the free registers, preventing any other partially-computed
                   1177: results from cluttering up the works.
                   1178: .SH
                   1179: Template Shortcuts
                   1180: .PP
                   1181: Some operations are just too hard or too clumsy
                   1182: to be implemented in code templates
                   1183: on a particular architecture.
                   1184: .PP
                   1185: One way to handle such operations is to replace them with function calls.
                   1186: The intermediate file reading code in
                   1187: .I reader.c
                   1188: contains a call to an implementation dependent macro MYREADER;
                   1189: this can be defined to call various routines
                   1190: which walk the code tree and
                   1191: perform transformations.
                   1192: On the \*V, for example,
                   1193: unsigned division and remainder operations
                   1194: are far too complex to encode in a template.
                   1195: The routine
                   1196: .I hardops
                   1197: is called from a tree walk in
                   1198: .I myreader
                   1199: to detect these operations and
                   1200: replace them with calls to the C runtime functions
                   1201: .I udiv
                   1202: and
                   1203: .I urem .
                   1204: (There are complementary functions
                   1205: .I audiv
                   1206: and
                   1207: .I aurem
                   1208: which are provided as support for unsigned assignment operator expressions;
                   1209: they are different from \fIudiv\fP and \fIurem\fP
                   1210: because the left hand side of an assignment operator expression
                   1211: must be evaluated only once.)
                   1212: Note that arithmetic support routines are always expensive;
                   1213: the compiler makes an effort to notice common operations
                   1214: such as unsigned division by a constant power of two
                   1215: and generates optimal code for these inline.
                   1216: .PP
                   1217: Another escape involves the routine
                   1218: .I zzzcode .
                   1219: This function is called from
                   1220: .I expand
                   1221: to process template macros which start with the character
                   1222: .B Z .
                   1223: On the \*V,
                   1224: many complex code generation problems
                   1225: are swept under the rug into \fIzzzcode\fP.
                   1226: Scalar type conversions are a particularly annoying issue;
                   1227: they are primarily handled using the macro \fBZA\fP.
                   1228: Rather than creating a template for each possible
                   1229: conversion and result,
                   1230: which would be tedious and complex given
                   1231: C's many scalar types,
                   1232: this macro allows the compiler to take shortcuts.
                   1233: Tough conversions such as \fBunsigned\fP into \fBdouble\fP
                   1234: are easily handled using special code under \fBZA\fP.
                   1235: One convention which makes scalar conversions
                   1236: somewhat more difficult than they might otherwise be
                   1237: is the strict requirement that
                   1238: values in registers must have a type
                   1239: that is as wide or wider than a single register.
                   1240: This convention is used primarily to implement
                   1241: the ``usual arithmetic conversions'' of C,
                   1242: but it can get in the way when
                   1243: converting between (say) a \fBchar\fP value
                   1244: and an \fBunsigned short\fP.
                   1245: A routine named \fIcollapsible\fP is used
                   1246: to determine whether one operation or two
                   1247: is needed to produce a register-width result.
                   1248: .PP
                   1249: Another convenient macro is \fBZP\fP.
                   1250: This macro is used to generate an appropriate
                   1251: conditional test after a comparison.
                   1252: This makes it possible to avoid
                   1253: a profusion of template entries
                   1254: which essentially duplicate each other,
                   1255: one entry for each type of test
                   1256: multiplied by the number of different
                   1257: comparison conditions.
                   1258: A related macro, \fBZN\fP,
                   1259: is used to normalize the result
                   1260: of a relational test by
                   1261: producing an integer 0 or 1.
                   1262: .PP
                   1263: The macro \fBZS\fP does the unlovely job
                   1264: of generating code for structure assignments.
                   1265: It tests the size of the structure
                   1266: to see what \*V instruction can be used to move it,
                   1267: and is capable of emitting a block move
                   1268: instruction for large structures.
                   1269: On other architectures this macro
                   1270: could be used to generate a function call
                   1271: to a block copy routine.
                   1272: .PP
                   1273: The macro \fBZG\fP was recently introduced
                   1274: to handle the thorny issue of
                   1275: assignment operator expressions which
                   1276: have an integral left hand side
                   1277: and a floating point right hand side.
                   1278: These expressions are passed to the code generator
                   1279: without the usual type balancing so that
                   1280: good code can be generated for them.
                   1281: Older versions of the portable compiler computed
                   1282: these expressions with integer arithmetic;
                   1283: with the \fBZG\fP operator,
                   1284: the current compiler can convert the left hand side
                   1285: to the appropriate floating type,
                   1286: compute the expression with floating point arithmetic,
                   1287: convert the result back to integral type
                   1288: and store it in the left hand side.
                   1289: These operations are performed by
                   1290: recursive calls to \fIzzzcode\fP
                   1291: and other routines related to \fIexpand\fP.
                   1292: .PP
                   1293: An assortment of other macros finish the job
                   1294: of interpreting code templates.
                   1295: Among the more interesting ones:
                   1296: \fBZC\fP produces the number of words
                   1297: pushed on the argument stack,
                   1298: which is useful for function calls;
                   1299: \fBZD\fP and \fBZE\fP produce
                   1300: constant increment and decrement operations;
                   1301: \fBZL\fR and \fBZR\fP produce
                   1302: the assembler letter code (\fBl\fP, \fBw\fP or \fBb\fP)
                   1303: corresponding to the size and type
                   1304: of the left and right operand respectively.
                   1305: .SH
                   1306: Shared Code
                   1307: .PP
                   1308: The
                   1309: .I lint
                   1310: utility shares sources with the portable compiler.
                   1311: \fILint\fP uses all of the machine independent pass 1 sources,
                   1312: and adds its own set of ``machine dependent'' routines,
                   1313: contained mostly in \fIlint.c\fP.
                   1314: \fILint\fP uses a private intermediate file format
                   1315: and a private pass 2 whose source is \fIlpass2.c\fP.
                   1316: Several modifications were made to the C scanner
                   1317: in \fIscan.c\fP, conditionally compiled with
                   1318: the symbol LINT,
                   1319: in order to support \fIlint\fP's convention
                   1320: of passing ``pragma'' information inside special comments.
                   1321: A few other minor modifications were also
                   1322: made, \fIe.g.\fP to skip over \fIasm\fP statements.
                   1323: .PP
                   1324: The \*f and \fIpc\fP compilers use a code generator
                   1325: which shares sources with pass 2 of the portable compiler.
                   1326: This code generator is very similar to pass 2
                   1327: but uses a different intermediate file format.
                   1328: Three source files are needed in addition to the pass 2 sources.
                   1329: .I fort.c
                   1330: is a machine independent source file which
                   1331: contains a pass 2 main routine that replaces the equivalent routine in
                   1332: .I reader.c ,
                   1333: together with several routines for reading the binary intermediate file.
                   1334: .I fort.c
                   1335: includes the machine dependent file
                   1336: .I fort.h ,
                   1337: which defines two trivial label generation routines.
                   1338: A header file
                   1339: .I /usr/include/pcc.h
                   1340: defines opcode and type symbols which
                   1341: are needed to provide a standard intermediate file format;
                   1342: this file is also included by the Fortran and Pascal compilers.
                   1343: The creation of this header file
                   1344: made it necessary to make
                   1345: some changes in the way the portable C compiler is built.
                   1346: These changes were made with the aim
                   1347: of minimizing the number of lines changed
                   1348: in the original sources.
                   1349: Macro symbols in
                   1350: .I pcc.h
                   1351: are flagged with a unique prefix to avoid
                   1352: symbol name collisions in the Fortran and Pascal compilers,
                   1353: which have their own internal opcode and type symbols.
                   1354: A
                   1355: .I sed (1)
                   1356: script is used to strip these prefixes,
                   1357: producing an include file named
                   1358: .I pcclocal.h
                   1359: which is specific to the portable C compiler
                   1360: and contains opcode symbols which
                   1361: are compatible with the original opcode symbols.
                   1362: A similar
                   1363: .I sed
                   1364: script is used to produce a file of Yacc tokens for the C grammar.
                   1365: .PP
                   1366: A number of changes to existing source files
                   1367: were made to accommodate the Fortran-style pass 2.
                   1368: These changes are conditionally compiled
                   1369: using the symbol FORT.
                   1370: Many changes were needed to implement
                   1371: single-precision arithmetic;
                   1372: other changes concern such things as
                   1373: the avoidance of floating point move instructions,
                   1374: which on the \*V can cause floating point faults
                   1375: when a datum is not a normalized floating point value.
                   1376: In earlier implementations of the Fortran-style pass 2 there were
                   1377: a number of stub files which served only
                   1378: to define the symbol FORT in a particular
                   1379: source file; these files have been removed for 4.3BSD
                   1380: in favor of a new compilation strategy which
                   1381: yields up to three different objects from
                   1382: a single source file, depending on what
                   1383: compilation control symbols are defined for that file.
                   1384: .PP
                   1385: The Fortran-style pass 2 uses a Polish Postfix intermediate file.
                   1386: The file is in binary format,
                   1387: and is logically divided into a stream of 32-bit records.
                   1388: Each record consists of an (\fIopcode, value, type\fP) triple,
                   1389: possibly followed inline by more descriptive information.
                   1390: The
                   1391: .I opcode
                   1392: and
                   1393: .I type
                   1394: are selected from the list in
                   1395: .I pcc.h ;
                   1396: the type encodes a basic type,
                   1397: around which may be wrapped type modifiers
                   1398: such as ``pointer to'' or ``array of'' to
                   1399: produce more complex types.
                   1400: The function of the
                   1401: .I value
                   1402: parameter depends on the opcode;
                   1403: it may be used for a flag,
                   1404: a register number
                   1405: or the value of a constant,
                   1406: or it may be unused.
                   1407: The optional inline data is often a null-terminated string,
                   1408: but it may also be a binary offset from a register
                   1409: or from a symbolic constant;
                   1410: sometimes both a string and an offset appear.
                   1411: .PP
                   1412: Here are a few samples of intermediate file records
                   1413: and their interpretation:
                   1414: .KS
                   1415: .TS
                   1416: expand, tab(:);
                   1417: c  c  c  c  c
                   1418: ^  ^  ^  c  ^
                   1419: l  lB l  l  l.
                   1420: Opcode:Type:Value:Optional:Interpretation
                   1421: :::Data:
                   1422: .sp 0.5v
                   1423: =
                   1424: .sp 0.5v
                   1425: ICON:int:flag=0:binary=5:T{
                   1426: .Bl
                   1427: the integer constant 5
                   1428: T}
                   1429: .Sp
                   1430: NAME:char:flag=1:T{
                   1431: .nf
                   1432: binary=1,
                   1433: string="_foo_"
                   1434: T}:T{
                   1435: .Bl
                   1436: a \fBcharacter*1\fP element in
                   1437: a Fortran common block \fIfoo\fP
                   1438: at offset 1
                   1439: T}
                   1440: .Sp
                   1441: OREG:char:reg=11:T{
                   1442: .nf
                   1443: offset=1,
                   1444: string="v.2-v.1"
                   1445: T}:T{
                   1446: .Bl
                   1447: the second element
                   1448: of a Fortran \fBcharacter*1\fP array,
                   1449: expressed as an offset from a static base register
                   1450: T}
                   1451: .Sp
                   1452: PLUS:float:::T{
                   1453: .Bl
                   1454: a single precision add
                   1455: T}
                   1456: .Sp
                   1457: FTEXT::size=2:string=".text 0":T{
                   1458: .Bl
                   1459: an inline assembler directive
                   1460: of length 2 (32-bit records)
                   1461: T}
                   1462: .TE
                   1463: .KE
                   1464: .SH
                   1465: Compiler Bugs
                   1466: .PP
                   1467: The portable compiler has an excellent record of generating correct code.
                   1468: The requirement for reasonable cooperation between the register allocation,
                   1469: Sethi-Ullman computation, rewriting rules, and templates builds quite a bit
                   1470: of redundancy into the compiling process.
                   1471: The effect of this is that, in a surprisingly short time, the compiler will
                   1472: start generating correct code for those
                   1473: programs that it can compile.
                   1474: The hard part of the job then becomes finding and
                   1475: eliminating those situations where the compiler refuses to
                   1476: compile a program because it knows it cannot do it right.
                   1477: For example, a template may simply be missing; this may either
                   1478: give a compiler error of the form ``no match for op ...'' , or cause
                   1479: the compiler to go into an infinite loop applying various rewriting rules.
                   1480: The compiler has a variable,
                   1481: .I nrecur ,
                   1482: that is set to 0 at the beginning of an expressions, and
                   1483: incremented at key spots in the
                   1484: compilation process; if this parameter gets too large, the
                   1485: compiler decides that it is in a loop, and aborts.
                   1486: Loops are also characteristic of botches in the machine-dependent rewriting rules.
                   1487: Bad Sethi-Ullman computations usually cause the scratch registers
                   1488: to run out; this often means
                   1489: that the Sethi-Ullman number was underestimated, so
                   1490: .I store
                   1491: did not store something it should have; alternatively,
                   1492: it can mean that the rewriting rules were not smart enough to
                   1493: find the sequence that
                   1494: .I sucomp
                   1495: assumed would be used.
                   1496: .PP
                   1497: The best approach when a compiler error is detected involves several stages.
                   1498: First, try to get a small example program that steps on the bug.
                   1499: Second, turn on various debugging flags in the code generator, and follow the
                   1500: tree through the process of being matched and rewritten.
                   1501: Some flags of interest are
                   1502: \fB\-e\fP, which prints the expression tree,
                   1503: \fB\-r\fP, which gives information about the allocation of registers,
                   1504: \fB\-a\fP, which gives information about the performance of
                   1505: .I rallo ,
                   1506: and \fB\-o\fP, which gives information about the behavior of
                   1507: .I order .
                   1508: This technique should allow most bugs to be found relatively quickly.
                   1509: .PP
                   1510: Unfortunately, finding the bug is usually not enough; it must also
                   1511: be fixed!
                   1512: The difficulty arises because a fix to the particular bug of interest tends
                   1513: to break other code that already works.
                   1514: Regression tests, tests that compare the performance of
                   1515: a new compiler against the performance of an older one, are very
                   1516: valuable in preventing major catastrophes.
                   1517: .SH
                   1518: Compiler Extensions
                   1519: .PP
                   1520: The portable C compiler makes a few extensions
                   1521: to the language described by Ritchie.
                   1522: .PP
                   1523: .I "Single precision arithmetic."
                   1524: ``All floating arithmetic in C is carried out in double-precision;
                   1525: whenever a
                   1526: .B float
                   1527: appears in a an expression it is lengthened to
                   1528: .B double
                   1529: by zero-padding its fraction.''
                   1530: \*-Dennis Ritchie.
                   1531: .[
                   1532: kernighan ritchie prentice 1978
                   1533: .]
                   1534: Programmers who would like to use C
                   1535: to write numerical applications
                   1536: often shy away from it because
                   1537: C programs cannot perform single precision arithmetic.
                   1538: On machines such as the \*V
                   1539: which can cleanly support arithmetic on two (or more)
                   1540: sizes of floating point values,
                   1541: programs which can take advantage of
                   1542: single precision arithmetic will run faster.
                   1543: A very popular proposal for the ANSI C standard states that
                   1544: implementations may perform single precision computations
                   1545: with single precision arithmetic;
                   1546: some actual C implementations already do this,
                   1547: and now the Berkeley compiler joins them.
                   1548: .PP
                   1549: The changes are implemented in the compiler with a set of
                   1550: conditional compilation directives based on the symbol SPRECC;
                   1551: thus two compilers are generated,
                   1552: one with only double precision arithmetic
                   1553: and one with both double and single precision arithmetic.
                   1554: The
                   1555: .I cc
                   1556: program uses a flag
                   1557: .B \-f
                   1558: to select the single/double version of the compiler (\fI/lib/sccom\fP)
                   1559: instead of the default double only version (\fI/lib/ccom\fP).
                   1560: It is expected that at some time in the future
                   1561: the double only compiler will be retired
                   1562: and the single/double compiler will become the default.
                   1563: .PP
                   1564: There are a few implementation details
                   1565: of the single/double compiler which
                   1566: will be of interest to users and compiler porters.
                   1567: To maintain compatibility with functions compiled by the double only compiler,
                   1568: single precision actual arguments are still coerced to double precision,
                   1569: and formal arguments which are declared single precision
                   1570: are still ``really'' double precision.
                   1571: This may change if function prototypes of the sort
                   1572: proposed for the ANSI C standard are eventually adopted.
                   1573: Floating point constants are now classified into single precision
                   1574: and double precision types.
                   1575: The precision of a constant is determined from context;
                   1576: if a floating constant appears in an arithmetic expression
                   1577: with a single precision value,
                   1578: the constant is treated as having single precision type
                   1579: and the arithmetic expression is computed
                   1580: using single precision arithmetic.
                   1581: .PP
                   1582: Remarkably little code in the compiler
                   1583: needed to be changed to implement the single/double compiler.
                   1584: In many cases the changes overlapped with
                   1585: special cases which are used for the Fortran-style
                   1586: pass 2 (\fI/lib/f1\fP).
                   1587: Most of the single precision changes were implemented by Sam Leffler.
                   1588: .PP
                   1589: .I "Preprocessor extensions."
                   1590: The portable C compiler is normally distributed
                   1591: with a macro preprocessor written by J. F. Reiser.
                   1592: This preprocessor implements the features described
                   1593: in Ritchie's reference manual;
                   1594: it removes comments, expands macro definitions
                   1595: and removes or inserts code based on conditional compilation directives.
                   1596: Two interesting extensions are provided by this version of the preprocessor:
                   1597: .IP \(bu
                   1598: When comments are removed, no white space is necessarily substituted;
                   1599: this has the effect of \fIre-tokenizing\fP code,
                   1600: since the PCC will reanalyze the input.
                   1601: Macros can thus create new tokens by clever use of comments.
                   1602: For example, the macro definition ``#define foo(a,b) a/**/b''
                   1603: creates a macro \fIfoo\fP which
                   1604: concatenates its two arguments, forming a new token.
                   1605: .IP \(bu
                   1606: Macro bodies are analyzed for macro arguments
                   1607: without regard to the boundaries of string or character constants.
                   1608: The definition ``#define bar(a) "a\en"'' creates a macro
                   1609: which returns the literal form of its argument embedded
                   1610: in a string with a newline appended.
                   1611: .PP
                   1612: These extensions are not portable to
                   1613: a number of other C preprocessors.
                   1614: They may be replaced in the future by corresponding ANSI C features,
                   1615: when the ANSI C standard has been formalized.
                   1616: .SH
                   1617: Summary and Conclusion
                   1618: .PP
                   1619: The portable compiler has been a useful tool for providing C
                   1620: capability on a large number of diverse machines,
                   1621: and for testing a number of theoretical
                   1622: constructs in a practical setting.
                   1623: It has many blemishes, both in style and functionality.
                   1624: It has been applied to many more machines than first
                   1625: anticipated, of a much wider range than originally dreamed of.
                   1626: Its use has also spread much faster than expected, leaving parts of
                   1627: the compiler still somewhat raw in shape.
                   1628: .PP
                   1629: On the theoretical side, there is some hope that the
                   1630: skeleton of the
                   1631: .I sucomp
                   1632: routine could be generated for many machines directly from the
                   1633: templates; this would give a considerable boost
                   1634: to the portability and correctness of the compiler,
                   1635: but might affect tunability and code quality.
                   1636: There is also room for more optimization, both within
                   1637: .I optim
                   1638: and in the form of a portable ``peephole'' optimizer.
                   1639: .PP
                   1640: On the practical, development side,
                   1641: the compiler could probably be sped up and made smaller
                   1642: without doing too much violence to its basic structure.
                   1643: Parts of the compiler deserve to be rewritten;
                   1644: the initialization code, register allocation, and
                   1645: parser are prime candidates.
                   1646: It might be that doing some or all of the parsing
                   1647: with a recursive descent parser might
                   1648: save enough space and time to be worthwhile;
                   1649: it would certainly ease the problem of moving the
                   1650: compiler to an environment where
                   1651: .I Yacc
                   1652: is not already present.
                   1653: .SH
                   1654: Acknowledgements
                   1655: .PP
                   1656: I would like to thank the many people who have
                   1657: sympathetically, and even enthusiastically, helped me grapple
                   1658: with what has been a frustrating program to write, test, and install.
                   1659: D. M. Ritchie and E. N. Pinson provided needed early
                   1660: encouragement and philosophical guidance;
                   1661: M. E. Lesk,
                   1662: R. Muha, T. G. Peterson,
                   1663: G. Riddle, L. Rosler,
                   1664: R. W. Mitze,
                   1665: B. R. Rowland,
                   1666: S. I. Feldman,
                   1667: and
                   1668: T. B. London
                   1669: have all contributed ideas, gripes, and all, at one time or another,
                   1670: climbed ``into the pits'' with me to help debug.
                   1671: Without their help this effort would have not been possible;
                   1672: with it, it was often kind of fun.
                   1673: \*-S. C. Johnson
                   1674: .sp
                   1675: .PP
                   1676: Many people have contributed fixes and improvements
                   1677: to the current Berkeley version of the compiler.
                   1678: A number of really valuable fixes were contributed by
                   1679: Ralph Campbell, Sam Leffler, Kirk McKusick, Arthur Olsen, Donn Seeley,
                   1680: Don Speck and Chris Torek,
                   1681: but most of the bugs were spotted by
                   1682: the legions of virtuous C programmers
                   1683: who were kind enough to let us know
                   1684: that the compiler was broken
                   1685: and when the heck were we going to get it fixed?
                   1686: Thank you all.
                   1687: \*-Donn Seeley
                   1688: .sp 2
                   1689: .LP
                   1690: .[
                   1691: $LIST$
                   1692: .]
                   1693: .LP

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.