Annotation of 43BSDReno/share/doc/smm/19.porttour/porttour2.ms, revision 1.1

1.1     ! root        1: .\"    @(#)porttour2.ms        6.3 (Berkeley) 5/30/86
        !             2: .\"
        !             3: .SH
        !             4: Pass Two
        !             5: .PP
        !             6: Code generation is far less well understood than
        !             7: parsing or lexical analysis, and for this reason
        !             8: the second pass is far harder to discuss in a file by file manner.
        !             9: A great deal of the difficulty is in understanding the issues
        !            10: and the strategies employed to meet them.
        !            11: Any particular function is likely to be reasonably straightforward.
        !            12: .PP
        !            13: Thus, this part of the paper will concentrate a good deal on the broader
        !            14: aspects of strategy in the code generator,
        !            15: and will not get too intimate with the details.
        !            16: .SH
        !            17: Overview
        !            18: .PP
        !            19: It is difficult to organize a code generator to be flexible enough to
        !            20: generate code for a large number of machines,
        !            21: and still be efficient for any one of them.
        !            22: Flexibility is also important when it comes time to tune
        !            23: the code generator to improve the output code quality.
        !            24: On the other hand, too much flexibility can lead to semantically
        !            25: incorrect code, and potentially a combinatorial explosion in the
        !            26: number of cases to be considered in the compiler.
        !            27: .PP
        !            28: One goal of the code generator is to have a high degree of correctness.
        !            29: It is very desirable to have the compiler detect its own inability to
        !            30: generate correct code, rather than to produce incorrect code.
        !            31: This goal is achieved by having a simple model of the job
        !            32: to be done (e.g., an expression tree)
        !            33: and a simple model of the machine state
        !            34: (e.g., which registers are free).
        !            35: The act of generating an instruction performs a transformation
        !            36: on the tree and the machine state;
        !            37: hopefully, the tree eventually gets
        !            38: reduced to a single node.
        !            39: If each of these instruction/transformation pairs is correct,
        !            40: and if the machine state model really represents the actual machine,
        !            41: and if the transformations reduce the input tree to the desired single node, then the
        !            42: output code will be correct.
        !            43: .PP
        !            44: For most real machines, there is no definitive theory of code generation that
        !            45: encompasses all the C operators.
        !            46: Thus the selection of which instruction/transformations to generate, and in what
        !            47: order, will have a heuristic flavor.
        !            48: If, for some expression tree, no transformation applies, or, more
        !            49: seriously, if the heuristics select a sequence of instruction/transformations
        !            50: that do not in fact reduce the tree, the compiler
        !            51: will report its inability to generate code, and abort.
        !            52: .PP
        !            53: A major part of the code generator is concerned with the model and the transformations.
        !            54: Most of this is machine independent, or depends only on simple tables.
        !            55: The flexibility comes from the heuristics that guide the transformations
        !            56: of the trees, the selection of subgoals, and the ordering of the computation.
        !            57: .SH
        !            58: The Machine Model
        !            59: .PP
        !            60: The machine is assumed to have a number of registers, of at most two different
        !            61: types:
        !            62: .I A
        !            63: and
        !            64: .I B .
        !            65: Within each register class, there may be scratch (temporary) registers and dedicated registers
        !            66: (e.g., register variables, the stack pointer, etc.).
        !            67: Requests to allocate and free registers involve only the temporary registers.
        !            68: .PP
        !            69: Each of the registers in the machine is given a name and a number
        !            70: in the
        !            71: .I mac2defs.h
        !            72: file; the numbers are used as indices into various tables
        !            73: that describe the registers, so they should be kept small.
        !            74: One such table is the
        !            75: .I rstatus
        !            76: table on file
        !            77: .I local2.c .
        !            78: This table is indexed by register number, and
        !            79: contains expressions made up from manifest constants describing the register types:
        !            80: SAREG for dedicated AREG's, SAREG\(orSTAREG for scratch AREG's,
        !            81: and SBREG and SBREG\(orSTBREG similarly for BREG's.
        !            82: There are macros that access this information:
        !            83: .I isbreg(r)
        !            84: returns true if register number
        !            85: .I r
        !            86: is a BREG, and
        !            87: .I istreg(r)
        !            88: returns true if register number
        !            89: .I r
        !            90: is a temporary AREG or BREG.
        !            91: Another table,
        !            92: .I rnames ,
        !            93: contains the register names; this is used when
        !            94: putting out assembler code and diagnostics.
        !            95: .PP
        !            96: The usage of registers is kept track of by an array called
        !            97: .I busy .
        !            98: .I Busy[r]
        !            99: is the number of uses of register
        !           100: .I r
        !           101: in the current tree being processed.
        !           102: The allocation and freeing of registers will be discussed later as part of the code generation
        !           103: algorithm.
        !           104: .SH
        !           105: General Organization
        !           106: .PP
        !           107: As mentioned above, the second pass reads lines from
        !           108: the intermediate file, copying through to the output unchanged
        !           109: any lines that begin with a `)', and making note of the
        !           110: information about stack usage and register allocation contained on
        !           111: lines beginning with `]' and `['.
        !           112: The expression trees, whose beginning is indicated by a line beginning with `.',
        !           113: are read and rebuilt into trees.
        !           114: If the compiler is loaded as one pass, the expression trees are
        !           115: immediately available to the code generator.
        !           116: .PP
        !           117: The actual code generation is done by a hierarchy of routines.
        !           118: The routine
        !           119: .I delay
        !           120: is first given the tree; it attempts to delay some postfix
        !           121: \fB+\|+\fP and \fB\-\|\-\fP computations that might reasonably be done after the
        !           122: smoke clears.
        !           123: It also attempts to handle comma (`,') operators by computing the
        !           124: left side expression first, and then rewriting the tree
        !           125: to eliminate the operator.
        !           126: .I Delay
        !           127: calls
        !           128: .I codgen
        !           129: to control the actual code generation process.
        !           130: .I Codgen
        !           131: takes as arguments a pointer to the expression tree,
        !           132: and a second argument that, for socio-historical reasons, is called a
        !           133: .I cookie .
        !           134: The cookie describes a set of goals that would be acceptable
        !           135: for the code generation: these are assigned to individual bits,
        !           136: so they may be logically or'ed together to form a large number of possible goals.
        !           137: Among the possible goals are FOREFF (compute for side effects only;
        !           138: don't worry about the value), INTEMP (compute and store value into a temporary location
        !           139: in memory), INAREG (compute into an A register), INTAREG (compute into a scratch
        !           140: A register), INBREG and INTBREG similarly, FORCC (compute for condition codes),
        !           141: and FORARG (compute it as a function argument; e.g., stack it if appropriate).
        !           142: .PP
        !           143: .I Codgen
        !           144: first canonicalizes the tree by calling
        !           145: .I canon .
        !           146: This routine looks for certain transformations that might now be
        !           147: applicable to the tree.
        !           148: One, which is very common and very powerful, is to
        !           149: fold together an indirection operator (UNARY MUL)
        !           150: and a register (REG); in most machines, this combination is
        !           151: addressable directly, and so is similar to a NAME in its
        !           152: behavior.
        !           153: The UNARY MUL and REG are folded together to make
        !           154: another node type called OREG.
        !           155: In fact, in many machines it is possible to directly address not just the
        !           156: cell pointed to by a register, but also cells differing by a constant
        !           157: offset from the cell pointed to by the register.
        !           158: .I Canon
        !           159: also looks for such cases, calling the
        !           160: machine dependent routine 
        !           161: .I notoff
        !           162: to decide if the offset is acceptable (for example, in the IBM 370 the offset
        !           163: must be between 0 and 4095 bytes).
        !           164: Another optimization is to replace bit field operations
        !           165: by shifts and masks if the operation involves extracting the field.
        !           166: Finally, a machine dependent routine,
        !           167: .I sucomp ,
        !           168: is called that computes the Sethi-Ullman numbers
        !           169: for the tree (see below).
        !           170: .PP
        !           171: After the tree is canonicalized,
        !           172: .I codgen
        !           173: calls the routine
        !           174: .I store
        !           175: whose job is to select a subtree of the tree to be computed
        !           176: and (usually) stored before beginning the
        !           177: computation of the full tree.
        !           178: .I Store
        !           179: must return a tree that can be computed without need
        !           180: for any temporary storage locations.
        !           181: In effect, the only store operations generated while processing the subtree must be as a response
        !           182: to explicit assignment operators in the tree.
        !           183: This division of the job marks one of the more significant, and successful,
        !           184: departures from most other compilers.
        !           185: It means that the code generator can operate
        !           186: under the assumption that there are enough
        !           187: registers to do its job, without worrying about
        !           188: temporary storage.
        !           189: If a store into a temporary appears in the output, it is always
        !           190: as a direct result of logic in the
        !           191: .I store
        !           192: routine; this makes debugging easier.
        !           193: .PP
        !           194: One consequence of this organization is that
        !           195: code is not generated by a treewalk.
        !           196: There are theoretical results that support this decision.
        !           197: .[
        !           198: Aho Johnson Optimal Code Trees jacm
        !           199: .]
        !           200: It may be desirable to compute several subtrees and store
        !           201: them before tackling the whole tree;
        !           202: if a subtree is to be stored, this is known before the code
        !           203: generation for the subtree is begun, and the subtree is computed
        !           204: when all scratch registers are available.
        !           205: .PP
        !           206: The
        !           207: .I store
        !           208: routine decides what subtrees, if any, should be
        !           209: stored by making use of numbers,
        !           210: called
        !           211: .I "Sethi-Ullman numbers" ,
        !           212: that give, for each subtree of an expression tree,
        !           213: the minimum number of scratch registers required to
        !           214: compile the subtree, without any stores into temporaries.
        !           215: .[
        !           216: Sethi Ullman jacm 1970
        !           217: .]
        !           218: These numbers are computed by the machine-dependent
        !           219: routine
        !           220: .I sucomp ,
        !           221: called by
        !           222: .I canon .
        !           223: The basic notion is that, knowing the Sethi-Ullman numbers for
        !           224: the descendants of a node, and knowing the operator of the
        !           225: node and some information about the machine, the
        !           226: Sethi-Ullman number of the node itself can be computed.
        !           227: If the Sethi-Ullman number for a tree exceeds the number of scratch
        !           228: registers available, some subtree must be stored.
        !           229: Unfortunately, the theory behind the Sethi-Ullman numbers
        !           230: applies only to uselessly simple machines and operators.
        !           231: For the rich set of C operators, and for machines with
        !           232: asymmetric registers, register pairs, different kinds of registers,
        !           233: and exceptional forms of addressing,
        !           234: the theory cannot be applied directly.
        !           235: The basic idea of estimation is a good one, however,
        !           236: and well worth applying; the application, especially
        !           237: when the compiler comes to be tuned for high code
        !           238: quality, goes beyond the park of theory into the
        !           239: swamp of heuristics.
        !           240: This topic will be taken up again later, when more of the compiler
        !           241: structure has been described.
        !           242: .PP
        !           243: After examining the Sethi-Ullman numbers,
        !           244: .I store
        !           245: selects a subtree, if any, to be stored, and returns the subtree and the associated cookie in
        !           246: the external variables
        !           247: .I stotree
        !           248: and
        !           249: .I stocook .
        !           250: If a subtree has been selected, or if
        !           251: the whole tree is ready to be processed, the
        !           252: routine
        !           253: .I order
        !           254: is called, with a tree and cookie.
        !           255: .I Order
        !           256: generates code for trees that
        !           257: do not require temporary locations.
        !           258: .I Order
        !           259: may make recursive calls on itself, and,
        !           260: in some cases, on
        !           261: .I codgen ;
        !           262: for example, when processing the operators \fB&&\fP, \fB\(or\(or\fP, and comma (`,'), that have
        !           263: a left to right evaluation, it is
        !           264: incorrect for
        !           265: .I store
        !           266: examine the right operand for subtrees to be stored.
        !           267: In these cases,
        !           268: .I order
        !           269: will call
        !           270: .I codgen
        !           271: recursively when it is permissible to work on the right operand.
        !           272: A similar issue arises with the \fB? :\fP operator.
        !           273: .PP
        !           274: The
        !           275: .I order
        !           276: routine works by matching the current tree with
        !           277: a set of code templates.
        !           278: If a template is discovered that will
        !           279: match the current tree and cookie, the associated assembly language
        !           280: statement or statements are generated.
        !           281: The tree is then rewritten,
        !           282: as specified by the template, to represent the effect of the output instruction(s).
        !           283: If no template match is found, first an attempt is made to find a match with a
        !           284: different cookie; for example, in order to compute
        !           285: an expression with cookie INTEMP (store into a temporary storage location),
        !           286: it is usually necessary to compute the expression into a scratch register
        !           287: first.
        !           288: If all attempts to match the tree fail, the heuristic part of
        !           289: the algorithm becomes dominant.
        !           290: Control is typically given to one of a number of machine-dependent routines
        !           291: that may in turn recursively call
        !           292: .I order
        !           293: to achieve a subgoal of the computation (for example, one of the
        !           294: arguments may be computed into a temporary register).
        !           295: After this subgoal has been achieved, the process begins again with the
        !           296: modified tree.
        !           297: If the machine-dependent heuristics are unable to reduce the tree further,
        !           298: a number of default rewriting rules may be considered appropriate.
        !           299: For example, if the left operand of a \fB+\fP is a scratch
        !           300: register, the \fB+\fP can be replaced by a \fB+=\fP operator;
        !           301: the tree may then match a template.
        !           302: .PP
        !           303: To close this introduction, we will discuss the steps in compiling
        !           304: code for the expression
        !           305: .DS
        !           306: \fIa\fR += \fIb\fR
        !           307: .DE
        !           308: where
        !           309: .I a
        !           310: and
        !           311: .I b
        !           312: are static variables.
        !           313: .PP
        !           314: To begin with, the whole expression tree is examined with cookie FOREFF, and
        !           315: no match is found.  Search with other cookies is equally fruitless, so an
        !           316: attempt at rewriting is made.
        !           317: Suppose we are dealing with the Interdata 8/32 for the moment.
        !           318: It is recognized that the left hand and right hand sides of the \fB+=\fP operator
        !           319: are addressable, and in particular the left hand side has no
        !           320: side effects, so it is permissible to rewrite this as
        !           321: .DS
        !           322: \fIa\fR = \fIa\fR + \fIb\fR
        !           323: .DE
        !           324: and this is done.
        !           325: No match is found on this tree either, so a machine dependent rewrite is done; it is recognized
        !           326: that the left hand side of the assignment is addressable, but the right hand side is not
        !           327: in a register, so
        !           328: .I order
        !           329: is called recursively, being asked to put the right
        !           330: hand side of the assignment into a register.
        !           331: This invocation of
        !           332: .I order
        !           333: searches the tree for a match, and fails.
        !           334: The machine dependent rule for \fB+\fP
        !           335: notices that the right hand operand is addressable;
        !           336: it decides to put the left operand into a scratch register.
        !           337: Another recursive call to
        !           338: .I order
        !           339: is made, with the tree
        !           340: consisting solely of the leaf
        !           341: .I a ,
        !           342: and the cookie asking that the value be placed into a scratch register.
        !           343: This now matches a template, and a load instruction is emitted.
        !           344: The node consisting of
        !           345: .I a
        !           346: is rewritten in place to represent the register into which
        !           347: .I a
        !           348: is loaded,
        !           349: and this third call to
        !           350: .I order
        !           351: returns.
        !           352: The second call to
        !           353: .I order
        !           354: now finds that it has the tree
        !           355: .DS
        !           356: \fBreg\fR + \fIb\fR
        !           357: .DE
        !           358: to consider.
        !           359: Once again, there is no match, but the default rewriting rule rewrites
        !           360: the \fB+\fP as a \fB+=\fP operator, since the left operand is a scratch register.
        !           361: When this is done, there is a match: in fact,
        !           362: .DS
        !           363: \fBreg\fR += \fIb\fR
        !           364: .DE
        !           365: simply describes the effect of the add instruction
        !           366: on a typical machine.
        !           367: After the add is emitted, the tree is rewritten
        !           368: to consist merely of the register node, since the result of the add
        !           369: is now in the register.
        !           370: This agrees with the cookie passed to the second invocation of
        !           371: .I order ,
        !           372: so this invocation
        !           373: terminates, returning to the first level.
        !           374: The original tree has now
        !           375: become
        !           376: .DS
        !           377: \fIa\fR = \fBreg\fR
        !           378: .DE
        !           379: which matches a template for the store instruction.
        !           380: The store is output, and the tree rewritten to become
        !           381: just a single register node.
        !           382: At this point, since the top level call to
        !           383: .I order
        !           384: was
        !           385: interested only in side effects, the call to
        !           386: .I order
        !           387: returns, and the code generation is completed;
        !           388: we have generated a load, add, and store, as might have been expected.
        !           389: .PP
        !           390: The effect of machine architecture on this is considerable.
        !           391: For example, on the Honeywell 6000, the machine dependent heuristics recognize that there is an ``add to storage''
        !           392: instruction, so the strategy is quite different;
        !           393: .I b
        !           394: is loaded in to
        !           395: a register, and then an add to storage instruction generated
        !           396: to add this register in to
        !           397: .I a .
        !           398: The transformations, involving as they do the semantics of C,
        !           399: are largely machine independent.
        !           400: The decisions as to when to use them, however, are
        !           401: almost totally machine dependent.
        !           402: .PP
        !           403: Having given a broad outline of
        !           404: the code generation process, we shall next consider the
        !           405: heart of it: the templates.
        !           406: This leads naturally into discussions of template matching and register allocation,
        !           407: and finally a discussion of the machine dependent interfaces and strategies.
        !           408: .SH
        !           409: The Templates
        !           410: .PP
        !           411: The templates describe the effect of the target machine instructions
        !           412: on the model of computation around which the compiler is organized.
        !           413: In effect, each template has five logical sections, and represents an assertion
        !           414: of the form:
        !           415: .IP
        !           416: .B If
        !           417: we have a subtree of a given shape (1), and we have a goal (cookie) or goals to
        !           418: achieve (2), and we have sufficient free resources (3),
        !           419: .B then
        !           420: we may emit an instruction or instructions (4), and
        !           421: rewrite the subtree in a particular manner (5),
        !           422: and the rewritten tree will achieve the desired goals.
        !           423: .PP
        !           424: These five sections will be discussed in more
        !           425: detail later.  First, we give an example of a
        !           426: template:
        !           427: .DS
        !           428: .ta 1i 2i 3i 4i 5i
        !           429: ASG PLUS,      INAREG,
        !           430:        SAREG,  TINT,
        !           431:        SNAME,  TINT,
        !           432:                0,      RLEFT,
        !           433:                "       add     AL,AR\en",
        !           434: .DE
        !           435: The top line specifies the operator (\fB+=\fP) and the cookie (compute the
        !           436: value of the subtree into an AREG).
        !           437: The second and third lines specify the left and right descendants,
        !           438: respectively,
        !           439: of the \fB+=\fP operator.
        !           440: The left descendant must be a REG node, representing an
        !           441: A register, and have integer type, while the right side must be a NAME node,
        !           442: and also have integer type.
        !           443: The fourth line contains the resource requirements (no scratch registers
        !           444: or temporaries needed), and the rewriting rule (replace the subtree by the left descendant).
        !           445: Finally, the quoted string on the last line represents the output to the assembler:
        !           446: lower case letters, tabs, spaces, etc. are copied
        !           447: .I verbatim .
        !           448: to the output; upper case letters trigger various macro-like expansions.
        !           449: Thus,
        !           450: .B AL
        !           451: would expand into the \fBA\fRddress form of the \fBL\fReft operand \(em
        !           452: presumably the register number.
        !           453: Similarly,
        !           454: .B AR
        !           455: would expand into the name of the right operand.
        !           456: The
        !           457: .I add
        !           458: instruction of the last section might well be
        !           459: emitted by this template.
        !           460: .PP
        !           461: In principle, it would be possible to make separate templates
        !           462: for all legal combinations of operators, cookies, types, and shapes.
        !           463: In practice, the number of combinations is very large.
        !           464: Thus, a considerable amount of mechanism is present to
        !           465: permit a large number of subtrees to be matched
        !           466: by a single template.
        !           467: Most of the shape and type specifiers are individual bits, and can
        !           468: be logically
        !           469: or'ed
        !           470: together.
        !           471: There are a number of special descriptors for matching classes of
        !           472: operators.
        !           473: The cookies can also be combined.
        !           474: As an example of the kind of template
        !           475: that really arises in practice, the
        !           476: actual template for the Interdata 8/32
        !           477: that subsumes the above example is:
        !           478: .DS
        !           479: .ta 1i 2i 3i 4i 5i
        !           480: ASG OPSIMP,    INAREG\(orFORCC,
        !           481:        SAREG,  TINT\(orTUNSIGNED\(orTPOINT,
        !           482:        SAREG\(orSNAME\(orSOREG\(orSCON,        TINT\(orTUNSIGNED\(orTPOINT,
        !           483:                0,      RLEFT\(orRESCC,
        !           484:                "       OI      AL,AR\en",
        !           485: .DE
        !           486: Here, OPSIMP represents the operators
        !           487: +, \-, \(or, &, and ^.
        !           488: The
        !           489: .B OI
        !           490: macro in the output string expands into the
        !           491: appropriate \fBI\fRnteger \fBO\fRpcode for the operator.
        !           492: The left and right sides can be integers, unsigned, or pointer types.
        !           493: The right side can be, in addition to a name, a register,
        !           494: a memory location whose address is given by a register and displacement (OREG),
        !           495: or a constant.
        !           496: Finally, these instructions set the condition codes,
        !           497: and so can be used in condition contexts:
        !           498: the cookie and rewriting rules reflect this.
        !           499: .SH
        !           500: The Template Matching Algorithm
        !           501: .PP
        !           502: The heart of the second pass is the template matching
        !           503: algorithm, in the routine
        !           504: .I match .
        !           505: .I Match
        !           506: is called with a tree and a cookie; it attempts to match
        !           507: the given tree against some template that will transform it
        !           508: according to one of the goals given in the cookie.
        !           509: If a match is successful, the transformation is
        !           510: applied;
        !           511: .I expand
        !           512: is called to generate the assembly code, and then
        !           513: .I reclaim
        !           514: rewrites the tree, and reclaims the resources, such
        !           515: as registers, that might have become free as a result
        !           516: of the generated code.
        !           517: .PP
        !           518: This part of the compiler is among the most time critical.
        !           519: There is a spectrum of implementation techniques available
        !           520: for doing this matching.
        !           521: The most naive algorithm simply looks at the templates one by one.
        !           522: This can be considerably improved upon by restricting the search
        !           523: for an acceptable template.
        !           524: It would be possible to do better than this if the templates were given
        !           525: to a separate program that ate them and generated a template
        !           526: matching subroutine.
        !           527: This would make maintenance of the compiler much more
        !           528: complicated, however, so this has not been done.
        !           529: .PP
        !           530: The matching algorithm is actually carried out by restricting
        !           531: the range in the table that must be searched for each opcode.
        !           532: This introduces a number of complications, however, and needs a
        !           533: bit of sympathetic help by the person constructing the
        !           534: compiler in order to obtain best results.
        !           535: The exact tuning of this algorithm continues; it
        !           536: is best to consult the code and comments in
        !           537: .I match
        !           538: for the latest version.
        !           539: .PP
        !           540: In order to match a template to a tree,
        !           541: it is necessary to match not only the cookie and the
        !           542: operator of the root, but also the types and shapes of the
        !           543: left and right descendants (if any) of the tree.
        !           544: A convention is established here that is carried out throughout
        !           545: the second pass of the compiler.
        !           546: If a node represents a unary operator, the single descendant
        !           547: is always the ``left'' descendant.
        !           548: If a node represents a unary operator or a leaf node (no descendants)
        !           549: the ``right'' descendant is taken by convention to be the node itself.
        !           550: This enables templates to easily match leaves and conversion operators, for example,
        !           551: without any additional mechanism in the matching program.
        !           552: .PP
        !           553: The type matching is straightforward; it is possible to specify any combination
        !           554: of basic types, general pointers, and pointers to one or more of
        !           555: the basic types.
        !           556: The shape matching is somewhat more complicated, but still pretty simple.
        !           557: Templates have a collection of possible operand shapes
        !           558: on which the opcode might match.
        !           559: In the simplest case, an
        !           560: .I add
        !           561: operation might be able to add to either a register variable
        !           562: or a scratch register, and might be able (with appropriate
        !           563: help from the assembler) to add an integer constant (ICON), a static
        !           564: memory cell (NAME), or a stack location (OREG).
        !           565: .PP
        !           566: It is usually attractive to specify a number of such shapes,
        !           567: and distinguish between them when the assembler output is produced.
        !           568: It is possible to describe the union of many elementary
        !           569: shapes such as ICON, NAME, OREG,
        !           570: AREG or BREG
        !           571: (both scratch and register forms), etc.
        !           572: To handle at least the simple forms of indirection, one can also
        !           573: match some more complicated forms of trees: STARNM and STARREG
        !           574: can match more complicated trees headed by an indirection operator,
        !           575: and SFLD can match certain trees headed by a FLD operator.
        !           576: These patterns call machine dependent routines that match the
        !           577: patterns of interest on a given machine.
        !           578: The shape SWADD may be used to recognize NAME or OREG
        !           579: nodes that lie on word boundaries: this may be of some importance
        !           580: on word addressed machines.
        !           581: Finally, there are some special shapes: these may not
        !           582: be used in conjunction with the other shapes, but may be
        !           583: defined and extended in machine dependent ways.
        !           584: The special shapes SZERO, SONE, and SMONE are predefined and match
        !           585: constants 0, 1, and \-1, respectively; others are easy to add
        !           586: and match by using the machine dependent routine
        !           587: .I special .
        !           588: .PP
        !           589: When a template has been found that matches the root of the tree,
        !           590: the cookie, and the shapes and types of the descendants,
        !           591: there is still one bar to a total match: the template may
        !           592: call for some resources (for example, a scratch register).
        !           593: The routine
        !           594: .I allo
        !           595: is called, and it attempts to allocate the resources.
        !           596: If it cannot, the match fails; no resources are
        !           597: allocated.
        !           598: If successful, the allocated resources are given numbers
        !           599: 1, 2, etc. for later reference when the assembly code is
        !           600: generated.
        !           601: The routines
        !           602: .I expand
        !           603: and
        !           604: .I reclaim
        !           605: are then called.
        !           606: The
        !           607: .I match
        !           608: routine then returns a special value, MDONE.
        !           609: If no match was found, the value MNOPE is returned;
        !           610: this is a signal to the caller to try more cookie
        !           611: values, or attempt a rewriting rule.
        !           612: .I Match
        !           613: is also used to select rewriting rules, although
        !           614: the way of doing this is pretty straightforward.
        !           615: A special cookie, FORREW, is used to ask
        !           616: .I match
        !           617: to search for a rewriting rule.
        !           618: The rewriting rules are keyed to various opcodes; most
        !           619: are carried out in
        !           620: .I order .
        !           621: Since the question of when to rewrite is one of the key issues in
        !           622: code generation, it will be taken up again later.
        !           623: .SH
        !           624: Register Allocation
        !           625: .PP
        !           626: The register allocation routines, and the allocation strategy,
        !           627: play a central role in the correctness of the code generation algorithm.
        !           628: If there are bugs in the Sethi-Ullman computation that cause the
        !           629: number of needed registers to be underestimated,
        !           630: the compiler may run out of scratch registers;
        !           631: it is essential that the allocator keep track of those registers that
        !           632: are free and busy, in order to detect such conditions.
        !           633: .PP
        !           634: Allocation of registers takes place as the result of a template
        !           635: match; the routine
        !           636: .I allo
        !           637: is called with a word describing the number of A registers,
        !           638: B registers, and temporary locations needed.
        !           639: The allocation of temporary locations on the stack is relatively
        !           640: straightforward, and will not be further covered; the
        !           641: bookkeeping is a bit tricky, but conceptually trivial, and requests
        !           642: for temporary space on the stack will never fail.
        !           643: .PP
        !           644: Register allocation is less straightforward.
        !           645: The two major complications are
        !           646: .I pairing
        !           647: and
        !           648: .I sharing .
        !           649: In many machines, some operations (such as multiplication
        !           650: and division), and/or some types (such as longs or double precision)
        !           651: require even/odd pairs of registers.
        !           652: Operations of the first type are exceptionally difficult to
        !           653: deal with in the compiler; in fact, their theoretical
        !           654: properties are rather bad as well.
        !           655: .[
        !           656: Aho Johnson Ullman Multiregister
        !           657: .]
        !           658: The second issue is dealt with rather more successfully;
        !           659: a machine dependent function called
        !           660: .I szty(t)
        !           661: is called that returns 1 or 2, depending on the
        !           662: number of A registers required to hold an object of type
        !           663: .I t .
        !           664: If
        !           665: .I szty
        !           666: returns 2, an even/odd pair of A registers is allocated
        !           667: for each request.
        !           668: As part of its duties, the routine
        !           669: .I usable
        !           670: finds usable register pairs for various operations.
        !           671: This task is not as easy as it sounds;
        !           672: it does not suffice to merely use
        !           673: .I szty
        !           674: on the expression tree,
        !           675: since there are situations in which
        !           676: a register pair temporary is needed even though
        !           677: the result of the expression requires only one register.
        !           678: This can occur with assignment operator expressions
        !           679: which have \fBint\fP type but a \fBdouble\fP right hand side,
        !           680: or with relational expressions where
        !           681: one operand is \fBfloat\fP and the other \fBdouble\fP.
        !           682: .PP
        !           683: The other issue, sharing, is more subtle, but
        !           684: important for good code quality.
        !           685: When registers are allocated, it
        !           686: is possible to reuse registers that hold address
        !           687: information, and use them to contain the values
        !           688: computed or accessed.
        !           689: For example, on the IBM 360, if register 2 has
        !           690: a pointer to an integer in it, we may load the
        !           691: integer into register 2 itself by saying:
        !           692: .DS
        !           693: L      2,0(2)
        !           694: .DE
        !           695: If register 2 had a byte pointer, however, the sequence for
        !           696: loading a character involves clearing the target
        !           697: register first, and then inserting the desired character:
        !           698: .DS
        !           699: SR     3,3
        !           700: IC     3,0(2)
        !           701: .DE
        !           702: In the first case, if register 3 were used as the target,
        !           703: it would lead to a larger number of registers
        !           704: used for the expression than were required; the compiler would
        !           705: generate inefficient code.
        !           706: On the other hand, if register 2 were used as the target in the second
        !           707: case, the code would simply be wrong.
        !           708: In the first case, register 2 can be 
        !           709: .I shared
        !           710: while in the second, it cannot.
        !           711: .PP
        !           712: In the specification of the register needs in the templates,
        !           713: it is possible to indicate whether required scratch registers
        !           714: may be shared with possible registers on the left or the right of the input tree.
        !           715: In order that a register be shared, it must be scratch, and it must
        !           716: be used only once, on the appropriate side of the tree being compiled.
        !           717: .PP
        !           718: The
        !           719: .I allo
        !           720: routine thus has a bit more to do than meets the eye;
        !           721: it calls
        !           722: .I freereg
        !           723: to obtain a free register for each A and B register request.
        !           724: .I Freereg
        !           725: makes multiple calls on the routine
        !           726: .I usable
        !           727: to decide if a given register can be used to satisfy
        !           728: a given need.
        !           729: .I Usable
        !           730: calls
        !           731: .I shareit
        !           732: if the register is busy, but might be shared.
        !           733: Finally,
        !           734: .I shareit
        !           735: calls
        !           736: .I ushare
        !           737: to decide if the desired register is actually in the appropriate
        !           738: subtree, and can be shared.
        !           739: .PP
        !           740: Just to add additional complexity, on some machines (such as the IBM 370) it
        !           741: is possible to have ``double indexing'' forms of
        !           742: addressing; these are represented by OREG's
        !           743: with the base and index registers encoded into the register field.
        !           744: While the register allocation and deallocation
        !           745: .I "per se"
        !           746: is not made more difficult by this phenomenon, the code itself
        !           747: is somewhat more complex.
        !           748: .PP
        !           749: Having allocated the registers and expanded the assembly language,
        !           750: it is time to reclaim the resources; the routine
        !           751: .I reclaim
        !           752: does this.
        !           753: Many operations produce more than one result.
        !           754: For example, many arithmetic operations may produce
        !           755: a value in a register, and also set the condition
        !           756: codes.
        !           757: Assignment operations may leave results both in a register and in memory.
        !           758: .I Reclaim
        !           759: is passed three parameters; the tree and cookie
        !           760: that were matched, and the rewriting field of the template.
        !           761: The rewriting field allows the specification of possible results;
        !           762: the tree is rewritten to reflect the results of the operation.
        !           763: If the tree was computed for side effects only (FOREFF),
        !           764: the tree is freed, and all resources in it reclaimed.
        !           765: If the tree was computed for condition codes, the resources
        !           766: are also freed, and the tree replaced by a special
        !           767: node type, FORCC.
        !           768: Otherwise, the value may be found in the left
        !           769: argument of the root, the right argument of the root,
        !           770: or one of the temporary resources allocated.
        !           771: In these cases, first the resources of the tree,
        !           772: and the newly allocated resources,
        !           773: are
        !           774: freed; then the resources needed by the result
        !           775: are made busy again.
        !           776: The final result must always match the shape of the input cookie;
        !           777: otherwise, the compiler error
        !           778: ``cannot reclaim''
        !           779: is generated.
        !           780: There are some machine dependent ways of
        !           781: preferring results in registers or memory when
        !           782: there are multiple results matching multiple goals in the cookie.
        !           783: .PP
        !           784: .I Reclaim
        !           785: also implements, in a curious way,
        !           786: C's ``usual arithmetic conversions''.
        !           787: When a value is generated into a temporary register,
        !           788: .I reclaim
        !           789: decides what the type and size of the result will be.
        !           790: Unless automatic conversion is specifically suppressed
        !           791: in the code template with the \fBT\fP macro,
        !           792: \fIreclaim\fP converts \fBchar\fP and \fBshort\fP
        !           793: results to \fBint\fP,
        !           794: \fBunsigned char\fP and \fBunsigned short\fP results
        !           795: to \fBunsigned int\fP,
        !           796: and \fBfloat\fP into \fBdouble\fP
        !           797: (for double only floating point arithmetic).
        !           798: This conversion is a simple type pun;
        !           799: no instructions for converting the value
        !           800: are actually emitted.
        !           801: This implies that registers must always
        !           802: contain a value that is at least as wide
        !           803: as a register,
        !           804: which greatly restricts the range of possible templates.
        !           805: .SH
        !           806: The Machine Dependent Interface
        !           807: .PP
        !           808: The files
        !           809: .I order.c ,
        !           810: .I local2.c ,
        !           811: and
        !           812: .I table.c ,
        !           813: as well as the header file
        !           814: .I mac2defs ,
        !           815: represent the machine dependent portion of the second pass.
        !           816: The machine dependent portion can be roughly divided into
        !           817: two: the easy portion and the hard portion.
        !           818: The easy portion
        !           819: tells the compiler the names of the registers, and arranges that
        !           820: the compiler generate the proper assembler formats, opcode names, location counters, etc.
        !           821: The hard portion involves the Sethi\-Ullman computation, the
        !           822: rewriting rules, and, to some extent, the templates.
        !           823: It is hard because there are no real algorithms that apply;
        !           824: most of this portion is based on heuristics.
        !           825: This section discusses the easy portion; the next several
        !           826: sections will discuss the hard portion.
        !           827: .PP
        !           828: If the compiler is adapted from a compiler for a machine
        !           829: of similar architecture, the easy part is indeed easy.
        !           830: In
        !           831: .I mac2defs ,
        !           832: the register numbers are defined, as well as various parameters for
        !           833: the stack frame, and various macros that describe the machine architecture.
        !           834: If double indexing is to be permitted, for example, the symbol
        !           835: R2REGS is defined.
        !           836: Also, a number of macros that are involved in function call processing,
        !           837: especially for unusual function call mechanisms, are defined here.
        !           838: .PP
        !           839: In
        !           840: .I local2.c ,
        !           841: a large number of simple functions are defined.
        !           842: These do things such as write out opcodes, register names,
        !           843: and address forms for the assembler.
        !           844: Part of the function call code is defined here; that is nontrivial
        !           845: to design, but typically rather straightforward to implement.
        !           846: Among the easy routines in
        !           847: .I order.c
        !           848: are routines for generating a created label,
        !           849: defining a label, and generating the arguments
        !           850: of a function call.
        !           851: .PP
        !           852: These routines tend to have a local effect, and depend on a fairly straightforward way
        !           853: on the target assembler and the design decisions already made about
        !           854: the compiler.
        !           855: Thus they will not be further treated here.
        !           856: .SH
        !           857: The Rewriting Rules
        !           858: .PP
        !           859: When a tree fails to match any template, it becomes
        !           860: a candidate for rewriting.
        !           861: Before the tree is rewritten,
        !           862: the machine dependent routine
        !           863: .I nextcook
        !           864: is called with the tree and the cookie; it suggests
        !           865: another cookie that might be a better candidate for the
        !           866: matching of the tree.
        !           867: If all else fails, the templates are searched with the cookie
        !           868: FORREW, to look for a rewriting rule.
        !           869: The rewriting rules are of two kinds;
        !           870: for most of the common operators, there are
        !           871: machine dependent rewriting rules that may be applied;
        !           872: these are handled by machine dependent functions
        !           873: that are called and given the tree to be computed.
        !           874: These routines may recursively call
        !           875: .I order
        !           876: or
        !           877: .I codgen
        !           878: to cause certain subgoals to be achieved;
        !           879: if they actually call for some alteration of the tree,
        !           880: they return 1, and the
        !           881: code generation algorithm recanonicalizes and tries again.
        !           882: If these routines choose not to deal with the tree, the
        !           883: default rewriting rules are applied.
        !           884: .PP
        !           885: The assignment operators, when rewritten, call the routine
        !           886: .I setasg .
        !           887: This is assumed to rewrite the tree at least to the point where there are
        !           888: no side effects in the left hand side.
        !           889: If there is still no template match,
        !           890: a default rewriting is done that causes
        !           891: an expression such as
        !           892: .DS
        !           893: .I "a += b"
        !           894: .DE
        !           895: to be rewritten as
        !           896: .DS
        !           897: .I "a = a + b"
        !           898: .DE
        !           899: This is a useful default for certain mixtures of strange types
        !           900: (for example, when
        !           901: .I a
        !           902: is a bit field and
        !           903: .I b
        !           904: an character) that
        !           905: otherwise might need separate table entries.
        !           906: .PP
        !           907: Simple assignment, structure assignment, and all forms of calls
        !           908: are handled completely by the machine dependent routines.
        !           909: For historical reasons, the routines generating the calls return
        !           910: 1 on failure, 0 on success, unlike the other routines.
        !           911: .PP
        !           912: The machine dependent routine
        !           913: .I setbin
        !           914: handles binary operators; it too must do most of the job.
        !           915: In particular, when it returns 0, it must do so with
        !           916: the left hand side in a temporary register.
        !           917: The default rewriting rule in this case is to convert the
        !           918: binary operator into the associated assignment operator;
        !           919: since the left hand side is assumed to be a temporary register,
        !           920: this preserves the semantics and often allows a considerable
        !           921: saving in the template table.
        !           922: .PP
        !           923: The increment and decrement operators may be dealt with with
        !           924: the machine dependent routine
        !           925: .I setincr .
        !           926: If this routine chooses not to deal with the tree, the rewriting rule replaces
        !           927: .DS
        !           928: .I "x ++"
        !           929: .DE
        !           930: by
        !           931: .DS
        !           932: .I "( (x += 1) \- 1)"
        !           933: .DE
        !           934: which preserves the semantics.
        !           935: Once again, this is not too attractive for the most common
        !           936: cases, but can generate close to optimal code when the
        !           937: type of x is unusual.
        !           938: .PP
        !           939: Finally, the indirection (UNARY MUL) operator is also handled
        !           940: in a special way.
        !           941: The machine dependent routine
        !           942: .I offstar
        !           943: is extremely important for the efficient generation of code.
        !           944: .I Offstar
        !           945: is called with a tree that is the direct descendant of a UNARY MUL node;
        !           946: its job is to transform this tree so that the combination of
        !           947: UNARY MUL with the transformed tree becomes addressable.
        !           948: On most machines,
        !           949: .I offstar
        !           950: can simply compute the tree into an A or B register,
        !           951: depending on the architecture, and then
        !           952: .I canon
        !           953: will make the resulting tree into an OREG.
        !           954: On many machines,
        !           955: .I offstar
        !           956: can profitably choose to do less work than computing
        !           957: its entire argument into a register.
        !           958: For example, if the target machine supports OREG's
        !           959: with a constant offset from a register, and
        !           960: .I offstar
        !           961: is called
        !           962: with a tree of the form
        !           963: .DS
        !           964: .I "expr + const"
        !           965: .DE
        !           966: where
        !           967: .I const
        !           968: is a constant, then
        !           969: .I offstar
        !           970: need only compute
        !           971: .I expr
        !           972: into the appropriate form of register.
        !           973: On machines that support double indexing,
        !           974: .I offstar
        !           975: may have even more choice as to how to proceed.
        !           976: The proper tuning of
        !           977: .I offstar ,
        !           978: which is not typically too difficult, should be one of the
        !           979: first tries at optimization attempted by the
        !           980: compiler writer.
        !           981: .SH
        !           982: The Sethi-Ullman Computation
        !           983: .PP
        !           984: The heart of the heuristics is the computation of the Sethi-Ullman numbers.
        !           985: This computation is closely linked with the rewriting rules and the
        !           986: templates.
        !           987: As mentioned before, the Sethi-Ullman numbers are expected to
        !           988: estimate the number of scratch registers needed to compute
        !           989: the subtrees without using any stores.
        !           990: However, the original theory does not apply to real machines.
        !           991: For one thing, the theory assumes that all registers
        !           992: are interchangeable.
        !           993: Real machines have general purpose, floating point, and index registers,
        !           994: register pairs, etc.
        !           995: The theory also does not account for side effects;
        !           996: this rules out various forms of pathology that arise
        !           997: from assignment and assignment operators.
        !           998: Condition codes are also undreamed of.
        !           999: Finally, the influence of types, conversions, and the
        !          1000: various addressability restrictions and extensions of real
        !          1001: machines are also ignored.
        !          1002: .PP
        !          1003: Nevertheless, for a ``useless'' theory,
        !          1004: the basic insight of Sethi and Ullman is amazingly
        !          1005: useful in a real compiler.
        !          1006: The notion that one should attempt to estimate the
        !          1007: resource needs of trees before starting the
        !          1008: code generation provides a natural means of splitting the
        !          1009: code generation problem, and provides a bit of redundancy
        !          1010: and self checking in the compiler.
        !          1011: Moreover, if writing the
        !          1012: Sethi-Ullman routines is hard, describing, writing, and debugging the
        !          1013: alternative (routines that attempt to free up registers by stores into
        !          1014: temporaries ``on the fly'') is even worse.
        !          1015: Nevertheless, it should be clearly understood that these routines exist in a realm
        !          1016: where there is no ``right'' way to write them;
        !          1017: it is an art, the realm of heuristics, and, consequently, a major
        !          1018: source of bugs in the compiler.
        !          1019: Often, the early, crude versions of these routines give little trouble;
        !          1020: only after
        !          1021: the compiler is actually working
        !          1022: and the
        !          1023: code quality is being improved do serious problem have to be faced.
        !          1024: Having a simple, regular machine architecture is worth quite
        !          1025: a lot at this time.
        !          1026: .PP
        !          1027: The major problems arise from asymmetries in the registers: register pairs,
        !          1028: having different kinds of registers, and the related problem of
        !          1029: needing more than one register (frequently a pair) to store certain
        !          1030: data
        !          1031: types (such as longs or doubles).
        !          1032: There appears to be no general way of treating this problem;
        !          1033: solutions have to be fudged for each machine where the problem arises.
        !          1034: On the Honeywell 66, for example, there are only two general purpose registers,
        !          1035: so a need for a pair is the same as the need for two registers.
        !          1036: On the IBM 370, the register pair (0,1) is used to do multiplications and divisions;
        !          1037: registers 0 and 1 are not generally considered part of the scratch registers, and so
        !          1038: do not require allocation explicitly.
        !          1039: On the Interdata 8/32, after much consideration, the
        !          1040: decision was made not to try to deal with the register pair issue;
        !          1041: operations such as multiplication and division that required pairs
        !          1042: were simply assumed to take all of the scratch registers.
        !          1043: Several weeks of effort had failed to produce
        !          1044: an algorithm that seemed to have much chance of running successfully
        !          1045: without inordinate debugging effort.
        !          1046: The difficulty of this issue should not be minimized; it represents one of the
        !          1047: main intellectual efforts in porting the compiler.
        !          1048: Nevertheless, this problem has been fudged with a degree of
        !          1049: success on nearly a dozen machines, so the compiler writer should
        !          1050: not abandon hope.
        !          1051: .PP
        !          1052: The Sethi-Ullman computations interact with the
        !          1053: rest of the compiler in a number of rather subtle ways.
        !          1054: As already discussed, the
        !          1055: .I store
        !          1056: routine uses the Sethi-Ullman numbers to decide which subtrees are too difficult
        !          1057: to compute in registers, and must be stored.
        !          1058: There are also subtle interactions between the
        !          1059: rewriting routines and the Sethi-Ullman numbers.
        !          1060: Suppose we have a tree such as
        !          1061: .DS
        !          1062: .I "A \- B"
        !          1063: .DE
        !          1064: where
        !          1065: .I A
        !          1066: and
        !          1067: .I B
        !          1068: are expressions; suppose further that
        !          1069: .I B
        !          1070: takes two registers, and
        !          1071: .I A
        !          1072: one.
        !          1073: It is possible to compute the full expression in two registers by
        !          1074: first computing
        !          1075: .I B ,
        !          1076: and then, using the scratch register
        !          1077: used by
        !          1078: .I B ,
        !          1079: but not containing the answer, compute
        !          1080: .I A .
        !          1081: The subtraction can then be done, computing the expression.
        !          1082: (Note that this assumes a number of things, not the least of which
        !          1083: are register-to-register subtraction operators and symmetric
        !          1084: registers.)
        !          1085: If the machine dependent routine
        !          1086: .I setbin ,
        !          1087: however, is not prepared to recognize this case
        !          1088: and compute the more difficult side of the expression
        !          1089: first, the
        !          1090: Sethi-Ullman number must be set to three.
        !          1091: Thus, the
        !          1092: Sethi-Ullman number for a tree should represent the code that
        !          1093: the machine dependent routines are actually willing to generate.
        !          1094: .PP
        !          1095: The interaction can go the other way.
        !          1096: If we take an expression such as
        !          1097: .DS
        !          1098: * ( \fIp\fP + \fIi\fP )
        !          1099: .DE
        !          1100: where
        !          1101: .I p
        !          1102: is a pointer and
        !          1103: .I i
        !          1104: an integer,
        !          1105: this can probably be done in one register on most machines.
        !          1106: Thus, its Sethi-Ullman number would probably be set to one.
        !          1107: If double indexing is possible in the machine, a possible way
        !          1108: of computing the expression is to load both
        !          1109: .I p
        !          1110: and
        !          1111: .I i
        !          1112: into registers, and then use double indexing.
        !          1113: This would use two scratch registers; in such a case,
        !          1114: it is possible that the scratch registers might be unobtainable,
        !          1115: or might make some other part of the computation run out of
        !          1116: registers.
        !          1117: The usual solution is to cause
        !          1118: .I offstar
        !          1119: to ignore opportunities for double indexing that would tie up more scratch
        !          1120: registers than the Sethi-Ullman number had reserved.
        !          1121: .PP
        !          1122: In summary, the Sethi-Ullman computation represents much of the craftsmanship and artistry in any application
        !          1123: of the portable compiler.
        !          1124: It is also a frequent source of bugs.
        !          1125: Algorithms are available that will produce nearly optimal code
        !          1126: for specialized machines, but unfortunately most existing machines
        !          1127: are far removed from these ideals.
        !          1128: The best way of proceeding in practice is to start with a compiler
        !          1129: for a similar machine to the target, and proceed very
        !          1130: carefully.
        !          1131: .SH
        !          1132: Register Allocation
        !          1133: .PP
        !          1134: After the Sethi-Ullman numbers are computed,
        !          1135: .I order
        !          1136: calls a routine,
        !          1137: .I rallo ,
        !          1138: that does register allocation, if appropriate.
        !          1139: This routine does relatively little, in general;
        !          1140: this is especially true if the target machine
        !          1141: is fairly regular.
        !          1142: There are a few cases where it is assumed that
        !          1143: the result of a computation takes place in a particular register;
        !          1144: switch and function return are the two major places.
        !          1145: The expression tree has a field,
        !          1146: .I rall ,
        !          1147: that may be filled with a register number; this is taken
        !          1148: to be a preferred register, and the first temporary
        !          1149: register allocated by a template match will be this preferred one, if
        !          1150: it is free.
        !          1151: If not, no particular action is taken; this is just a heuristic.
        !          1152: If no register preference is present, the field contains NOPREF.
        !          1153: In some cases, the result must be placed in
        !          1154: a given register, no matter what.
        !          1155: The register number is placed in
        !          1156: .I rall ,
        !          1157: and the mask MUSTDO is logically or'ed in with it.
        !          1158: In this case, if the subtree is requested in a register, and comes
        !          1159: back in a register other than the demanded one, it is moved
        !          1160: by calling the routine
        !          1161: .I rmove .
        !          1162: If the target register for this move is busy, it is a compiler
        !          1163: error.
        !          1164: .PP
        !          1165: Note that this mechanism is the only one that will ever cause a register-to-register
        !          1166: move between scratch registers (unless such a move is buried in the depths of
        !          1167: some template).
        !          1168: This simplifies debugging.
        !          1169: In some cases, there is a rather strange interaction between
        !          1170: the register allocation and the Sethi-Ullman number;
        !          1171: if there is an operator or situation requiring a
        !          1172: particular register, the allocator and the Sethi-Ullman
        !          1173: computation must conspire to ensure that the target
        !          1174: register is not being used by some intermediate result of some far-removed computation.
        !          1175: This is most easily done by making the special operation take
        !          1176: all of the free registers, preventing any other partially-computed
        !          1177: results from cluttering up the works.
        !          1178: .SH
        !          1179: Template Shortcuts
        !          1180: .PP
        !          1181: Some operations are just too hard or too clumsy
        !          1182: to be implemented in code templates
        !          1183: on a particular architecture.
        !          1184: .PP
        !          1185: One way to handle such operations is to replace them with function calls.
        !          1186: The intermediate file reading code in
        !          1187: .I reader.c
        !          1188: contains a call to an implementation dependent macro MYREADER;
        !          1189: this can be defined to call various routines
        !          1190: which walk the code tree and
        !          1191: perform transformations.
        !          1192: On the \*V, for example,
        !          1193: unsigned division and remainder operations
        !          1194: are far too complex to encode in a template.
        !          1195: The routine
        !          1196: .I hardops
        !          1197: is called from a tree walk in
        !          1198: .I myreader
        !          1199: to detect these operations and
        !          1200: replace them with calls to the C runtime functions
        !          1201: .I udiv
        !          1202: and
        !          1203: .I urem .
        !          1204: (There are complementary functions
        !          1205: .I audiv
        !          1206: and
        !          1207: .I aurem
        !          1208: which are provided as support for unsigned assignment operator expressions;
        !          1209: they are different from \fIudiv\fP and \fIurem\fP
        !          1210: because the left hand side of an assignment operator expression
        !          1211: must be evaluated only once.)
        !          1212: Note that arithmetic support routines are always expensive;
        !          1213: the compiler makes an effort to notice common operations
        !          1214: such as unsigned division by a constant power of two
        !          1215: and generates optimal code for these inline.
        !          1216: .PP
        !          1217: Another escape involves the routine
        !          1218: .I zzzcode .
        !          1219: This function is called from
        !          1220: .I expand
        !          1221: to process template macros which start with the character
        !          1222: .B Z .
        !          1223: On the \*V,
        !          1224: many complex code generation problems
        !          1225: are swept under the rug into \fIzzzcode\fP.
        !          1226: Scalar type conversions are a particularly annoying issue;
        !          1227: they are primarily handled using the macro \fBZA\fP.
        !          1228: Rather than creating a template for each possible
        !          1229: conversion and result,
        !          1230: which would be tedious and complex given
        !          1231: C's many scalar types,
        !          1232: this macro allows the compiler to take shortcuts.
        !          1233: Tough conversions such as \fBunsigned\fP into \fBdouble\fP
        !          1234: are easily handled using special code under \fBZA\fP.
        !          1235: One convention which makes scalar conversions
        !          1236: somewhat more difficult than they might otherwise be
        !          1237: is the strict requirement that
        !          1238: values in registers must have a type
        !          1239: that is as wide or wider than a single register.
        !          1240: This convention is used primarily to implement
        !          1241: the ``usual arithmetic conversions'' of C,
        !          1242: but it can get in the way when
        !          1243: converting between (say) a \fBchar\fP value
        !          1244: and an \fBunsigned short\fP.
        !          1245: A routine named \fIcollapsible\fP is used
        !          1246: to determine whether one operation or two
        !          1247: is needed to produce a register-width result.
        !          1248: .PP
        !          1249: Another convenient macro is \fBZP\fP.
        !          1250: This macro is used to generate an appropriate
        !          1251: conditional test after a comparison.
        !          1252: This makes it possible to avoid
        !          1253: a profusion of template entries
        !          1254: which essentially duplicate each other,
        !          1255: one entry for each type of test
        !          1256: multiplied by the number of different
        !          1257: comparison conditions.
        !          1258: A related macro, \fBZN\fP,
        !          1259: is used to normalize the result
        !          1260: of a relational test by
        !          1261: producing an integer 0 or 1.
        !          1262: .PP
        !          1263: The macro \fBZS\fP does the unlovely job
        !          1264: of generating code for structure assignments.
        !          1265: It tests the size of the structure
        !          1266: to see what \*V instruction can be used to move it,
        !          1267: and is capable of emitting a block move
        !          1268: instruction for large structures.
        !          1269: On other architectures this macro
        !          1270: could be used to generate a function call
        !          1271: to a block copy routine.
        !          1272: .PP
        !          1273: The macro \fBZG\fP was recently introduced
        !          1274: to handle the thorny issue of
        !          1275: assignment operator expressions which
        !          1276: have an integral left hand side
        !          1277: and a floating point right hand side.
        !          1278: These expressions are passed to the code generator
        !          1279: without the usual type balancing so that
        !          1280: good code can be generated for them.
        !          1281: Older versions of the portable compiler computed
        !          1282: these expressions with integer arithmetic;
        !          1283: with the \fBZG\fP operator,
        !          1284: the current compiler can convert the left hand side
        !          1285: to the appropriate floating type,
        !          1286: compute the expression with floating point arithmetic,
        !          1287: convert the result back to integral type
        !          1288: and store it in the left hand side.
        !          1289: These operations are performed by
        !          1290: recursive calls to \fIzzzcode\fP
        !          1291: and other routines related to \fIexpand\fP.
        !          1292: .PP
        !          1293: An assortment of other macros finish the job
        !          1294: of interpreting code templates.
        !          1295: Among the more interesting ones:
        !          1296: \fBZC\fP produces the number of words
        !          1297: pushed on the argument stack,
        !          1298: which is useful for function calls;
        !          1299: \fBZD\fP and \fBZE\fP produce
        !          1300: constant increment and decrement operations;
        !          1301: \fBZL\fR and \fBZR\fP produce
        !          1302: the assembler letter code (\fBl\fP, \fBw\fP or \fBb\fP)
        !          1303: corresponding to the size and type
        !          1304: of the left and right operand respectively.
        !          1305: .SH
        !          1306: Shared Code
        !          1307: .PP
        !          1308: The
        !          1309: .I lint
        !          1310: utility shares sources with the portable compiler.
        !          1311: \fILint\fP uses all of the machine independent pass 1 sources,
        !          1312: and adds its own set of ``machine dependent'' routines,
        !          1313: contained mostly in \fIlint.c\fP.
        !          1314: \fILint\fP uses a private intermediate file format
        !          1315: and a private pass 2 whose source is \fIlpass2.c\fP.
        !          1316: Several modifications were made to the C scanner
        !          1317: in \fIscan.c\fP, conditionally compiled with
        !          1318: the symbol LINT,
        !          1319: in order to support \fIlint\fP's convention
        !          1320: of passing ``pragma'' information inside special comments.
        !          1321: A few other minor modifications were also
        !          1322: made, \fIe.g.\fP to skip over \fIasm\fP statements.
        !          1323: .PP
        !          1324: The \*f and \fIpc\fP compilers use a code generator
        !          1325: which shares sources with pass 2 of the portable compiler.
        !          1326: This code generator is very similar to pass 2
        !          1327: but uses a different intermediate file format.
        !          1328: Three source files are needed in addition to the pass 2 sources.
        !          1329: .I fort.c
        !          1330: is a machine independent source file which
        !          1331: contains a pass 2 main routine that replaces the equivalent routine in
        !          1332: .I reader.c ,
        !          1333: together with several routines for reading the binary intermediate file.
        !          1334: .I fort.c
        !          1335: includes the machine dependent file
        !          1336: .I fort.h ,
        !          1337: which defines two trivial label generation routines.
        !          1338: A header file
        !          1339: .I /usr/include/pcc.h
        !          1340: defines opcode and type symbols which
        !          1341: are needed to provide a standard intermediate file format;
        !          1342: this file is also included by the Fortran and Pascal compilers.
        !          1343: The creation of this header file
        !          1344: made it necessary to make
        !          1345: some changes in the way the portable C compiler is built.
        !          1346: These changes were made with the aim
        !          1347: of minimizing the number of lines changed
        !          1348: in the original sources.
        !          1349: Macro symbols in
        !          1350: .I pcc.h
        !          1351: are flagged with a unique prefix to avoid
        !          1352: symbol name collisions in the Fortran and Pascal compilers,
        !          1353: which have their own internal opcode and type symbols.
        !          1354: A
        !          1355: .I sed (1)
        !          1356: script is used to strip these prefixes,
        !          1357: producing an include file named
        !          1358: .I pcclocal.h
        !          1359: which is specific to the portable C compiler
        !          1360: and contains opcode symbols which
        !          1361: are compatible with the original opcode symbols.
        !          1362: A similar
        !          1363: .I sed
        !          1364: script is used to produce a file of Yacc tokens for the C grammar.
        !          1365: .PP
        !          1366: A number of changes to existing source files
        !          1367: were made to accommodate the Fortran-style pass 2.
        !          1368: These changes are conditionally compiled
        !          1369: using the symbol FORT.
        !          1370: Many changes were needed to implement
        !          1371: single-precision arithmetic;
        !          1372: other changes concern such things as
        !          1373: the avoidance of floating point move instructions,
        !          1374: which on the \*V can cause floating point faults
        !          1375: when a datum is not a normalized floating point value.
        !          1376: In earlier implementations of the Fortran-style pass 2 there were
        !          1377: a number of stub files which served only
        !          1378: to define the symbol FORT in a particular
        !          1379: source file; these files have been removed for 4.3BSD
        !          1380: in favor of a new compilation strategy which
        !          1381: yields up to three different objects from
        !          1382: a single source file, depending on what
        !          1383: compilation control symbols are defined for that file.
        !          1384: .PP
        !          1385: The Fortran-style pass 2 uses a Polish Postfix intermediate file.
        !          1386: The file is in binary format,
        !          1387: and is logically divided into a stream of 32-bit records.
        !          1388: Each record consists of an (\fIopcode, value, type\fP) triple,
        !          1389: possibly followed inline by more descriptive information.
        !          1390: The
        !          1391: .I opcode
        !          1392: and
        !          1393: .I type
        !          1394: are selected from the list in
        !          1395: .I pcc.h ;
        !          1396: the type encodes a basic type,
        !          1397: around which may be wrapped type modifiers
        !          1398: such as ``pointer to'' or ``array of'' to
        !          1399: produce more complex types.
        !          1400: The function of the
        !          1401: .I value
        !          1402: parameter depends on the opcode;
        !          1403: it may be used for a flag,
        !          1404: a register number
        !          1405: or the value of a constant,
        !          1406: or it may be unused.
        !          1407: The optional inline data is often a null-terminated string,
        !          1408: but it may also be a binary offset from a register
        !          1409: or from a symbolic constant;
        !          1410: sometimes both a string and an offset appear.
        !          1411: .PP
        !          1412: Here are a few samples of intermediate file records
        !          1413: and their interpretation:
        !          1414: .KS
        !          1415: .TS
        !          1416: expand, tab(:);
        !          1417: c  c  c  c  c
        !          1418: ^  ^  ^  c  ^
        !          1419: l  lB l  l  l.
        !          1420: Opcode:Type:Value:Optional:Interpretation
        !          1421: :::Data:
        !          1422: .sp 0.5v
        !          1423: =
        !          1424: .sp 0.5v
        !          1425: ICON:int:flag=0:binary=5:T{
        !          1426: .Bl
        !          1427: the integer constant 5
        !          1428: T}
        !          1429: .Sp
        !          1430: NAME:char:flag=1:T{
        !          1431: .nf
        !          1432: binary=1,
        !          1433: string="_foo_"
        !          1434: T}:T{
        !          1435: .Bl
        !          1436: a \fBcharacter*1\fP element in
        !          1437: a Fortran common block \fIfoo\fP
        !          1438: at offset 1
        !          1439: T}
        !          1440: .Sp
        !          1441: OREG:char:reg=11:T{
        !          1442: .nf
        !          1443: offset=1,
        !          1444: string="v.2-v.1"
        !          1445: T}:T{
        !          1446: .Bl
        !          1447: the second element
        !          1448: of a Fortran \fBcharacter*1\fP array,
        !          1449: expressed as an offset from a static base register
        !          1450: T}
        !          1451: .Sp
        !          1452: PLUS:float:::T{
        !          1453: .Bl
        !          1454: a single precision add
        !          1455: T}
        !          1456: .Sp
        !          1457: FTEXT::size=2:string=".text 0":T{
        !          1458: .Bl
        !          1459: an inline assembler directive
        !          1460: of length 2 (32-bit records)
        !          1461: T}
        !          1462: .TE
        !          1463: .KE
        !          1464: .SH
        !          1465: Compiler Bugs
        !          1466: .PP
        !          1467: The portable compiler has an excellent record of generating correct code.
        !          1468: The requirement for reasonable cooperation between the register allocation,
        !          1469: Sethi-Ullman computation, rewriting rules, and templates builds quite a bit
        !          1470: of redundancy into the compiling process.
        !          1471: The effect of this is that, in a surprisingly short time, the compiler will
        !          1472: start generating correct code for those
        !          1473: programs that it can compile.
        !          1474: The hard part of the job then becomes finding and
        !          1475: eliminating those situations where the compiler refuses to
        !          1476: compile a program because it knows it cannot do it right.
        !          1477: For example, a template may simply be missing; this may either
        !          1478: give a compiler error of the form ``no match for op ...'' , or cause
        !          1479: the compiler to go into an infinite loop applying various rewriting rules.
        !          1480: The compiler has a variable,
        !          1481: .I nrecur ,
        !          1482: that is set to 0 at the beginning of an expressions, and
        !          1483: incremented at key spots in the
        !          1484: compilation process; if this parameter gets too large, the
        !          1485: compiler decides that it is in a loop, and aborts.
        !          1486: Loops are also characteristic of botches in the machine-dependent rewriting rules.
        !          1487: Bad Sethi-Ullman computations usually cause the scratch registers
        !          1488: to run out; this often means
        !          1489: that the Sethi-Ullman number was underestimated, so
        !          1490: .I store
        !          1491: did not store something it should have; alternatively,
        !          1492: it can mean that the rewriting rules were not smart enough to
        !          1493: find the sequence that
        !          1494: .I sucomp
        !          1495: assumed would be used.
        !          1496: .PP
        !          1497: The best approach when a compiler error is detected involves several stages.
        !          1498: First, try to get a small example program that steps on the bug.
        !          1499: Second, turn on various debugging flags in the code generator, and follow the
        !          1500: tree through the process of being matched and rewritten.
        !          1501: Some flags of interest are
        !          1502: \fB\-e\fP, which prints the expression tree,
        !          1503: \fB\-r\fP, which gives information about the allocation of registers,
        !          1504: \fB\-a\fP, which gives information about the performance of
        !          1505: .I rallo ,
        !          1506: and \fB\-o\fP, which gives information about the behavior of
        !          1507: .I order .
        !          1508: This technique should allow most bugs to be found relatively quickly.
        !          1509: .PP
        !          1510: Unfortunately, finding the bug is usually not enough; it must also
        !          1511: be fixed!
        !          1512: The difficulty arises because a fix to the particular bug of interest tends
        !          1513: to break other code that already works.
        !          1514: Regression tests, tests that compare the performance of
        !          1515: a new compiler against the performance of an older one, are very
        !          1516: valuable in preventing major catastrophes.
        !          1517: .SH
        !          1518: Compiler Extensions
        !          1519: .PP
        !          1520: The portable C compiler makes a few extensions
        !          1521: to the language described by Ritchie.
        !          1522: .PP
        !          1523: .I "Single precision arithmetic."
        !          1524: ``All floating arithmetic in C is carried out in double-precision;
        !          1525: whenever a
        !          1526: .B float
        !          1527: appears in a an expression it is lengthened to
        !          1528: .B double
        !          1529: by zero-padding its fraction.''
        !          1530: \*-Dennis Ritchie.
        !          1531: .[
        !          1532: kernighan ritchie prentice 1978
        !          1533: .]
        !          1534: Programmers who would like to use C
        !          1535: to write numerical applications
        !          1536: often shy away from it because
        !          1537: C programs cannot perform single precision arithmetic.
        !          1538: On machines such as the \*V
        !          1539: which can cleanly support arithmetic on two (or more)
        !          1540: sizes of floating point values,
        !          1541: programs which can take advantage of
        !          1542: single precision arithmetic will run faster.
        !          1543: A very popular proposal for the ANSI C standard states that
        !          1544: implementations may perform single precision computations
        !          1545: with single precision arithmetic;
        !          1546: some actual C implementations already do this,
        !          1547: and now the Berkeley compiler joins them.
        !          1548: .PP
        !          1549: The changes are implemented in the compiler with a set of
        !          1550: conditional compilation directives based on the symbol SPRECC;
        !          1551: thus two compilers are generated,
        !          1552: one with only double precision arithmetic
        !          1553: and one with both double and single precision arithmetic.
        !          1554: The
        !          1555: .I cc
        !          1556: program uses a flag
        !          1557: .B \-f
        !          1558: to select the single/double version of the compiler (\fI/lib/sccom\fP)
        !          1559: instead of the default double only version (\fI/lib/ccom\fP).
        !          1560: It is expected that at some time in the future
        !          1561: the double only compiler will be retired
        !          1562: and the single/double compiler will become the default.
        !          1563: .PP
        !          1564: There are a few implementation details
        !          1565: of the single/double compiler which
        !          1566: will be of interest to users and compiler porters.
        !          1567: To maintain compatibility with functions compiled by the double only compiler,
        !          1568: single precision actual arguments are still coerced to double precision,
        !          1569: and formal arguments which are declared single precision
        !          1570: are still ``really'' double precision.
        !          1571: This may change if function prototypes of the sort
        !          1572: proposed for the ANSI C standard are eventually adopted.
        !          1573: Floating point constants are now classified into single precision
        !          1574: and double precision types.
        !          1575: The precision of a constant is determined from context;
        !          1576: if a floating constant appears in an arithmetic expression
        !          1577: with a single precision value,
        !          1578: the constant is treated as having single precision type
        !          1579: and the arithmetic expression is computed
        !          1580: using single precision arithmetic.
        !          1581: .PP
        !          1582: Remarkably little code in the compiler
        !          1583: needed to be changed to implement the single/double compiler.
        !          1584: In many cases the changes overlapped with
        !          1585: special cases which are used for the Fortran-style
        !          1586: pass 2 (\fI/lib/f1\fP).
        !          1587: Most of the single precision changes were implemented by Sam Leffler.
        !          1588: .PP
        !          1589: .I "Preprocessor extensions."
        !          1590: The portable C compiler is normally distributed
        !          1591: with a macro preprocessor written by J. F. Reiser.
        !          1592: This preprocessor implements the features described
        !          1593: in Ritchie's reference manual;
        !          1594: it removes comments, expands macro definitions
        !          1595: and removes or inserts code based on conditional compilation directives.
        !          1596: Two interesting extensions are provided by this version of the preprocessor:
        !          1597: .IP \(bu
        !          1598: When comments are removed, no white space is necessarily substituted;
        !          1599: this has the effect of \fIre-tokenizing\fP code,
        !          1600: since the PCC will reanalyze the input.
        !          1601: Macros can thus create new tokens by clever use of comments.
        !          1602: For example, the macro definition ``#define foo(a,b) a/**/b''
        !          1603: creates a macro \fIfoo\fP which
        !          1604: concatenates its two arguments, forming a new token.
        !          1605: .IP \(bu
        !          1606: Macro bodies are analyzed for macro arguments
        !          1607: without regard to the boundaries of string or character constants.
        !          1608: The definition ``#define bar(a) "a\en"'' creates a macro
        !          1609: which returns the literal form of its argument embedded
        !          1610: in a string with a newline appended.
        !          1611: .PP
        !          1612: These extensions are not portable to
        !          1613: a number of other C preprocessors.
        !          1614: They may be replaced in the future by corresponding ANSI C features,
        !          1615: when the ANSI C standard has been formalized.
        !          1616: .SH
        !          1617: Summary and Conclusion
        !          1618: .PP
        !          1619: The portable compiler has been a useful tool for providing C
        !          1620: capability on a large number of diverse machines,
        !          1621: and for testing a number of theoretical
        !          1622: constructs in a practical setting.
        !          1623: It has many blemishes, both in style and functionality.
        !          1624: It has been applied to many more machines than first
        !          1625: anticipated, of a much wider range than originally dreamed of.
        !          1626: Its use has also spread much faster than expected, leaving parts of
        !          1627: the compiler still somewhat raw in shape.
        !          1628: .PP
        !          1629: On the theoretical side, there is some hope that the
        !          1630: skeleton of the
        !          1631: .I sucomp
        !          1632: routine could be generated for many machines directly from the
        !          1633: templates; this would give a considerable boost
        !          1634: to the portability and correctness of the compiler,
        !          1635: but might affect tunability and code quality.
        !          1636: There is also room for more optimization, both within
        !          1637: .I optim
        !          1638: and in the form of a portable ``peephole'' optimizer.
        !          1639: .PP
        !          1640: On the practical, development side,
        !          1641: the compiler could probably be sped up and made smaller
        !          1642: without doing too much violence to its basic structure.
        !          1643: Parts of the compiler deserve to be rewritten;
        !          1644: the initialization code, register allocation, and
        !          1645: parser are prime candidates.
        !          1646: It might be that doing some or all of the parsing
        !          1647: with a recursive descent parser might
        !          1648: save enough space and time to be worthwhile;
        !          1649: it would certainly ease the problem of moving the
        !          1650: compiler to an environment where
        !          1651: .I Yacc
        !          1652: is not already present.
        !          1653: .SH
        !          1654: Acknowledgements
        !          1655: .PP
        !          1656: I would like to thank the many people who have
        !          1657: sympathetically, and even enthusiastically, helped me grapple
        !          1658: with what has been a frustrating program to write, test, and install.
        !          1659: D. M. Ritchie and E. N. Pinson provided needed early
        !          1660: encouragement and philosophical guidance;
        !          1661: M. E. Lesk,
        !          1662: R. Muha, T. G. Peterson,
        !          1663: G. Riddle, L. Rosler,
        !          1664: R. W. Mitze,
        !          1665: B. R. Rowland,
        !          1666: S. I. Feldman,
        !          1667: and
        !          1668: T. B. London
        !          1669: have all contributed ideas, gripes, and all, at one time or another,
        !          1670: climbed ``into the pits'' with me to help debug.
        !          1671: Without their help this effort would have not been possible;
        !          1672: with it, it was often kind of fun.
        !          1673: \*-S. C. Johnson
        !          1674: .sp
        !          1675: .PP
        !          1676: Many people have contributed fixes and improvements
        !          1677: to the current Berkeley version of the compiler.
        !          1678: A number of really valuable fixes were contributed by
        !          1679: Ralph Campbell, Sam Leffler, Kirk McKusick, Arthur Olsen, Donn Seeley,
        !          1680: Don Speck and Chris Torek,
        !          1681: but most of the bugs were spotted by
        !          1682: the legions of virtuous C programmers
        !          1683: who were kind enough to let us know
        !          1684: that the compiler was broken
        !          1685: and when the heck were we going to get it fixed?
        !          1686: Thank you all.
        !          1687: \*-Donn Seeley
        !          1688: .sp 2
        !          1689: .LP
        !          1690: .[
        !          1691: $LIST$
        !          1692: .]
        !          1693: .LP

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.