|
|
1.1 ! root 1: .\" @(#)porttour2.ms 6.3 (Berkeley) 5/30/86 ! 2: .\" ! 3: .SH ! 4: Pass Two ! 5: .PP ! 6: Code generation is far less well understood than ! 7: parsing or lexical analysis, and for this reason ! 8: the second pass is far harder to discuss in a file by file manner. ! 9: A great deal of the difficulty is in understanding the issues ! 10: and the strategies employed to meet them. ! 11: Any particular function is likely to be reasonably straightforward. ! 12: .PP ! 13: Thus, this part of the paper will concentrate a good deal on the broader ! 14: aspects of strategy in the code generator, ! 15: and will not get too intimate with the details. ! 16: .SH ! 17: Overview ! 18: .PP ! 19: It is difficult to organize a code generator to be flexible enough to ! 20: generate code for a large number of machines, ! 21: and still be efficient for any one of them. ! 22: Flexibility is also important when it comes time to tune ! 23: the code generator to improve the output code quality. ! 24: On the other hand, too much flexibility can lead to semantically ! 25: incorrect code, and potentially a combinatorial explosion in the ! 26: number of cases to be considered in the compiler. ! 27: .PP ! 28: One goal of the code generator is to have a high degree of correctness. ! 29: It is very desirable to have the compiler detect its own inability to ! 30: generate correct code, rather than to produce incorrect code. ! 31: This goal is achieved by having a simple model of the job ! 32: to be done (e.g., an expression tree) ! 33: and a simple model of the machine state ! 34: (e.g., which registers are free). ! 35: The act of generating an instruction performs a transformation ! 36: on the tree and the machine state; ! 37: hopefully, the tree eventually gets ! 38: reduced to a single node. ! 39: If each of these instruction/transformation pairs is correct, ! 40: and if the machine state model really represents the actual machine, ! 41: and if the transformations reduce the input tree to the desired single node, then the ! 42: output code will be correct. ! 43: .PP ! 44: For most real machines, there is no definitive theory of code generation that ! 45: encompasses all the C operators. ! 46: Thus the selection of which instruction/transformations to generate, and in what ! 47: order, will have a heuristic flavor. ! 48: If, for some expression tree, no transformation applies, or, more ! 49: seriously, if the heuristics select a sequence of instruction/transformations ! 50: that do not in fact reduce the tree, the compiler ! 51: will report its inability to generate code, and abort. ! 52: .PP ! 53: A major part of the code generator is concerned with the model and the transformations. ! 54: Most of this is machine independent, or depends only on simple tables. ! 55: The flexibility comes from the heuristics that guide the transformations ! 56: of the trees, the selection of subgoals, and the ordering of the computation. ! 57: .SH ! 58: The Machine Model ! 59: .PP ! 60: The machine is assumed to have a number of registers, of at most two different ! 61: types: ! 62: .I A ! 63: and ! 64: .I B . ! 65: Within each register class, there may be scratch (temporary) registers and dedicated registers ! 66: (e.g., register variables, the stack pointer, etc.). ! 67: Requests to allocate and free registers involve only the temporary registers. ! 68: .PP ! 69: Each of the registers in the machine is given a name and a number ! 70: in the ! 71: .I mac2defs.h ! 72: file; the numbers are used as indices into various tables ! 73: that describe the registers, so they should be kept small. ! 74: One such table is the ! 75: .I rstatus ! 76: table on file ! 77: .I local2.c . ! 78: This table is indexed by register number, and ! 79: contains expressions made up from manifest constants describing the register types: ! 80: SAREG for dedicated AREG's, SAREG\(orSTAREG for scratch AREG's, ! 81: and SBREG and SBREG\(orSTBREG similarly for BREG's. ! 82: There are macros that access this information: ! 83: .I isbreg(r) ! 84: returns true if register number ! 85: .I r ! 86: is a BREG, and ! 87: .I istreg(r) ! 88: returns true if register number ! 89: .I r ! 90: is a temporary AREG or BREG. ! 91: Another table, ! 92: .I rnames , ! 93: contains the register names; this is used when ! 94: putting out assembler code and diagnostics. ! 95: .PP ! 96: The usage of registers is kept track of by an array called ! 97: .I busy . ! 98: .I Busy[r] ! 99: is the number of uses of register ! 100: .I r ! 101: in the current tree being processed. ! 102: The allocation and freeing of registers will be discussed later as part of the code generation ! 103: algorithm. ! 104: .SH ! 105: General Organization ! 106: .PP ! 107: As mentioned above, the second pass reads lines from ! 108: the intermediate file, copying through to the output unchanged ! 109: any lines that begin with a `)', and making note of the ! 110: information about stack usage and register allocation contained on ! 111: lines beginning with `]' and `['. ! 112: The expression trees, whose beginning is indicated by a line beginning with `.', ! 113: are read and rebuilt into trees. ! 114: If the compiler is loaded as one pass, the expression trees are ! 115: immediately available to the code generator. ! 116: .PP ! 117: The actual code generation is done by a hierarchy of routines. ! 118: The routine ! 119: .I delay ! 120: is first given the tree; it attempts to delay some postfix ! 121: \fB+\|+\fP and \fB\-\|\-\fP computations that might reasonably be done after the ! 122: smoke clears. ! 123: It also attempts to handle comma (`,') operators by computing the ! 124: left side expression first, and then rewriting the tree ! 125: to eliminate the operator. ! 126: .I Delay ! 127: calls ! 128: .I codgen ! 129: to control the actual code generation process. ! 130: .I Codgen ! 131: takes as arguments a pointer to the expression tree, ! 132: and a second argument that, for socio-historical reasons, is called a ! 133: .I cookie . ! 134: The cookie describes a set of goals that would be acceptable ! 135: for the code generation: these are assigned to individual bits, ! 136: so they may be logically or'ed together to form a large number of possible goals. ! 137: Among the possible goals are FOREFF (compute for side effects only; ! 138: don't worry about the value), INTEMP (compute and store value into a temporary location ! 139: in memory), INAREG (compute into an A register), INTAREG (compute into a scratch ! 140: A register), INBREG and INTBREG similarly, FORCC (compute for condition codes), ! 141: and FORARG (compute it as a function argument; e.g., stack it if appropriate). ! 142: .PP ! 143: .I Codgen ! 144: first canonicalizes the tree by calling ! 145: .I canon . ! 146: This routine looks for certain transformations that might now be ! 147: applicable to the tree. ! 148: One, which is very common and very powerful, is to ! 149: fold together an indirection operator (UNARY MUL) ! 150: and a register (REG); in most machines, this combination is ! 151: addressable directly, and so is similar to a NAME in its ! 152: behavior. ! 153: The UNARY MUL and REG are folded together to make ! 154: another node type called OREG. ! 155: In fact, in many machines it is possible to directly address not just the ! 156: cell pointed to by a register, but also cells differing by a constant ! 157: offset from the cell pointed to by the register. ! 158: .I Canon ! 159: also looks for such cases, calling the ! 160: machine dependent routine ! 161: .I notoff ! 162: to decide if the offset is acceptable (for example, in the IBM 370 the offset ! 163: must be between 0 and 4095 bytes). ! 164: Another optimization is to replace bit field operations ! 165: by shifts and masks if the operation involves extracting the field. ! 166: Finally, a machine dependent routine, ! 167: .I sucomp , ! 168: is called that computes the Sethi-Ullman numbers ! 169: for the tree (see below). ! 170: .PP ! 171: After the tree is canonicalized, ! 172: .I codgen ! 173: calls the routine ! 174: .I store ! 175: whose job is to select a subtree of the tree to be computed ! 176: and (usually) stored before beginning the ! 177: computation of the full tree. ! 178: .I Store ! 179: must return a tree that can be computed without need ! 180: for any temporary storage locations. ! 181: In effect, the only store operations generated while processing the subtree must be as a response ! 182: to explicit assignment operators in the tree. ! 183: This division of the job marks one of the more significant, and successful, ! 184: departures from most other compilers. ! 185: It means that the code generator can operate ! 186: under the assumption that there are enough ! 187: registers to do its job, without worrying about ! 188: temporary storage. ! 189: If a store into a temporary appears in the output, it is always ! 190: as a direct result of logic in the ! 191: .I store ! 192: routine; this makes debugging easier. ! 193: .PP ! 194: One consequence of this organization is that ! 195: code is not generated by a treewalk. ! 196: There are theoretical results that support this decision. ! 197: .[ ! 198: Aho Johnson Optimal Code Trees jacm ! 199: .] ! 200: It may be desirable to compute several subtrees and store ! 201: them before tackling the whole tree; ! 202: if a subtree is to be stored, this is known before the code ! 203: generation for the subtree is begun, and the subtree is computed ! 204: when all scratch registers are available. ! 205: .PP ! 206: The ! 207: .I store ! 208: routine decides what subtrees, if any, should be ! 209: stored by making use of numbers, ! 210: called ! 211: .I "Sethi-Ullman numbers" , ! 212: that give, for each subtree of an expression tree, ! 213: the minimum number of scratch registers required to ! 214: compile the subtree, without any stores into temporaries. ! 215: .[ ! 216: Sethi Ullman jacm 1970 ! 217: .] ! 218: These numbers are computed by the machine-dependent ! 219: routine ! 220: .I sucomp , ! 221: called by ! 222: .I canon . ! 223: The basic notion is that, knowing the Sethi-Ullman numbers for ! 224: the descendants of a node, and knowing the operator of the ! 225: node and some information about the machine, the ! 226: Sethi-Ullman number of the node itself can be computed. ! 227: If the Sethi-Ullman number for a tree exceeds the number of scratch ! 228: registers available, some subtree must be stored. ! 229: Unfortunately, the theory behind the Sethi-Ullman numbers ! 230: applies only to uselessly simple machines and operators. ! 231: For the rich set of C operators, and for machines with ! 232: asymmetric registers, register pairs, different kinds of registers, ! 233: and exceptional forms of addressing, ! 234: the theory cannot be applied directly. ! 235: The basic idea of estimation is a good one, however, ! 236: and well worth applying; the application, especially ! 237: when the compiler comes to be tuned for high code ! 238: quality, goes beyond the park of theory into the ! 239: swamp of heuristics. ! 240: This topic will be taken up again later, when more of the compiler ! 241: structure has been described. ! 242: .PP ! 243: After examining the Sethi-Ullman numbers, ! 244: .I store ! 245: selects a subtree, if any, to be stored, and returns the subtree and the associated cookie in ! 246: the external variables ! 247: .I stotree ! 248: and ! 249: .I stocook . ! 250: If a subtree has been selected, or if ! 251: the whole tree is ready to be processed, the ! 252: routine ! 253: .I order ! 254: is called, with a tree and cookie. ! 255: .I Order ! 256: generates code for trees that ! 257: do not require temporary locations. ! 258: .I Order ! 259: may make recursive calls on itself, and, ! 260: in some cases, on ! 261: .I codgen ; ! 262: for example, when processing the operators \fB&&\fP, \fB\(or\(or\fP, and comma (`,'), that have ! 263: a left to right evaluation, it is ! 264: incorrect for ! 265: .I store ! 266: examine the right operand for subtrees to be stored. ! 267: In these cases, ! 268: .I order ! 269: will call ! 270: .I codgen ! 271: recursively when it is permissible to work on the right operand. ! 272: A similar issue arises with the \fB? :\fP operator. ! 273: .PP ! 274: The ! 275: .I order ! 276: routine works by matching the current tree with ! 277: a set of code templates. ! 278: If a template is discovered that will ! 279: match the current tree and cookie, the associated assembly language ! 280: statement or statements are generated. ! 281: The tree is then rewritten, ! 282: as specified by the template, to represent the effect of the output instruction(s). ! 283: If no template match is found, first an attempt is made to find a match with a ! 284: different cookie; for example, in order to compute ! 285: an expression with cookie INTEMP (store into a temporary storage location), ! 286: it is usually necessary to compute the expression into a scratch register ! 287: first. ! 288: If all attempts to match the tree fail, the heuristic part of ! 289: the algorithm becomes dominant. ! 290: Control is typically given to one of a number of machine-dependent routines ! 291: that may in turn recursively call ! 292: .I order ! 293: to achieve a subgoal of the computation (for example, one of the ! 294: arguments may be computed into a temporary register). ! 295: After this subgoal has been achieved, the process begins again with the ! 296: modified tree. ! 297: If the machine-dependent heuristics are unable to reduce the tree further, ! 298: a number of default rewriting rules may be considered appropriate. ! 299: For example, if the left operand of a \fB+\fP is a scratch ! 300: register, the \fB+\fP can be replaced by a \fB+=\fP operator; ! 301: the tree may then match a template. ! 302: .PP ! 303: To close this introduction, we will discuss the steps in compiling ! 304: code for the expression ! 305: .DS ! 306: \fIa\fR += \fIb\fR ! 307: .DE ! 308: where ! 309: .I a ! 310: and ! 311: .I b ! 312: are static variables. ! 313: .PP ! 314: To begin with, the whole expression tree is examined with cookie FOREFF, and ! 315: no match is found. Search with other cookies is equally fruitless, so an ! 316: attempt at rewriting is made. ! 317: Suppose we are dealing with the Interdata 8/32 for the moment. ! 318: It is recognized that the left hand and right hand sides of the \fB+=\fP operator ! 319: are addressable, and in particular the left hand side has no ! 320: side effects, so it is permissible to rewrite this as ! 321: .DS ! 322: \fIa\fR = \fIa\fR + \fIb\fR ! 323: .DE ! 324: and this is done. ! 325: No match is found on this tree either, so a machine dependent rewrite is done; it is recognized ! 326: that the left hand side of the assignment is addressable, but the right hand side is not ! 327: in a register, so ! 328: .I order ! 329: is called recursively, being asked to put the right ! 330: hand side of the assignment into a register. ! 331: This invocation of ! 332: .I order ! 333: searches the tree for a match, and fails. ! 334: The machine dependent rule for \fB+\fP ! 335: notices that the right hand operand is addressable; ! 336: it decides to put the left operand into a scratch register. ! 337: Another recursive call to ! 338: .I order ! 339: is made, with the tree ! 340: consisting solely of the leaf ! 341: .I a , ! 342: and the cookie asking that the value be placed into a scratch register. ! 343: This now matches a template, and a load instruction is emitted. ! 344: The node consisting of ! 345: .I a ! 346: is rewritten in place to represent the register into which ! 347: .I a ! 348: is loaded, ! 349: and this third call to ! 350: .I order ! 351: returns. ! 352: The second call to ! 353: .I order ! 354: now finds that it has the tree ! 355: .DS ! 356: \fBreg\fR + \fIb\fR ! 357: .DE ! 358: to consider. ! 359: Once again, there is no match, but the default rewriting rule rewrites ! 360: the \fB+\fP as a \fB+=\fP operator, since the left operand is a scratch register. ! 361: When this is done, there is a match: in fact, ! 362: .DS ! 363: \fBreg\fR += \fIb\fR ! 364: .DE ! 365: simply describes the effect of the add instruction ! 366: on a typical machine. ! 367: After the add is emitted, the tree is rewritten ! 368: to consist merely of the register node, since the result of the add ! 369: is now in the register. ! 370: This agrees with the cookie passed to the second invocation of ! 371: .I order , ! 372: so this invocation ! 373: terminates, returning to the first level. ! 374: The original tree has now ! 375: become ! 376: .DS ! 377: \fIa\fR = \fBreg\fR ! 378: .DE ! 379: which matches a template for the store instruction. ! 380: The store is output, and the tree rewritten to become ! 381: just a single register node. ! 382: At this point, since the top level call to ! 383: .I order ! 384: was ! 385: interested only in side effects, the call to ! 386: .I order ! 387: returns, and the code generation is completed; ! 388: we have generated a load, add, and store, as might have been expected. ! 389: .PP ! 390: The effect of machine architecture on this is considerable. ! 391: For example, on the Honeywell 6000, the machine dependent heuristics recognize that there is an ``add to storage'' ! 392: instruction, so the strategy is quite different; ! 393: .I b ! 394: is loaded in to ! 395: a register, and then an add to storage instruction generated ! 396: to add this register in to ! 397: .I a . ! 398: The transformations, involving as they do the semantics of C, ! 399: are largely machine independent. ! 400: The decisions as to when to use them, however, are ! 401: almost totally machine dependent. ! 402: .PP ! 403: Having given a broad outline of ! 404: the code generation process, we shall next consider the ! 405: heart of it: the templates. ! 406: This leads naturally into discussions of template matching and register allocation, ! 407: and finally a discussion of the machine dependent interfaces and strategies. ! 408: .SH ! 409: The Templates ! 410: .PP ! 411: The templates describe the effect of the target machine instructions ! 412: on the model of computation around which the compiler is organized. ! 413: In effect, each template has five logical sections, and represents an assertion ! 414: of the form: ! 415: .IP ! 416: .B If ! 417: we have a subtree of a given shape (1), and we have a goal (cookie) or goals to ! 418: achieve (2), and we have sufficient free resources (3), ! 419: .B then ! 420: we may emit an instruction or instructions (4), and ! 421: rewrite the subtree in a particular manner (5), ! 422: and the rewritten tree will achieve the desired goals. ! 423: .PP ! 424: These five sections will be discussed in more ! 425: detail later. First, we give an example of a ! 426: template: ! 427: .DS ! 428: .ta 1i 2i 3i 4i 5i ! 429: ASG PLUS, INAREG, ! 430: SAREG, TINT, ! 431: SNAME, TINT, ! 432: 0, RLEFT, ! 433: " add AL,AR\en", ! 434: .DE ! 435: The top line specifies the operator (\fB+=\fP) and the cookie (compute the ! 436: value of the subtree into an AREG). ! 437: The second and third lines specify the left and right descendants, ! 438: respectively, ! 439: of the \fB+=\fP operator. ! 440: The left descendant must be a REG node, representing an ! 441: A register, and have integer type, while the right side must be a NAME node, ! 442: and also have integer type. ! 443: The fourth line contains the resource requirements (no scratch registers ! 444: or temporaries needed), and the rewriting rule (replace the subtree by the left descendant). ! 445: Finally, the quoted string on the last line represents the output to the assembler: ! 446: lower case letters, tabs, spaces, etc. are copied ! 447: .I verbatim . ! 448: to the output; upper case letters trigger various macro-like expansions. ! 449: Thus, ! 450: .B AL ! 451: would expand into the \fBA\fRddress form of the \fBL\fReft operand \(em ! 452: presumably the register number. ! 453: Similarly, ! 454: .B AR ! 455: would expand into the name of the right operand. ! 456: The ! 457: .I add ! 458: instruction of the last section might well be ! 459: emitted by this template. ! 460: .PP ! 461: In principle, it would be possible to make separate templates ! 462: for all legal combinations of operators, cookies, types, and shapes. ! 463: In practice, the number of combinations is very large. ! 464: Thus, a considerable amount of mechanism is present to ! 465: permit a large number of subtrees to be matched ! 466: by a single template. ! 467: Most of the shape and type specifiers are individual bits, and can ! 468: be logically ! 469: or'ed ! 470: together. ! 471: There are a number of special descriptors for matching classes of ! 472: operators. ! 473: The cookies can also be combined. ! 474: As an example of the kind of template ! 475: that really arises in practice, the ! 476: actual template for the Interdata 8/32 ! 477: that subsumes the above example is: ! 478: .DS ! 479: .ta 1i 2i 3i 4i 5i ! 480: ASG OPSIMP, INAREG\(orFORCC, ! 481: SAREG, TINT\(orTUNSIGNED\(orTPOINT, ! 482: SAREG\(orSNAME\(orSOREG\(orSCON, TINT\(orTUNSIGNED\(orTPOINT, ! 483: 0, RLEFT\(orRESCC, ! 484: " OI AL,AR\en", ! 485: .DE ! 486: Here, OPSIMP represents the operators ! 487: +, \-, \(or, &, and ^. ! 488: The ! 489: .B OI ! 490: macro in the output string expands into the ! 491: appropriate \fBI\fRnteger \fBO\fRpcode for the operator. ! 492: The left and right sides can be integers, unsigned, or pointer types. ! 493: The right side can be, in addition to a name, a register, ! 494: a memory location whose address is given by a register and displacement (OREG), ! 495: or a constant. ! 496: Finally, these instructions set the condition codes, ! 497: and so can be used in condition contexts: ! 498: the cookie and rewriting rules reflect this. ! 499: .SH ! 500: The Template Matching Algorithm ! 501: .PP ! 502: The heart of the second pass is the template matching ! 503: algorithm, in the routine ! 504: .I match . ! 505: .I Match ! 506: is called with a tree and a cookie; it attempts to match ! 507: the given tree against some template that will transform it ! 508: according to one of the goals given in the cookie. ! 509: If a match is successful, the transformation is ! 510: applied; ! 511: .I expand ! 512: is called to generate the assembly code, and then ! 513: .I reclaim ! 514: rewrites the tree, and reclaims the resources, such ! 515: as registers, that might have become free as a result ! 516: of the generated code. ! 517: .PP ! 518: This part of the compiler is among the most time critical. ! 519: There is a spectrum of implementation techniques available ! 520: for doing this matching. ! 521: The most naive algorithm simply looks at the templates one by one. ! 522: This can be considerably improved upon by restricting the search ! 523: for an acceptable template. ! 524: It would be possible to do better than this if the templates were given ! 525: to a separate program that ate them and generated a template ! 526: matching subroutine. ! 527: This would make maintenance of the compiler much more ! 528: complicated, however, so this has not been done. ! 529: .PP ! 530: The matching algorithm is actually carried out by restricting ! 531: the range in the table that must be searched for each opcode. ! 532: This introduces a number of complications, however, and needs a ! 533: bit of sympathetic help by the person constructing the ! 534: compiler in order to obtain best results. ! 535: The exact tuning of this algorithm continues; it ! 536: is best to consult the code and comments in ! 537: .I match ! 538: for the latest version. ! 539: .PP ! 540: In order to match a template to a tree, ! 541: it is necessary to match not only the cookie and the ! 542: operator of the root, but also the types and shapes of the ! 543: left and right descendants (if any) of the tree. ! 544: A convention is established here that is carried out throughout ! 545: the second pass of the compiler. ! 546: If a node represents a unary operator, the single descendant ! 547: is always the ``left'' descendant. ! 548: If a node represents a unary operator or a leaf node (no descendants) ! 549: the ``right'' descendant is taken by convention to be the node itself. ! 550: This enables templates to easily match leaves and conversion operators, for example, ! 551: without any additional mechanism in the matching program. ! 552: .PP ! 553: The type matching is straightforward; it is possible to specify any combination ! 554: of basic types, general pointers, and pointers to one or more of ! 555: the basic types. ! 556: The shape matching is somewhat more complicated, but still pretty simple. ! 557: Templates have a collection of possible operand shapes ! 558: on which the opcode might match. ! 559: In the simplest case, an ! 560: .I add ! 561: operation might be able to add to either a register variable ! 562: or a scratch register, and might be able (with appropriate ! 563: help from the assembler) to add an integer constant (ICON), a static ! 564: memory cell (NAME), or a stack location (OREG). ! 565: .PP ! 566: It is usually attractive to specify a number of such shapes, ! 567: and distinguish between them when the assembler output is produced. ! 568: It is possible to describe the union of many elementary ! 569: shapes such as ICON, NAME, OREG, ! 570: AREG or BREG ! 571: (both scratch and register forms), etc. ! 572: To handle at least the simple forms of indirection, one can also ! 573: match some more complicated forms of trees: STARNM and STARREG ! 574: can match more complicated trees headed by an indirection operator, ! 575: and SFLD can match certain trees headed by a FLD operator. ! 576: These patterns call machine dependent routines that match the ! 577: patterns of interest on a given machine. ! 578: The shape SWADD may be used to recognize NAME or OREG ! 579: nodes that lie on word boundaries: this may be of some importance ! 580: on word addressed machines. ! 581: Finally, there are some special shapes: these may not ! 582: be used in conjunction with the other shapes, but may be ! 583: defined and extended in machine dependent ways. ! 584: The special shapes SZERO, SONE, and SMONE are predefined and match ! 585: constants 0, 1, and \-1, respectively; others are easy to add ! 586: and match by using the machine dependent routine ! 587: .I special . ! 588: .PP ! 589: When a template has been found that matches the root of the tree, ! 590: the cookie, and the shapes and types of the descendants, ! 591: there is still one bar to a total match: the template may ! 592: call for some resources (for example, a scratch register). ! 593: The routine ! 594: .I allo ! 595: is called, and it attempts to allocate the resources. ! 596: If it cannot, the match fails; no resources are ! 597: allocated. ! 598: If successful, the allocated resources are given numbers ! 599: 1, 2, etc. for later reference when the assembly code is ! 600: generated. ! 601: The routines ! 602: .I expand ! 603: and ! 604: .I reclaim ! 605: are then called. ! 606: The ! 607: .I match ! 608: routine then returns a special value, MDONE. ! 609: If no match was found, the value MNOPE is returned; ! 610: this is a signal to the caller to try more cookie ! 611: values, or attempt a rewriting rule. ! 612: .I Match ! 613: is also used to select rewriting rules, although ! 614: the way of doing this is pretty straightforward. ! 615: A special cookie, FORREW, is used to ask ! 616: .I match ! 617: to search for a rewriting rule. ! 618: The rewriting rules are keyed to various opcodes; most ! 619: are carried out in ! 620: .I order . ! 621: Since the question of when to rewrite is one of the key issues in ! 622: code generation, it will be taken up again later. ! 623: .SH ! 624: Register Allocation ! 625: .PP ! 626: The register allocation routines, and the allocation strategy, ! 627: play a central role in the correctness of the code generation algorithm. ! 628: If there are bugs in the Sethi-Ullman computation that cause the ! 629: number of needed registers to be underestimated, ! 630: the compiler may run out of scratch registers; ! 631: it is essential that the allocator keep track of those registers that ! 632: are free and busy, in order to detect such conditions. ! 633: .PP ! 634: Allocation of registers takes place as the result of a template ! 635: match; the routine ! 636: .I allo ! 637: is called with a word describing the number of A registers, ! 638: B registers, and temporary locations needed. ! 639: The allocation of temporary locations on the stack is relatively ! 640: straightforward, and will not be further covered; the ! 641: bookkeeping is a bit tricky, but conceptually trivial, and requests ! 642: for temporary space on the stack will never fail. ! 643: .PP ! 644: Register allocation is less straightforward. ! 645: The two major complications are ! 646: .I pairing ! 647: and ! 648: .I sharing . ! 649: In many machines, some operations (such as multiplication ! 650: and division), and/or some types (such as longs or double precision) ! 651: require even/odd pairs of registers. ! 652: Operations of the first type are exceptionally difficult to ! 653: deal with in the compiler; in fact, their theoretical ! 654: properties are rather bad as well. ! 655: .[ ! 656: Aho Johnson Ullman Multiregister ! 657: .] ! 658: The second issue is dealt with rather more successfully; ! 659: a machine dependent function called ! 660: .I szty(t) ! 661: is called that returns 1 or 2, depending on the ! 662: number of A registers required to hold an object of type ! 663: .I t . ! 664: If ! 665: .I szty ! 666: returns 2, an even/odd pair of A registers is allocated ! 667: for each request. ! 668: As part of its duties, the routine ! 669: .I usable ! 670: finds usable register pairs for various operations. ! 671: This task is not as easy as it sounds; ! 672: it does not suffice to merely use ! 673: .I szty ! 674: on the expression tree, ! 675: since there are situations in which ! 676: a register pair temporary is needed even though ! 677: the result of the expression requires only one register. ! 678: This can occur with assignment operator expressions ! 679: which have \fBint\fP type but a \fBdouble\fP right hand side, ! 680: or with relational expressions where ! 681: one operand is \fBfloat\fP and the other \fBdouble\fP. ! 682: .PP ! 683: The other issue, sharing, is more subtle, but ! 684: important for good code quality. ! 685: When registers are allocated, it ! 686: is possible to reuse registers that hold address ! 687: information, and use them to contain the values ! 688: computed or accessed. ! 689: For example, on the IBM 360, if register 2 has ! 690: a pointer to an integer in it, we may load the ! 691: integer into register 2 itself by saying: ! 692: .DS ! 693: L 2,0(2) ! 694: .DE ! 695: If register 2 had a byte pointer, however, the sequence for ! 696: loading a character involves clearing the target ! 697: register first, and then inserting the desired character: ! 698: .DS ! 699: SR 3,3 ! 700: IC 3,0(2) ! 701: .DE ! 702: In the first case, if register 3 were used as the target, ! 703: it would lead to a larger number of registers ! 704: used for the expression than were required; the compiler would ! 705: generate inefficient code. ! 706: On the other hand, if register 2 were used as the target in the second ! 707: case, the code would simply be wrong. ! 708: In the first case, register 2 can be ! 709: .I shared ! 710: while in the second, it cannot. ! 711: .PP ! 712: In the specification of the register needs in the templates, ! 713: it is possible to indicate whether required scratch registers ! 714: may be shared with possible registers on the left or the right of the input tree. ! 715: In order that a register be shared, it must be scratch, and it must ! 716: be used only once, on the appropriate side of the tree being compiled. ! 717: .PP ! 718: The ! 719: .I allo ! 720: routine thus has a bit more to do than meets the eye; ! 721: it calls ! 722: .I freereg ! 723: to obtain a free register for each A and B register request. ! 724: .I Freereg ! 725: makes multiple calls on the routine ! 726: .I usable ! 727: to decide if a given register can be used to satisfy ! 728: a given need. ! 729: .I Usable ! 730: calls ! 731: .I shareit ! 732: if the register is busy, but might be shared. ! 733: Finally, ! 734: .I shareit ! 735: calls ! 736: .I ushare ! 737: to decide if the desired register is actually in the appropriate ! 738: subtree, and can be shared. ! 739: .PP ! 740: Just to add additional complexity, on some machines (such as the IBM 370) it ! 741: is possible to have ``double indexing'' forms of ! 742: addressing; these are represented by OREG's ! 743: with the base and index registers encoded into the register field. ! 744: While the register allocation and deallocation ! 745: .I "per se" ! 746: is not made more difficult by this phenomenon, the code itself ! 747: is somewhat more complex. ! 748: .PP ! 749: Having allocated the registers and expanded the assembly language, ! 750: it is time to reclaim the resources; the routine ! 751: .I reclaim ! 752: does this. ! 753: Many operations produce more than one result. ! 754: For example, many arithmetic operations may produce ! 755: a value in a register, and also set the condition ! 756: codes. ! 757: Assignment operations may leave results both in a register and in memory. ! 758: .I Reclaim ! 759: is passed three parameters; the tree and cookie ! 760: that were matched, and the rewriting field of the template. ! 761: The rewriting field allows the specification of possible results; ! 762: the tree is rewritten to reflect the results of the operation. ! 763: If the tree was computed for side effects only (FOREFF), ! 764: the tree is freed, and all resources in it reclaimed. ! 765: If the tree was computed for condition codes, the resources ! 766: are also freed, and the tree replaced by a special ! 767: node type, FORCC. ! 768: Otherwise, the value may be found in the left ! 769: argument of the root, the right argument of the root, ! 770: or one of the temporary resources allocated. ! 771: In these cases, first the resources of the tree, ! 772: and the newly allocated resources, ! 773: are ! 774: freed; then the resources needed by the result ! 775: are made busy again. ! 776: The final result must always match the shape of the input cookie; ! 777: otherwise, the compiler error ! 778: ``cannot reclaim'' ! 779: is generated. ! 780: There are some machine dependent ways of ! 781: preferring results in registers or memory when ! 782: there are multiple results matching multiple goals in the cookie. ! 783: .PP ! 784: .I Reclaim ! 785: also implements, in a curious way, ! 786: C's ``usual arithmetic conversions''. ! 787: When a value is generated into a temporary register, ! 788: .I reclaim ! 789: decides what the type and size of the result will be. ! 790: Unless automatic conversion is specifically suppressed ! 791: in the code template with the \fBT\fP macro, ! 792: \fIreclaim\fP converts \fBchar\fP and \fBshort\fP ! 793: results to \fBint\fP, ! 794: \fBunsigned char\fP and \fBunsigned short\fP results ! 795: to \fBunsigned int\fP, ! 796: and \fBfloat\fP into \fBdouble\fP ! 797: (for double only floating point arithmetic). ! 798: This conversion is a simple type pun; ! 799: no instructions for converting the value ! 800: are actually emitted. ! 801: This implies that registers must always ! 802: contain a value that is at least as wide ! 803: as a register, ! 804: which greatly restricts the range of possible templates. ! 805: .SH ! 806: The Machine Dependent Interface ! 807: .PP ! 808: The files ! 809: .I order.c , ! 810: .I local2.c , ! 811: and ! 812: .I table.c , ! 813: as well as the header file ! 814: .I mac2defs , ! 815: represent the machine dependent portion of the second pass. ! 816: The machine dependent portion can be roughly divided into ! 817: two: the easy portion and the hard portion. ! 818: The easy portion ! 819: tells the compiler the names of the registers, and arranges that ! 820: the compiler generate the proper assembler formats, opcode names, location counters, etc. ! 821: The hard portion involves the Sethi\-Ullman computation, the ! 822: rewriting rules, and, to some extent, the templates. ! 823: It is hard because there are no real algorithms that apply; ! 824: most of this portion is based on heuristics. ! 825: This section discusses the easy portion; the next several ! 826: sections will discuss the hard portion. ! 827: .PP ! 828: If the compiler is adapted from a compiler for a machine ! 829: of similar architecture, the easy part is indeed easy. ! 830: In ! 831: .I mac2defs , ! 832: the register numbers are defined, as well as various parameters for ! 833: the stack frame, and various macros that describe the machine architecture. ! 834: If double indexing is to be permitted, for example, the symbol ! 835: R2REGS is defined. ! 836: Also, a number of macros that are involved in function call processing, ! 837: especially for unusual function call mechanisms, are defined here. ! 838: .PP ! 839: In ! 840: .I local2.c , ! 841: a large number of simple functions are defined. ! 842: These do things such as write out opcodes, register names, ! 843: and address forms for the assembler. ! 844: Part of the function call code is defined here; that is nontrivial ! 845: to design, but typically rather straightforward to implement. ! 846: Among the easy routines in ! 847: .I order.c ! 848: are routines for generating a created label, ! 849: defining a label, and generating the arguments ! 850: of a function call. ! 851: .PP ! 852: These routines tend to have a local effect, and depend on a fairly straightforward way ! 853: on the target assembler and the design decisions already made about ! 854: the compiler. ! 855: Thus they will not be further treated here. ! 856: .SH ! 857: The Rewriting Rules ! 858: .PP ! 859: When a tree fails to match any template, it becomes ! 860: a candidate for rewriting. ! 861: Before the tree is rewritten, ! 862: the machine dependent routine ! 863: .I nextcook ! 864: is called with the tree and the cookie; it suggests ! 865: another cookie that might be a better candidate for the ! 866: matching of the tree. ! 867: If all else fails, the templates are searched with the cookie ! 868: FORREW, to look for a rewriting rule. ! 869: The rewriting rules are of two kinds; ! 870: for most of the common operators, there are ! 871: machine dependent rewriting rules that may be applied; ! 872: these are handled by machine dependent functions ! 873: that are called and given the tree to be computed. ! 874: These routines may recursively call ! 875: .I order ! 876: or ! 877: .I codgen ! 878: to cause certain subgoals to be achieved; ! 879: if they actually call for some alteration of the tree, ! 880: they return 1, and the ! 881: code generation algorithm recanonicalizes and tries again. ! 882: If these routines choose not to deal with the tree, the ! 883: default rewriting rules are applied. ! 884: .PP ! 885: The assignment operators, when rewritten, call the routine ! 886: .I setasg . ! 887: This is assumed to rewrite the tree at least to the point where there are ! 888: no side effects in the left hand side. ! 889: If there is still no template match, ! 890: a default rewriting is done that causes ! 891: an expression such as ! 892: .DS ! 893: .I "a += b" ! 894: .DE ! 895: to be rewritten as ! 896: .DS ! 897: .I "a = a + b" ! 898: .DE ! 899: This is a useful default for certain mixtures of strange types ! 900: (for example, when ! 901: .I a ! 902: is a bit field and ! 903: .I b ! 904: an character) that ! 905: otherwise might need separate table entries. ! 906: .PP ! 907: Simple assignment, structure assignment, and all forms of calls ! 908: are handled completely by the machine dependent routines. ! 909: For historical reasons, the routines generating the calls return ! 910: 1 on failure, 0 on success, unlike the other routines. ! 911: .PP ! 912: The machine dependent routine ! 913: .I setbin ! 914: handles binary operators; it too must do most of the job. ! 915: In particular, when it returns 0, it must do so with ! 916: the left hand side in a temporary register. ! 917: The default rewriting rule in this case is to convert the ! 918: binary operator into the associated assignment operator; ! 919: since the left hand side is assumed to be a temporary register, ! 920: this preserves the semantics and often allows a considerable ! 921: saving in the template table. ! 922: .PP ! 923: The increment and decrement operators may be dealt with with ! 924: the machine dependent routine ! 925: .I setincr . ! 926: If this routine chooses not to deal with the tree, the rewriting rule replaces ! 927: .DS ! 928: .I "x ++" ! 929: .DE ! 930: by ! 931: .DS ! 932: .I "( (x += 1) \- 1)" ! 933: .DE ! 934: which preserves the semantics. ! 935: Once again, this is not too attractive for the most common ! 936: cases, but can generate close to optimal code when the ! 937: type of x is unusual. ! 938: .PP ! 939: Finally, the indirection (UNARY MUL) operator is also handled ! 940: in a special way. ! 941: The machine dependent routine ! 942: .I offstar ! 943: is extremely important for the efficient generation of code. ! 944: .I Offstar ! 945: is called with a tree that is the direct descendant of a UNARY MUL node; ! 946: its job is to transform this tree so that the combination of ! 947: UNARY MUL with the transformed tree becomes addressable. ! 948: On most machines, ! 949: .I offstar ! 950: can simply compute the tree into an A or B register, ! 951: depending on the architecture, and then ! 952: .I canon ! 953: will make the resulting tree into an OREG. ! 954: On many machines, ! 955: .I offstar ! 956: can profitably choose to do less work than computing ! 957: its entire argument into a register. ! 958: For example, if the target machine supports OREG's ! 959: with a constant offset from a register, and ! 960: .I offstar ! 961: is called ! 962: with a tree of the form ! 963: .DS ! 964: .I "expr + const" ! 965: .DE ! 966: where ! 967: .I const ! 968: is a constant, then ! 969: .I offstar ! 970: need only compute ! 971: .I expr ! 972: into the appropriate form of register. ! 973: On machines that support double indexing, ! 974: .I offstar ! 975: may have even more choice as to how to proceed. ! 976: The proper tuning of ! 977: .I offstar , ! 978: which is not typically too difficult, should be one of the ! 979: first tries at optimization attempted by the ! 980: compiler writer. ! 981: .SH ! 982: The Sethi-Ullman Computation ! 983: .PP ! 984: The heart of the heuristics is the computation of the Sethi-Ullman numbers. ! 985: This computation is closely linked with the rewriting rules and the ! 986: templates. ! 987: As mentioned before, the Sethi-Ullman numbers are expected to ! 988: estimate the number of scratch registers needed to compute ! 989: the subtrees without using any stores. ! 990: However, the original theory does not apply to real machines. ! 991: For one thing, the theory assumes that all registers ! 992: are interchangeable. ! 993: Real machines have general purpose, floating point, and index registers, ! 994: register pairs, etc. ! 995: The theory also does not account for side effects; ! 996: this rules out various forms of pathology that arise ! 997: from assignment and assignment operators. ! 998: Condition codes are also undreamed of. ! 999: Finally, the influence of types, conversions, and the ! 1000: various addressability restrictions and extensions of real ! 1001: machines are also ignored. ! 1002: .PP ! 1003: Nevertheless, for a ``useless'' theory, ! 1004: the basic insight of Sethi and Ullman is amazingly ! 1005: useful in a real compiler. ! 1006: The notion that one should attempt to estimate the ! 1007: resource needs of trees before starting the ! 1008: code generation provides a natural means of splitting the ! 1009: code generation problem, and provides a bit of redundancy ! 1010: and self checking in the compiler. ! 1011: Moreover, if writing the ! 1012: Sethi-Ullman routines is hard, describing, writing, and debugging the ! 1013: alternative (routines that attempt to free up registers by stores into ! 1014: temporaries ``on the fly'') is even worse. ! 1015: Nevertheless, it should be clearly understood that these routines exist in a realm ! 1016: where there is no ``right'' way to write them; ! 1017: it is an art, the realm of heuristics, and, consequently, a major ! 1018: source of bugs in the compiler. ! 1019: Often, the early, crude versions of these routines give little trouble; ! 1020: only after ! 1021: the compiler is actually working ! 1022: and the ! 1023: code quality is being improved do serious problem have to be faced. ! 1024: Having a simple, regular machine architecture is worth quite ! 1025: a lot at this time. ! 1026: .PP ! 1027: The major problems arise from asymmetries in the registers: register pairs, ! 1028: having different kinds of registers, and the related problem of ! 1029: needing more than one register (frequently a pair) to store certain ! 1030: data ! 1031: types (such as longs or doubles). ! 1032: There appears to be no general way of treating this problem; ! 1033: solutions have to be fudged for each machine where the problem arises. ! 1034: On the Honeywell 66, for example, there are only two general purpose registers, ! 1035: so a need for a pair is the same as the need for two registers. ! 1036: On the IBM 370, the register pair (0,1) is used to do multiplications and divisions; ! 1037: registers 0 and 1 are not generally considered part of the scratch registers, and so ! 1038: do not require allocation explicitly. ! 1039: On the Interdata 8/32, after much consideration, the ! 1040: decision was made not to try to deal with the register pair issue; ! 1041: operations such as multiplication and division that required pairs ! 1042: were simply assumed to take all of the scratch registers. ! 1043: Several weeks of effort had failed to produce ! 1044: an algorithm that seemed to have much chance of running successfully ! 1045: without inordinate debugging effort. ! 1046: The difficulty of this issue should not be minimized; it represents one of the ! 1047: main intellectual efforts in porting the compiler. ! 1048: Nevertheless, this problem has been fudged with a degree of ! 1049: success on nearly a dozen machines, so the compiler writer should ! 1050: not abandon hope. ! 1051: .PP ! 1052: The Sethi-Ullman computations interact with the ! 1053: rest of the compiler in a number of rather subtle ways. ! 1054: As already discussed, the ! 1055: .I store ! 1056: routine uses the Sethi-Ullman numbers to decide which subtrees are too difficult ! 1057: to compute in registers, and must be stored. ! 1058: There are also subtle interactions between the ! 1059: rewriting routines and the Sethi-Ullman numbers. ! 1060: Suppose we have a tree such as ! 1061: .DS ! 1062: .I "A \- B" ! 1063: .DE ! 1064: where ! 1065: .I A ! 1066: and ! 1067: .I B ! 1068: are expressions; suppose further that ! 1069: .I B ! 1070: takes two registers, and ! 1071: .I A ! 1072: one. ! 1073: It is possible to compute the full expression in two registers by ! 1074: first computing ! 1075: .I B , ! 1076: and then, using the scratch register ! 1077: used by ! 1078: .I B , ! 1079: but not containing the answer, compute ! 1080: .I A . ! 1081: The subtraction can then be done, computing the expression. ! 1082: (Note that this assumes a number of things, not the least of which ! 1083: are register-to-register subtraction operators and symmetric ! 1084: registers.) ! 1085: If the machine dependent routine ! 1086: .I setbin , ! 1087: however, is not prepared to recognize this case ! 1088: and compute the more difficult side of the expression ! 1089: first, the ! 1090: Sethi-Ullman number must be set to three. ! 1091: Thus, the ! 1092: Sethi-Ullman number for a tree should represent the code that ! 1093: the machine dependent routines are actually willing to generate. ! 1094: .PP ! 1095: The interaction can go the other way. ! 1096: If we take an expression such as ! 1097: .DS ! 1098: * ( \fIp\fP + \fIi\fP ) ! 1099: .DE ! 1100: where ! 1101: .I p ! 1102: is a pointer and ! 1103: .I i ! 1104: an integer, ! 1105: this can probably be done in one register on most machines. ! 1106: Thus, its Sethi-Ullman number would probably be set to one. ! 1107: If double indexing is possible in the machine, a possible way ! 1108: of computing the expression is to load both ! 1109: .I p ! 1110: and ! 1111: .I i ! 1112: into registers, and then use double indexing. ! 1113: This would use two scratch registers; in such a case, ! 1114: it is possible that the scratch registers might be unobtainable, ! 1115: or might make some other part of the computation run out of ! 1116: registers. ! 1117: The usual solution is to cause ! 1118: .I offstar ! 1119: to ignore opportunities for double indexing that would tie up more scratch ! 1120: registers than the Sethi-Ullman number had reserved. ! 1121: .PP ! 1122: In summary, the Sethi-Ullman computation represents much of the craftsmanship and artistry in any application ! 1123: of the portable compiler. ! 1124: It is also a frequent source of bugs. ! 1125: Algorithms are available that will produce nearly optimal code ! 1126: for specialized machines, but unfortunately most existing machines ! 1127: are far removed from these ideals. ! 1128: The best way of proceeding in practice is to start with a compiler ! 1129: for a similar machine to the target, and proceed very ! 1130: carefully. ! 1131: .SH ! 1132: Register Allocation ! 1133: .PP ! 1134: After the Sethi-Ullman numbers are computed, ! 1135: .I order ! 1136: calls a routine, ! 1137: .I rallo , ! 1138: that does register allocation, if appropriate. ! 1139: This routine does relatively little, in general; ! 1140: this is especially true if the target machine ! 1141: is fairly regular. ! 1142: There are a few cases where it is assumed that ! 1143: the result of a computation takes place in a particular register; ! 1144: switch and function return are the two major places. ! 1145: The expression tree has a field, ! 1146: .I rall , ! 1147: that may be filled with a register number; this is taken ! 1148: to be a preferred register, and the first temporary ! 1149: register allocated by a template match will be this preferred one, if ! 1150: it is free. ! 1151: If not, no particular action is taken; this is just a heuristic. ! 1152: If no register preference is present, the field contains NOPREF. ! 1153: In some cases, the result must be placed in ! 1154: a given register, no matter what. ! 1155: The register number is placed in ! 1156: .I rall , ! 1157: and the mask MUSTDO is logically or'ed in with it. ! 1158: In this case, if the subtree is requested in a register, and comes ! 1159: back in a register other than the demanded one, it is moved ! 1160: by calling the routine ! 1161: .I rmove . ! 1162: If the target register for this move is busy, it is a compiler ! 1163: error. ! 1164: .PP ! 1165: Note that this mechanism is the only one that will ever cause a register-to-register ! 1166: move between scratch registers (unless such a move is buried in the depths of ! 1167: some template). ! 1168: This simplifies debugging. ! 1169: In some cases, there is a rather strange interaction between ! 1170: the register allocation and the Sethi-Ullman number; ! 1171: if there is an operator or situation requiring a ! 1172: particular register, the allocator and the Sethi-Ullman ! 1173: computation must conspire to ensure that the target ! 1174: register is not being used by some intermediate result of some far-removed computation. ! 1175: This is most easily done by making the special operation take ! 1176: all of the free registers, preventing any other partially-computed ! 1177: results from cluttering up the works. ! 1178: .SH ! 1179: Template Shortcuts ! 1180: .PP ! 1181: Some operations are just too hard or too clumsy ! 1182: to be implemented in code templates ! 1183: on a particular architecture. ! 1184: .PP ! 1185: One way to handle such operations is to replace them with function calls. ! 1186: The intermediate file reading code in ! 1187: .I reader.c ! 1188: contains a call to an implementation dependent macro MYREADER; ! 1189: this can be defined to call various routines ! 1190: which walk the code tree and ! 1191: perform transformations. ! 1192: On the \*V, for example, ! 1193: unsigned division and remainder operations ! 1194: are far too complex to encode in a template. ! 1195: The routine ! 1196: .I hardops ! 1197: is called from a tree walk in ! 1198: .I myreader ! 1199: to detect these operations and ! 1200: replace them with calls to the C runtime functions ! 1201: .I udiv ! 1202: and ! 1203: .I urem . ! 1204: (There are complementary functions ! 1205: .I audiv ! 1206: and ! 1207: .I aurem ! 1208: which are provided as support for unsigned assignment operator expressions; ! 1209: they are different from \fIudiv\fP and \fIurem\fP ! 1210: because the left hand side of an assignment operator expression ! 1211: must be evaluated only once.) ! 1212: Note that arithmetic support routines are always expensive; ! 1213: the compiler makes an effort to notice common operations ! 1214: such as unsigned division by a constant power of two ! 1215: and generates optimal code for these inline. ! 1216: .PP ! 1217: Another escape involves the routine ! 1218: .I zzzcode . ! 1219: This function is called from ! 1220: .I expand ! 1221: to process template macros which start with the character ! 1222: .B Z . ! 1223: On the \*V, ! 1224: many complex code generation problems ! 1225: are swept under the rug into \fIzzzcode\fP. ! 1226: Scalar type conversions are a particularly annoying issue; ! 1227: they are primarily handled using the macro \fBZA\fP. ! 1228: Rather than creating a template for each possible ! 1229: conversion and result, ! 1230: which would be tedious and complex given ! 1231: C's many scalar types, ! 1232: this macro allows the compiler to take shortcuts. ! 1233: Tough conversions such as \fBunsigned\fP into \fBdouble\fP ! 1234: are easily handled using special code under \fBZA\fP. ! 1235: One convention which makes scalar conversions ! 1236: somewhat more difficult than they might otherwise be ! 1237: is the strict requirement that ! 1238: values in registers must have a type ! 1239: that is as wide or wider than a single register. ! 1240: This convention is used primarily to implement ! 1241: the ``usual arithmetic conversions'' of C, ! 1242: but it can get in the way when ! 1243: converting between (say) a \fBchar\fP value ! 1244: and an \fBunsigned short\fP. ! 1245: A routine named \fIcollapsible\fP is used ! 1246: to determine whether one operation or two ! 1247: is needed to produce a register-width result. ! 1248: .PP ! 1249: Another convenient macro is \fBZP\fP. ! 1250: This macro is used to generate an appropriate ! 1251: conditional test after a comparison. ! 1252: This makes it possible to avoid ! 1253: a profusion of template entries ! 1254: which essentially duplicate each other, ! 1255: one entry for each type of test ! 1256: multiplied by the number of different ! 1257: comparison conditions. ! 1258: A related macro, \fBZN\fP, ! 1259: is used to normalize the result ! 1260: of a relational test by ! 1261: producing an integer 0 or 1. ! 1262: .PP ! 1263: The macro \fBZS\fP does the unlovely job ! 1264: of generating code for structure assignments. ! 1265: It tests the size of the structure ! 1266: to see what \*V instruction can be used to move it, ! 1267: and is capable of emitting a block move ! 1268: instruction for large structures. ! 1269: On other architectures this macro ! 1270: could be used to generate a function call ! 1271: to a block copy routine. ! 1272: .PP ! 1273: The macro \fBZG\fP was recently introduced ! 1274: to handle the thorny issue of ! 1275: assignment operator expressions which ! 1276: have an integral left hand side ! 1277: and a floating point right hand side. ! 1278: These expressions are passed to the code generator ! 1279: without the usual type balancing so that ! 1280: good code can be generated for them. ! 1281: Older versions of the portable compiler computed ! 1282: these expressions with integer arithmetic; ! 1283: with the \fBZG\fP operator, ! 1284: the current compiler can convert the left hand side ! 1285: to the appropriate floating type, ! 1286: compute the expression with floating point arithmetic, ! 1287: convert the result back to integral type ! 1288: and store it in the left hand side. ! 1289: These operations are performed by ! 1290: recursive calls to \fIzzzcode\fP ! 1291: and other routines related to \fIexpand\fP. ! 1292: .PP ! 1293: An assortment of other macros finish the job ! 1294: of interpreting code templates. ! 1295: Among the more interesting ones: ! 1296: \fBZC\fP produces the number of words ! 1297: pushed on the argument stack, ! 1298: which is useful for function calls; ! 1299: \fBZD\fP and \fBZE\fP produce ! 1300: constant increment and decrement operations; ! 1301: \fBZL\fR and \fBZR\fP produce ! 1302: the assembler letter code (\fBl\fP, \fBw\fP or \fBb\fP) ! 1303: corresponding to the size and type ! 1304: of the left and right operand respectively. ! 1305: .SH ! 1306: Shared Code ! 1307: .PP ! 1308: The ! 1309: .I lint ! 1310: utility shares sources with the portable compiler. ! 1311: \fILint\fP uses all of the machine independent pass 1 sources, ! 1312: and adds its own set of ``machine dependent'' routines, ! 1313: contained mostly in \fIlint.c\fP. ! 1314: \fILint\fP uses a private intermediate file format ! 1315: and a private pass 2 whose source is \fIlpass2.c\fP. ! 1316: Several modifications were made to the C scanner ! 1317: in \fIscan.c\fP, conditionally compiled with ! 1318: the symbol LINT, ! 1319: in order to support \fIlint\fP's convention ! 1320: of passing ``pragma'' information inside special comments. ! 1321: A few other minor modifications were also ! 1322: made, \fIe.g.\fP to skip over \fIasm\fP statements. ! 1323: .PP ! 1324: The \*f and \fIpc\fP compilers use a code generator ! 1325: which shares sources with pass 2 of the portable compiler. ! 1326: This code generator is very similar to pass 2 ! 1327: but uses a different intermediate file format. ! 1328: Three source files are needed in addition to the pass 2 sources. ! 1329: .I fort.c ! 1330: is a machine independent source file which ! 1331: contains a pass 2 main routine that replaces the equivalent routine in ! 1332: .I reader.c , ! 1333: together with several routines for reading the binary intermediate file. ! 1334: .I fort.c ! 1335: includes the machine dependent file ! 1336: .I fort.h , ! 1337: which defines two trivial label generation routines. ! 1338: A header file ! 1339: .I /usr/include/pcc.h ! 1340: defines opcode and type symbols which ! 1341: are needed to provide a standard intermediate file format; ! 1342: this file is also included by the Fortran and Pascal compilers. ! 1343: The creation of this header file ! 1344: made it necessary to make ! 1345: some changes in the way the portable C compiler is built. ! 1346: These changes were made with the aim ! 1347: of minimizing the number of lines changed ! 1348: in the original sources. ! 1349: Macro symbols in ! 1350: .I pcc.h ! 1351: are flagged with a unique prefix to avoid ! 1352: symbol name collisions in the Fortran and Pascal compilers, ! 1353: which have their own internal opcode and type symbols. ! 1354: A ! 1355: .I sed (1) ! 1356: script is used to strip these prefixes, ! 1357: producing an include file named ! 1358: .I pcclocal.h ! 1359: which is specific to the portable C compiler ! 1360: and contains opcode symbols which ! 1361: are compatible with the original opcode symbols. ! 1362: A similar ! 1363: .I sed ! 1364: script is used to produce a file of Yacc tokens for the C grammar. ! 1365: .PP ! 1366: A number of changes to existing source files ! 1367: were made to accommodate the Fortran-style pass 2. ! 1368: These changes are conditionally compiled ! 1369: using the symbol FORT. ! 1370: Many changes were needed to implement ! 1371: single-precision arithmetic; ! 1372: other changes concern such things as ! 1373: the avoidance of floating point move instructions, ! 1374: which on the \*V can cause floating point faults ! 1375: when a datum is not a normalized floating point value. ! 1376: In earlier implementations of the Fortran-style pass 2 there were ! 1377: a number of stub files which served only ! 1378: to define the symbol FORT in a particular ! 1379: source file; these files have been removed for 4.3BSD ! 1380: in favor of a new compilation strategy which ! 1381: yields up to three different objects from ! 1382: a single source file, depending on what ! 1383: compilation control symbols are defined for that file. ! 1384: .PP ! 1385: The Fortran-style pass 2 uses a Polish Postfix intermediate file. ! 1386: The file is in binary format, ! 1387: and is logically divided into a stream of 32-bit records. ! 1388: Each record consists of an (\fIopcode, value, type\fP) triple, ! 1389: possibly followed inline by more descriptive information. ! 1390: The ! 1391: .I opcode ! 1392: and ! 1393: .I type ! 1394: are selected from the list in ! 1395: .I pcc.h ; ! 1396: the type encodes a basic type, ! 1397: around which may be wrapped type modifiers ! 1398: such as ``pointer to'' or ``array of'' to ! 1399: produce more complex types. ! 1400: The function of the ! 1401: .I value ! 1402: parameter depends on the opcode; ! 1403: it may be used for a flag, ! 1404: a register number ! 1405: or the value of a constant, ! 1406: or it may be unused. ! 1407: The optional inline data is often a null-terminated string, ! 1408: but it may also be a binary offset from a register ! 1409: or from a symbolic constant; ! 1410: sometimes both a string and an offset appear. ! 1411: .PP ! 1412: Here are a few samples of intermediate file records ! 1413: and their interpretation: ! 1414: .KS ! 1415: .TS ! 1416: expand, tab(:); ! 1417: c c c c c ! 1418: ^ ^ ^ c ^ ! 1419: l lB l l l. ! 1420: Opcode:Type:Value:Optional:Interpretation ! 1421: :::Data: ! 1422: .sp 0.5v ! 1423: = ! 1424: .sp 0.5v ! 1425: ICON:int:flag=0:binary=5:T{ ! 1426: .Bl ! 1427: the integer constant 5 ! 1428: T} ! 1429: .Sp ! 1430: NAME:char:flag=1:T{ ! 1431: .nf ! 1432: binary=1, ! 1433: string="_foo_" ! 1434: T}:T{ ! 1435: .Bl ! 1436: a \fBcharacter*1\fP element in ! 1437: a Fortran common block \fIfoo\fP ! 1438: at offset 1 ! 1439: T} ! 1440: .Sp ! 1441: OREG:char:reg=11:T{ ! 1442: .nf ! 1443: offset=1, ! 1444: string="v.2-v.1" ! 1445: T}:T{ ! 1446: .Bl ! 1447: the second element ! 1448: of a Fortran \fBcharacter*1\fP array, ! 1449: expressed as an offset from a static base register ! 1450: T} ! 1451: .Sp ! 1452: PLUS:float:::T{ ! 1453: .Bl ! 1454: a single precision add ! 1455: T} ! 1456: .Sp ! 1457: FTEXT::size=2:string=".text 0":T{ ! 1458: .Bl ! 1459: an inline assembler directive ! 1460: of length 2 (32-bit records) ! 1461: T} ! 1462: .TE ! 1463: .KE ! 1464: .SH ! 1465: Compiler Bugs ! 1466: .PP ! 1467: The portable compiler has an excellent record of generating correct code. ! 1468: The requirement for reasonable cooperation between the register allocation, ! 1469: Sethi-Ullman computation, rewriting rules, and templates builds quite a bit ! 1470: of redundancy into the compiling process. ! 1471: The effect of this is that, in a surprisingly short time, the compiler will ! 1472: start generating correct code for those ! 1473: programs that it can compile. ! 1474: The hard part of the job then becomes finding and ! 1475: eliminating those situations where the compiler refuses to ! 1476: compile a program because it knows it cannot do it right. ! 1477: For example, a template may simply be missing; this may either ! 1478: give a compiler error of the form ``no match for op ...'' , or cause ! 1479: the compiler to go into an infinite loop applying various rewriting rules. ! 1480: The compiler has a variable, ! 1481: .I nrecur , ! 1482: that is set to 0 at the beginning of an expressions, and ! 1483: incremented at key spots in the ! 1484: compilation process; if this parameter gets too large, the ! 1485: compiler decides that it is in a loop, and aborts. ! 1486: Loops are also characteristic of botches in the machine-dependent rewriting rules. ! 1487: Bad Sethi-Ullman computations usually cause the scratch registers ! 1488: to run out; this often means ! 1489: that the Sethi-Ullman number was underestimated, so ! 1490: .I store ! 1491: did not store something it should have; alternatively, ! 1492: it can mean that the rewriting rules were not smart enough to ! 1493: find the sequence that ! 1494: .I sucomp ! 1495: assumed would be used. ! 1496: .PP ! 1497: The best approach when a compiler error is detected involves several stages. ! 1498: First, try to get a small example program that steps on the bug. ! 1499: Second, turn on various debugging flags in the code generator, and follow the ! 1500: tree through the process of being matched and rewritten. ! 1501: Some flags of interest are ! 1502: \fB\-e\fP, which prints the expression tree, ! 1503: \fB\-r\fP, which gives information about the allocation of registers, ! 1504: \fB\-a\fP, which gives information about the performance of ! 1505: .I rallo , ! 1506: and \fB\-o\fP, which gives information about the behavior of ! 1507: .I order . ! 1508: This technique should allow most bugs to be found relatively quickly. ! 1509: .PP ! 1510: Unfortunately, finding the bug is usually not enough; it must also ! 1511: be fixed! ! 1512: The difficulty arises because a fix to the particular bug of interest tends ! 1513: to break other code that already works. ! 1514: Regression tests, tests that compare the performance of ! 1515: a new compiler against the performance of an older one, are very ! 1516: valuable in preventing major catastrophes. ! 1517: .SH ! 1518: Compiler Extensions ! 1519: .PP ! 1520: The portable C compiler makes a few extensions ! 1521: to the language described by Ritchie. ! 1522: .PP ! 1523: .I "Single precision arithmetic." ! 1524: ``All floating arithmetic in C is carried out in double-precision; ! 1525: whenever a ! 1526: .B float ! 1527: appears in a an expression it is lengthened to ! 1528: .B double ! 1529: by zero-padding its fraction.'' ! 1530: \*-Dennis Ritchie. ! 1531: .[ ! 1532: kernighan ritchie prentice 1978 ! 1533: .] ! 1534: Programmers who would like to use C ! 1535: to write numerical applications ! 1536: often shy away from it because ! 1537: C programs cannot perform single precision arithmetic. ! 1538: On machines such as the \*V ! 1539: which can cleanly support arithmetic on two (or more) ! 1540: sizes of floating point values, ! 1541: programs which can take advantage of ! 1542: single precision arithmetic will run faster. ! 1543: A very popular proposal for the ANSI C standard states that ! 1544: implementations may perform single precision computations ! 1545: with single precision arithmetic; ! 1546: some actual C implementations already do this, ! 1547: and now the Berkeley compiler joins them. ! 1548: .PP ! 1549: The changes are implemented in the compiler with a set of ! 1550: conditional compilation directives based on the symbol SPRECC; ! 1551: thus two compilers are generated, ! 1552: one with only double precision arithmetic ! 1553: and one with both double and single precision arithmetic. ! 1554: The ! 1555: .I cc ! 1556: program uses a flag ! 1557: .B \-f ! 1558: to select the single/double version of the compiler (\fI/lib/sccom\fP) ! 1559: instead of the default double only version (\fI/lib/ccom\fP). ! 1560: It is expected that at some time in the future ! 1561: the double only compiler will be retired ! 1562: and the single/double compiler will become the default. ! 1563: .PP ! 1564: There are a few implementation details ! 1565: of the single/double compiler which ! 1566: will be of interest to users and compiler porters. ! 1567: To maintain compatibility with functions compiled by the double only compiler, ! 1568: single precision actual arguments are still coerced to double precision, ! 1569: and formal arguments which are declared single precision ! 1570: are still ``really'' double precision. ! 1571: This may change if function prototypes of the sort ! 1572: proposed for the ANSI C standard are eventually adopted. ! 1573: Floating point constants are now classified into single precision ! 1574: and double precision types. ! 1575: The precision of a constant is determined from context; ! 1576: if a floating constant appears in an arithmetic expression ! 1577: with a single precision value, ! 1578: the constant is treated as having single precision type ! 1579: and the arithmetic expression is computed ! 1580: using single precision arithmetic. ! 1581: .PP ! 1582: Remarkably little code in the compiler ! 1583: needed to be changed to implement the single/double compiler. ! 1584: In many cases the changes overlapped with ! 1585: special cases which are used for the Fortran-style ! 1586: pass 2 (\fI/lib/f1\fP). ! 1587: Most of the single precision changes were implemented by Sam Leffler. ! 1588: .PP ! 1589: .I "Preprocessor extensions." ! 1590: The portable C compiler is normally distributed ! 1591: with a macro preprocessor written by J. F. Reiser. ! 1592: This preprocessor implements the features described ! 1593: in Ritchie's reference manual; ! 1594: it removes comments, expands macro definitions ! 1595: and removes or inserts code based on conditional compilation directives. ! 1596: Two interesting extensions are provided by this version of the preprocessor: ! 1597: .IP \(bu ! 1598: When comments are removed, no white space is necessarily substituted; ! 1599: this has the effect of \fIre-tokenizing\fP code, ! 1600: since the PCC will reanalyze the input. ! 1601: Macros can thus create new tokens by clever use of comments. ! 1602: For example, the macro definition ``#define foo(a,b) a/**/b'' ! 1603: creates a macro \fIfoo\fP which ! 1604: concatenates its two arguments, forming a new token. ! 1605: .IP \(bu ! 1606: Macro bodies are analyzed for macro arguments ! 1607: without regard to the boundaries of string or character constants. ! 1608: The definition ``#define bar(a) "a\en"'' creates a macro ! 1609: which returns the literal form of its argument embedded ! 1610: in a string with a newline appended. ! 1611: .PP ! 1612: These extensions are not portable to ! 1613: a number of other C preprocessors. ! 1614: They may be replaced in the future by corresponding ANSI C features, ! 1615: when the ANSI C standard has been formalized. ! 1616: .SH ! 1617: Summary and Conclusion ! 1618: .PP ! 1619: The portable compiler has been a useful tool for providing C ! 1620: capability on a large number of diverse machines, ! 1621: and for testing a number of theoretical ! 1622: constructs in a practical setting. ! 1623: It has many blemishes, both in style and functionality. ! 1624: It has been applied to many more machines than first ! 1625: anticipated, of a much wider range than originally dreamed of. ! 1626: Its use has also spread much faster than expected, leaving parts of ! 1627: the compiler still somewhat raw in shape. ! 1628: .PP ! 1629: On the theoretical side, there is some hope that the ! 1630: skeleton of the ! 1631: .I sucomp ! 1632: routine could be generated for many machines directly from the ! 1633: templates; this would give a considerable boost ! 1634: to the portability and correctness of the compiler, ! 1635: but might affect tunability and code quality. ! 1636: There is also room for more optimization, both within ! 1637: .I optim ! 1638: and in the form of a portable ``peephole'' optimizer. ! 1639: .PP ! 1640: On the practical, development side, ! 1641: the compiler could probably be sped up and made smaller ! 1642: without doing too much violence to its basic structure. ! 1643: Parts of the compiler deserve to be rewritten; ! 1644: the initialization code, register allocation, and ! 1645: parser are prime candidates. ! 1646: It might be that doing some or all of the parsing ! 1647: with a recursive descent parser might ! 1648: save enough space and time to be worthwhile; ! 1649: it would certainly ease the problem of moving the ! 1650: compiler to an environment where ! 1651: .I Yacc ! 1652: is not already present. ! 1653: .SH ! 1654: Acknowledgements ! 1655: .PP ! 1656: I would like to thank the many people who have ! 1657: sympathetically, and even enthusiastically, helped me grapple ! 1658: with what has been a frustrating program to write, test, and install. ! 1659: D. M. Ritchie and E. N. Pinson provided needed early ! 1660: encouragement and philosophical guidance; ! 1661: M. E. Lesk, ! 1662: R. Muha, T. G. Peterson, ! 1663: G. Riddle, L. Rosler, ! 1664: R. W. Mitze, ! 1665: B. R. Rowland, ! 1666: S. I. Feldman, ! 1667: and ! 1668: T. B. London ! 1669: have all contributed ideas, gripes, and all, at one time or another, ! 1670: climbed ``into the pits'' with me to help debug. ! 1671: Without their help this effort would have not been possible; ! 1672: with it, it was often kind of fun. ! 1673: \*-S. C. Johnson ! 1674: .sp ! 1675: .PP ! 1676: Many people have contributed fixes and improvements ! 1677: to the current Berkeley version of the compiler. ! 1678: A number of really valuable fixes were contributed by ! 1679: Ralph Campbell, Sam Leffler, Kirk McKusick, Arthur Olsen, Donn Seeley, ! 1680: Don Speck and Chris Torek, ! 1681: but most of the bugs were spotted by ! 1682: the legions of virtuous C programmers ! 1683: who were kind enough to let us know ! 1684: that the compiler was broken ! 1685: and when the heck were we going to get it fixed? ! 1686: Thank you all. ! 1687: \*-Donn Seeley ! 1688: .sp 2 ! 1689: .LP ! 1690: .[ ! 1691: $LIST$ ! 1692: .] ! 1693: .LP
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.