|
|
1.1 ! root 1: .EQ ! 2: delim ## ! 3: .EN ! 4: .DS C ! 5: \f3 Twig: A Brief Description\f1 ! 6: .DE ! 7: .sp 2 ex ! 8: .PP ! 9: Twig is a language for processing trees based on the ! 10: algorithms described above. A twig program consists of a set ! 11: of pattern-action rules together with associated declarations. Patterns ! 12: describe trees to be matched. Actions calculate costs, perform tree ! 13: manipulations and other functions such as emitting code. ! 14: A twig program is translated by the twig ! 15: preprocessor into subroutines and tables in the host language which is C. ! 16: .PP ! 17: There are many ways to represent trees and costs. To give the user more ! 18: choice in representation, twig treats them as abstract data ! 19: types (ADT). Its manipulation of trees and costs are strictly performed ! 20: through calls to subroutines provided by the user. ! 21: .sp 1ex ! 22: .SH ! 23: Rules ! 24: .PP ! 25: The fundamental unit of a twig program is a rule. ! 26: .sp 1ex ! 27: .ti 1.5in ! 28: \f2label \f3: \f2tree_pattern \f3[ \f2cost \f3] [ = \f2action \f3]\f1 ! 29: .sp 1ex ! 30: .PP ! 31: Intuitively, the pattern is used to match a subtree. The \f2cost\f1 code ! 32: fragment is then evaluated. The resulting cost is recorded by ! 33: the matcher for use in dynamic programming. The \f2action\f1 is executed ! 34: if the rule is part of the least cost cover of the tree. ! 35: .PP ! 36: If the \f2cost\f1 part is missing, Twig will insert default code ! 37: that returns the special value, DEFAULT_COST. ! 38: A missing \f2action\f1 part indicates that no action will be taken when a ! 39: match is found. ! 40: .SH ! 41: Tree Patterns ! 42: .PP ! 43: Tree patterns are specified ! 44: in prefix parenthesized form and can be described by the ! 45: following BNF: ! 46: .DS ! 47: .na ! 48: .nf ! 49: \f2tree_pattern \(-> node_id | label_id ! 50: tree_pattern \(-> node_id (tree_list) ! 51: tree_list \(-> tree,tree_list | tree ! 52: .DE ! 53: .ad ! 54: .fi ! 55: For example, the tree, ! 56: .DS ! 57: .sp 1in ! 58: .DE ! 59: will be written in twig as ! 60: .sp 1ex ! 61: .DS ! 62: .na ! 63: .nf ! 64: \f3oper(leftOp,rightOp)\f1 ! 65: .DE ! 66: .ad ! 67: .fi ! 68: .PP ! 69: There are two types of symbols: \f2node_id\f1s and \f2label_id\f1s. ! 70: \f2Node_id\f1s are used to denote internal nodes and leaves. ! 71: \f2Label_id\f1s label tree patterns and are analogous to ! 72: non-terminals in context free grammars. ! 73: For example, the following twig rules without their action parts describe ! 74: simple expression trees with the \f3plus\f1 operator. ! 75: .DS ! 76: \f3 ! 77: .na ! 78: .nf ! 79: expr: plus(expr, expr) ! 80: expr: identifier ! 81: expr: constant ! 82: .DE ! 83: .ad ! 84: .fi ! 85: \f3Identifier\f1 and \f3constant\f1 are \f2node_id\f1s ! 86: representing leaves, and ! 87: \f2plus\f1 is a \f2node_id\f1s representing an internal node whereas ! 88: \f3expr\f1 is a ! 89: \f2label_id\f1. Leaves of patterns that are \f2label_id\f1s are called ! 90: \f2labelled leaves\f1. ! 91: In the first rule, both leaves of the pattern are labelled. ! 92: .PP ! 93: Twig assigns an integer to each node symbol and label symbol. These ! 94: integers are used by the twig pattern matcher as encodings for the ! 95: symbols. ! 96: As the matcher traverses the ! 97: tree, a user supplied subroutine is called to provide an integer ! 98: corresponding to each node. ! 99: .SH ! 100: Costs ! 101: .PP ! 102: To increase the flexibility of representing costs, the tree matcher ! 103: views costs as an ADT. ! 104: For example, costs may be represented as an integer or ! 105: as a vector of integers with each element representing the cost of a specific ! 106: resource. ! 107: A cost ADT suitable for Twig must have the following four definitions: ! 108: .IP ! 109: \(bu \f3COST\f1 is a C type. ! 110: Although the proper functioning of the tree matcher ! 111: does not depend on the details of the \f3COST\f1 type, it must have ! 112: the type information for storage allocation purposes. ! 113: .IP ! 114: \(bu ! 115: \f3INFINITY\f1 is the maximum attainable cost value. The matcher uses ! 116: \f3INFINITY\f1 to initialize its data structures. ! 117: .IP ! 118: \(bu ! 119: \f3DEFAULT_COST\f1 is the cost ! 120: value returned by rules without a cost part. ! 121: .IP ! 122: \(bu ! 123: \f3COSTLESS\f1 is a function of ! 124: two cost values. It must return true if and only if the first argument is ! 125: less than the second. ! 126: .sp 2ex ! 127: .SH ! 128: Trees ! 129: .PP ! 130: As with costs, Twig treats trees as an ADT. Here, using an ADT is ! 131: even more important because trees come in a variety of shapes and ! 132: representations. ! 133: Twig would be overly complicated if it had to ! 134: know any more than the fundamental properties of trees. Thus, ! 135: twig treats trees as an acyclic directed ! 136: graph of almost featureless nodes with one ! 137: distinguished node, the root. Each node has only one feature and that ! 138: is an integer corresponding to the \f2node_id\f1s discussed above. ! 139: To provide this ! 140: view to Twig, the user must provide the following definitions and ! 141: subroutines. ! 142: .IP ! 143: \(bu \f3NODE\f1 is the type of a node. The actual ! 144: details of ! 145: the NODE are irrelevant and Twig uses this definition only for storage ! 146: allocation and declaration purposes. ! 147: .IP ! 148: \(bu \f3NONODE\f1 is a constant of type \f3NODE\f1. It is a distinguished ! 149: value returned by the routines defined below to represent a null value. ! 150: .IP ! 151: \(bu ! 152: \f3mtGetNodes#(r,n)# returns the #n#th child of \f2r\f1 where \f2r\f1 ! 153: is a \f3NODE\f1. The ! 154: leftmost child is \f3mtGetNodes\f2(r,1)\f1. If #n>degree(r)# ! 155: then the function should return \f3NONODE\f1. Twig expects the ! 156: expression \f3mtGetNodes\f2(\f3NONODE\f2, 0)\f1 ! 157: to denote the root node of ! 158: the subject tree. ! 159: .IP ! 160: \(bu ! 161: \f3mtValue\f2(r)\f1 returns the integer associated with \f2r\f1. ! 162: .IP ! 163: \(bu ! 164: \f3mtSetNodes\f2(r,n,c)\f1 replaces the \f2n\f1th child of \f2r\f1 with ! 165: \f2c\f1. This routine may be used to replace whole subtrees with others. ! 166: .sp 2ex ! 167: .SH ! 168: Pattern Matcher Operation ! 169: .PP ! 170: The pattern matcher operates in two phases: the costing phase and the ! 171: execution phase. ! 172: .PP ! 173: In the costing phase, the matcher performs a preorder traversal of ! 174: a subject tree and discovers matches from the bottom up. At the same time, it ! 175: builds a skeleton tree that is structurally isomorphic to the subject tree. ! 176: When a match is discovered the cost clause of the pattern is invoked to ! 177: calculate the cost. Many patterns with different labels could match at any ! 178: given node but only the lowest cost pattern for each label is recorded in ! 179: the skeleton. ! 180: .PP ! 181: When a pattern is matched, its label is then used as input to the pattern ! 182: matcher so that matching of patterns with labelled leaves can begin. ! 183: This ! 184: process ! 185: is analogous to a reduce action in ! 186: bottom up parsers. In twig, multiple reductions are ! 187: tracked. ! 188: .PP ! 189: The cost part of a rule can also assign a \f2mode\f1 to a match. The ! 190: \f2mode\f1 controls how action parts are to be executed. ! 191: Actions parts of \f2defer\f1 mode matches are not executed until the ! 192: execution phase. \f2Rewrite\f1 mode matches cause the action part to ! 193: be executed immediately during the costing phase to transform the ! 194: matched subtree. Once the rewrite mode action is executed, the costing ! 195: phase continues by purging all information in the skeleton for the old ! 196: subtree and then scanning the newly transformed subtree. ! 197: \f2Rewrite\f1 actions are similar ! 198: to actions that are invoked on a reduction in ! 199: LR parsing. ! 200: .PP ! 201: In the case where multiple rewrite actions are possible the one with ! 202: the lowest cost will be executed. Furthermore, a rewrite action is ! 203: considered only if its cost is lower than all the defer mode actions. ! 204: Rewrite mode actions that are not executed are treated in the same ! 205: manner as defer mode actions in the execution phase. ! 206: .PP ! 207: When the root of the subject tree is reached in the preorder ! 208: traversal, the execution phase begins. The lowest cost match is then ! 209: chosen at the root and executed. In this context, execution involves ! 210: executing the actions of the labelled leaves before invoking the code ! 211: in the current action. ! 212: .SH ! 213: Lexical Issues and Conventions ! 214: .PP ! 215: Currently the ! 216: following are keywords of twig in no particular order: ! 217: .DS ! 218: .TS ! 219: center; ! 220: l l l. ! 221: label node cost ! 222: action prologue insert ! 223: .TE ! 224: .DE ! 225: They have special meanings and may not be used as ! 226: identifiers. ! 227: ;:(),= are special characters. All blanks, tabs, formfeeds and ! 228: newlines are ignored by twig but they may not appear inside ! 229: identifiers and numbers. Identifiers are nonempty strings ! 230: made of letters, digits or underscores and starting with a letter. ! 231: Numbers are nonempty string of digits. Code fragments or action parts ! 232: are enclosed in braces. Inside code fragments, C lexical rules must ! 233: be obeyed except that strings of the form $...$ that are not inside C ! 234: strings are interpreted by twig. In the following sections, $id$ ! 235: denotes an identifier; \f2int1\f1, \f2int2\f1 denotes numbers; \f2ccode\f1 ! 236: denotes C code fragments; ... indicates repetition of the previous ! 237: item; [...] indicates that ... is optional. ! 238: .PP ! 239: The input to the Twig program will be referred to as the subject tree. ! 240: .sp 2ex ! 241: .SH ! 242: Prologue and Inserts ! 243: \f1 ! 244: .DS ! 245: \f3prologue \f2ccode\f1; ! 246: .DE ! 247: signals to twig that \f2ccode\f1 should be inserted at the ! 248: beginning of the output source file. \f2Ccode\f1 should contain ! 249: definitions relevant to the C code in ! 250: rules that are defined later in the twig source file. ! 251: .DS ! 252: \f3insert \f2ccode\f1; ! 253: .DE ! 254: should cause \f2ccode\f1 to be placed into the source file. ! 255: There can be multiple inserts and they will be placed into the source ! 256: file ! 257: in the order that they appear. ! 258: .SH ! 259: Declarations ! 260: .PP ! 261: All twig identifiers are declared before they are used. ! 262: .DS ! 263: \f3node\f1 \f2id\f1[(\f2int1\f1)] [= \f2int2\f1] ...; ! 264: .DE ! 265: A node declaration declares one or more identifier to be ! 266: associated with nodes of the subject tree. The identifiers are ! 267: assigned numbers by twig but the user can overide the assigned number ! 268: by specifying \f2int2\f1. The degree of the node identifer, i.e. the ! 269: number of children, is assumed to be fixed. Normally, twig will ! 270: deduce the degree when \f2id\f1 is used in a rule. However, the user may ! 271: explicitly give the degree by specifying \f2int1\f1. ! 272: .DS ! 273: \f3label\f2 id\f1 ...; ! 274: .DE ! 275: A label declaration indicates that the following $id$'s are to be used ! 276: as labels of rules. ! 277: .sp 2ex ! 278: .SH ! 279: Costs and Action code ! 280: .PP ! 281: Code fragments such as the \f2cost\f1 and \f2action\f1 clauses of a rule are ! 282: specfied by enclosing C code in braces. All legal C constructs are ! 283: permitted in code fragments. In addition, the following are provided ! 284: for convenient access to the subject tree and internal data ! 285: structures of the matcher. ! 286: .sp 1ex ! 287: .IP ! 288: \(bu $%\f2n\f1$ denotes a pointer value to the matcher data ! 289: structure for the $n$th labelled leaf. Thus to access the cost value ! 290: associated with that leaf, the notation $%\f2n\f1$\(->cost may be used. ! 291: .IP ! 292: \(bu ! 293: $$ denotes a pointer value to the root of the subject ! 294: tree. ! 295: .IP ! 296: \(bu ! 297: $#{n sub 1}.{n sub 2}.{n sub 3}...{n sub {k-1}}.{n sub k}#$ denotes ! 298: a pointer value to the ${n sub k}$th child of the $#{n sub {k-1}}#th$ ! 299: child of the ! 300: $#{n sub {k-2}}#$th child of ... of the $#{n sub 1 }#$th child of the root of ! 301: the subject tree. Each $#{n sub i}#$ is a positive non-zero integer. ! 302: .PP ! 303: Some special constructs are available to code fragments in the cost ! 304: part of the specification. The ! 305: statement "\f3ABORT;\f1" when encountered during the execution of ! 306: the cost code, signals to the matcher that this pattern is to be ! 307: rejected. When a "\f3REWRITE;\f1" statement is encountered, control ! 308: is returned to the matcher and the rule will become a \f2rewrite\f1 ! 309: mode match. When the end of the cost code fragment is reached, ! 310: control is returned to the matcher and the rule becomes a \f2defer\f1 ! 311: mode match. Cost values are returned to the matcher by assigning to ! 312: a variable named ! 313: "\f3cost\f1" in the cost clause. If no assignment is ! 314: made to \f3cost\f1 then the returned cost will be ! 315: \f3DEFAULT_COST\f1. ! 316: .PP ! 317: Action clauses are expected to return a new tree which will replace ! 318: the subject tree. This is done by returning using the standard C ! 319: "\f3return\f2(new_tree);\f1" statement ! 320: where \f2new_tree\f1 is of type ! 321: \f3NODEPTR\f1. When the end of the action clause is encountered, the ! 322: matcher resumes execution and the subject tree is not replaced. ! 323: .bp ! 324: .SH ! 325: An Example ! 326: .PP ! 327: The following are examples of twig rules that generate VAX code for ! 328: the subtract instruction: ! 329: .DS ! 330: .na ! 331: .nf ! 332: prologue { ! 333: NODE gettermp(); ! 334: }; ! 335: node long constant sub; ! 336: label operand temp; ! 337: ! 338: operand: long; /* rule 1 */ ! 339: operand: constant; /* rule 2 */ ! 340: operand: temp; /* rule 3 */ ! 341: temp: operand; /* rule 4 */ ! 342: { cost = TEMP_COST+$%1$\(->cost; } ! 343: = { ! 344: NODE t = gettemp(); ! 345: emit("mov", $$, t, 0); ! 346: return(t); ! 347: }; ! 348: ! 349: operand: sub(operand,opernad) /* rule 5 */ ! 350: { cost = $%1$\(->cost-$%2$\(->cost+SUB_COST; } ! 351: = { ! 352: NODE t = gettemp(); ! 353: emit("sub", $1$, $2$, t, 0); ! 354: return(t); ! 355: }; ! 356: ! 357: temp: sub(temp,constant) /* rule 6 $*/ ! 358: { ! 359: if(value($2$)==1) ! 360: cost = $%1$\(->cost+DEC_COST; ! 361: else ABORT; ! 362: } = { ! 363: emit("dec", $1$, 0); ! 364: return($1$); ! 365: }; ! 366: .DE ! 367: .IP ! 368: \(bu ! 369: Rules 3 and 4 form a loop. ! 370: The potential loop: temp\(->operand\(->temp\(->operand... is ! 371: broken ! 372: by the matcher recognizing that the cost of the second match of temp ! 373: does not cost less than the first match of temp. ! 374: .IP ! 375: \(bu In the cost clause of rule 5, the cost is the sum of the ! 376: leaves plus the cost of the subtract instruction. ! 377: The action clause emits code to add the two operands ! 378: and leave the result in a temporary location. The temporary is ! 379: returned as a substitution for the subject tree. ! 380: .IP ! 381: \(bu Rule 6 handles a special case where the left operand is ! 382: already in a temporary and the constant is one. In this case, the ! 383: temporary is directly decremented and returned as the new tree. ! 384: .bp ! 385: .DS C ! 386: \f3Experimental Results\f1 ! 387: .DE ! 388: .sp 2ex ! 389: .PP ! 390: A code generator for the VAX has been build using Twig and incorporated ! 391: into the pcc2 compiler. The following table summarizes the results ! 392: for three benchmark programs. ! 393: The compiler using the Twig code generator is abbreviated \f2tcc\f1. ! 394: The times were obtained on a VAX-11/780 ! 395: running 4.2bsd UNIX. ! 396: The first number is the time spent in the compiler. ! 397: The parenthesized number is the time spent in the operating system. ! 398: .TS ! 399: center; ! 400: c c c c c ! 401: l n n n n. ! 402: lines tcc pcc2 improvement ! 403: _ ! 404: grep.c 458 9.9(1.4) 11.4(0.9) 13% ! 405: pmach.c 610 13.2(1.5) 13.5(0.6) 2% ! 406: reader.c 1005 23.3(1.6) 31.2(0.9) 25% ! 407: .TE ! 408: .PP ! 409: The running time for tcc is slightly ! 410: faster than that of pcc2. ! 411: However, The higher system times of tcc indicates that tcc place ! 412: higher demands on the operating system. ! 413: In the case of pmach.c, the total CPU time is actually more than that ! 414: of pcc2's. ! 415: .PP ! 416: Twig requires more dynamic storage than pcc2. For a tree, the dynamic ! 417: storage requirement on the average is number of nodes * 200 ! 418: bytes ! 419: while the worst case is about number of nodes * 1000 bytes. ! 420: However this storage is reclaimed and reused for every new tree. ! 421: The larger system time ! 422: is caused by calls to the runtime system for dynamic storage. ! 423: .PP ! 424: Writing and modifying tcc was quick. ! 425: The basic ! 426: VAX without indexed addressing mode can be described in about two weeks. ! 427: This was done concurrently with debugging much of the table ! 428: walker. Adding indexed modes took ! 429: only several hours once we were confident that the rest of the ! 430: description was correct. ! 431: .PP ! 432: One reason for the ease of modifcation is that rules can be added ! 433: independent of the other rules. Looping can be completely ignored since ! 434: dynamic programming eliminates that possiblity. ! 435: Table generation was extremely fast or about 4.2 seconds for the ! 436: VAX description. Thus testing the description could be done quickly. ! 437: .PP ! 438: The tables were very small. For the VAX, there were only 109 rules. ! 439: This required 7.5K bytes of table space. ! 440: The whole code generator is another 40K ! 441: bytes. ! 442: .SH ! 443: Code Quality ! 444: .PP ! 445: The code generated by the Twig for the test programs were of the ! 446: same quality as code generated by pcc2. ! 447: However, Twig does better targeting. ! 448: .DS ! 449: tcc: ! 450: .ti +4em ! 451: addl3 -4(fp),-8(fp),-12(fp) ! 452: .sp 1ex ! 453: pcc2: ! 454: .ti +4em ! 455: addl3 -4(fp),-8(fp),r0 ! 456: .ti +4em ! 457: movl r0,-12(fp) ! 458: .DE ! 459: On the other hand, pcc2 handled temporary registers better ! 460: and for an equivalent sequence of code, Twig may use ! 461: more temporary registers. ! 462: .DS ! 463: tcc: ! 464: .ti +4em ! 465: cvtbl (r0),r1 ! 466: pcc2: ! 467: .ti +4em ! 468: cvtbl (r0),r0 ! 469: .DE ! 470: This is actually a symptom of how temporary registers are allocated in tcc. ! 471: Registers are allocated before code is emitted and thus any registers freed ! 472: during code emissions will not be reused until the next code sequence. ! 473: .SH ! 474: Room for improvement ! 475: .PP ! 476: Currently Twig is slow compared to other table-driven systems such as ! 477: [Henry] and [Ganapathi]. However, we have had less than one year of ! 478: experience with Twig and we believe that its performance can be ! 479: improved because ! 480: .IP ! 481: \(bu Unit productions such as ! 482: .DS ! 483: operand: temp ! 484: .DE ! 485: in the above example increases the running time of Twig ! 486: since the closure operation for unit production is done at run-time. ! 487: .IP ! 488: \(bu Machine representations. The current compact ! 489: representation of the machine require linear searching except at the ! 490: start state where the large branching factor compelled us to use an ! 491: array scheme. Using another representation would speed up the ! 492: matcher. For example, implementing the VAX description as an array is ! 493: estimated to use about 50K bytes but would provide a significant ! 494: performance improvement. ! 495: .IP ! 496: \(bu Twig performs many more procedure calls than pcc2. ! 497: Every machine step requires at least one procedure call. Contrast ! 498: this to YACC where a machine step is performed in line. Procedure ! 499: calls on the VAX are expensive.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.