|
|
1.1 ! root 1: \magnification=1200 ! 2: \centerline{\bf Twig: A Brief Description} ! 3: \bigskip ! 4: Twig is a language for processing trees. A twig program consists of a set ! 5: of pattern-action rules together with associated declarations. Patterns ! 6: describe trees to be matched. Actions calculate costs, perform tree ! 7: manipulations and other functions such as emitting code. ! 8: A twig program is translated by the twig ! 9: preprocessor into subroutines and tables in the host language which is C in ! 10: the current implementation. ! 11: ! 12: There are many ways to represent trees and costs. To give the user more ! 13: choice in representation, twig treats them as abstract data ! 14: types (ADT). Its manipulation of trees and costs are strictly performed ! 15: through calls to subroutines provided by the user. ! 16: \medskip ! 17: \line{\bf Rules\hfil} ! 18: The fundamental unit of a twig program is a rule. ! 19: \medskip ! 20: \line{\hfil\vbox{\hsize=3.5truein ! 21: \noindent$label\_id$ {\bf :} $tree\_pattern$\quad {\bf [} $cost$ {\bf ]} ! 22: {\bf [=} $action$ {\bf ]}}\hfil} ! 23: \medskip ! 24: Intuitively, the pattern is used to match a subtree. The $cost$ code ! 25: fragment is then evaluated. The resulting cost is recorded by ! 26: the matcher for use in dynamic programming. The $action$ is executed ! 27: if the rule is part of the least cost cover of the tree. ! 28: ! 29: If the $cost$ part is missing, Twig will insert default code ! 30: that returns the special value, {\tt DEFAULT\_COST}. ! 31: A missing $action$ part indicates that nothing will be done when a ! 32: match is found. ! 33: \medskip ! 34: \line{\bf Tree Patterns\hfil} ! 35: Tree patterns are specified ! 36: in prefix parenthesized form and can be described by the ! 37: following BNF: ! 38: \medskip ! 39: \line{\hfil\vbox{\hsize=3.5truein\obeylines\parindent=0pt\parskip=0pt ! 40: $tree\_pattern\to node\_id\;\vert\;label\_id$ ! 41: $tree\_pattern\to node\_id \;{\bf (}\; tree\_list \;{\bf )}$ ! 42: $tree\_list \to tree \;{\bf ,}\; tree\_list \;\vert\; tree$ ! 43: }\hfil} ! 44: ! 45: For example, the tree, ! 46: \midinsert ! 47: \vskip 1.0truein ! 48: \endinsert ! 49: \noindent will be written in twig as ! 50: \medskip ! 51: \line{\hfil\vbox{\hsize=3.5truein\tt oper(leftOp, rightOp)}\hfil} ! 52: \medskip ! 53: There are two types of symbols: $node\_id$\/s and $label\_id$\/s ! 54: $Node\_id$\/s are used to denote internal nodes and leaves. ! 55: $Label\_id$\/s label tree patterns and are analogous to ! 56: non-terminals in context free grammars. ! 57: For example, the following twig rules without their action parts describe ! 58: simple expression trees with the {\tt plus} operator. ! 59: \medskip ! 60: \line{\hfil\vbox{ ! 61: \tt\noindent\halign{ #:\quad & #\cr ! 62: expr & plus(expr, expr)\cr ! 63: expr & identifier\cr ! 64: expr & constant\cr ! 65: }}\hfil} ! 66: \medskip ! 67: \noindent Here, {\tt identifier} and {\tt constant} are $node\_id$\/s ! 68: representing leaves, and ! 69: {\tt plus} is a $node\_id$\/s representing an internal node whereas ! 70: {\tt expr} is a ! 71: $label\_id$. Leaves of patterns that are $label\_id$\/s are called ! 72: {\sl labelled leaves}. ! 73: In the first rule, both leaves of the pattern are labelled. ! 74: ! 75: Twig assigns an integer to each node symbol and label symbol. These ! 76: integers are used by the twig pattern matcher as encodings for the ! 77: symbols. ! 78: As the matcher traverses the ! 79: tree, a user supplied subroutine is called to provide an integer ! 80: corresponding to each node. ! 81: \medskip ! 82: \line{\bf Costs\hfil} ! 83: To increase the flexibility of representing costs, the tree matcher ! 84: views costs as an ADT. ! 85: For example, costs may be represented as an integer or ! 86: as a vector of integers with each element representing the cost of a specific ! 87: resource. ! 88: A cost ADT suitable for Twig must have the following four definitions: ! 89: \medskip ! 90: \noindent$\bullet$ {\tt COST} is a C type. ! 91: Although the proper functioning of the tree matcher ! 92: does not depend on the details of the {\tt COST} type, it must have ! 93: the type information for storage allocation purposes. ! 94: \smallskip ! 95: \noindent$\bullet$ ! 96: {\tt INFINITY} is the maximum attainable cost value. The matcher uses {\tt ! 97: INFINITY} to initialize its data structures. ! 98: \smallskip\noindent$\bullet$ ! 99: {\tt DEFAULT\_COST} is the cost ! 100: value returned by rules without a cost part. ! 101: \smallskip\noindent$\bullet$ ! 102: {\tt COSTLESS} is a function of ! 103: two cost values. It must return true if and only if the first argument is ! 104: less than the second. ! 105: \medskip ! 106: \line{\bf Trees\hfil} ! 107: ! 108: As with costs, Twig treats trees as an ADT. Here, using an ADT is ! 109: even more important because trees come in a variety of shapes and ! 110: representations. ! 111: Twig would be overly complicated if it had to ! 112: know any more than the fundamental properties of trees. Thus, ! 113: twig treats trees as an acyclic directed ! 114: graph of almost featureless nodes with one ! 115: distinguished node, the root. Each node has only one feature and that ! 116: is an integer corresponding to the $node\_id$\/s discussed above. ! 117: To provide this ! 118: view to Twig, the user must provide the following definitions and ! 119: subroutines. ! 120: \medskip ! 121: \noindent$\bullet$ {\tt NODE} is the type of a node. The actual ! 122: details of ! 123: the NODE are irrelevant and Twig uses this definition only for storage ! 124: allocation and declaration purposes. ! 125: \smallskip ! 126: \noindent$\bullet$ {\tt NONODE} is a constant of type ! 127: {\tt NODE}. It is a distinguished ! 128: value returned by the routines defined below to represent a null value. ! 129: \smallskip ! 130: \noindent$\bullet$ ! 131: {\tt mtGetNodes$(r,n)$} returns the $n$th child of $r$ where $r$ ! 132: is a {\tt NODE}. The ! 133: leftmost child is {\tt mtGetNodes}$(r,1)$. If $n > \hbox{degree}(r)$ ! 134: then the function should return {\tt NONODE}. Twig expects the ! 135: expression {\tt mtGetNodes}$({\tt NONODE}, 0)$ ! 136: to denote the root node of ! 137: the subject tree. ! 138: \smallskip ! 139: \noindent$\bullet$ ! 140: {\tt mtValue$(r)$} returns the integer associated with $r$. ! 141: \smallskip ! 142: \noindent$\bullet$ ! 143: {\tt mtSetNodes$(r,n,c)$} replaces the $n$th child of $r$ with ! 144: $c$. This routine may be used to replace whole subtrees with others. ! 145: \medskip ! 146: \line{ \bf Pattern Matcher Operation \hfil} ! 147: ! 148: The pattern matcher operates in two phases: the costing phase and the execution ! 149: phase. ! 150: ! 151: In the costing phase, the matcher performs a preorder traversal of ! 152: a subject tree and discovers matches from the bottom up. At the same time, it ! 153: builds a skeleton tree that is structurally isomorphic to the subject tree. ! 154: When a match is discovered the cost clause of the pattern is invoked to ! 155: calculate the cost. Many patterns with different labels could match at any ! 156: given node but only the lowest cost pattern for each label is recorded in ! 157: the skeleton. ! 158: ! 159: When a pattern is matched, its label is then used as input to the pattern ! 160: matcher so that matching of patterns with labelled leaves can begin. ! 161: This ! 162: process ! 163: is analogous to a reduce action in ! 164: bottom up parsers. In twig, multiple reductions are ! 165: tracked. ! 166: ! 167: The cost part of a rule can also assign a {\sl mode} to a match. The {\sl ! 168: mode} controls how action parts are to be executed. ! 169: Actions parts of {\sl defer} mode matches are not executed until the ! 170: execution phase. {\sl Rewrite} mode matches cause the action part to ! 171: be executed immediately during the costing phase to transform the ! 172: matched subtree. Once the rewrite mode action is executed, the costing ! 173: phase continues by purging all information in the skeleton for the old ! 174: subtree and then scanning the newly transformed subtree. {\sl ! 175: Rewrite} actions are similar to actions that are invoked on a reduction in ! 176: LR parsing. ! 177: ! 178: In the case where multiple rewrite actions are possible the one with ! 179: the lowest cost will be executed. Furthermore, a rewrite action is ! 180: considered only if its cost is lower than all the defer mode actions. ! 181: Rewrite mode actions that are not executed are treated in the same ! 182: manner as defer mode actions in the execution phase. ! 183: ! 184: When the root of the subject tree is reached in the preorder ! 185: traversal, the execution phase begins. The lowest cost match is then ! 186: chosen at the root and executed. In this context, execution involves ! 187: executing the actions of the labelled leaves before invoking the code ! 188: in the current action. ! 189: \medskip ! 190: \line{\bf Lexical Issues and Conventions\hfil} ! 191: Currently the ! 192: following are keywords of twig in no particular order: ! 193: \medskip ! 194: \line{\hfil\vbox{\hsize=3.5truein\tt ! 195: \halign{ # \hfil & # \hfil & # \hfil \cr ! 196: label & node & cost \cr ! 197: action & prologue & insert\cr ! 198: }}\hfil} ! 199: \medskip ! 200: \noindent They have special meanings and may not be used as ! 201: identifiers. ! 202: ;:(),= are special characters. All blanks, tabs, formfeeds and ! 203: newlines are ignored by twig but they may not appear inside ! 204: identifiers and numbers. Identifiers are nonempty strings ! 205: made of letters, digits or underscores and starting with a letter. ! 206: Numbers are nonempty string of digits. Code fragments or action parts ! 207: are enclosed in braces. Inside code fragments, C lexical rules must ! 208: be obeyed except that strings of the form $\$...\$$ that are not inside C ! 209: strings are interpreted by twig. In the following sections, $id$ ! 210: denotes an identifier; $int_1$, $int_2$ denotes numbers; $ccode$ ! 211: denotes C code fragments; $...$ indicates repetition of the previous ! 212: item; $[...]$ indicates that $...$ is optional. ! 213: ! 214: The input to the Twig program will be referred to as the subject tree. ! 215: \medskip ! 216: \line{\bf Prologue and Inserts\hfil} ! 217: ! 218: \noindent\line{\hfil \vbox{\hsize=3.5truein \tt prologue $ccode$;}\hfil} ! 219: \medskip ! 220: \noindent signals to twig that $ccode$ should be inserted at the ! 221: beginning of the output source file. $Ccode$ should contain ! 222: definitions relevant to the C code in ! 223: rules that are defined later in the twig source file. ! 224: \medskip ! 225: \noindent\line{\hfil\tt\vbox{\hsize=3.5truein insert $ccode$;}\hfil} ! 226: \medskip ! 227: \noindent should cause $ccode$ to be placed into the source file. ! 228: There can be multiple inserts and they will be placed into the source ! 229: file ! 230: in the order that they appear. ! 231: \medskip ! 232: \line{\bf Declarations\hfil} ! 233: All twig identifiers are declared before they are used. ! 234: \medskip ! 235: \line{\hfil \tt node $id[(int_1)] [= int_2] ...$;\hfil} ! 236: \medskip ! 237: \noindent A node declaration declares one or more identifier to be ! 238: associated with nodes of the subject tree. The identifiers are ! 239: assigned numbers by twig but the user can overide the assigned number ! 240: by specifying $int_2$. The degree of the node identifer, i.e. the ! 241: number of children, is assumed to be fixed. Normally, twig will ! 242: deduce the degree when $id$ is used in a rule. However, the user may ! 243: explicitly give the degree by specifying $int_1$. ! 244: \medskip ! 245: \line{\hfil\vbox{\hsize=3.5truein\tt label $id ...$;}\hfil} ! 246: \medskip ! 247: A label declaration indicates that the following $id$'s are to be used ! 248: as labels of rules. ! 249: \medskip\line{\bf Costs and Action code\hfil} ! 250: ! 251: Code fragments such as the $cost$ and $action$ clauses of a rule are ! 252: specfied by enclosing C code in braces. All legal C constructs are ! 253: permitted in code fragments. In addition, the following are provided ! 254: for convenient access to the subject tree and internal data ! 255: structures of the matcher. ! 256: \smallskip ! 257: {\parindent=0pt ! 258: $\bullet$ $\$\%n\$$ denotes a pointer value to the matcher data ! 259: structure for the $n$th labelled leaf. Thus to access the cost value ! 260: associated with that leaf, the notation $\$\%n\$\to${\tt cost} may be used. ! 261: ! 262: $\bullet$ $\$\$$ denotes a pointer value to the root of the subject ! 263: tree. ! 264: ! 265: $\bullet$ $\$n_1.n_2.n_3.\, \hbox{...}\,.n_{k-1}.n_k\$$ denotes ! 266: a pointer value to the $n_k$th child of the $n_{k-1}th$ child of the ! 267: $n_{k-2}$th child of \hbox{...} of the $n_1$th child of the root of ! 268: the subject tree. Each $n_i$ is a positive non-zero integer. ! 269: } ! 270: \medskip ! 271: ! 272: Some special constructs are available to code fragments in the cost ! 273: part of the specification. The ! 274: statement ``{\tt ABORT;}'' when encountered during the execution of ! 275: the cost code, signals to the matcher that this pattern is to be ! 276: rejected. When a ``{\tt REWRITE;}'' statement is encountered, control ! 277: is returned to the matcher and the rule will become a {\sl rewrite} ! 278: mode match. When the end of the cost code fragment is reached, ! 279: control is returned to the matcher and the rule becomes a {\sl defer} ! 280: mode match. Cost values are returned to the matcher by assigning to ! 281: the ``{\tt cost}'' variable in the cost clause. If no assignment is ! 282: made to the {\tt cost} variable then the returned cost will be {\tt ! 283: DEFAULT\_COST}. ! 284: ! 285: Action clauses are expected to return a new tree which will replace ! 286: the subject tree. This is done by returning using the standard C ! 287: ``{\tt return($new\_tree$); }'' statement ! 288: where {\tt $new\_tree$} is of type ! 289: {\tt NODEPTR}. When the end of the action clause is encountered, the ! 290: matcher resumes execution and the subject tree is not replaced. ! 291: \vfil\eject ! 292: \line{\bf An Example\hfil} ! 293: ! 294: The following are examples of twig rules that generate VAX code for ! 295: the subtract instruction: ! 296: \midinsert ! 297: \moveright 1truein \vbox{\settabs 9 \columns ! 298: \+prologue & $\{$\cr ! 299: \smallskip ! 300: \+NODE & gettemp();\cr ! 301: \smallskip ! 302: \+$\}$;\cr\smallskip ! 303: \+node&long constant sub;\cr ! 304: \+label&operand temp;\cr ! 305: \smallskip ! 306: \+operand:&long;&&&$/*$ rule 1 $*/$\cr ! 307: \+operand:&constant;&&&$/*$ rule 2 $*/$\cr ! 308: \+operand:&temp;&&&$/*$ rule 3 $*/$\cr ! 309: \smallskip ! 310: \+temp:&operand&&&$/*$ rule 4 $*/$\cr ! 311: \+&$\{$ cost = {\tt TEMP\_COST}+$\$\%1\$\to$cost;$\}$\cr ! 312: \+&$=\{$\cr ! 313: \+&&NODE t = gettemp();\cr ! 314: \+&& emit(``mov'', $\$\$$, t, 0);\cr ! 315: \+&&return(t);\cr ! 316: \+&$\}$;\cr ! 317: \smallskip ! 318: \+operand:&sub(operand,operand)&&&$/*$ rule 5 $*/$\cr ! 319: \+&$\{$cost = $\$\%1\$\to$cost + $\$\%2\$\to$cost+{\tt SUB\_COST};$\}$\cr ! 320: \+&=$\{$\cr ! 321: \+&&NODE t = gettemp();\cr ! 322: \+&&emit(``sub'', $\$1\$$, $\$2\$$, t, 0);\cr ! 323: \+&&return(t);\cr ! 324: \+&$\}$;\cr ! 325: \smallskip ! 326: \+temp:&sub(temp,constant)&&&$/*$ rule 6 $*/$\cr ! 327: \+&$\{$\cr ! 328: \+&&if(value($\$2\$$)==1)\cr ! 329: \+&&&cost = $\$\%1\$\to$cost+{\tt DEC\_COST};\cr ! 330: \+&&else ABORT;\cr ! 331: \+&$\}$=$\{$\cr ! 332: \+&&emit(``dec'', $\$1\$$, 0);\cr ! 333: \+&&return($\$1\$$);\cr ! 334: \+&$\}$;\cr ! 335: } ! 336: \endinsert ! 337: {\parindent=0pt ! 338: $\bullet$ Rules 3 and 4 form a loop. ! 339: The potential loop: temp$\to$operand$\to$temp$\to$operand$...$ is ! 340: broken ! 341: by the matcher recognizing that the cost of the second match of temp ! 342: does not cost less than the first match of temp. ! 343: ! 344: $\bullet$ Rule 4 localizes the testing of whether an operand is a ! 345: temporary. It is not necessary but makes the coding of other ! 346: cost clauses less tedious. ! 347: ! 348: $\bullet$ In the cost clause of rule 5, the cost is the sum of the ! 349: leaves plus the cost of the subtract instruction. ! 350: The action clause emits code to add the two operands ! 351: and leave the result in a temporary location. The temporary is ! 352: returned as a substitution for the subject tree. ! 353: ! 354: $\bullet$ Rule 6 handles a special case where the left operand is ! 355: already in a temporary and the constant is one. In this case, the ! 356: temporary is directly decremented and returned as the new tree. ! 357: }\vfil\eject ! 358: \end
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.