|
|
1.1 root 1: \magnification=1200
2: \centerline{\bf Twig: A Brief Description}
3: \bigskip
4: Twig is a language for processing trees. A twig program consists of a set
5: of pattern-action rules together with associated declarations. Patterns
6: describe trees to be matched. Actions calculate costs, perform tree
7: manipulations and other functions such as emitting code.
8: A twig program is translated by the twig
9: preprocessor into subroutines and tables in the host language which is C in
10: the current implementation.
11:
12: There are many ways to represent trees and costs. To give the user more
13: choice in representation, twig treats them as abstract data
14: types (ADT). Its manipulation of trees and costs are strictly performed
15: through calls to subroutines provided by the user.
16: \medskip
17: \line{\bf Rules\hfil}
18: The fundamental unit of a twig program is a rule.
19: \medskip
20: \line{\hfil\vbox{\hsize=3.5truein
21: \noindent$label\_id$ {\bf :} $tree\_pattern$\quad {\bf [} $cost$ {\bf ]}
22: {\bf [=} $action$ {\bf ]}}\hfil}
23: \medskip
24: Intuitively, the pattern is used to match a subtree. The $cost$ code
25: fragment is then evaluated. The resulting cost is recorded by
26: the matcher for use in dynamic programming. The $action$ is executed
27: if the rule is part of the least cost cover of the tree.
28:
29: If the $cost$ part is missing, Twig will insert default code
30: that returns the special value, {\tt DEFAULT\_COST}.
31: A missing $action$ part indicates that nothing will be done when a
32: match is found.
33: \medskip
34: \line{\bf Tree Patterns\hfil}
35: Tree patterns are specified
36: in prefix parenthesized form and can be described by the
37: following BNF:
38: \medskip
39: \line{\hfil\vbox{\hsize=3.5truein\obeylines\parindent=0pt\parskip=0pt
40: $tree\_pattern\to node\_id\;\vert\;label\_id$
41: $tree\_pattern\to node\_id \;{\bf (}\; tree\_list \;{\bf )}$
42: $tree\_list \to tree \;{\bf ,}\; tree\_list \;\vert\; tree$
43: }\hfil}
44:
45: For example, the tree,
46: \midinsert
47: \vskip 1.0truein
48: \endinsert
49: \noindent will be written in twig as
50: \medskip
51: \line{\hfil\vbox{\hsize=3.5truein\tt oper(leftOp, rightOp)}\hfil}
52: \medskip
53: There are two types of symbols: $node\_id$\/s and $label\_id$\/s
54: $Node\_id$\/s are used to denote internal nodes and leaves.
55: $Label\_id$\/s label tree patterns and are analogous to
56: non-terminals in context free grammars.
57: For example, the following twig rules without their action parts describe
58: simple expression trees with the {\tt plus} operator.
59: \medskip
60: \line{\hfil\vbox{
61: \tt\noindent\halign{ #:\quad & #\cr
62: expr & plus(expr, expr)\cr
63: expr & identifier\cr
64: expr & constant\cr
65: }}\hfil}
66: \medskip
67: \noindent Here, {\tt identifier} and {\tt constant} are $node\_id$\/s
68: representing leaves, and
69: {\tt plus} is a $node\_id$\/s representing an internal node whereas
70: {\tt expr} is a
71: $label\_id$. Leaves of patterns that are $label\_id$\/s are called
72: {\sl labelled leaves}.
73: In the first rule, both leaves of the pattern are labelled.
74:
75: Twig assigns an integer to each node symbol and label symbol. These
76: integers are used by the twig pattern matcher as encodings for the
77: symbols.
78: As the matcher traverses the
79: tree, a user supplied subroutine is called to provide an integer
80: corresponding to each node.
81: \medskip
82: \line{\bf Costs\hfil}
83: To increase the flexibility of representing costs, the tree matcher
84: views costs as an ADT.
85: For example, costs may be represented as an integer or
86: as a vector of integers with each element representing the cost of a specific
87: resource.
88: A cost ADT suitable for Twig must have the following four definitions:
89: \medskip
90: \noindent$\bullet$ {\tt COST} is a C type.
91: Although the proper functioning of the tree matcher
92: does not depend on the details of the {\tt COST} type, it must have
93: the type information for storage allocation purposes.
94: \smallskip
95: \noindent$\bullet$
96: {\tt INFINITY} is the maximum attainable cost value. The matcher uses {\tt
97: INFINITY} to initialize its data structures.
98: \smallskip\noindent$\bullet$
99: {\tt DEFAULT\_COST} is the cost
100: value returned by rules without a cost part.
101: \smallskip\noindent$\bullet$
102: {\tt COSTLESS} is a function of
103: two cost values. It must return true if and only if the first argument is
104: less than the second.
105: \medskip
106: \line{\bf Trees\hfil}
107:
108: As with costs, Twig treats trees as an ADT. Here, using an ADT is
109: even more important because trees come in a variety of shapes and
110: representations.
111: Twig would be overly complicated if it had to
112: know any more than the fundamental properties of trees. Thus,
113: twig treats trees as an acyclic directed
114: graph of almost featureless nodes with one
115: distinguished node, the root. Each node has only one feature and that
116: is an integer corresponding to the $node\_id$\/s discussed above.
117: To provide this
118: view to Twig, the user must provide the following definitions and
119: subroutines.
120: \medskip
121: \noindent$\bullet$ {\tt NODE} is the type of a node. The actual
122: details of
123: the NODE are irrelevant and Twig uses this definition only for storage
124: allocation and declaration purposes.
125: \smallskip
126: \noindent$\bullet$ {\tt NONODE} is a constant of type
127: {\tt NODE}. It is a distinguished
128: value returned by the routines defined below to represent a null value.
129: \smallskip
130: \noindent$\bullet$
131: {\tt mtGetNodes$(r,n)$} returns the $n$th child of $r$ where $r$
132: is a {\tt NODE}. The
133: leftmost child is {\tt mtGetNodes}$(r,1)$. If $n > \hbox{degree}(r)$
134: then the function should return {\tt NONODE}. Twig expects the
135: expression {\tt mtGetNodes}$({\tt NONODE}, 0)$
136: to denote the root node of
137: the subject tree.
138: \smallskip
139: \noindent$\bullet$
140: {\tt mtValue$(r)$} returns the integer associated with $r$.
141: \smallskip
142: \noindent$\bullet$
143: {\tt mtSetNodes$(r,n,c)$} replaces the $n$th child of $r$ with
144: $c$. This routine may be used to replace whole subtrees with others.
145: \medskip
146: \line{ \bf Pattern Matcher Operation \hfil}
147:
148: The pattern matcher operates in two phases: the costing phase and the execution
149: phase.
150:
151: In the costing phase, the matcher performs a preorder traversal of
152: a subject tree and discovers matches from the bottom up. At the same time, it
153: builds a skeleton tree that is structurally isomorphic to the subject tree.
154: When a match is discovered the cost clause of the pattern is invoked to
155: calculate the cost. Many patterns with different labels could match at any
156: given node but only the lowest cost pattern for each label is recorded in
157: the skeleton.
158:
159: When a pattern is matched, its label is then used as input to the pattern
160: matcher so that matching of patterns with labelled leaves can begin.
161: This
162: process
163: is analogous to a reduce action in
164: bottom up parsers. In twig, multiple reductions are
165: tracked.
166:
167: The cost part of a rule can also assign a {\sl mode} to a match. The {\sl
168: mode} controls how action parts are to be executed.
169: Actions parts of {\sl defer} mode matches are not executed until the
170: execution phase. {\sl Rewrite} mode matches cause the action part to
171: be executed immediately during the costing phase to transform the
172: matched subtree. Once the rewrite mode action is executed, the costing
173: phase continues by purging all information in the skeleton for the old
174: subtree and then scanning the newly transformed subtree. {\sl
175: Rewrite} actions are similar to actions that are invoked on a reduction in
176: LR parsing.
177:
178: In the case where multiple rewrite actions are possible the one with
179: the lowest cost will be executed. Furthermore, a rewrite action is
180: considered only if its cost is lower than all the defer mode actions.
181: Rewrite mode actions that are not executed are treated in the same
182: manner as defer mode actions in the execution phase.
183:
184: When the root of the subject tree is reached in the preorder
185: traversal, the execution phase begins. The lowest cost match is then
186: chosen at the root and executed. In this context, execution involves
187: executing the actions of the labelled leaves before invoking the code
188: in the current action.
189: \medskip
190: \line{\bf Lexical Issues and Conventions\hfil}
191: Currently the
192: following are keywords of twig in no particular order:
193: \medskip
194: \line{\hfil\vbox{\hsize=3.5truein\tt
195: \halign{ # \hfil & # \hfil & # \hfil \cr
196: label & node & cost \cr
197: action & prologue & insert\cr
198: }}\hfil}
199: \medskip
200: \noindent They have special meanings and may not be used as
201: identifiers.
202: ;:(),= are special characters. All blanks, tabs, formfeeds and
203: newlines are ignored by twig but they may not appear inside
204: identifiers and numbers. Identifiers are nonempty strings
205: made of letters, digits or underscores and starting with a letter.
206: Numbers are nonempty string of digits. Code fragments or action parts
207: are enclosed in braces. Inside code fragments, C lexical rules must
208: be obeyed except that strings of the form $\$...\$$ that are not inside C
209: strings are interpreted by twig. In the following sections, $id$
210: denotes an identifier; $int_1$, $int_2$ denotes numbers; $ccode$
211: denotes C code fragments; $...$ indicates repetition of the previous
212: item; $[...]$ indicates that $...$ is optional.
213:
214: The input to the Twig program will be referred to as the subject tree.
215: \medskip
216: \line{\bf Prologue and Inserts\hfil}
217:
218: \noindent\line{\hfil \vbox{\hsize=3.5truein \tt prologue $ccode$;}\hfil}
219: \medskip
220: \noindent signals to twig that $ccode$ should be inserted at the
221: beginning of the output source file. $Ccode$ should contain
222: definitions relevant to the C code in
223: rules that are defined later in the twig source file.
224: \medskip
225: \noindent\line{\hfil\tt\vbox{\hsize=3.5truein insert $ccode$;}\hfil}
226: \medskip
227: \noindent should cause $ccode$ to be placed into the source file.
228: There can be multiple inserts and they will be placed into the source
229: file
230: in the order that they appear.
231: \medskip
232: \line{\bf Declarations\hfil}
233: All twig identifiers are declared before they are used.
234: \medskip
235: \line{\hfil \tt node $id[(int_1)] [= int_2] ...$;\hfil}
236: \medskip
237: \noindent A node declaration declares one or more identifier to be
238: associated with nodes of the subject tree. The identifiers are
239: assigned numbers by twig but the user can overide the assigned number
240: by specifying $int_2$. The degree of the node identifer, i.e. the
241: number of children, is assumed to be fixed. Normally, twig will
242: deduce the degree when $id$ is used in a rule. However, the user may
243: explicitly give the degree by specifying $int_1$.
244: \medskip
245: \line{\hfil\vbox{\hsize=3.5truein\tt label $id ...$;}\hfil}
246: \medskip
247: A label declaration indicates that the following $id$'s are to be used
248: as labels of rules.
249: \medskip\line{\bf Costs and Action code\hfil}
250:
251: Code fragments such as the $cost$ and $action$ clauses of a rule are
252: specfied by enclosing C code in braces. All legal C constructs are
253: permitted in code fragments. In addition, the following are provided
254: for convenient access to the subject tree and internal data
255: structures of the matcher.
256: \smallskip
257: {\parindent=0pt
258: $\bullet$ $\$\%n\$$ denotes a pointer value to the matcher data
259: structure for the $n$th labelled leaf. Thus to access the cost value
260: associated with that leaf, the notation $\$\%n\$\to${\tt cost} may be used.
261:
262: $\bullet$ $\$\$$ denotes a pointer value to the root of the subject
263: tree.
264:
265: $\bullet$ $\$n_1.n_2.n_3.\, \hbox{...}\,.n_{k-1}.n_k\$$ denotes
266: a pointer value to the $n_k$th child of the $n_{k-1}th$ child of the
267: $n_{k-2}$th child of \hbox{...} of the $n_1$th child of the root of
268: the subject tree. Each $n_i$ is a positive non-zero integer.
269: }
270: \medskip
271:
272: Some special constructs are available to code fragments in the cost
273: part of the specification. The
274: statement ``{\tt ABORT;}'' when encountered during the execution of
275: the cost code, signals to the matcher that this pattern is to be
276: rejected. When a ``{\tt REWRITE;}'' statement is encountered, control
277: is returned to the matcher and the rule will become a {\sl rewrite}
278: mode match. When the end of the cost code fragment is reached,
279: control is returned to the matcher and the rule becomes a {\sl defer}
280: mode match. Cost values are returned to the matcher by assigning to
281: the ``{\tt cost}'' variable in the cost clause. If no assignment is
282: made to the {\tt cost} variable then the returned cost will be {\tt
283: DEFAULT\_COST}.
284:
285: Action clauses are expected to return a new tree which will replace
286: the subject tree. This is done by returning using the standard C
287: ``{\tt return($new\_tree$); }'' statement
288: where {\tt $new\_tree$} is of type
289: {\tt NODEPTR}. When the end of the action clause is encountered, the
290: matcher resumes execution and the subject tree is not replaced.
291: \vfil\eject
292: \line{\bf An Example\hfil}
293:
294: The following are examples of twig rules that generate VAX code for
295: the subtract instruction:
296: \midinsert
297: \moveright 1truein \vbox{\settabs 9 \columns
298: \+prologue & $\{$\cr
299: \smallskip
300: \+NODE & gettemp();\cr
301: \smallskip
302: \+$\}$;\cr\smallskip
303: \+node&long constant sub;\cr
304: \+label&operand temp;\cr
305: \smallskip
306: \+operand:&long;&&&$/*$ rule 1 $*/$\cr
307: \+operand:&constant;&&&$/*$ rule 2 $*/$\cr
308: \+operand:&temp;&&&$/*$ rule 3 $*/$\cr
309: \smallskip
310: \+temp:&operand&&&$/*$ rule 4 $*/$\cr
311: \+&$\{$ cost = {\tt TEMP\_COST}+$\$\%1\$\to$cost;$\}$\cr
312: \+&$=\{$\cr
313: \+&&NODE t = gettemp();\cr
314: \+&& emit(``mov'', $\$\$$, t, 0);\cr
315: \+&&return(t);\cr
316: \+&$\}$;\cr
317: \smallskip
318: \+operand:&sub(operand,operand)&&&$/*$ rule 5 $*/$\cr
319: \+&$\{$cost = $\$\%1\$\to$cost + $\$\%2\$\to$cost+{\tt SUB\_COST};$\}$\cr
320: \+&=$\{$\cr
321: \+&&NODE t = gettemp();\cr
322: \+&&emit(``sub'', $\$1\$$, $\$2\$$, t, 0);\cr
323: \+&&return(t);\cr
324: \+&$\}$;\cr
325: \smallskip
326: \+temp:&sub(temp,constant)&&&$/*$ rule 6 $*/$\cr
327: \+&$\{$\cr
328: \+&&if(value($\$2\$$)==1)\cr
329: \+&&&cost = $\$\%1\$\to$cost+{\tt DEC\_COST};\cr
330: \+&&else ABORT;\cr
331: \+&$\}$=$\{$\cr
332: \+&&emit(``dec'', $\$1\$$, 0);\cr
333: \+&&return($\$1\$$);\cr
334: \+&$\}$;\cr
335: }
336: \endinsert
337: {\parindent=0pt
338: $\bullet$ Rules 3 and 4 form a loop.
339: The potential loop: temp$\to$operand$\to$temp$\to$operand$...$ is
340: broken
341: by the matcher recognizing that the cost of the second match of temp
342: does not cost less than the first match of temp.
343:
344: $\bullet$ Rule 4 localizes the testing of whether an operand is a
345: temporary. It is not necessary but makes the coding of other
346: cost clauses less tedious.
347:
348: $\bullet$ In the cost clause of rule 5, the cost is the sum of the
349: leaves plus the cost of the subtract instruction.
350: The action clause emits code to add the two operands
351: and leave the result in a temporary location. The temporary is
352: returned as a substitution for the subject tree.
353:
354: $\bullet$ Rule 6 handles a special case where the left operand is
355: already in a temporary and the constant is one. In this case, the
356: temporary is directly decremented and returned as the new tree.
357: }\vfil\eject
358: \end
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.