|
|
1.1 root 1: .EQ
2: delim ##
3: .EN
4: .DS C
5: \f3 Twig: A Brief Description\f1
6: .DE
7: .sp 2 ex
8: .PP
9: Twig is a language for processing trees based on the
10: algorithms described above. A twig program consists of a set
11: of pattern-action rules together with associated declarations. Patterns
12: describe trees to be matched. Actions calculate costs, perform tree
13: manipulations and other functions such as emitting code.
14: A twig program is translated by the twig
15: preprocessor into subroutines and tables in the host language which is C.
16: .PP
17: There are many ways to represent trees and costs. To give the user more
18: choice in representation, twig treats them as abstract data
19: types (ADT). Its manipulation of trees and costs are strictly performed
20: through calls to subroutines provided by the user.
21: .sp 1ex
22: .SH
23: Rules
24: .PP
25: The fundamental unit of a twig program is a rule.
26: .sp 1ex
27: .ti 1.5in
28: \f2label \f3: \f2tree_pattern \f3[ \f2cost \f3] [ = \f2action \f3]\f1
29: .sp 1ex
30: .PP
31: Intuitively, the pattern is used to match a subtree. The \f2cost\f1 code
32: fragment is then evaluated. The resulting cost is recorded by
33: the matcher for use in dynamic programming. The \f2action\f1 is executed
34: if the rule is part of the least cost cover of the tree.
35: .PP
36: If the \f2cost\f1 part is missing, Twig will insert default code
37: that returns the special value, DEFAULT_COST.
38: A missing \f2action\f1 part indicates that no action will be taken when a
39: match is found.
40: .SH
41: Tree Patterns
42: .PP
43: Tree patterns are specified
44: in prefix parenthesized form and can be described by the
45: following BNF:
46: .DS
47: .na
48: .nf
49: \f2tree_pattern \(-> node_id | label_id
50: tree_pattern \(-> node_id (tree_list)
51: tree_list \(-> tree,tree_list | tree
52: .DE
53: .ad
54: .fi
55: For example, the tree,
56: .DS
57: .sp 1in
58: .DE
59: will be written in twig as
60: .sp 1ex
61: .DS
62: .na
63: .nf
64: \f3oper(leftOp,rightOp)\f1
65: .DE
66: .ad
67: .fi
68: .PP
69: There are two types of symbols: \f2node_id\f1s and \f2label_id\f1s.
70: \f2Node_id\f1s are used to denote internal nodes and leaves.
71: \f2Label_id\f1s label tree patterns and are analogous to
72: non-terminals in context free grammars.
73: For example, the following twig rules without their action parts describe
74: simple expression trees with the \f3plus\f1 operator.
75: .DS
76: \f3
77: .na
78: .nf
79: expr: plus(expr, expr)
80: expr: identifier
81: expr: constant
82: .DE
83: .ad
84: .fi
85: \f3Identifier\f1 and \f3constant\f1 are \f2node_id\f1s
86: representing leaves, and
87: \f2plus\f1 is a \f2node_id\f1s representing an internal node whereas
88: \f3expr\f1 is a
89: \f2label_id\f1. Leaves of patterns that are \f2label_id\f1s are called
90: \f2labelled leaves\f1.
91: In the first rule, both leaves of the pattern are labelled.
92: .PP
93: Twig assigns an integer to each node symbol and label symbol. These
94: integers are used by the twig pattern matcher as encodings for the
95: symbols.
96: As the matcher traverses the
97: tree, a user supplied subroutine is called to provide an integer
98: corresponding to each node.
99: .SH
100: Costs
101: .PP
102: To increase the flexibility of representing costs, the tree matcher
103: views costs as an ADT.
104: For example, costs may be represented as an integer or
105: as a vector of integers with each element representing the cost of a specific
106: resource.
107: A cost ADT suitable for Twig must have the following four definitions:
108: .IP
109: \(bu \f3COST\f1 is a C type.
110: Although the proper functioning of the tree matcher
111: does not depend on the details of the \f3COST\f1 type, it must have
112: the type information for storage allocation purposes.
113: .IP
114: \(bu
115: \f3INFINITY\f1 is the maximum attainable cost value. The matcher uses
116: \f3INFINITY\f1 to initialize its data structures.
117: .IP
118: \(bu
119: \f3DEFAULT_COST\f1 is the cost
120: value returned by rules without a cost part.
121: .IP
122: \(bu
123: \f3COSTLESS\f1 is a function of
124: two cost values. It must return true if and only if the first argument is
125: less than the second.
126: .sp 2ex
127: .SH
128: Trees
129: .PP
130: As with costs, Twig treats trees as an ADT. Here, using an ADT is
131: even more important because trees come in a variety of shapes and
132: representations.
133: Twig would be overly complicated if it had to
134: know any more than the fundamental properties of trees. Thus,
135: twig treats trees as an acyclic directed
136: graph of almost featureless nodes with one
137: distinguished node, the root. Each node has only one feature and that
138: is an integer corresponding to the \f2node_id\f1s discussed above.
139: To provide this
140: view to Twig, the user must provide the following definitions and
141: subroutines.
142: .IP
143: \(bu \f3NODE\f1 is the type of a node. The actual
144: details of
145: the NODE are irrelevant and Twig uses this definition only for storage
146: allocation and declaration purposes.
147: .IP
148: \(bu \f3NONODE\f1 is a constant of type \f3NODE\f1. It is a distinguished
149: value returned by the routines defined below to represent a null value.
150: .IP
151: \(bu
152: \f3mtGetNodes#(r,n)# returns the #n#th child of \f2r\f1 where \f2r\f1
153: is a \f3NODE\f1. The
154: leftmost child is \f3mtGetNodes\f2(r,1)\f1. If #n>degree(r)#
155: then the function should return \f3NONODE\f1. Twig expects the
156: expression \f3mtGetNodes\f2(\f3NONODE\f2, 0)\f1
157: to denote the root node of
158: the subject tree.
159: .IP
160: \(bu
161: \f3mtValue\f2(r)\f1 returns the integer associated with \f2r\f1.
162: .IP
163: \(bu
164: \f3mtSetNodes\f2(r,n,c)\f1 replaces the \f2n\f1th child of \f2r\f1 with
165: \f2c\f1. This routine may be used to replace whole subtrees with others.
166: .sp 2ex
167: .SH
168: Pattern Matcher Operation
169: .PP
170: The pattern matcher operates in two phases: the costing phase and the
171: execution phase.
172: .PP
173: In the costing phase, the matcher performs a preorder traversal of
174: a subject tree and discovers matches from the bottom up. At the same time, it
175: builds a skeleton tree that is structurally isomorphic to the subject tree.
176: When a match is discovered the cost clause of the pattern is invoked to
177: calculate the cost. Many patterns with different labels could match at any
178: given node but only the lowest cost pattern for each label is recorded in
179: the skeleton.
180: .PP
181: When a pattern is matched, its label is then used as input to the pattern
182: matcher so that matching of patterns with labelled leaves can begin.
183: This
184: process
185: is analogous to a reduce action in
186: bottom up parsers. In twig, multiple reductions are
187: tracked.
188: .PP
189: The cost part of a rule can also assign a \f2mode\f1 to a match. The
190: \f2mode\f1 controls how action parts are to be executed.
191: Actions parts of \f2defer\f1 mode matches are not executed until the
192: execution phase. \f2Rewrite\f1 mode matches cause the action part to
193: be executed immediately during the costing phase to transform the
194: matched subtree. Once the rewrite mode action is executed, the costing
195: phase continues by purging all information in the skeleton for the old
196: subtree and then scanning the newly transformed subtree.
197: \f2Rewrite\f1 actions are similar
198: to actions that are invoked on a reduction in
199: LR parsing.
200: .PP
201: In the case where multiple rewrite actions are possible the one with
202: the lowest cost will be executed. Furthermore, a rewrite action is
203: considered only if its cost is lower than all the defer mode actions.
204: Rewrite mode actions that are not executed are treated in the same
205: manner as defer mode actions in the execution phase.
206: .PP
207: When the root of the subject tree is reached in the preorder
208: traversal, the execution phase begins. The lowest cost match is then
209: chosen at the root and executed. In this context, execution involves
210: executing the actions of the labelled leaves before invoking the code
211: in the current action.
212: .SH
213: Lexical Issues and Conventions
214: .PP
215: Currently the
216: following are keywords of twig in no particular order:
217: .DS
218: .TS
219: center;
220: l l l.
221: label node cost
222: action prologue insert
223: .TE
224: .DE
225: They have special meanings and may not be used as
226: identifiers.
227: ;:(),= are special characters. All blanks, tabs, formfeeds and
228: newlines are ignored by twig but they may not appear inside
229: identifiers and numbers. Identifiers are nonempty strings
230: made of letters, digits or underscores and starting with a letter.
231: Numbers are nonempty string of digits. Code fragments or action parts
232: are enclosed in braces. Inside code fragments, C lexical rules must
233: be obeyed except that strings of the form $...$ that are not inside C
234: strings are interpreted by twig. In the following sections, $id$
235: denotes an identifier; \f2int1\f1, \f2int2\f1 denotes numbers; \f2ccode\f1
236: denotes C code fragments; ... indicates repetition of the previous
237: item; [...] indicates that ... is optional.
238: .PP
239: The input to the Twig program will be referred to as the subject tree.
240: .sp 2ex
241: .SH
242: Prologue and Inserts
243: \f1
244: .DS
245: \f3prologue \f2ccode\f1;
246: .DE
247: signals to twig that \f2ccode\f1 should be inserted at the
248: beginning of the output source file. \f2Ccode\f1 should contain
249: definitions relevant to the C code in
250: rules that are defined later in the twig source file.
251: .DS
252: \f3insert \f2ccode\f1;
253: .DE
254: should cause \f2ccode\f1 to be placed into the source file.
255: There can be multiple inserts and they will be placed into the source
256: file
257: in the order that they appear.
258: .SH
259: Declarations
260: .PP
261: All twig identifiers are declared before they are used.
262: .DS
263: \f3node\f1 \f2id\f1[(\f2int1\f1)] [= \f2int2\f1] ...;
264: .DE
265: A node declaration declares one or more identifier to be
266: associated with nodes of the subject tree. The identifiers are
267: assigned numbers by twig but the user can overide the assigned number
268: by specifying \f2int2\f1. The degree of the node identifer, i.e. the
269: number of children, is assumed to be fixed. Normally, twig will
270: deduce the degree when \f2id\f1 is used in a rule. However, the user may
271: explicitly give the degree by specifying \f2int1\f1.
272: .DS
273: \f3label\f2 id\f1 ...;
274: .DE
275: A label declaration indicates that the following $id$'s are to be used
276: as labels of rules.
277: .sp 2ex
278: .SH
279: Costs and Action code
280: .PP
281: Code fragments such as the \f2cost\f1 and \f2action\f1 clauses of a rule are
282: specfied by enclosing C code in braces. All legal C constructs are
283: permitted in code fragments. In addition, the following are provided
284: for convenient access to the subject tree and internal data
285: structures of the matcher.
286: .sp 1ex
287: .IP
288: \(bu $%\f2n\f1$ denotes a pointer value to the matcher data
289: structure for the $n$th labelled leaf. Thus to access the cost value
290: associated with that leaf, the notation $%\f2n\f1$\(->cost may be used.
291: .IP
292: \(bu
293: $$ denotes a pointer value to the root of the subject
294: tree.
295: .IP
296: \(bu
297: $#{n sub 1}.{n sub 2}.{n sub 3}...{n sub {k-1}}.{n sub k}#$ denotes
298: a pointer value to the ${n sub k}$th child of the $#{n sub {k-1}}#th$
299: child of the
300: $#{n sub {k-2}}#$th child of ... of the $#{n sub 1 }#$th child of the root of
301: the subject tree. Each $#{n sub i}#$ is a positive non-zero integer.
302: .PP
303: Some special constructs are available to code fragments in the cost
304: part of the specification. The
305: statement "\f3ABORT;\f1" when encountered during the execution of
306: the cost code, signals to the matcher that this pattern is to be
307: rejected. When a "\f3REWRITE;\f1" statement is encountered, control
308: is returned to the matcher and the rule will become a \f2rewrite\f1
309: mode match. When the end of the cost code fragment is reached,
310: control is returned to the matcher and the rule becomes a \f2defer\f1
311: mode match. Cost values are returned to the matcher by assigning to
312: a variable named
313: "\f3cost\f1" in the cost clause. If no assignment is
314: made to \f3cost\f1 then the returned cost will be
315: \f3DEFAULT_COST\f1.
316: .PP
317: Action clauses are expected to return a new tree which will replace
318: the subject tree. This is done by returning using the standard C
319: "\f3return\f2(new_tree);\f1" statement
320: where \f2new_tree\f1 is of type
321: \f3NODEPTR\f1. When the end of the action clause is encountered, the
322: matcher resumes execution and the subject tree is not replaced.
323: .bp
324: .SH
325: An Example
326: .PP
327: The following are examples of twig rules that generate VAX code for
328: the subtract instruction:
329: .DS
330: .na
331: .nf
332: prologue {
333: NODE gettermp();
334: };
335: node long constant sub;
336: label operand temp;
337:
338: operand: long; /* rule 1 */
339: operand: constant; /* rule 2 */
340: operand: temp; /* rule 3 */
341: temp: operand; /* rule 4 */
342: { cost = TEMP_COST+$%1$\(->cost; }
343: = {
344: NODE t = gettemp();
345: emit("mov", $$, t, 0);
346: return(t);
347: };
348:
349: operand: sub(operand,opernad) /* rule 5 */
350: { cost = $%1$\(->cost-$%2$\(->cost+SUB_COST; }
351: = {
352: NODE t = gettemp();
353: emit("sub", $1$, $2$, t, 0);
354: return(t);
355: };
356:
357: temp: sub(temp,constant) /* rule 6 $*/
358: {
359: if(value($2$)==1)
360: cost = $%1$\(->cost+DEC_COST;
361: else ABORT;
362: } = {
363: emit("dec", $1$, 0);
364: return($1$);
365: };
366: .DE
367: .IP
368: \(bu
369: Rules 3 and 4 form a loop.
370: The potential loop: temp\(->operand\(->temp\(->operand... is
371: broken
372: by the matcher recognizing that the cost of the second match of temp
373: does not cost less than the first match of temp.
374: .IP
375: \(bu In the cost clause of rule 5, the cost is the sum of the
376: leaves plus the cost of the subtract instruction.
377: The action clause emits code to add the two operands
378: and leave the result in a temporary location. The temporary is
379: returned as a substitution for the subject tree.
380: .IP
381: \(bu Rule 6 handles a special case where the left operand is
382: already in a temporary and the constant is one. In this case, the
383: temporary is directly decremented and returned as the new tree.
384: .bp
385: .DS C
386: \f3Experimental Results\f1
387: .DE
388: .sp 2ex
389: .PP
390: A code generator for the VAX has been build using Twig and incorporated
391: into the pcc2 compiler. The following table summarizes the results
392: for three benchmark programs.
393: The compiler using the Twig code generator is abbreviated \f2tcc\f1.
394: The times were obtained on a VAX-11/780
395: running 4.2bsd UNIX.
396: The first number is the time spent in the compiler.
397: The parenthesized number is the time spent in the operating system.
398: .TS
399: center;
400: c c c c c
401: l n n n n.
402: lines tcc pcc2 improvement
403: _
404: grep.c 458 9.9(1.4) 11.4(0.9) 13%
405: pmach.c 610 13.2(1.5) 13.5(0.6) 2%
406: reader.c 1005 23.3(1.6) 31.2(0.9) 25%
407: .TE
408: .PP
409: The running time for tcc is slightly
410: faster than that of pcc2.
411: However, The higher system times of tcc indicates that tcc place
412: higher demands on the operating system.
413: In the case of pmach.c, the total CPU time is actually more than that
414: of pcc2's.
415: .PP
416: Twig requires more dynamic storage than pcc2. For a tree, the dynamic
417: storage requirement on the average is number of nodes * 200
418: bytes
419: while the worst case is about number of nodes * 1000 bytes.
420: However this storage is reclaimed and reused for every new tree.
421: The larger system time
422: is caused by calls to the runtime system for dynamic storage.
423: .PP
424: Writing and modifying tcc was quick.
425: The basic
426: VAX without indexed addressing mode can be described in about two weeks.
427: This was done concurrently with debugging much of the table
428: walker. Adding indexed modes took
429: only several hours once we were confident that the rest of the
430: description was correct.
431: .PP
432: One reason for the ease of modifcation is that rules can be added
433: independent of the other rules. Looping can be completely ignored since
434: dynamic programming eliminates that possiblity.
435: Table generation was extremely fast or about 4.2 seconds for the
436: VAX description. Thus testing the description could be done quickly.
437: .PP
438: The tables were very small. For the VAX, there were only 109 rules.
439: This required 7.5K bytes of table space.
440: The whole code generator is another 40K
441: bytes.
442: .SH
443: Code Quality
444: .PP
445: The code generated by the Twig for the test programs were of the
446: same quality as code generated by pcc2.
447: However, Twig does better targeting.
448: .DS
449: tcc:
450: .ti +4em
451: addl3 -4(fp),-8(fp),-12(fp)
452: .sp 1ex
453: pcc2:
454: .ti +4em
455: addl3 -4(fp),-8(fp),r0
456: .ti +4em
457: movl r0,-12(fp)
458: .DE
459: On the other hand, pcc2 handled temporary registers better
460: and for an equivalent sequence of code, Twig may use
461: more temporary registers.
462: .DS
463: tcc:
464: .ti +4em
465: cvtbl (r0),r1
466: pcc2:
467: .ti +4em
468: cvtbl (r0),r0
469: .DE
470: This is actually a symptom of how temporary registers are allocated in tcc.
471: Registers are allocated before code is emitted and thus any registers freed
472: during code emissions will not be reused until the next code sequence.
473: .SH
474: Room for improvement
475: .PP
476: Currently Twig is slow compared to other table-driven systems such as
477: [Henry] and [Ganapathi]. However, we have had less than one year of
478: experience with Twig and we believe that its performance can be
479: improved because
480: .IP
481: \(bu Unit productions such as
482: .DS
483: operand: temp
484: .DE
485: in the above example increases the running time of Twig
486: since the closure operation for unit production is done at run-time.
487: .IP
488: \(bu Machine representations. The current compact
489: representation of the machine require linear searching except at the
490: start state where the large branching factor compelled us to use an
491: array scheme. Using another representation would speed up the
492: matcher. For example, implementing the VAX description as an array is
493: estimated to use about 50K bytes but would provide a significant
494: performance improvement.
495: .IP
496: \(bu Twig performs many more procedure calls than pcc2.
497: Every machine step requires at least one procedure call. Contrast
498: this to YACC where a machine step is performed in line. Procedure
499: calls on the VAX are expensive.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.