|
|
1.1 root 1: \documentstyle[11pt,titlepage]{article}
2: % style
3: \setlength{\textwidth}{15cm}
4: \setlength{\oddsidemargin}{4.6mm} % 30 mm vasen marginaali
5: \setlength{\evensidemargin}{\oddsidemargin}
6: \setlength{\textheight}{22cm}
7: \setlength{\topmargin}{-8mm}
8: \newcommand{\mltwig}{ML-Twig }
9:
10: \title{Preliminary Version of \mltwig User Manual}
11: \author{Jussi Rintanen \\ Helsinki University of Technology}
12: \date{June 10, 1989}
13:
14: \begin{document}
15: \maketitle
16: \tableofcontents
17: \newpage
18: \section{Introduction}
19:
20: \mltwig is the Standard ML counterpart of Twig
21: \cite{ahoganapathitjiang,tjiang}, a general tree manipulation program
22: implemented in C by S. W. K. Tjiang, which can be used for various purposes,
23: especially for writing code generators \cite{appel}.
24: \mltwig is not a clone of Twig, but a similar tool adjusted to fit
25: the Standard ML environment.
26: The user writes a specification of a tree manipulation program,
27: which is then compiled by \mltwig to an ordinary Standard ML program.
28:
29: A specification defines a representation for the trees to be manipulated,
30: and a set of tree translation rules called a tree translation scheme.
31: The rules of the tree translation scheme can be seen as tree rewriting
32: rules, by means of which a tree can be reduced bottom-up to a single node.
33: Often, a tree translation scheme is redundant, and the reduction can
34: proceed using different rules. This redundancy can be taken
35: advantage of by associating a cost with each rule. The reduction chosen
36: by \mltwig is the cheapest. In the determination of the cheapest
37: reduction, a tree pattern matching algorithm and a dynamic programming
38: algorithm are used \cite{ahojohnson,hoffmannodonnell}.
39:
40: During the tree reduction process, the program produced by \mltwig maps
41: the user tree to a result value. The result value may be a new tree of
42: the same type as the original tree, or more generally, any Standard ML value.
43: A behaviour facilitating side-effects is also supported by means of
44: a top-down execution mechanism, which is similar to that of Twig.
45: The efficiency of \mltwig compiler is acceptable, but the run-time system
46: could be more efficient. Therefore we currently regard \mltwig as
47: an experimental tool.
48:
49: \section{\mltwig Specifications}
50: \subsection{Syntax}
51:
52: The syntax of \mltwig specifications is presented below.
53:
54: \newenvironment{grammar}{\goodbreak\medskip\obeylines\parskip=1pt}{\medskip\goodbreak}
55: \newcommand{\NT}[1]{ {\em #1} }
56: \newcommand{\TE}[1]{ {\tt #1} }
57: \newcommand{\RW}[1]{ {\tt\bf #1} }
58: \newcommand{\IS}{ $\Rightarrow$ }
59: \newcommand{\OR}{ $|$ }
60: \newcommand{\EMPTY}{ $\epsilon$ }
61: \newcommand{\TAB}{ \hskip8mm }
62: \newcommand{\VTAB}{ \vskip3mm }
63: \newcommand{\LPA}{ {\bf (} }
64: \newcommand{\RPA}{ {\bf )} }
65: \newcommand{\COM}{ {\bf ,} }
66: \newcommand{\RBR}{ ${\bf ]}$ }
67: \newcommand{\LBR}{ ${\bf [}$ }
68: \newcommand{\SEC}{ {\bf ;} }
69: \newcommand{\COL}{ {\bf :} }
70: \newcommand{\EQ} { {\bf =} }
71:
72: \begin{grammar}
73: \VTAB
74: \NT{specification}\IS\NT{head}\NT{body}
75:
76: \NT{head}\IS\NT{head-declaration}\OR\NT{head-declaration}\NT{head}
77:
78: \NT{head-declaration}\IS\RW{prologue}\LPA\NT{ml-code}\RPA\SEC
79: \TAB\OR\RW{default\_cost}\LPA \NT{ml-code}\RPA \SEC
80: \TAB\OR\RW{structure\_name}\TE{id} \SEC
81: \TAB\OR\RW{label}\NT{label-list}\SEC
82: \TAB\OR\RW{node}\NT{node-list}\SEC
83:
84: \NT{body}\IS\NT{bodyd}\OR\NT{bodyd}\NT{body}
85:
86: \NT{bodyd}\IS\NT{rule}\OR\RW{REWRITE}\NT{rule}\OR\RW{TOPDOWN}\NT{rule}\OR\RW{insert}\LPA\NT{ml-code}\RPA\SEC
87:
88: \VTAB
89:
90: \NT{node-list}\IS\NT{node}\OR\NT{node-list}\COM\NT{node}
91:
92: \NT{node}\IS\TE{nodeid}(\TE{int})
93:
94: \NT{label-list}\IS\TE{label}\OR\NT{label-list}\TE{label}
95:
96: \NT{label}\IS\TE{labelid}\RW{of}\NT{ml-code}
97:
98: \VTAB
99:
100: \NT{rule}\IS\TE{labelid}\NT{tree-pattern}\NT{cost}\NT{action}\SEC
101:
102: \VTAB
103:
104: \NT{cost}\IS\COL(\NT{ml-code})\OR\EMPTY
105:
106: \NT{action}\IS\EQ(\NT{ml-code})
107:
108: \NT{ml-code}\IS\TE{mltoken}\OR\NT{ml-code}\TE{mltoken}
109:
110: \NT{tree-pattern}\IS\TE{labelid}\OR\TE{nodeid}\OR\TE{nodeid}\LPA\NT{tree-list} \RPA
111:
112: \NT{tree-list}\IS\NT{tree-pattern}\OR\NT{tree-list}\COM\NT{tree-pattern}
113:
114: \VTAB
115: \end{grammar}
116:
117: The prologue, the node and label declarations, and the rules
118: are obligatory. The prologue must precede the inserts.
119: The node and label declarations must precede the rules.
120:
121: White space, new lines and comments can be used in all parts of a
122: specification. The lexical analyzer of \mltwig is basically the same
123: as that of Standard ML. Symbolic identifiers beginning with a \$
124: should be used with caution because of presence of the special tree reference
125: tokens, see Section \ref{rules} for details.
126: The following words are reserved, and may not be used as label or node symbols.
127:
128: \begin{verbatim}
129: default_cost insert label node prologue
130: REWRITE TOPDOWN structure_name
131: \end{verbatim}
132:
133: \subsection{Label and Node Declarations}
134:
135: \mltwig sees the user tree abstractly as a simple tree, where each node
136: and leaf is annotated with a symbol.
137: The symbols are declared in the node declaration.
138: The declaration begins with the word {\tt node}, which is followed
139: by a comma-separated list of node symbols with the arity of the symbol
140: in parentheses. The arity of a leaf node is 0.
141: The user may not use reserved words of Standard ML as node symbols.
142: The type {\tt symbol} is generated by \mltwig from the {\tt node} declaration.
143: It contains nullary data constructors named after the node symbols.
144: E.g. for the node declaration
145: \begin{verbatim}
146: node PLUS(2), MINUS(2), MUL(2), DIV(2), CONST(0);
147: \end{verbatim}
148: the following nullary constructors are available for the user to be
149: used by the function {\tt node\_value}, see Section \ref{prologue}.
150: \begin{verbatim}
151: PLUS MINUS MUL DIV CONST
152: \end{verbatim}
153:
154: During the construction of the optimal cover, matched subtrees are labeled with
155: the replacement labels of the matching rules. The replacement
156: labels are declared in the label declaration. For each label, there
157: is an associated type. The declaration is of the form
158: \begin{verbatim}
159: label lab1 of ty1 | lab2 of ty2 | ... ;
160: \end{verbatim}
161: An example label declaration is
162: \begin{verbatim}
163: label Name of string | Const of int | Fb of foo -> bar list;
164: \end{verbatim}
165: The expression in the action part of a rule is of the same type as
166: the replacement label of the rule. The value of the expression may
167: be used in the rules matching at the parent nodes of the user tree.
168:
169: \subsection{Prologue}\label{prologue}
170: In the prologue part the user writes the values for interfacing the
171: user tree and cost representations to the \mltwig run time system:
172: \begin{verbatim}
173: type cost
174: val cost_less : cost * cost -> bool
175:
176: type tree
177: val get_subtrees : tree -> tree list
178: val node_value : tree -> symbol
179: \end{verbatim}
180:
181: The costs of the rules are represented by the type {\tt cost}.
182: The only operation for this type used by \mltwig is
183: the comparison function {\tt cost\_less}, which returns true, if
184: the first argument cost is smaller than the second.
185:
186: The type {\tt tree} is the type of the trees to be fed to the tree manipulator
187: program produced by \mltwig. The tree processor accesses the tree type
188: only with the functions {\tt get\_subtrees} and {\tt node\_value}.
189: {\tt Get\_subtrees} returns the list of subtrees of a tree. If there are
190: no subtrees, it returns the empty list. {\tt Node\_value} returns the
191: symbol associated with the root node of a tree.
192:
193: \subsection{Default Values}
194: A default cost may be defined to be used for the rules without an
195: explicit cost expression.
196: The syntax of the default cost declaration is
197: \begin{verbatim}
198: default_cost ( ... );
199: \end{verbatim}
200:
201: The default cost is represented as a function of type {\tt cost list -> cost},
202: where the argument {\tt cost list} contains the costs of the children.
203: The values defined in the prologue and the inserts may be used
204: in defining the default cost.
205:
206: The user may name the structure produced my \mltwig by providing
207: a structure name declaration. The syntax is simply
208: \begin{quote}
209: {\tt structure\_name} {\em id} {\tt ;}
210: \end{quote}
211:
212: \subsection{Inserts and Rules}\label{rules}
213: The body of an \mltwig specification consists of a set of rules,
214: possibly interleaved with inserts. An insert is a sequence of
215: Standard ML code in the same as the prologue. The syntax is similarly
216: \begin{verbatim}
217: insert ( ... );
218: \end{verbatim}
219: There are three kinds of rules in \mltwig: ordinary, rewrite and
220: topdown. Rewrite rules are used for replacing a subtree
221: with another during the selection of an optimal cover. Topdown
222: rules are used in conjunction with side-effects, when it is
223: essential to execute the action parts of the parent and the children
224: in a certain order.
225:
226: The three kinds of rules correspond the different match modes of Twig.
227: Our scheme is more restricted, but we think it suits better
228: to the implementation and parent language Standard ML.
229:
230: The same syntax is used for all kinds of rules, except that rewrite and
231: topdown rules are prefixed with the identifiers {\tt REWRITE} and {\tt TOPDOWN},
232: respectively. The syntax of rules is
233: \begin{verbatim}
234: replacement tree-pattern { : ( cost-expr ) } = ( action-expr );
235: \end{verbatim}
236:
237: A rule matches at a node of the user tree, if the root symbol of the
238: tree pattern is the same as the symbol in the node of the user tree,
239: and the subtrees of the tree pattern match the subtrees of the user tree.
240: If the tree pattern is a label, it matches the user tree at a node,
241: if a rule with the same label as replacement has matched at the node.
242:
243: The cost expression is the price of matching a rule. The action expression
244: is primarily evaluated for a value, but it could also have side-effects.
245: A cover of a user tree is a set of rules, using which the tree can be
246: reduced to a single node.
247: The most inexpensive cover of a user tree is the cover with the least
248: cost rule matching at the root node.
249:
250: \mltwig selectes the most inexpensive cover, and executes the
251: actions associated with the rules constituting the cover. During
252: the selection phase, if the least cost rule matching at a node is
253: a rewrite rule, the action expression of the rule is evaluated for a new
254: tree to be used in place of the original subtree.
255:
256: \subsubsection{Tree Patterns}
257:
258: The tree patterns are in prefix notation, i.e. the root symbol
259: of a tree pattern is followed by parenthesized subpatterns.
260: A leaf node is not followed by parentheses.
261: An example of a tree pattern:
262: \begin{verbatim}
263: OP(PLUS,exp,OP(binop,exp,exp))
264: \end{verbatim}
265: Obviously, the {\tt OP} symbol is a node symbol of arity 3.
266: The symbols {\tt PLUS}, {\tt binop} and {\tt exp} could be either
267: node symbols of arity 0 or labels, that depends on the node and
268: label declarations of the specification.
269:
270: \subsubsection{Costs}
271: The cost expression of a rule must be of type {\tt cost}. The costs
272: of the rules which matched at the nodes corresponding the labeled
273: leafs of the tree pattern may be accessed using identifiers named
274: after the following convention. If a label occurs only once
275: in the tree pattern, the identifier used to access the cost is the
276: label itself. If a label occurs two or more times, the different
277: occurences are indexed with a number. For example, in the tree
278: pattern (labels are in lower case)
279: \begin{verbatim}
280: MOVE(reg,OP(binop,exp,exp))
281: \end{verbatim}
282: the cost of the rule matching at the node corresponding the {\tt reg}
283: node could be accessed with the identifier {\tt reg}, and the rest of
284: the labels with the identifiers {\tt binop}, {\tt exp1} and {\tt exp2},
285: in that order. If a name is ambiguous, the user is responsible to rename
286: the labels. An example of an ambiguity can be seen in the following
287: tree pattern.
288: \begin{verbatim}
289: N(a,a1,a)
290: \end{verbatim}
291: The names of the labels would be {\tt a1}, {\tt a1} and {\tt a2}.
292:
293: All values defined in the prologue and the inserts can also be used.
294: A subtree of the current node can be accessed by a special token
295: by writing a period-separated list of subtree numbers between a
296: pair of \$'s. For example, the third subtree of the second subtree
297: of the current node can be accessed using \$2.3\$. As a special case,
298: the current node can be accessed using \$\$. The default cost,
299: as specified in the default cost declaration, can be referenced
300: with the identifier DC. A rule can be rejected by calling
301: the function {\tt ABORT} with the argument {\tt ()}.
302:
303: \subsubsection{Actions}
304: There are some values available for all three kinds of rules:
305: the values defined in the prologue and in the inserts,
306: as well as the tree reference tokens described in the previous section.
307:
308: The ordinary and topdown rules return a value of the same type as the
309: replacement label of the rule. Rewrite rules return a value of type {\tt tree}.
310:
311: In ordinary rules, the values of the action expressions of
312: the rules matching at the nodes corresponding the labeled leafs of
313: the tree pattern can be accessed in the same way as costs in the
314: cost expression. The types of the values are the types of the labels
315: as described in the label declaration.
316:
317: When evaluating the action expression of a topdown rule, the action
318: expression of the children rules have not yet been evaluated. Instead,
319: there are available funtions for evaluating the children expressions,
320: which the user can apply in any order she wishes. The functions are
321: named the same in way as the cost expressions of the children and the
322: action expression of the children in ordinary rules, except that they
323: are prefixed with the string ''{\tt DO}''. For example, if the
324: label {\tt stm} is of type {\tt string list}, then in the action
325: expression of a rule with the tree pattern
326: \begin{verbatim}
327: SEQ(stm,stm)
328: \end{verbatim}
329: the following functions can be used:
330: \begin{verbatim}
331: DOstm1 : unit -> string list
332: DOstm2 : unit -> string list
333: \end{verbatim}
334: For rewrite rules, only the common values are available.
335:
336: \subsubsection{Unit Rules}
337: An \mltwig specification often contains rules with a tree pattern
338: consisting of just one label, i.e. unit rules. The problem with
339: unit rules is that they can form cycles. For example, if rules {\em a b},
340: {\em b c} and {\em c a} were present, and a rule with the replacement
341: {\em b} had matched, the tree pattern matcher would match the first
342: rule, then the third, the second, the first and so on.
343:
344: In \mltwig, the unit rule cycles are cut, when the initial label is
345: encountered the second time. Another possibility would be to cut, when
346: the first matched unit rule is encountered the second time.
347:
348: \section{How to Run \mltwig and a Program Produced by It}
349: The current implementation of \mltwig has been written with
350: Standard ML of New Jersey 0.33 in mind. However, it should
351: be fairly easy to port it to other implementations of Standard ML.
352:
353: There are two ways to use \mltwig. The first way is to compile it
354: with the batch compiler, and use it
355: independently of the interactive system. For this purpose, there is
356: a simple ''user interface'', which prints a title and asks the user
357: for the name of the specification file.
358: The second way is to load \mltwig into the interactive system, and call
359: the function {\tt Main.main} with a file name as the argument.
360:
361: \mltwig reads the specification file, and produces a new file containing
362: the tree processor program. The name of the new file is the name of
363: the specification file suffixed with ''.sml''.
364: The resulting file contains a structure of the form
365: \begin{verbatim}
366: structure NNN =
367: struct
368: structure User =
369: struct
370: ...
371: end
372: exception NoCover
373: exception InternalError of string
374: val translate : User.tree -> User.result = ...
375: end;
376: \end{verbatim}
377: The name of the structure ({\tt NNN}) is the name specified in the
378: structure\_name declaration, or the default name {\tt TreeProcessor}.
379: The substructure {\tt User} contains all the code of the prologue and
380: the inserts, and the definition of the type {\tt symbol}.
381: The function {\tt translate} is the actual function, which
382: is applied to a user tree. If no cover was found, the exception
383: {\tt NoCover} is raised. If an internal error occurs in the function
384: {\tt translate}, the exception {\tt InternalError} is raised.
385: The value returned by {\tt translate} is the value returned by
386: the rule matched at the root node of the user tree. It is tagged
387: with a data constructor, which is the replacement label of the
388: rule at the root node of the least cost cover prefixed with ''\_''.
389: The structure described above is not self-contained, and a functor
390: located in a separate file must be loaded before it.
391:
392: %\section{A Short Implementation Description}
393: \section{An Example \mltwig Specification}
394:
395: Our example is a specification of a simple ''code generator'', which
396: compiles arithmetic expressions to instructions for a simple stack machine.
397: The instruction set of the machine is redundant: in addition to the simple
398: arithmetic instructions, it has a combined addition--multiplication
399: instruction, which is cheaper than the separate instructions
400: performing the same calculation.
401: \newpage
402: \begin{verbatim}
403: (* Example of a complete ML-Twig specification: *)
404: (* Evaluation of expressions *)
405:
406: node Plus(2), Minus(2), Mul(2);
407: node Const(0);
408:
409: prologue (
410:
411: (* The type and function definitions *)
412:
413: datatype tree = Tree of (tree * symbol * tree) | Leaf of int
414: type cost = int
415:
416: fun get_subtrees(Leaf _) = []
417: | get_subtrees(Tree (t1,_,t2)) = [t1,t2]
418: fun node_value(Tree(_,ope,_)) = ope
419: | node_value (Leaf _) = Const
420: val cost_less : int * int -> bool = (op <)
421:
422: fun constValue (Leaf i) = i
423: datatype instr = PUSH of int | PLUS | MINUS | MUL | PLUSMUL
424: );
425:
426: label Expr of instr list;
427:
428: default_cost( fn subexprcost => fold (op +) subexprcost 0);
429:
430: (* Rules *)
431:
432: Expr Const =([PUSH (constValue $$)]);
433:
434: Expr Plus(Expr,Expr) : (2+Expr1+Expr2) =(Expr1@Expr2@[PLUS]);
435:
436: Expr Minus(Expr,Expr) : (2+Expr1+Expr2) =(Expr1@Expr2@[MINUS]);
437:
438: Expr Mul(Expr,Expr) : (2+DC) =(Expr1@Expr2@[MUL]);
439:
440: Expr Mul(Expr,Plus(Expr,Expr)) : (3+DC) =(Expr1@Expr2@Expr3@[PLUSMUL]);
441:
442: REWRITE Expr Mul(Plus(Expr,Expr),Expr) : (0) =(Tree($2$,Mul,$1$));
443:
444: (* End *)
445: \end{verbatim}
446: \newpage
447: \begin{thebibliography}{99}
448:
449: \bibitem{ahocorasick} {\sc A. V. Aho and M. J. Corasick},
450: {\em Efficient String Pattern Matching: An Aid to Bibliographic Search}.
451: Communications of The ACM, 18(June 1975), pp. 333-340.
452:
453: \bibitem{ahoganapathitjiang} {\sc A. V. Aho, M. Ganapathi and S. W. K.
454: Tjiang}, {\em Code Generation Using Tree Matching and Dynamic Programming},
455: AT\&T Bell Laboratories, 600 Mountain Avenue, NJ 07974, 1986.
456:
457: \bibitem{ahojohnson} {\sc A. V. Aho and S. C. Johnson}, {\em Optimal Code
458: Generation for Expression Trees}, Journal of The ACM, 23(July 1975),
459: pp. 488-501.
460:
461: \bibitem{appel} {\sc A. W. Appel}, {\em Concise Specifications of Locally
462: Optimal Code Generators}, Technical Report CS-TR-080-87, Princeton
463: University, Department of Computer Science, February 1987.
464:
465: \bibitem{harpermacqueenmilner} {\sc R. Harper, D. MacQueen and R. Milner},
466: {\em Standard ML}, Laboratory of Foundations of Computer Science,
467: University of Edinburgh, Technical Report ECS-LFCS-82-2 (1986).
468:
469: \bibitem{hoffmannodonnell} {\sc C. M. Hoffmann and M. J. O'Donnell},
470: {\em Pattern Matching in Trees}, Journal of The ACM, 29(January 1982),
471: pp. 68-95.
472:
473: \bibitem{tjiang} {\sc Tjiang, S. W. K.}, {\em Twig Reference Manual},
474: Technical Report, AT\&T Bell Laboratories, 1986.
475:
476: \end{thebibliography}
477:
478: \end{document}
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.