Annotation of researchv10no/cmd/sml/lib/twig/man.tex, revision 1.1.1.1

1.1       root        1: \documentstyle[11pt,titlepage]{article}
                      2: % style
                      3: \setlength{\textwidth}{15cm}
                      4: \setlength{\oddsidemargin}{4.6mm} % 30 mm vasen marginaali
                      5: \setlength{\evensidemargin}{\oddsidemargin}
                      6: \setlength{\textheight}{22cm}
                      7: \setlength{\topmargin}{-8mm}
                      8: \newcommand{\mltwig}{ML-Twig }
                      9: 
                     10: \title{Preliminary Version of \mltwig User Manual}
                     11: \author{Jussi Rintanen \\ Helsinki University of Technology}
                     12: \date{June 10, 1989}
                     13: 
                     14: \begin{document}
                     15: \maketitle
                     16: \tableofcontents
                     17: \newpage
                     18: \section{Introduction}
                     19: 
                     20: \mltwig is the Standard ML counterpart of Twig
                     21: \cite{ahoganapathitjiang,tjiang}, a general tree manipulation program
                     22: implemented in C by S. W. K. Tjiang, which can be used for various purposes,
                     23: especially for writing code generators \cite{appel}.
                     24: \mltwig is not a clone of Twig, but a similar tool adjusted to fit
                     25: the Standard ML environment.
                     26: The user writes a specification of a tree manipulation program,
                     27: which is then compiled by \mltwig to an ordinary Standard ML program.
                     28: 
                     29: A specification defines a representation for the trees to be manipulated,
                     30: and a set of tree translation rules called a tree translation scheme.
                     31: The rules of the tree translation scheme can be seen as tree rewriting
                     32: rules, by means of which a tree can be reduced bottom-up to a single node.
                     33: Often, a tree translation scheme is redundant, and the reduction can
                     34: proceed using different rules. This redundancy can be taken
                     35: advantage of by associating a cost with each rule. The reduction chosen
                     36: by \mltwig is the cheapest. In the determination of the cheapest
                     37: reduction, a tree pattern matching algorithm and a dynamic programming
                     38: algorithm are used \cite{ahojohnson,hoffmannodonnell}.
                     39: 
                     40: During the tree reduction process, the program produced by \mltwig maps
                     41: the user tree to a result value. The result value may be a new tree of
                     42: the same type as the original tree, or more generally, any Standard ML value.
                     43: A behaviour facilitating side-effects is also supported by means of
                     44: a top-down execution mechanism, which is similar to that of Twig.
                     45: The efficiency of \mltwig compiler is acceptable, but the run-time system
                     46: could be more efficient. Therefore we currently regard \mltwig as
                     47: an experimental tool.
                     48: 
                     49: \section{\mltwig Specifications}
                     50: \subsection{Syntax}
                     51: 
                     52: The syntax of \mltwig specifications is presented below.
                     53: 
                     54: \newenvironment{grammar}{\goodbreak\medskip\obeylines\parskip=1pt}{\medskip\goodbreak}
                     55: \newcommand{\NT}[1]{ {\em #1} }
                     56: \newcommand{\TE}[1]{ {\tt #1} }
                     57: \newcommand{\RW}[1]{ {\tt\bf #1} }
                     58: \newcommand{\IS}{ $\Rightarrow$ }
                     59: \newcommand{\OR}{ $|$ }
                     60: \newcommand{\EMPTY}{ $\epsilon$ }
                     61: \newcommand{\TAB}{ \hskip8mm }
                     62: \newcommand{\VTAB}{ \vskip3mm }
                     63: \newcommand{\LPA}{ {\bf (} }
                     64: \newcommand{\RPA}{ {\bf )} }
                     65: \newcommand{\COM}{ {\bf ,} }
                     66: \newcommand{\RBR}{ ${\bf ]}$ }
                     67: \newcommand{\LBR}{ ${\bf [}$ }
                     68: \newcommand{\SEC}{ {\bf ;} }
                     69: \newcommand{\COL}{ {\bf :} }
                     70: \newcommand{\EQ} { {\bf =} }
                     71: 
                     72: \begin{grammar}
                     73: \VTAB
                     74: \NT{specification}\IS\NT{head}\NT{body}
                     75:         
                     76: \NT{head}\IS\NT{head-declaration}\OR\NT{head-declaration}\NT{head}
                     77:         
                     78: \NT{head-declaration}\IS\RW{prologue}\LPA\NT{ml-code}\RPA\SEC
                     79: \TAB\OR\RW{default\_cost}\LPA \NT{ml-code}\RPA \SEC
                     80: \TAB\OR\RW{structure\_name}\TE{id} \SEC
                     81: \TAB\OR\RW{label}\NT{label-list}\SEC
                     82: \TAB\OR\RW{node}\NT{node-list}\SEC
                     83:         
                     84: \NT{body}\IS\NT{bodyd}\OR\NT{bodyd}\NT{body} 
                     85:         
                     86: \NT{bodyd}\IS\NT{rule}\OR\RW{REWRITE}\NT{rule}\OR\RW{TOPDOWN}\NT{rule}\OR\RW{insert}\LPA\NT{ml-code}\RPA\SEC
                     87: 
                     88: \VTAB
                     89: 
                     90: \NT{node-list}\IS\NT{node}\OR\NT{node-list}\COM\NT{node}
                     91: 
                     92: \NT{node}\IS\TE{nodeid}(\TE{int}) 
                     93: 
                     94: \NT{label-list}\IS\TE{label}\OR\NT{label-list}\TE{label}
                     95: 
                     96: \NT{label}\IS\TE{labelid}\RW{of}\NT{ml-code}
                     97: 
                     98: \VTAB
                     99:         
                    100: \NT{rule}\IS\TE{labelid}\NT{tree-pattern}\NT{cost}\NT{action}\SEC
                    101: 
                    102: \VTAB
                    103: 
                    104: \NT{cost}\IS\COL(\NT{ml-code})\OR\EMPTY
                    105:   
                    106: \NT{action}\IS\EQ(\NT{ml-code})
                    107: 
                    108: \NT{ml-code}\IS\TE{mltoken}\OR\NT{ml-code}\TE{mltoken}
                    109: 
                    110: \NT{tree-pattern}\IS\TE{labelid}\OR\TE{nodeid}\OR\TE{nodeid}\LPA\NT{tree-list} \RPA 
                    111:    
                    112: \NT{tree-list}\IS\NT{tree-pattern}\OR\NT{tree-list}\COM\NT{tree-pattern}
                    113: 
                    114: \VTAB
                    115: \end{grammar}
                    116: 
                    117: The prologue, the node and label declarations, and the rules
                    118: are obligatory. The prologue must precede the inserts.
                    119: The node and label declarations must precede the rules.
                    120: 
                    121: White space, new lines and comments can be used in all parts of a
                    122: specification. The lexical analyzer of \mltwig is basically the same
                    123: as that of Standard ML. Symbolic identifiers beginning with a \$
                    124: should be used with caution because of presence of the special tree reference
                    125: tokens, see Section \ref{rules} for details.
                    126: The following words are reserved, and may not be used as label or node symbols.
                    127: 
                    128: \begin{verbatim}
                    129:    default_cost   insert   label   node   prologue
                    130:    REWRITE   TOPDOWN   structure_name
                    131: \end{verbatim}
                    132: 
                    133: \subsection{Label and Node Declarations}
                    134: 
                    135: \mltwig sees the user tree abstractly as a simple tree, where each node
                    136: and leaf is annotated with a symbol.
                    137: The symbols are declared in the node declaration.
                    138: The declaration begins with the word {\tt node}, which is followed
                    139: by a comma-separated list of node symbols with the arity of the symbol
                    140: in parentheses. The arity of a leaf node is 0.
                    141: The user may not use reserved words of Standard ML as node symbols.
                    142: The type {\tt symbol} is generated by \mltwig from the {\tt node} declaration.
                    143: It contains nullary data constructors named after the node symbols.
                    144: E.g. for the node declaration
                    145: \begin{verbatim}
                    146:    node PLUS(2), MINUS(2), MUL(2), DIV(2), CONST(0);
                    147: \end{verbatim}
                    148: the following nullary constructors are available for the user to be
                    149: used by the function {\tt node\_value}, see Section \ref{prologue}.
                    150: \begin{verbatim}
                    151:      PLUS MINUS MUL DIV CONST
                    152: \end{verbatim}
                    153: 
                    154: During the construction of the optimal cover, matched subtrees are labeled with
                    155: the replacement labels of the matching rules. The replacement
                    156: labels are declared in the label declaration. For each label, there
                    157: is an associated type. The declaration is of the form
                    158: \begin{verbatim}
                    159:      label lab1 of ty1 | lab2 of ty2 | ... ;
                    160: \end{verbatim}
                    161: An example label declaration is
                    162: \begin{verbatim}
                    163:      label Name of string | Const of int | Fb of foo -> bar list;
                    164: \end{verbatim}
                    165: The expression in the action part of a rule is of the same type as
                    166: the replacement label of the rule. The value of the expression may
                    167: be used in the rules matching at the parent nodes of the user tree.
                    168: 
                    169: \subsection{Prologue}\label{prologue}
                    170: In the prologue part the user writes the values for interfacing the
                    171: user tree and cost representations to the \mltwig run time system:
                    172: \begin{verbatim}
                    173:   type cost
                    174:   val cost_less : cost * cost -> bool
                    175: 
                    176:   type tree
                    177:   val get_subtrees : tree -> tree list
                    178:   val node_value : tree -> symbol
                    179: \end{verbatim}
                    180: 
                    181: The costs of the rules are represented by the type {\tt cost}.
                    182: The only operation for this type used by \mltwig is
                    183: the comparison function {\tt cost\_less}, which returns true, if
                    184: the first argument cost is smaller than the second.
                    185: 
                    186: The type {\tt tree} is the type of the trees to be fed to the tree manipulator
                    187: program produced by \mltwig. The tree processor accesses the tree type
                    188: only with the functions {\tt get\_subtrees} and {\tt node\_value}.
                    189: {\tt Get\_subtrees} returns the list of subtrees of a tree. If there are
                    190: no subtrees, it returns the empty list. {\tt Node\_value} returns the
                    191: symbol associated with the root node of a tree.
                    192: 
                    193: \subsection{Default Values}
                    194: A default cost may be defined to be used for the rules without an
                    195: explicit cost expression.
                    196: The syntax of the default cost declaration is
                    197: \begin{verbatim}
                    198:      default_cost ( ... );
                    199: \end{verbatim}
                    200: 
                    201: The default cost is represented as a function of type {\tt cost list -> cost},
                    202: where the argument {\tt cost list} contains the costs of the children.
                    203: The values defined in the prologue and the inserts may be used
                    204: in defining the default cost.
                    205: 
                    206: The user may name the structure produced my \mltwig by providing
                    207: a structure name declaration. The syntax is simply
                    208: \begin{quote}
                    209:   {\tt structure\_name} {\em id} {\tt ;}
                    210: \end{quote}
                    211: 
                    212: \subsection{Inserts and Rules}\label{rules}
                    213: The body of an \mltwig specification consists of a set of rules,
                    214: possibly interleaved with inserts. An insert is a sequence of
                    215: Standard ML code in the same as the prologue. The syntax is similarly
                    216: \begin{verbatim}
                    217:      insert ( ... );
                    218: \end{verbatim}
                    219: There are three kinds of rules in \mltwig: ordinary, rewrite and
                    220: topdown. Rewrite rules are used for replacing a subtree
                    221: with another during the selection of an optimal cover. Topdown
                    222: rules are used in conjunction with side-effects, when it is
                    223: essential to execute the action parts of the parent and the children
                    224: in a certain order.
                    225: 
                    226: The three kinds of rules correspond the different match modes of Twig.
                    227: Our scheme is more restricted, but we think it suits better
                    228: to the implementation and parent language Standard ML.
                    229: 
                    230: The same syntax is used for all kinds of rules, except that rewrite and
                    231: topdown rules are prefixed with the identifiers {\tt REWRITE} and {\tt TOPDOWN},
                    232: respectively. The syntax of rules is
                    233: \begin{verbatim}
                    234:      replacement tree-pattern { : ( cost-expr ) } = ( action-expr );
                    235: \end{verbatim}
                    236: 
                    237: A rule matches at a node of the user tree, if the root symbol of the
                    238: tree pattern is the same as the symbol in the node of the user tree,
                    239: and the subtrees of the tree pattern match the subtrees of the user tree.
                    240: If the tree pattern is a label, it matches the user tree at a node,
                    241: if a rule with the same label as replacement has matched at the node.
                    242: 
                    243: The cost expression is the price of matching a rule. The action expression
                    244: is primarily evaluated for a value, but it could also have side-effects.
                    245: A cover of a user tree is a set of rules, using which the tree can be
                    246: reduced to a single node.
                    247: The most inexpensive cover of a user tree is the cover with the least
                    248: cost rule matching at the root node.
                    249: 
                    250: \mltwig selectes the most inexpensive cover, and executes the
                    251: actions associated with the rules constituting the cover. During
                    252: the selection phase, if the least cost rule matching at a node is
                    253: a rewrite rule, the action expression of the rule is evaluated for a new
                    254: tree to be used in place of the original subtree.
                    255: 
                    256: \subsubsection{Tree Patterns}
                    257: 
                    258: The tree patterns are in prefix notation, i.e. the root symbol
                    259: of a tree pattern is followed by parenthesized subpatterns.
                    260: A leaf node is not followed by parentheses.
                    261: An example of a tree pattern:
                    262: \begin{verbatim}
                    263:   OP(PLUS,exp,OP(binop,exp,exp))
                    264: \end{verbatim}
                    265: Obviously, the {\tt OP} symbol is a node symbol of arity 3.
                    266: The symbols {\tt PLUS}, {\tt binop} and {\tt exp} could be either
                    267: node symbols of arity 0 or labels, that depends on the node and
                    268: label declarations of the specification.
                    269: 
                    270: \subsubsection{Costs}
                    271: The cost expression of a rule must be of type {\tt cost}. The costs
                    272: of the rules which matched at the nodes corresponding the labeled
                    273: leafs of the tree pattern may be accessed using identifiers named
                    274: after the following convention. If a label occurs only once
                    275: in the tree pattern, the identifier used to access the cost is the
                    276: label itself. If a label occurs two or more times, the different
                    277: occurences are indexed with a number. For example, in the tree
                    278: pattern (labels are in lower case)
                    279: \begin{verbatim}
                    280:   MOVE(reg,OP(binop,exp,exp))
                    281: \end{verbatim}
                    282: the cost of the rule matching at the node corresponding the {\tt reg}
                    283: node could be accessed with the identifier {\tt reg}, and the rest of
                    284: the labels with the identifiers {\tt binop}, {\tt exp1} and {\tt exp2},
                    285: in that order. If a name is ambiguous, the user is responsible to rename
                    286: the labels. An example of an ambiguity can be seen in the following
                    287: tree pattern.
                    288: \begin{verbatim}
                    289:   N(a,a1,a)
                    290: \end{verbatim}
                    291: The names of the labels would be {\tt a1}, {\tt a1} and {\tt a2}.
                    292: 
                    293: All values defined in the prologue and the inserts can also be used.
                    294: A subtree of the current node can be accessed by a special token
                    295: by writing a period-separated list of subtree numbers between a
                    296: pair of \$'s. For example, the third subtree of the second subtree
                    297: of the current node can be accessed using \$2.3\$. As a special case,
                    298: the current node can be accessed using \$\$. The default cost,
                    299: as specified in the default cost declaration, can be referenced
                    300: with the identifier DC. A rule can be rejected by calling
                    301: the function {\tt ABORT} with the argument {\tt ()}.
                    302: 
                    303: \subsubsection{Actions}
                    304: There are some values available for all three kinds of rules:
                    305: the values defined in the prologue and in the inserts,
                    306: as well as the tree reference tokens described in the previous section.
                    307: 
                    308: The ordinary and topdown rules return a value of the same type as the
                    309: replacement label of the rule. Rewrite rules return a value of type {\tt tree}.
                    310: 
                    311: In ordinary rules, the values of the action expressions of
                    312: the rules matching at the nodes corresponding the labeled leafs of
                    313: the tree pattern can be accessed in the same way as costs in the
                    314: cost expression. The types of the values are the types of the labels
                    315: as described in the label declaration.
                    316: 
                    317: When evaluating the action expression of a topdown rule, the action
                    318: expression of the children rules have not yet been evaluated. Instead,
                    319: there are available funtions for evaluating the children expressions,
                    320: which the user can apply in any order she wishes. The functions are
                    321: named the same in way as the cost expressions of the children and the
                    322: action expression of the children in ordinary rules, except that they
                    323: are prefixed with the string ''{\tt DO}''. For example, if the
                    324: label {\tt stm} is of type {\tt string list}, then in the action
                    325: expression of a rule with the tree pattern
                    326: \begin{verbatim}
                    327:     SEQ(stm,stm)
                    328: \end{verbatim}
                    329: the following functions can be used:
                    330: \begin{verbatim}
                    331:      DOstm1 : unit -> string list
                    332:      DOstm2 : unit -> string list
                    333: \end{verbatim}
                    334: For rewrite rules, only the common values are available.
                    335: 
                    336: \subsubsection{Unit Rules}
                    337: An \mltwig specification often contains rules with a tree pattern
                    338: consisting of just one label, i.e. unit rules. The problem with
                    339: unit rules is that they can form cycles. For example, if rules {\em a b},
                    340: {\em b c} and {\em c a} were present, and a rule with the replacement
                    341: {\em b} had matched, the tree pattern matcher would match the first
                    342: rule, then the third, the second, the first and so on.
                    343: 
                    344: In \mltwig, the unit rule cycles are cut, when the initial label is
                    345: encountered the second time. Another possibility would be to cut, when
                    346: the first matched unit rule is encountered the second time.
                    347: 
                    348: \section{How to Run \mltwig and a Program Produced by It}
                    349: The current implementation of \mltwig has been written with
                    350: Standard ML of New Jersey 0.33 in mind. However, it should
                    351: be fairly easy to port it to other implementations of Standard ML.
                    352: 
                    353: There are two ways to use \mltwig. The first way is to compile it
                    354: with the batch compiler, and use it
                    355: independently of the interactive system. For this purpose, there is
                    356: a simple ''user interface'', which prints a title and asks the user
                    357: for the name of the specification file.
                    358: The second way is to load \mltwig into the interactive system, and call
                    359: the function {\tt Main.main} with a file name as the argument.
                    360: 
                    361: \mltwig reads the specification file, and produces a new file containing
                    362: the tree processor program. The name of the new file is the name of
                    363: the specification file suffixed with ''.sml''. 
                    364: The resulting file contains a structure of the form
                    365: \begin{verbatim}
                    366:         structure NNN =
                    367:           struct
                    368:             structure User =
                    369:               struct
                    370:                 ...
                    371:               end
                    372:             exception NoCover
                    373:             exception InternalError of string
                    374:             val translate : User.tree -> User.result = ...
                    375:         end;
                    376: \end{verbatim}
                    377: The name of the structure ({\tt NNN}) is the name specified in the
                    378: structure\_name declaration, or the default name {\tt TreeProcessor}.
                    379: The substructure {\tt User} contains all the code of the prologue and
                    380: the inserts, and the definition of the type {\tt symbol}.
                    381: The function {\tt translate} is the actual function, which
                    382: is applied to a user tree. If no cover was found, the exception
                    383: {\tt NoCover} is raised. If an internal error occurs in the function
                    384: {\tt translate}, the exception {\tt InternalError} is raised.
                    385: The value returned by {\tt translate} is the value returned by
                    386: the rule matched at the root node of the user tree. It is tagged
                    387: with a data constructor, which is the replacement label of the
                    388: rule at the root node of the least cost cover prefixed with ''\_''.
                    389: The structure described above is not self-contained, and a functor
                    390: located in a separate file must be loaded before it.
                    391: 
                    392: %\section{A Short Implementation Description}
                    393: \section{An Example \mltwig Specification}
                    394: 
                    395: Our example is a specification of a simple ''code generator'', which
                    396: compiles arithmetic expressions to instructions for a simple stack machine.
                    397: The instruction set of the machine is redundant: in addition to the simple
                    398: arithmetic instructions, it has a combined addition--multiplication
                    399: instruction, which is cheaper than the separate instructions
                    400: performing the same calculation.
                    401: \newpage
                    402: \begin{verbatim}
                    403:    (* Example of a complete ML-Twig specification:        *)
                    404:    (* Evaluation of expressions                           *)
                    405: 
                    406:    node   Plus(2), Minus(2), Mul(2);
                    407:    node   Const(0);
                    408: 
                    409:    prologue (
                    410:    
                    411:    (* The type and function definitions                   *)
                    412: 
                    413:       datatype tree = Tree of (tree * symbol * tree) | Leaf of int
                    414:       type cost = int
                    415: 
                    416:       fun get_subtrees(Leaf _) = []
                    417:         | get_subtrees(Tree (t1,_,t2)) = [t1,t2]
                    418:       fun node_value(Tree(_,ope,_))  = ope
                    419:         | node_value (Leaf _) = Const
                    420:       val cost_less : int * int -> bool = (op <)
                    421: 
                    422:       fun constValue (Leaf i) = i
                    423:       datatype instr = PUSH of int | PLUS | MINUS | MUL | PLUSMUL
                    424:    );
                    425: 
                    426:    label Expr of instr list;
                    427:    
                    428:    default_cost( fn subexprcost => fold (op +) subexprcost 0);
                    429: 
                    430:    (* Rules                                               *)
                    431: 
                    432:    Expr Const            =([PUSH (constValue $$)]);
                    433:    
                    434:    Expr Plus(Expr,Expr) :   (2+Expr1+Expr2) =(Expr1@Expr2@[PLUS]);
                    435:    
                    436:    Expr Minus(Expr,Expr) :  (2+Expr1+Expr2) =(Expr1@Expr2@[MINUS]);
                    437:    
                    438:    Expr Mul(Expr,Expr) :    (2+DC)          =(Expr1@Expr2@[MUL]);
                    439: 
                    440:    Expr Mul(Expr,Plus(Expr,Expr)) : (3+DC)  =(Expr1@Expr2@Expr3@[PLUSMUL]);
                    441: 
                    442:    REWRITE Expr Mul(Plus(Expr,Expr),Expr) : (0)  =(Tree($2$,Mul,$1$));
                    443:    
                    444:    (* End                                                 *)
                    445: \end{verbatim}   
                    446: \newpage   
                    447: \begin{thebibliography}{99}
                    448: 
                    449: \bibitem{ahocorasick} {\sc A. V. Aho and M. J. Corasick},
                    450: {\em Efficient String Pattern Matching: An Aid to Bibliographic Search}.
                    451: Communications of The ACM, 18(June 1975), pp. 333-340.
                    452: 
                    453: \bibitem{ahoganapathitjiang} {\sc A. V. Aho, M. Ganapathi and S. W. K.
                    454: Tjiang}, {\em Code Generation Using Tree Matching and Dynamic Programming},
                    455: AT\&T Bell Laboratories, 600 Mountain Avenue, NJ 07974, 1986.
                    456: 
                    457: \bibitem{ahojohnson} {\sc A. V. Aho and S. C. Johnson}, {\em Optimal Code
                    458: Generation for Expression Trees}, Journal of The ACM, 23(July 1975),
                    459: pp. 488-501.
                    460: 
                    461: \bibitem{appel} {\sc A. W. Appel}, {\em Concise Specifications of Locally
                    462: Optimal Code Generators}, Technical Report CS-TR-080-87, Princeton
                    463: University, Department of Computer Science, February 1987.
                    464: 
                    465: \bibitem{harpermacqueenmilner} {\sc R. Harper, D. MacQueen and R. Milner},
                    466: {\em Standard ML}, Laboratory of Foundations of Computer Science,
                    467: University of Edinburgh, Technical Report ECS-LFCS-82-2 (1986).
                    468: 
                    469: \bibitem{hoffmannodonnell} {\sc C. M. Hoffmann and M. J. O'Donnell},
                    470: {\em Pattern Matching in Trees}, Journal of The ACM, 29(January 1982),
                    471: pp. 68-95.
                    472: 
                    473: \bibitem{tjiang} {\sc Tjiang, S. W. K.}, {\em Twig Reference Manual},
                    474: Technical Report, AT\&T Bell Laboratories, 1986.
                    475: 
                    476: \end{thebibliography}
                    477: 
                    478: \end{document}

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.