Annotation of researchv10no/cmd/twig/doc/manual/twig0.tex, revision 1.1

1.1     ! root        1: \documentstyle[titlepage,fullpage,12pt,clbiba]{article}
        !             2: \begin{document}
        !             3: \newcommand{\twigcomp}{{\tt twig}}
        !             4: \newcommand{\Twigcomp}{{\tt Twig}}
        !             5: \newcommand{\twiglang}{{\it twig}}
        !             6: \newcommand{\Twiglang}{{\it Twig}}
        !             7: \title{Twig Reference Manual}
        !             8: 
        !             9: \date{
        !            10: \begin{minipage}{.8\textwidth}
        !            11: \normalsize
        !            12: \noindent
        !            13: \Twiglang{} is a language for manipulating trees.  A \twiglang{}
        !            14: program consists of a set
        !            15: of pattern-action rules together with associated declarations.  Patterns
        !            16: describe trees to be matched.  Actions calculate costs, perform tree
        !            17: manipulations and other functions such as emitting code.  A \twiglang{} program is
        !            18: translated by the \twigcomp{} compiler into subroutines and tables in a host
        !            19: language.  In the current implementation, the host language is C.\\[6pt]
        !            20: A \twiglang{} program manipulates trees by first finding a minimum cost covering of
        !            21: the input tree.  The actions of the rules whose pattern parts composes the
        !            22: covering is then executed. The minimum cost covering is determined using
        !            23: dynamic programming.  This technique naturally resolves many ambiguities that
        !            24: may be in the specifications.\\[6pt]
        !            25: The prime purpose of \twiglang{} is to create tree manipulation programs.
        !            26: One interesting application of tree manipulation is code generation and
        !            27: \twiglang{} has been used to implement a code generator for the 
        !            28: pcc2 compiler on the VAX.
        !            29: \end{minipage}}
        !            30: 
        !            31: \author{Steven W.K. Tjiang}
        !            32: 
        !            33: \maketitle
        !            34: \tableofcontents\newpage
        !            35: \section{Introduction}
        !            36: 
        !            37: \Twiglang{} is a language for writing tree manipulation programs.
        !            38: A \twiglang{} program does this by using a matcher to find a minimal
        !            39: cost covering of a tree.
        !            40: The covering is composed of
        !            41: user supplied templates.
        !            42: Trees are manipulated by
        !            43: executing the actions associated with the templates of their coverings.
        !            44: These actions may rewrite the tree, update other data structures, or
        !            45: generate output.
        !            46: 
        !            47: Figure~\ref{overview} gives an overview of the \twiglang{} system.
        !            48: The round boxes in the figure indicates files and the sharp cornered boxes
        !            49: are processing programs.
        !            50: The user writes the \twiglang{} specification file.
        !            51: This file is then compiled by the \twigcomp{} compiler into
        !            52: host language files.
        !            53: In the current version of \twiglang{}, the host language is C.
        !            54: The host language files are then compiled and linked with
        !            55: other user supplied files.
        !            56: The resulting program represented by the large box at the bottom
        !            57: of figure~\ref{overview} is the tree manipulation program.
        !            58: \begin{figure}[hp]
        !            59: \input{pic}
        !            60: \caption{The {\bf Twig } system}
        !            61: \label{overview}
        !            62: \end{figure}
        !            63: The details of this box will be discussed later.
        !            64: 
        !            65: \section{Motivation}
        !            66: 
        !            67: Although \Twiglang{} is a language for manipulating trees
        !            68: it was originally motivated by
        !            69: pattern directed code generation.  In many compilers, the typical
        !            70: structure is a {\sl front end} and a {\sl back end}.  The front end reads
        !            71: the input source program, checks its syntax and contextual conditions,
        !            72: and builds an intermediate representation of the program.
        !            73: The language used to do this representation is often called the
        !            74: {\it intermediate representation} and will be abbreviated as IR.
        !            75: The back end
        !            76: takes the IR and translates it into the target machine code.
        !            77: 
        !            78: That isn't the whole story but it's close.  Both ends may be 
        !            79: sets of programs rather than one
        !            80: program.  There may also be a
        !            81: {\it middle} that performs transformations on the
        !            82: IR.  The intention of these
        !            83: transformations is to improve the target code that comes out of the compiler.
        !            84: For this discussion, we can ignore the middle.
        !            85: 
        !            86: IRs can vary greatly from compiler to compiler.
        !            87: However the IRs are implemented, they are encodings of graphs.
        !            88: Usually three types of graphs are evident\cite{dragon}.
        !            89: There is a flowgraph whose nodes are {\sl basic blocks} and
        !            90: the edges corresponds to transfer of control between the basic blocks.
        !            91: Basic blocks are not featureless.  They are graphs themselves.
        !            92: Some compilers will represent them as a forest of
        !            93: {\it directed acyclic graphs} or DAGs.
        !            94: The nodes are data values and operations.
        !            95: Since operations yields values, one can think of nodes as
        !            96: corresponding to values.  An edge exists from node $a$ to
        !            97: node $b$ if and only if the value corresponding to $a$ depends on the
        !            98: value corresponding to $b$.  Other compilers will use trees instead of DAGs.
        !            99: Trees are not
        !           100: as general as DAGs and these compilers overcome that by using some
        !           101: type of labelling or copying.
        !           102: 
        !           103: Writing a front end usually
        !           104: involves writing a recognizer
        !           105: for the source language, designing a symbol table,
        !           106: and determining an error recovery strategy.  What has made this process easy
        !           107: are parser generators like YACC\cite{yacc}.
        !           108: They separate out the task of recognizing the
        !           109: source language from the other details.  This makes the task
        !           110: intellectually more manageable.
        !           111: The compiler implementor now specifies the source language via grammar
        !           112: productions.  Symbol table manipulations,
        !           113: and IR construction rules are attached to productions and activated
        !           114: when a source language construct is recognized.
        !           115: The front end is then {\it automatically} generated by processing the
        !           116: specification with a parser generator.
        !           117: The result is
        !           118: a high quality front end whose performance is close to that of a
        !           119: hand-crafted front end.
        !           120: 
        !           121: Providing the same type of facilities for automatically building back
        !           122: ends has been much more difficult.  Extending current front end
        !           123: parsers to do back ends
        !           124: has been workable but not entirely successful.  The basic idea is that
        !           125: the back end implementor writes a {\it machine grammar}
        !           126: for the IR and code generation
        !           127: occurs as actions that are executed when patterns are found
        !           128: in the IR\cite{gana,henry}.
        !           129: Unfortunately, these grammars are large and ambiguous.  Using
        !           130: parsers to handle them have usually met with many combinatorial problems
        !           131: \cite{henry}.  Furthermore, these code generators are slow
        !           132: although their output is often of high quality.
        !           133: The software engineering advantages one see with parsers in the front end
        !           134: are not so evident in this circumstance.  Instead of having to deal with
        !           135: the complexities of code generation, the implementor now has to deal with
        !           136: the complexities of the very large machine grammar.
        !           137: 
        !           138: Another approach at automatically building back ends is to start with
        !           139: known good algorithms for generating code.  From these algorithms, we
        !           140: can build an abstract interpreter and then specifications can be written
        !           141: to drive them.  We hope that by doing this we can derive a much more natural
        !           142: code generation specification technique.
        !           143: \Twiglang{} is one possible result of this approach.  The algorithms
        !           144: that form the basis of \twiglang{} are presented in \cite{optimalcode} and
        !           145: \cite{treematch}.  \Twiglang{}
        !           146: specifications are much less complex than parser based
        !           147: specifications.  Ambiguities are handled naturally and do not complicate
        !           148: the specifications.
        !           149: 
        !           150: In \cite{optimalcode}, it is shown how optimal code can be generated
        !           151: for an expression tree
        !           152: given a description of a machine's instruction set.
        !           153: This description is in term of tree templates.
        !           154: The algorithm first determines a minimal cost covering of
        !           155: an expression tree using dynamic programming.
        !           156: The covering is composed from the templates in the
        !           157: description.  The code generated is then the instructions
        !           158: represented by the templates in the covering.
        !           159: This algorithm was shown to generate the shortest code sequences for an
        !           160: abstract machine and
        !           161: its running time was shown to be linear in the number of nodes in the
        !           162: expression tree in \cite{optimalcode}.
        !           163: 
        !           164: This is all very well in theory but what about in practice.
        !           165: Code length is  not always the quantity that one wishes to minimize.
        !           166: One might wish to minimize the number of memory accesses or machine cycles
        !           167: executed by the generated code.
        !           168: Fortunately, the algorithm can also minimize these quantities.
        !           169: In fact, it can minimize any strictly increasing cost function.
        !           170: Let $I$ denote a machine instruction and
        !           171: $I_0I_1I_2...I_n$ denote a sequence of instructions.
        !           172: Then a cost function $f$ is strictly increasing if and only if
        !           173: $f(I_0I_1I_2...I_n) < f(I_0I_1I_2...I_nI_{n+1})$ for all
        !           174: instruction sequences $I_0I_1I_2...I_n$ and instruction
        !           175: $I_{n+1}$.  Another shortcoming in \cite{optimalcode}
        !           176: is that no efficient algorithm is given for matching the templates
        !           177: in the tree.  This is where \cite{treematch} comes in.  In that paper,
        !           178: two algorithms are presented for tree matching.  \Twiglang{} combines
        !           179: the work presented in the two papers.
        !           180: 
        !           181: \Twiglang{} differs from the algorithm as presented in \cite{optimalcode} in
        !           182: the following ways:
        !           183: \begin{itemize}
        !           184: \item
        !           185: It provides a convenient specification language for the templates.
        !           186: \item
        !           187: Tree templates in \cite{optimalcode} were labelled with either {\sl Register} or
        !           188: {\sl Memory}.  This label being the storage class that the instruction corresponding
        !           189: to the template would put a result.
        !           190: \Twiglang{} generalize tree labels so that
        !           191: they can be any symbol.
        !           192: \item
        !           193: Instead of associating an instruction with each template, \twiglang{} associates
        !           194: an action.
        !           195: \item
        !           196: In \cite{optimalcode}, code is generated by a bottom up traversal of the
        !           197: minimum cost covering.  \Twiglang{} allows a top down traversal of the cover
        !           198: and also the ability for the actions to rewrite parts of the
        !           199: tree.
        !           200: \item
        !           201: The algorithm in \cite{optimalcode} determines the evaluation order
        !           202: which gives the optimal code.
        !           203: \Twiglang{} does not do this.
        !           204: The evaluation order determination was omitted based on the conjecture that
        !           205: most of the time, simple left to right evaluation of the leaves will yield
        !           206: near optimal results.
        !           207: In situations where a nonstandard evaluation order is required, a new
        !           208: \twiglang{} pattern could be added that codes the order explicitly
        !           209: (see Section~\ref{exec:phase}).
        !           210: \end{itemize}
        !           211: 
        !           212: In this manual, a brief discussion will be given of the \twiglang{} language
        !           213: and how \twiglang{} programs can be incorporated with C.
        !           214: 
        !           215: Many statements have been made in this manual without proof.
        !           216: If the reader wishes to pursue it further, more details about the algorithms
        !           217: will be presented in an upcoming paper,
        !           218: \cite{aho:gana:tjiang}.
        !           219: 
        !           220: \section{The \Twiglang{} Language}
        !           221: \subsection{Lexical Issues and Conventions}
        !           222: Currently the
        !           223: following identifiers are reserved keywords of \twiglang{}:
        !           224: 
        !           225: \begin{tabbing}
        !           226: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
        !           227: action \> cost \> insert \\
        !           228: label \> node \> prologue
        !           229: \end{tabbing}
        !           230: 
        !           231: \noindent ;:(),= are special characters.  All blanks, tabs, formfeeds and
        !           232: newlines are ignored by \twiglang{} but they may not appear inside
        !           233: identifiers and numbers.  Identifiers are nonempty strings
        !           234: made of letters, digits or underscores and starting with a letter.
        !           235: Numbers are nonempty string of digits.  Code fragments or action parts
        !           236: are enclosed in braces.  Inside code fragments, C lexical rules must
        !           237: be obeyed except that strings of the form $\$...\$$ that are not inside C
        !           238: strings have special meaning to \twiglang{}.  In the following sections, $id$
        !           239: denotes an identifier; $int_1$, $int_2$ denotes numbers; $ccode$
        !           240: denotes C code fragments; $...$ indicates repetition of the previous
        !           241: item; $[...]$ indicates that $...$ is optional.
        !           242: 
        !           243: The input to the \twiglang{} program will be referred to as the subject tree.
        !           244: 
        !           245: \subsection{Rules}
        !           246: 
        !           247: The fundamental unit of a \twiglang{} program is a rule.
        !           248: 
        !           249: \begin{verse}
        !           250: $label\_id$ {\bf :} $tree\_pattern$\quad {\bf [} $cost$ {\bf ]}
        !           251: {\bf [ =} $action$ {\bf ]}
        !           252: \end{verse}
        !           253: 
        !           254: \noindent Intuitively, the pattern is used to match a subtree.
        !           255: The $cost$ code
        !           256: fragment is then evaluated.  The resulting cost is recorded by
        !           257: the matcher for use in dynamic programming.  The $action$ is executed
        !           258: if the rule is part of the least cost cover of the tree.
        !           259: The details of pattern matching will be discussed in Section~\ref{pattern}.
        !           260: 
        !           261: If the $cost$ part is missing, \twigcomp{} will insert default code
        !           262: that returns the special value, {\tt DEFAULT\_COST}.
        !           263: A missing $action$ part indicates that nothing will be done when a
        !           264: match is found.
        !           265: 
        !           266: \subsection{Tree Patterns}
        !           267: \label{TreePatterns}
        !           268: Tree patterns are specified
        !           269: in prefix parenthesized form and can be described by the
        !           270: following BNF:
        !           271: 
        !           272: \begin{verse}
        !           273: $tree\_pattern\to node\_id\;\vert\;label\_id$\\
        !           274: $tree\_pattern\to node\_id \;{\bf (}\; tree\_list \;{\bf )}$\\
        !           275: $tree\_list \to tree \;{\bf ,}\; tree\_list \;\vert\; tree$\\
        !           276: \end{verse}
        !           277: 
        !           278: \begin{figure}
        !           279: \unitlength 0.1in
        !           280: \begin{center}
        !           281: \begin{picture}(25,16)
        !           282: \put(7,0){\framebox(4,3){\tt id}}
        !           283: \put(20,0){\framebox(5,3){\tt const}}
        !           284: \put(13,6){\framebox(5,3){\tt plus}}
        !           285: \put(6,12){\framebox(5,3){\tt plus}}
        !           286: \put(0,6){\framebox(4,3){\tt id}}
        !           287: \put(11,3){\line(2,3){2}}
        !           288: \put(20,3){\line(-2,3){2}}
        !           289: \put(13,9){\line(-2,3){2}}
        !           290: \put(4,9){\line(2,3){2}}
        !           291: \end{picture}
        !           292: \end{center}
        !           293: \vspace{0.5in}
        !           294: \begin{verse}
        !           295: \begin{center}
        !           296: \tt plus(id, plus(id, const))
        !           297: \end{center}
        !           298: \end{verse}
        !           299: \caption{A Tree and its prefixed form}
        !           300: \label{treepattern}
        !           301: \end{figure}
        !           302: \noindent Figure~\ref{treepattern} gives an example of a tree pattern
        !           303: and its prefix parenthesized form.
        !           304: 
        !           305: There are two types of symbols: $node\_id$\/s and $label\_id$\/s.
        !           306: $Node\_id$\/s are used to denote internal nodes and leaves.
        !           307: $Label\_id$\/s label tree patterns and are analogous to
        !           308: nonterminals in context free grammars.
        !           309: For example, the following \twiglang{} rules without their action parts describe
        !           310: simple expression trees with the {\tt plus} operator.
        !           311: 
        !           312: \begin{verse}
        !           313: {\tt\begin{tabbing}
        !           314: expr: \= plus(expr, expr)\\
        !           315: expr: \> identifier\\
        !           316: expr: \> constant
        !           317: \end{tabbing}}
        !           318: \end{verse}
        !           319: 
        !           320: \noindent Here, {\tt identifier} and {\tt constant} are $node\_id$\/s
        !           321: representing leaves, and
        !           322: {\tt plus} is a $node\_id$ representing an internal node whereas
        !           323: {\tt expr} is a
        !           324: $label\_id$.  Pattern leaves that are $label\_id$\/s are called
        !           325: {\sl labelled leaves}.
        !           326: In the first rule, both leaves of the pattern are labelled.
        !           327: 
        !           328: \Twigcomp{} associates an integer with each node symbol and label symbol.  These
        !           329: integers are used by the \twiglang{} pattern matcher as encodings for the
        !           330: symbols.
        !           331: As the matcher traverses the
        !           332: tree, a user supplied subroutine is called to provide an integer
        !           333: corresponding to each node.
        !           334: 
        !           335: \subsection{Costs}
        !           336: 
        !           337: To increase the flexibility of representing costs, the tree matcher
        !           338: views costs as an {\sl abstract data type} or ADT.
        !           339: For example, costs may be represented as an integer or
        !           340: as a vector of integers with each element representing the cost of a specific
        !           341: resource.
        !           342: A cost ADT suitable for \twiglang{} must have the following four definitions:
        !           343: 
        !           344: \begin{itemize}
        !           345: \item {\tt COST} is a C type.
        !           346: Although the proper functioning of the tree matcher
        !           347: does not depend on the internal details of the {\tt COST} type, it must have
        !           348: the type information for storage allocation purposes.
        !           349: \item {\tt INFINITY} is the maximum attainable cost value.
        !           350: The matcher uses {\tt INFINITY} to initialize its data structures.
        !           351: \item {\tt DEFAULT\_COST} is the cost
        !           352: value returned by rules without a cost part.
        !           353: \item
        !           354: {\tt COSTLESS} is a function of
        !           355: two cost values.  It must return true if and only if the first argument is
        !           356: less than the second.
        !           357: \end{itemize}
        !           358: 
        !           359: \subsection{Trees}
        !           360: 
        !           361: As with costs, trees are treated as an ADT.  Here, using an ADT is
        !           362: even more important because trees come in a variety of shapes and
        !           363: representations.
        !           364: \Twiglang{} would be overly complicated if it had to
        !           365: know any more than the fundamental properties of trees.  Thus,
        !           366: \twiglang{} treats trees as an acyclic directed
        !           367: graph of almost featureless nodes with one
        !           368: distinguished node, the root.  Each node has only one feature and that
        !           369: is an integer corresponding to the $node\_id$\/s discussed above.
        !           370: To provide this
        !           371: view to \twiglang{}, the user must provide the following definitions and
        !           372: subroutines.
        !           373: 
        !           374: \begin{itemize}
        !           375: \item {\tt NODEPTR} is the type of a node.  The actual
        !           376: details of
        !           377: the {\tt NODEPTR}
        !           378: are irrelevant and \twigcomp{} uses this definition only for storage
        !           379: allocation and declaration purposes.
        !           380: \item
        !           381: {\tt NULL} is used to denote a null {\tt NODEPTR} value.
        !           382: In the current implementation this is the same {\tt NULL} used by the
        !           383: C standard I/O package and need not be defined by the user.
        !           384: This restricts the {\tt NODEPTR} representation to be a pointer.
        !           385: \item
        !           386: {\tt mtGetNodes$(r,n)$} returns the $n$th child of $r$ where $r$
        !           387: is a {\tt NODEPTR}.  The
        !           388: leftmost child is {\tt mtGetNodes}$(r,1)$.  If $n > \hbox{degree}(r)$
        !           389: then the function should return {\tt NULL}.  \Twigcomp{} expects the
        !           390: expression {\tt mtGetNodes}$({\tt NULL}, 0)$
        !           391: to denote the root node of
        !           392: the subject tree.
        !           393: \item
        !           394: {\tt mtValue$(r)$} returns the integer associated with $r$
        !           395: (See section~\ref{TreePatterns}).
        !           396: \item
        !           397: {\tt mtSetNodes$(r,n,c)$} replaces the $n$th child of $r$ with
        !           398: $c$.  This routine may be used to replace whole subtrees with others.
        !           399: {\tt mtSetNodes}$({\tt NULL},0,c)$ is used by \twigcomp{} to set the
        !           400: root of the subject tree to $c$.
        !           401: \end{itemize}
        !           402: 
        !           403: \subsection{Prologue and Inserts}
        !           404: 
        !           405: \begin{verse}
        !           406: {\tt prologue} $ccode$;
        !           407: \end{verse}
        !           408: 
        !           409: \noindent signals to \twigcomp{} that $ccode$ should be inserted at the
        !           410: beginning of the output source file.  $Ccode$ should contain
        !           411: definitions relevant to the C code in
        !           412: rules that are defined later in the \twiglang{} source file.
        !           413: 
        !           414: \begin{verse}
        !           415: {\tt insert} $ccode$;
        !           416: \end{verse}
        !           417: 
        !           418: \noindent causes $ccode$ to be placed into the source file.
        !           419: There can be multiple inserts and they will be placed into the source
        !           420: file
        !           421: in the order that they appear.
        !           422: 
        !           423: \subsection{Declarations}
        !           424: 
        !           425: All \twiglang{} identifiers are declared before they are used.
        !           426: 
        !           427: \begin{verse}
        !           428: {\tt node $id[(int_1)] [= int_2] ...$;\hfil}
        !           429: \end{verse}
        !           430: 
        !           431: \noindent A node declaration declares one or more identifier to be
        !           432: associated with nodes of the subject tree.  The identifiers are
        !           433: assigned numbers by \twigcomp{} but the user can overide the assigned number
        !           434: by specifying $int_2$.  The degree of the node identifer, the
        !           435: number of children, is assumed to be fixed.  Normally, \twigcomp{} will
        !           436: deduce the degree when $id$ is used in a rule.  However, the user may
        !           437: explicitly give the degree by specifying $int_1$.
        !           438: 
        !           439: \begin{verse}
        !           440: {\tt label $id ...$;}
        !           441: \end{verse}
        !           442: 
        !           443: A label declaration indicates that the following $id$'s are to be used
        !           444: as labels of rules.
        !           445: 
        !           446: \subsection{Costs and Action code}
        !           447: 
        !           448: Code fragments such as the $cost$ and $action$ clauses of a rule are
        !           449: specfied by enclosing C code in braces.  All legal C constructs are
        !           450: permitted in code fragments.  In addition, the following are provided
        !           451: for convenient access to the subject tree and internal data
        !           452: structures of the matcher.
        !           453: 
        !           454: \begin{itemize}
        !           455: \item $\$\%n\$$ denotes a pointer value to the matcher data
        !           456: structure for the $n$th labelled leaf.  Thus to access the cost value
        !           457: associated with that leaf, the notation $\$\%n\$\to${\tt cost} may be used.
        !           458: 
        !           459: \item $\$\$$ denotes a pointer value to the root of the subject
        !           460: tree.
        !           461: 
        !           462: \item $\$n_1.n_2.n_3.\, \hbox{...}\,.n_{k-1}.n_k\$$ denotes
        !           463: a pointer value to the $n_k$th child of the $n_{k-1}th$ child of the
        !           464: $n_{k-2}$th child of \hbox{...} of the $n_1$th child of the root of
        !           465: the subject tree.  Each $n_i$ is a positive non-zero integer.
        !           466: \end{itemize}
        !           467: 
        !           468: Some special constructs are available to code fragments in the cost
        !           469: part of the specification.  The
        !           470: statement ``{\tt ABORT;}'' when encountered during the execution of
        !           471: the cost code, signals to the matcher that this pattern is to be
        !           472: rejected.  When a ``{\tt REWRITE;}'' statement is encountered, control
        !           473: is returned to the matcher and the rule will become a {\sl rewrite}
        !           474: mode match.  When the end of the cost code fragment is reached,
        !           475: control is returned to the matcher and the rule becomes a {\sl defer}
        !           476: mode match.
        !           477: The statement ``{\tt TOPDOWN;}'' is like ``{\tt REWRITE;}'' except that
        !           478: the mode of the match becomes {\sl topdown}.  The meanings of these modes
        !           479: will be explained in Section~\ref{pattern}.
        !           480: 
        !           481: Cost values are returned to the matcher by assigning to
        !           482: the ``{\tt cost}'' variable in the cost clause.  If no assignment is
        !           483: made to the {\tt cost} variable then the returned cost will be {\tt
        !           484: DEFAULT\_COST}.
        !           485: 
        !           486: Action clauses are expected to return a new tree which will replace
        !           487: the subject tree.  This is done by returning using the
        !           488: ``{\tt return($new\_tree$);}'' statement.
        !           489: where {\tt $new\_tree$} is of type
        !           490: {\tt NODEPTR}.
        !           491: If execution reaches the end of the action clause, the
        !           492: matcher resumes execution and the subject tree is not modified.
        !           493: 
        !           494: \section{Pattern Matcher Operation}
        !           495: \label{pattern}
        !           496: 
        !           497: The
        !           498: pattern matcher operates in two phases: the costing phase and the execution
        !           499: phase.
        !           500: During
        !           501: the costing phase, a minimal cost covering of the subject tree is found.
        !           502: The execution phase invokes the actions that are associated with the patterns
        !           503: making up the covering.  Execution phases may begin during the costing phase
        !           504: to execute the covering of a subtree as described in Section~\ref{exec:phase}.
        !           505: 
        !           506: \subsection{The Costing Phase}
        !           507: \label{cost:phase}
        !           508: 
        !           509: Let the patterns of the \twiglang{} specifications be interpreted as grammar
        !           510: productions with {\sl label\_id}s as non-terminals and {\sl node\_id}s
        !           511: as terminals.
        !           512: If we add a production
        !           513: $S\,\to\,\hbox{\sl label\_id}$ for all {\sl label\_id}s where $S$ is the
        !           514: start symbol,
        !           515: then a covering for a tree is analogous to a derivation
        !           516: for the prefixed form of the tree.
        !           517: Figure~\ref{gram:interp} shows some patterns and its corresponding grammar.
        !           518: \begin{figure}
        !           519: \begin{center}
        !           520: \begin{verse}
        !           521: \tt
        !           522: e: plus(e,e)\\
        !           523: e: plus(e, plus(e,e))\\
        !           524: e: id\\
        !           525: e: const\\
        !           526: \end{verse}
        !           527: \end{center}
        !           528: \vspace{0.3in}
        !           529: \begin{center}\begin{verse}
        !           530: $S\to e$\\
        !           531: $e\to plus(e,e)$\\
        !           532: $e\to plus(e,plus(e,e)$\\
        !           533: $e\to id$\\
        !           534: $e\to const$\\
        !           535: \end{verse}\end{center}
        !           536: \caption{Pattern and Grammar}
        !           537: \label{gram:interp}
        !           538: \end{figure}
        !           539: The productions of the form $S\to\hbox{\sl label\_id}$ reflect the fact that
        !           540: any {\sl label\_id} may start a cover.
        !           541: 
        !           542: The matcher finds the least cost cover by doing a preorder traversal of
        !           543: the subject tree.
        !           544: At the same time, it
        !           545: builds a skeleton tree that is structurally isomorphic to the subject tree.
        !           546: When a match is discovered the cost clause of the pattern is invoked to
        !           547: calculate the cost.  Many patterns with different labels could match at any
        !           548: given node but only the lowest cost pattern for each label is recorded in
        !           549: the skeleton.
        !           550: 
        !           551: When a pattern is matched, its label is then used as input to the pattern
        !           552: matcher so that matching of patterns with labelled leaves can begin.
        !           553: This process
        !           554: is analogous to a reduce action in
        !           555: bottom up parsers.
        !           556: 
        !           557: \subsection{The Execution Phase}
        !           558: \label{exec:phase}
        !           559: 
        !           560: The execution phase starts when the costing phase is complete or when
        !           561: a sufficiently low cost {\sl rewrite} mode rule is encountered.
        !           562: Let $M$ be a matching
        !           563: rule and $L_1, L_2, L_3, ... ,L_n$ be
        !           564: the labelled leaves of the matching pattern.
        !           565: The following procedure is used to execute the action parts of $M$.
        !           566: 
        !           567: \vbox{
        !           568: \vspace{2ex}
        !           569: \noindent {\bf Procedure \it Execute}
        !           570: 
        !           571: \begin{itemize}
        !           572: \item If $M$ is a {\sl deferred} mode match then
        !           573: execution occurs by first applying {\it Execute} to the $L_i$s
        !           574: from left to right.  That is, in the order that they appear in
        !           575: the prefix form of the tree.
        !           576: When all the $L_i$s are executed, the action part of $M$ is
        !           577: invoked.
        !           578: 
        !           579: \item If $M$ is a {\sl rewrite } mode match then
        !           580: just the action part of $M$ is executed.  When the action part returns
        !           581: the matcher deletes the skeleton corresponding to the subtree that
        !           582: $M$ matched and rescans the new subtree that may have been substituted
        !           583: by executing $M$.
        !           584: 
        !           585: \item If $M$ is a {\sl topdown} mode match then only
        !           586: the action part of $M$ is executed.  To execute a
        !           587: labelled leaves, $L_i$, $M$'s action part may do so explicitly
        !           588: by using the {\tt tDO($\$\%i\$$)} macro.  This allows
        !           589: the user to arbitrary order the execution of the labelled leaves.
        !           590: \end{itemize}
        !           591: }
        !           592: 
        !           593: \noindent When the costing phase is over, the execution phase is started
        !           594: by picking the lowest cost match at the root of the subject tree.
        !           595: This will be the root of the minimum cost cover.
        !           596: The match is executed as described above.
        !           597: 
        !           598: The execution phase may begin before the costing phase is over.
        !           599: This occurs when a rewrite rule matches and its cost is lower than
        !           600: all other matches at that subtree.  In this situation the rewrite rule
        !           601: is executed immediately and the new rewritten subtree is scanned once more.
        !           602: The prescence of rewrite rules throws a wrench in the theoretical niceties
        !           603: of the matching algorithm.  For example, there is now no guarantee
        !           604: that the algorithm will terminate because the tree can
        !           605: be repeatedly enlarged.
        !           606: Rewrite rules can be dangerous.  They can
        !           607: be triggered unintentionally if the user is not careful to abort them
        !           608: in situations where they are not wanted.
        !           609: However, if used with care they can help to reduce the size of the \twiglang{}
        !           610: specification.
        !           611: 
        !           612: \section{Some Examples}
        !           613: 
        !           614: \subsection{Expression Trees to Prefix}
        !           615: A \twiglang{} program to print out the prefix form of expression
        !           616: parse trees
        !           617: is shown in Figure~\ref{trees:prefix}.
        !           618: 
        !           619: \begin{itemize}
        !           620: 
        !           621: \item The rules do not have cost calculations.
        !           622: Since there are no
        !           623: ambiguities costs are not necessary.
        !           624: 
        !           625: \item The second rule is a topdown mode rule.  This is essential in printing
        !           626: the prefix form.  If it was omitted the postfix form would be printed.
        !           627: 
        !           628: \item Before any matching can begin {\tt \_matchinit} must be called.
        !           629: Trees can then be processed by calling {\tt \_match} after arranging that
        !           630: the call
        !           631: {\tt mtGetNodes}$({\tt NULL},0)$ returns the root of the subject tree.
        !           632: \end{itemize}
        !           633: 
        !           634: \begin{figure}[tp]
        !           635: \begin{tabbing}
        !           636: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
        !           637: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
        !           638: prologue \> $\{$\\[4pt]
        !           639: typedef \> struct node $*$\>\>{\tt NODEPTR};\\
        !           640: \#define \> {\tt COST} \>\> int\\
        !           641: \#define \> {\tt INFINITY}\>\>  100000\\
        !           642: \#define \> {\tt DEFAULT\_COST} \>\> 0\\[4pt]
        !           643: {\tt NODEPTR} Root;\\
        !           644: struct code \> $\{$\\
        !           645: \> char op; \> \> $/*$ null if node is a leaf $*/$\\
        !           646: \> NODEPTR left, right;\\
        !           647: \> char $*$id;\\
        !           648: $\}$;\\[4pt]
        !           649: $\}$;\\[6pt]
        !           650: node \> nOp nIdent;\\
        !           651: label \> lExpr;\\[6pt]
        !           652: lExpr:\> nIdent\\
        !           653: \>= $\{$ printf(``$\%$s'', id); $\}$;\\[4pt]
        !           654: lExpr:\>nOp(lExpr,lExpr)\\
        !           655: \>$\{$ {\tt TOPDOWN};$\}$\\
        !           656: \>= $\{$\\
        !           657: \>\>putchar($\$\$\to$op);\\
        !           658: \>\>tDO($\$\%1\$$);\\
        !           659: \>\>tDO($\$\%2\$$);\\
        !           660: \>$\}$;\\[6pt]
        !           661: insert\>$\{$\\[4pt]
        !           662: mtValue(root)\\
        !           663: \>{\tt NODEPTR} root;\\
        !           664: $\{$\\
        !           665: \>if(root$\to$op==0)\\
        !           666: \>\>return(nIdent);\\
        !           667: \>else\\
        !           668: \>\>return(nOp);\\
        !           669: $\}$\\[4pt]
        !           670: \end{tabbing}
        !           671: \caption{Printing Expression trees to Prefix form}
        !           672: \end{figure}
        !           673: \addtocounter{figure}{-1}
        !           674: \begin{figure}[tp]
        !           675: \begin{tabbing}
        !           676: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
        !           677: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
        !           678: mtGetNodes(r,n)\\
        !           679: \>{\tt NODEPTR} r;\\
        !           680: \>int n;\\
        !           681: $\{$\\
        !           682: \>if(n==1)\\
        !           683: \>\>return(r$\to$left);\\
        !           684: \>else if(n==1)\\
        !           685: \>\>return(r$\to$right);\\
        !           686: \>else if(n==0 $\&\&$ r=={\tt NULL})\\
        !           687: \>\>return(Root);\\
        !           688: \>else return({\tt NULL});\\
        !           689: $\}$\\[4pt]
        !           690: mtSetNodes(r,n, c)\\
        !           691: \>{\tt NODEPTR} r, c;\\
        !           692: \>int n;\\
        !           693: $\{$\\
        !           694: \>if(n==1)\\
        !           695: \>\>r$\to$left = c;\\
        !           696: \>else if(n==2)\\
        !           697: \>\>r$\to$right = c;\\
        !           698: \>else if(n==0)\\
        !           699: \>\>Root = c;\\
        !           700: $\}$\\[4pt]
        !           701: main(argc,argv)\>\>\>$/*$ called by user only $*/$\\
        !           702: \>char $**$argv;\\
        !           703: $\{$\\
        !           704: \>\_matchinit();\>\>$/*$ initialize the matcher $*/$\\
        !           705: \>$...$ {\sl get a tree and set Root to it}\\
        !           706: \>\_match();\>\>$/*$ do the match $*/$\\
        !           707: $\}$\\
        !           708: $\}$\\
        !           709: \end{tabbing}
        !           710: \caption{(cont.) Printing Expression trees to Prefix form}
        !           711: \label{trees:prefix}
        !           712: \end{figure}
        !           713: 
        !           714: \subsection{Short Circuited Boolean Evaluation}
        !           715: 
        !           716: Short circuited boolean evaluation is naturally topdown.
        !           717: A fragment of \twiglang{} program that implements this
        !           718: is shown in Figure~\ref{short:circuit}.
        !           719: 
        !           720: \begin{itemize}
        !           721: 
        !           722: \item Labels for true and false branches are passed down to
        !           723: rules below by putting them into the nodes.  Another way of
        !           724: accomplishing this is to use an auxillary argument stack.
        !           725: 
        !           726: \item The rules assume that every test generates a branch to both
        !           727: the false and true label.  On conventional machines, extra branches
        !           728: can be eliminated by recognizing that one could generate code that
        !           729: falls through to its true or false labels.  This can be done by
        !           730: separating {\sl lTest} into {\sl lTrueTest} and {\sl lFalseTest} which
        !           731: branches only when true and false respectively.  Rules can then be
        !           732: written to take advantage of this.
        !           733: \end{itemize}
        !           734: 
        !           735: \begin{figure}
        !           736: \begin{tabbing}
        !           737: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
        !           738: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
        !           739: prologue \> $\{$\\[4pt]
        !           740: \#include\>``otherdefs.h''\\
        !           741: union node $\{$\\
        !           742: \> $...$ \>\>$/*$ definition of other node types $*/$\\
        !           743: \> union $\{$\\
        !           744: \>\> int operation;\\
        !           745: \>\> {\tt NODEPTR} left, right;\\
        !           746: \>\> {\tt LABEL} falselab, truelab;\>\>$/*$ {\tt LABEL} will defined
        !           747: elsewhere $*/$\\
        !           748: \>\> $...$ \>\> $/*$ other definitions relevant to {\sl test nodes} $*/$\\
        !           749: \> $\}$ test;
        !           750: \> $...$ \>\>$/*$ definition of other node types $*/$\\
        !           751: $\}$;\\[4pt]
        !           752: $\}$;\\[6pt]
        !           753: node\>nAnd nOr nNot nLess;\\
        !           754: label\>lTest lExpr;\\[6pt]
        !           755: lTest:\>nAnd(lTest,lTest)\\
        !           756: \>$\{$ {\tt TOPDOWN}; $\}$\\
        !           757: \>=$\{$\\
        !           758: \>\>$\$1\$\to$test.falselab = $\$\$\to$test.falselab;\\
        !           759: \>\>$\$1\$\to$test.truelab = getlabel();\\
        !           760: \>\>tDO($\$\%1\$$);\\
        !           761: \>\>printlab($\$1\$\to$test.truelab);\\
        !           762: \>\>$\$2\$\to$test.falselab = $\$\$\to$test.falselab;\\
        !           763: \>\>$\$2\$\to$test.truelab = $\$\$\to$test.truelab;\\
        !           764: \>\>tDO($\$\%2\$$);\\
        !           765: \>$\}$;\\[4pt]
        !           766: lTest:\>nOr(lTest,lTest)\\
        !           767: \>$\{$ {\tt TOPDOWN}; $\}$\\
        !           768: \>=$\{$\\
        !           769: \>\>$\$1\$\to$test.falselab = getlabel();\\
        !           770: \>\>$\$1\$\to$test.truelab = $\$\$\to$test.truelab;\\
        !           771: \>\>tDO($\$\%1\$$);\\
        !           772: \>\>$\$2\$\to$test.truelab = $\$\$\to$test.truelab;\\
        !           773: \>\>$\$2\$\to$test.falselab = $\$\$\to$test.falselab;\\
        !           774: \>\>tDO($\$\%2\$$);\\
        !           775: \>$\}$;\\
        !           776: \end{tabbing}
        !           777: \caption{Short circuited Boolean Evaluation}
        !           778: \label{short:circuit}
        !           779: \end{figure}
        !           780: \addtocounter{figure}{-1}
        !           781: \begin{figure}[tp]
        !           782: \begin{tabbing}
        !           783: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
        !           784: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
        !           785: lTest:\>nNot(lTest)\\
        !           786: \>$\{$ {\tt TOPDOWN}; $\}$\\
        !           787: \>=$\{$\\
        !           788: \>\>$\$1\$\to$test.truelab = $\$\$\to$test.falselab;\\
        !           789: \>\>$\$1\$\to$test.falselab = $\$\$\to$test.truelab;\\
        !           790: \>\>tDO($\$\%1\$$);\\
        !           791: \>$\}$;\\[4pt]
        !           792: lTest:\>nLess(lExpr,lExpr)\>\>$/*$ {\sl not topdown anymore } $*/$\\
        !           793: \>=$\{$ $...$ {\sl generate code for comparison } $...$ $\}$;\\
        !           794: \end{tabbing}
        !           795: \caption{(cont.) Short circuited Boolean Evaluation}
        !           796: \end{figure}
        !           797: 
        !           798: \subsection{Code Generation}
        !           799: 
        !           800: Figure~\ref{example} is an 
        !           801: example of \twiglang{} program that
        !           802: can be used to generate VAX code for
        !           803: the subtract instruction:
        !           804: 
        !           805: \begin{figure}
        !           806: \begin{tabbing}
        !           807: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
        !           808: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
        !           809: prologue \> $\{$\\
        !           810: $/*$ otherdefs.h will have a type definition for {\tt NODEPTR} $*/$\\
        !           811: \#include\>``otherdefs.h''\\
        !           812: \#define\>{\tt TEMP\_COST}\>\>5\\
        !           813: \#define\>{\tt SUB\_COST}\>\>30\\
        !           814: \#define\>{\tt DEC\_COST}\>\>10\\
        !           815: NODEPTR \> gettemp();\\
        !           816: $\}$;\\[4pt]
        !           817: node \> long constant sub;\\
        !           818: label \> operand temp;\\[4pt]
        !           819: operand: \> long;\>\>\>$/*$ rule 1 $*/$\\
        !           820: operand:\>constant;\>\>\>$/*$ rule 2 $*/$\\
        !           821: operand:\>temp;\>\>\>$/*$ rule 3 $*/$\\[4pt]
        !           822: temp:\>operand\>\>\>$/*$ rule 4 $*/$\\
        !           823: \>$\{$ cost = {\tt TEMP\_COST}+$\$\%1\$\to$cost;$\}$\\
        !           824: \>$=\{$\\
        !           825: \>\>NODEPTR t = gettemp();\\
        !           826: \>\> emit(``mov'', $\$\$$, t, 0);\\
        !           827: \>\>return(t);\\
        !           828: \>$\}$;\\[4pt]
        !           829: operand:\>sub(operand,operand)\>\>\>$/*$ rule 5 $*/$\\
        !           830: \>$\{$cost = $\$\%1\$\to$cost + $\$\%2\$\to$cost+{\tt SUB\_COST};$\}$\\
        !           831: \>=$\{$\\
        !           832: \>\>NODEPTR t = gettemp();\\
        !           833: \>\>emit(``sub'', $\$1\$$, $\$2\$$, t, 0);\\
        !           834: \>\>return(t);\\
        !           835: \>$\}$;\\[4pt]
        !           836: temp:\>sub(temp,constant)\>\>\>$/*$ rule 6 $*/$\\
        !           837: \>$\{$\\
        !           838: \>\>if(value($\$2\$$)==1)\\
        !           839: \>\>\>cost = $\$\%1\$\to$cost+{\tt DEC\_COST};\\
        !           840: \>\>else ABORT;\\
        !           841: \>$\}$=$\{$\\
        !           842: \>\>emit(``dec'', $\$1\$$, 0);\\
        !           843: \>\>return($\$1\$$);\\
        !           844: \>$\}$;
        !           845: \end{tabbing}
        !           846: \caption{Small Code Generation Example}
        !           847: \label{example}
        !           848: \end{figure}
        !           849: 
        !           850: \begin{itemize}
        !           851: \item Rules 3 and 4 form a loop.
        !           852: The potential loop: temp$\to$operand$\to$temp$\to$operand$...$ is
        !           853: broken
        !           854: by the matcher recognizing that the cost of the second match of temp
        !           855: does not cost less than the first match of temp.
        !           856: 
        !           857: \item In the cost clause of rule 5, the cost is the sum of the
        !           858: leaves plus the cost of the subtract instruction.
        !           859: The action clause emits code to add the two operands
        !           860: and leave the result in a temporary location.  The temporary is
        !           861: returned as a substitution for the subject tree.
        !           862: 
        !           863: \item Rule 6 handles a special case where the left operand is
        !           864: already in a temporary and the constant is one.  In this case, the
        !           865: temporary is directly decremented and returned as the new tree.
        !           866: \end{itemize}
        !           867: 
        !           868: \section{How to run \Twigcomp{}}
        !           869: 
        !           870: Once a user has written the specification, \twigcomp{} is used to compile it
        !           871: into a subroutine.
        !           872: Figure~\ref{overview} gives an overview of how \twigcomp{} does this.
        !           873: In Figure~\ref{overview}, rounded boxes indicate files and sharp
        !           874: cornered boxes
        !           875: are processing programs.  The user creates the \twiglang{} specification which
        !           876: is represented by the top box.  The specification file must have the
        !           877: suffix ``{\tt .mt}''.  Let's say we have one called {\tt arbor.mt}.
        !           878: To invoke the \twigcomp{} compiler we type:
        !           879: 
        !           880: \begin{verse}
        !           881: {\bf twig} [--{\bf w$xx$}] arbor.mt
        !           882: \end{verse}
        !           883: 
        !           884: The proprocessor reads {\tt arbor.mt} and if there are no errors
        !           885: will create two files: {\tt symbols.h} and {\tt walker.c}.
        !           886: To build {\tt walker.c}, \twigcomp{} uses a template file called {\tt walker.$xx$}
        !           887: where $xx$ is the argument of the optional --{\bf w} flag.
        !           888: If the flag is omitted then $xx$ defaults to ``{\tt c1}''.
        !           889: 
        !           890: {\tt Symbols.h} and {\tt walker.c}
        !           891: can then compiled and linked with the other modules of the
        !           892: user's program.  These other modules should provide the
        !           893: Tree and Cost ADTs described above.
        !           894: {\tt Walker.h} is an auxilliary include file that is referenced by {
        !           895: \tt walker.c}.
        !           896: A typical compile command is:
        !           897: 
        !           898: \begin{verse}
        !           899: {\bf cc} -- {\bf I\sl include\_dir} {\bf -c} walker.c
        !           900: \end{verse}
        !           901: 
        !           902: The --{\bf I\sl include\_dir} argument causes {\bf cc} to look for
        !           903: walker.h in {\sl include\_dir}.  The exact value of {\sl include\_dir}
        !           904: will depend on where the files are stored at your site.
        !           905: The tree matcher is initialized by calling {\tt \_matchinit} and matching
        !           906: can begin by calling {\tt \_match} from
        !           907: the user program.
        !           908: 
        !           909: \section{Organization of the \Twigcomp{} Compiler}
        !           910: The preprocessor is written completely in C and YACC.
        !           911: It can be roughly broken up into four modules:
        !           912: \begin{itemize}
        !           913: \item Lexer --- The lexer performs lexical analysis.
        !           914: The input is tokenized by this module and passed to the parser
        !           915: as required.  Identifiers are looked up in the symbol table
        !           916: and space is reserved for them if they have never been encountered.
        !           917: \item Parser --- This part is written in YACC and recognizes the
        !           918: \twiglang{} language.  When a language construct is recognized actions
        !           919: are invoked.  Declarations, prologues and inserts will invoke symbol
        !           920: table operations when recognized.  Rules are broken up and stored
        !           921: in the symbol table too.  The tree patterns in the rules are unflattened
        !           922: and passed to the machine builder.
        !           923: \item Machine Builder --- This module takes tree patterns and incrementally
        !           924: builds an internal representation of the matching automaton.
        !           925: The builder takes this
        !           926: tree and then enumerates each of the possible paths from the root to the
        !           927: leaves.  These paths are treated as strings and a string matching automaton
        !           928: is built as described in \cite{fgrep} and \cite{treematch}.
        !           929: 
        !           930: Inside the builder the partially built machine is kept in a linked list
        !           931: form.  Each machine state is a linked list and each node represents
        !           932: a state transition.  When the tables for the machine are written out into {
        !           933: \tt walker.c}, the linked list structure is
        !           934: translated into a table of sixteen bit integers.
        !           935: Each transistion is stored as two integers.  The first is
        !           936: the integer corresponding to the symbol that causes the transistion and the
        !           937: second is the index of the next state in the table.
        !           938: 
        !           939: \item Symbol Table ---  The central data structure in this symbol table is
        !           940: the hash table.
        !           941: Each entry in the hash table is a bucket --- a linked list of symbol table
        !           942: entries.  There is no rehashing.  All colliding items are placed
        !           943: in the linked list.
        !           944: This is a simple
        !           945: and adequately efficient arrangement.  Depending on the type of symbols, the
        !           946: entry may point to other data structures.  Entries for $label\_id$s point to
        !           947: lists of trees.  $Node\_id$ entries record the integers that have been
        !           948: associates with them.  Other entries may point to data structures holding a
        !           949: textual representation of C code that forms action and cost parts.
        !           950: \end{itemize}
        !           951: 
        !           952: Here is a list of the files that form the \twigcomp{} compiler's source
        !           953: and their approximate function.
        !           954: \begin{itemize}
        !           955: \item {\tt common.h} --- This file contains data definitions for how
        !           956: trees are represented in the parser.  It also contains type definitions for
        !           957: external functions.
        !           958: \item {\tt code.h} --- This file contains definitions for how code fragments
        !           959: are stored.
        !           960: \item {\tt sym.h} --- This file defines the major data structures in the
        !           961: symbol table.
        !           962: \item {\tt machine.h} ---  This file defines the how the matching automaton
        !           963: is represented internally.
        !           964: \item {\tt twig.y} --- This is the parser.
        !           965: \item {\tt sym.c} ---  This is the symbol table manager.
        !           966: \item {\tt path.c} --- The path string enumerator.
        !           967: \item {\tt machine.c} --- The machine builder.
        !           968: \item {\tt lex.c } --- The lexer.
        !           969: \item {\tt tree.c}, {\tt io.c}, and {\tt code.c} --- Miscellaneous
        !           970: routines for manipulating trees, performing I/O and manipulating
        !           971: code fragments.
        !           972: \end{itemize}
        !           973: 
        !           974: \section{Performance}
        !           975: 
        !           976: So far, the only experience we have with \twiglang{}
        !           977: is a VAX code generator written
        !           978: for the pcc2 compiler.  The \twiglang{} code generator is 25\% faster than
        !           979: the original code generator.
        !           980: The maintainability and modifiability of the code generator has
        !           981: improved.  For example adding the indexed addressing mode
        !           982: into the code generator took only a few hours.  The target code quality
        !           983: is just as good as the original code generator.
        !           984: 
        !           985: The \twigcomp{} table
        !           986: generation algorithm is fast.  For the VAX machine description which
        !           987: consist of 109 rules, the generation time was 4.2 seconds on a 780.
        !           988: The generation time could be increased
        !           989: by two orders of magnitude
        !           990: before other table driven system can compete with \twigcomp{} on this basis.
        !           991: Furthermore, the tables are small.  The VAX description is only 7.5K
        !           992: and the text space for the walker is 30K.
        !           993: Again this is much smaller than those of other table driven systems.
        !           994: 
        !           995: \Twiglang{} is currently a research tool.  Several things can be done to improve
        !           996: \twiglang{}'s performance.
        !           997: \begin{itemize}
        !           998: \item  \Twiglang{} does a lot of procedure calls.  Every machine transition
        !           999: require at least one procedure call.  Contrast this to YACC where a machine
        !          1000: transition is done in line.
        !          1001: \item \Twiglang{} performs a very expensive closure operation with respect to
        !          1002: unit rules.  A unit rule is the analogue of a unit production and has the
        !          1003: form:
        !          1004: 
        !          1005: \begin{verse}
        !          1006: $label_1$: $label_2$;
        !          1007: \end{verse}
        !          1008: 
        !          1009: \noindent This closure is done at run time.
        !          1010: It requires many procedure calls and manipulates complex data structures.
        !          1011: We are looking at ways of doing this
        !          1012: at table generation time or to hash the results of the closures so that
        !          1013: redundant calculations can be avoided.  The problem with doing them
        !          1014: in the table generator is the possible explosion in its
        !          1015: running time.
        !          1016: 
        !          1017: \item Many of the cost parts for the rules are similar and hence some
        !          1018: tests are performed and recalculated many times
        !          1019: on a node.  \Twigcomp{} should be clever
        !          1020: enough to factor out some of these tests and do them only once.
        !          1021: However, the information required for \twigcomp{} to do this is not easily
        !          1022: available.  It is hidden in the C code fragments.
        !          1023: 
        !          1024: \item The current compact
        !          1025: representation of the matching automaton
        !          1026: require linear searching except at the
        !          1027: start state where the large branching factor compelled us to use an
        !          1028: array scheme.  Using another representation would speed up the
        !          1029: matcher.  For example, implementing the VAX description as an array is
        !          1030: estimated to use about 50K bytes but may provide
        !          1031: performance improvement.
        !          1032: 
        !          1033: \end{itemize}
        !          1034: \bibliography{mybib}
        !          1035: \bibliographystyle{plain}
        !          1036: \end{document}

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.