Annotation of researchv10no/cmd/twig/doc/manual/twig0.tex, revision 1.1.1.1

1.1       root        1: \documentstyle[titlepage,fullpage,12pt,clbiba]{article}
                      2: \begin{document}
                      3: \newcommand{\twigcomp}{{\tt twig}}
                      4: \newcommand{\Twigcomp}{{\tt Twig}}
                      5: \newcommand{\twiglang}{{\it twig}}
                      6: \newcommand{\Twiglang}{{\it Twig}}
                      7: \title{Twig Reference Manual}
                      8: 
                      9: \date{
                     10: \begin{minipage}{.8\textwidth}
                     11: \normalsize
                     12: \noindent
                     13: \Twiglang{} is a language for manipulating trees.  A \twiglang{}
                     14: program consists of a set
                     15: of pattern-action rules together with associated declarations.  Patterns
                     16: describe trees to be matched.  Actions calculate costs, perform tree
                     17: manipulations and other functions such as emitting code.  A \twiglang{} program is
                     18: translated by the \twigcomp{} compiler into subroutines and tables in a host
                     19: language.  In the current implementation, the host language is C.\\[6pt]
                     20: A \twiglang{} program manipulates trees by first finding a minimum cost covering of
                     21: the input tree.  The actions of the rules whose pattern parts composes the
                     22: covering is then executed. The minimum cost covering is determined using
                     23: dynamic programming.  This technique naturally resolves many ambiguities that
                     24: may be in the specifications.\\[6pt]
                     25: The prime purpose of \twiglang{} is to create tree manipulation programs.
                     26: One interesting application of tree manipulation is code generation and
                     27: \twiglang{} has been used to implement a code generator for the 
                     28: pcc2 compiler on the VAX.
                     29: \end{minipage}}
                     30: 
                     31: \author{Steven W.K. Tjiang}
                     32: 
                     33: \maketitle
                     34: \tableofcontents\newpage
                     35: \section{Introduction}
                     36: 
                     37: \Twiglang{} is a language for writing tree manipulation programs.
                     38: A \twiglang{} program does this by using a matcher to find a minimal
                     39: cost covering of a tree.
                     40: The covering is composed of
                     41: user supplied templates.
                     42: Trees are manipulated by
                     43: executing the actions associated with the templates of their coverings.
                     44: These actions may rewrite the tree, update other data structures, or
                     45: generate output.
                     46: 
                     47: Figure~\ref{overview} gives an overview of the \twiglang{} system.
                     48: The round boxes in the figure indicates files and the sharp cornered boxes
                     49: are processing programs.
                     50: The user writes the \twiglang{} specification file.
                     51: This file is then compiled by the \twigcomp{} compiler into
                     52: host language files.
                     53: In the current version of \twiglang{}, the host language is C.
                     54: The host language files are then compiled and linked with
                     55: other user supplied files.
                     56: The resulting program represented by the large box at the bottom
                     57: of figure~\ref{overview} is the tree manipulation program.
                     58: \begin{figure}[hp]
                     59: \input{pic}
                     60: \caption{The {\bf Twig } system}
                     61: \label{overview}
                     62: \end{figure}
                     63: The details of this box will be discussed later.
                     64: 
                     65: \section{Motivation}
                     66: 
                     67: Although \Twiglang{} is a language for manipulating trees
                     68: it was originally motivated by
                     69: pattern directed code generation.  In many compilers, the typical
                     70: structure is a {\sl front end} and a {\sl back end}.  The front end reads
                     71: the input source program, checks its syntax and contextual conditions,
                     72: and builds an intermediate representation of the program.
                     73: The language used to do this representation is often called the
                     74: {\it intermediate representation} and will be abbreviated as IR.
                     75: The back end
                     76: takes the IR and translates it into the target machine code.
                     77: 
                     78: That isn't the whole story but it's close.  Both ends may be 
                     79: sets of programs rather than one
                     80: program.  There may also be a
                     81: {\it middle} that performs transformations on the
                     82: IR.  The intention of these
                     83: transformations is to improve the target code that comes out of the compiler.
                     84: For this discussion, we can ignore the middle.
                     85: 
                     86: IRs can vary greatly from compiler to compiler.
                     87: However the IRs are implemented, they are encodings of graphs.
                     88: Usually three types of graphs are evident\cite{dragon}.
                     89: There is a flowgraph whose nodes are {\sl basic blocks} and
                     90: the edges corresponds to transfer of control between the basic blocks.
                     91: Basic blocks are not featureless.  They are graphs themselves.
                     92: Some compilers will represent them as a forest of
                     93: {\it directed acyclic graphs} or DAGs.
                     94: The nodes are data values and operations.
                     95: Since operations yields values, one can think of nodes as
                     96: corresponding to values.  An edge exists from node $a$ to
                     97: node $b$ if and only if the value corresponding to $a$ depends on the
                     98: value corresponding to $b$.  Other compilers will use trees instead of DAGs.
                     99: Trees are not
                    100: as general as DAGs and these compilers overcome that by using some
                    101: type of labelling or copying.
                    102: 
                    103: Writing a front end usually
                    104: involves writing a recognizer
                    105: for the source language, designing a symbol table,
                    106: and determining an error recovery strategy.  What has made this process easy
                    107: are parser generators like YACC\cite{yacc}.
                    108: They separate out the task of recognizing the
                    109: source language from the other details.  This makes the task
                    110: intellectually more manageable.
                    111: The compiler implementor now specifies the source language via grammar
                    112: productions.  Symbol table manipulations,
                    113: and IR construction rules are attached to productions and activated
                    114: when a source language construct is recognized.
                    115: The front end is then {\it automatically} generated by processing the
                    116: specification with a parser generator.
                    117: The result is
                    118: a high quality front end whose performance is close to that of a
                    119: hand-crafted front end.
                    120: 
                    121: Providing the same type of facilities for automatically building back
                    122: ends has been much more difficult.  Extending current front end
                    123: parsers to do back ends
                    124: has been workable but not entirely successful.  The basic idea is that
                    125: the back end implementor writes a {\it machine grammar}
                    126: for the IR and code generation
                    127: occurs as actions that are executed when patterns are found
                    128: in the IR\cite{gana,henry}.
                    129: Unfortunately, these grammars are large and ambiguous.  Using
                    130: parsers to handle them have usually met with many combinatorial problems
                    131: \cite{henry}.  Furthermore, these code generators are slow
                    132: although their output is often of high quality.
                    133: The software engineering advantages one see with parsers in the front end
                    134: are not so evident in this circumstance.  Instead of having to deal with
                    135: the complexities of code generation, the implementor now has to deal with
                    136: the complexities of the very large machine grammar.
                    137: 
                    138: Another approach at automatically building back ends is to start with
                    139: known good algorithms for generating code.  From these algorithms, we
                    140: can build an abstract interpreter and then specifications can be written
                    141: to drive them.  We hope that by doing this we can derive a much more natural
                    142: code generation specification technique.
                    143: \Twiglang{} is one possible result of this approach.  The algorithms
                    144: that form the basis of \twiglang{} are presented in \cite{optimalcode} and
                    145: \cite{treematch}.  \Twiglang{}
                    146: specifications are much less complex than parser based
                    147: specifications.  Ambiguities are handled naturally and do not complicate
                    148: the specifications.
                    149: 
                    150: In \cite{optimalcode}, it is shown how optimal code can be generated
                    151: for an expression tree
                    152: given a description of a machine's instruction set.
                    153: This description is in term of tree templates.
                    154: The algorithm first determines a minimal cost covering of
                    155: an expression tree using dynamic programming.
                    156: The covering is composed from the templates in the
                    157: description.  The code generated is then the instructions
                    158: represented by the templates in the covering.
                    159: This algorithm was shown to generate the shortest code sequences for an
                    160: abstract machine and
                    161: its running time was shown to be linear in the number of nodes in the
                    162: expression tree in \cite{optimalcode}.
                    163: 
                    164: This is all very well in theory but what about in practice.
                    165: Code length is  not always the quantity that one wishes to minimize.
                    166: One might wish to minimize the number of memory accesses or machine cycles
                    167: executed by the generated code.
                    168: Fortunately, the algorithm can also minimize these quantities.
                    169: In fact, it can minimize any strictly increasing cost function.
                    170: Let $I$ denote a machine instruction and
                    171: $I_0I_1I_2...I_n$ denote a sequence of instructions.
                    172: Then a cost function $f$ is strictly increasing if and only if
                    173: $f(I_0I_1I_2...I_n) < f(I_0I_1I_2...I_nI_{n+1})$ for all
                    174: instruction sequences $I_0I_1I_2...I_n$ and instruction
                    175: $I_{n+1}$.  Another shortcoming in \cite{optimalcode}
                    176: is that no efficient algorithm is given for matching the templates
                    177: in the tree.  This is where \cite{treematch} comes in.  In that paper,
                    178: two algorithms are presented for tree matching.  \Twiglang{} combines
                    179: the work presented in the two papers.
                    180: 
                    181: \Twiglang{} differs from the algorithm as presented in \cite{optimalcode} in
                    182: the following ways:
                    183: \begin{itemize}
                    184: \item
                    185: It provides a convenient specification language for the templates.
                    186: \item
                    187: Tree templates in \cite{optimalcode} were labelled with either {\sl Register} or
                    188: {\sl Memory}.  This label being the storage class that the instruction corresponding
                    189: to the template would put a result.
                    190: \Twiglang{} generalize tree labels so that
                    191: they can be any symbol.
                    192: \item
                    193: Instead of associating an instruction with each template, \twiglang{} associates
                    194: an action.
                    195: \item
                    196: In \cite{optimalcode}, code is generated by a bottom up traversal of the
                    197: minimum cost covering.  \Twiglang{} allows a top down traversal of the cover
                    198: and also the ability for the actions to rewrite parts of the
                    199: tree.
                    200: \item
                    201: The algorithm in \cite{optimalcode} determines the evaluation order
                    202: which gives the optimal code.
                    203: \Twiglang{} does not do this.
                    204: The evaluation order determination was omitted based on the conjecture that
                    205: most of the time, simple left to right evaluation of the leaves will yield
                    206: near optimal results.
                    207: In situations where a nonstandard evaluation order is required, a new
                    208: \twiglang{} pattern could be added that codes the order explicitly
                    209: (see Section~\ref{exec:phase}).
                    210: \end{itemize}
                    211: 
                    212: In this manual, a brief discussion will be given of the \twiglang{} language
                    213: and how \twiglang{} programs can be incorporated with C.
                    214: 
                    215: Many statements have been made in this manual without proof.
                    216: If the reader wishes to pursue it further, more details about the algorithms
                    217: will be presented in an upcoming paper,
                    218: \cite{aho:gana:tjiang}.
                    219: 
                    220: \section{The \Twiglang{} Language}
                    221: \subsection{Lexical Issues and Conventions}
                    222: Currently the
                    223: following identifiers are reserved keywords of \twiglang{}:
                    224: 
                    225: \begin{tabbing}
                    226: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
                    227: action \> cost \> insert \\
                    228: label \> node \> prologue
                    229: \end{tabbing}
                    230: 
                    231: \noindent ;:(),= are special characters.  All blanks, tabs, formfeeds and
                    232: newlines are ignored by \twiglang{} but they may not appear inside
                    233: identifiers and numbers.  Identifiers are nonempty strings
                    234: made of letters, digits or underscores and starting with a letter.
                    235: Numbers are nonempty string of digits.  Code fragments or action parts
                    236: are enclosed in braces.  Inside code fragments, C lexical rules must
                    237: be obeyed except that strings of the form $\$...\$$ that are not inside C
                    238: strings have special meaning to \twiglang{}.  In the following sections, $id$
                    239: denotes an identifier; $int_1$, $int_2$ denotes numbers; $ccode$
                    240: denotes C code fragments; $...$ indicates repetition of the previous
                    241: item; $[...]$ indicates that $...$ is optional.
                    242: 
                    243: The input to the \twiglang{} program will be referred to as the subject tree.
                    244: 
                    245: \subsection{Rules}
                    246: 
                    247: The fundamental unit of a \twiglang{} program is a rule.
                    248: 
                    249: \begin{verse}
                    250: $label\_id$ {\bf :} $tree\_pattern$\quad {\bf [} $cost$ {\bf ]}
                    251: {\bf [ =} $action$ {\bf ]}
                    252: \end{verse}
                    253: 
                    254: \noindent Intuitively, the pattern is used to match a subtree.
                    255: The $cost$ code
                    256: fragment is then evaluated.  The resulting cost is recorded by
                    257: the matcher for use in dynamic programming.  The $action$ is executed
                    258: if the rule is part of the least cost cover of the tree.
                    259: The details of pattern matching will be discussed in Section~\ref{pattern}.
                    260: 
                    261: If the $cost$ part is missing, \twigcomp{} will insert default code
                    262: that returns the special value, {\tt DEFAULT\_COST}.
                    263: A missing $action$ part indicates that nothing will be done when a
                    264: match is found.
                    265: 
                    266: \subsection{Tree Patterns}
                    267: \label{TreePatterns}
                    268: Tree patterns are specified
                    269: in prefix parenthesized form and can be described by the
                    270: following BNF:
                    271: 
                    272: \begin{verse}
                    273: $tree\_pattern\to node\_id\;\vert\;label\_id$\\
                    274: $tree\_pattern\to node\_id \;{\bf (}\; tree\_list \;{\bf )}$\\
                    275: $tree\_list \to tree \;{\bf ,}\; tree\_list \;\vert\; tree$\\
                    276: \end{verse}
                    277: 
                    278: \begin{figure}
                    279: \unitlength 0.1in
                    280: \begin{center}
                    281: \begin{picture}(25,16)
                    282: \put(7,0){\framebox(4,3){\tt id}}
                    283: \put(20,0){\framebox(5,3){\tt const}}
                    284: \put(13,6){\framebox(5,3){\tt plus}}
                    285: \put(6,12){\framebox(5,3){\tt plus}}
                    286: \put(0,6){\framebox(4,3){\tt id}}
                    287: \put(11,3){\line(2,3){2}}
                    288: \put(20,3){\line(-2,3){2}}
                    289: \put(13,9){\line(-2,3){2}}
                    290: \put(4,9){\line(2,3){2}}
                    291: \end{picture}
                    292: \end{center}
                    293: \vspace{0.5in}
                    294: \begin{verse}
                    295: \begin{center}
                    296: \tt plus(id, plus(id, const))
                    297: \end{center}
                    298: \end{verse}
                    299: \caption{A Tree and its prefixed form}
                    300: \label{treepattern}
                    301: \end{figure}
                    302: \noindent Figure~\ref{treepattern} gives an example of a tree pattern
                    303: and its prefix parenthesized form.
                    304: 
                    305: There are two types of symbols: $node\_id$\/s and $label\_id$\/s.
                    306: $Node\_id$\/s are used to denote internal nodes and leaves.
                    307: $Label\_id$\/s label tree patterns and are analogous to
                    308: nonterminals in context free grammars.
                    309: For example, the following \twiglang{} rules without their action parts describe
                    310: simple expression trees with the {\tt plus} operator.
                    311: 
                    312: \begin{verse}
                    313: {\tt\begin{tabbing}
                    314: expr: \= plus(expr, expr)\\
                    315: expr: \> identifier\\
                    316: expr: \> constant
                    317: \end{tabbing}}
                    318: \end{verse}
                    319: 
                    320: \noindent Here, {\tt identifier} and {\tt constant} are $node\_id$\/s
                    321: representing leaves, and
                    322: {\tt plus} is a $node\_id$ representing an internal node whereas
                    323: {\tt expr} is a
                    324: $label\_id$.  Pattern leaves that are $label\_id$\/s are called
                    325: {\sl labelled leaves}.
                    326: In the first rule, both leaves of the pattern are labelled.
                    327: 
                    328: \Twigcomp{} associates an integer with each node symbol and label symbol.  These
                    329: integers are used by the \twiglang{} pattern matcher as encodings for the
                    330: symbols.
                    331: As the matcher traverses the
                    332: tree, a user supplied subroutine is called to provide an integer
                    333: corresponding to each node.
                    334: 
                    335: \subsection{Costs}
                    336: 
                    337: To increase the flexibility of representing costs, the tree matcher
                    338: views costs as an {\sl abstract data type} or ADT.
                    339: For example, costs may be represented as an integer or
                    340: as a vector of integers with each element representing the cost of a specific
                    341: resource.
                    342: A cost ADT suitable for \twiglang{} must have the following four definitions:
                    343: 
                    344: \begin{itemize}
                    345: \item {\tt COST} is a C type.
                    346: Although the proper functioning of the tree matcher
                    347: does not depend on the internal details of the {\tt COST} type, it must have
                    348: the type information for storage allocation purposes.
                    349: \item {\tt INFINITY} is the maximum attainable cost value.
                    350: The matcher uses {\tt INFINITY} to initialize its data structures.
                    351: \item {\tt DEFAULT\_COST} is the cost
                    352: value returned by rules without a cost part.
                    353: \item
                    354: {\tt COSTLESS} is a function of
                    355: two cost values.  It must return true if and only if the first argument is
                    356: less than the second.
                    357: \end{itemize}
                    358: 
                    359: \subsection{Trees}
                    360: 
                    361: As with costs, trees are treated as an ADT.  Here, using an ADT is
                    362: even more important because trees come in a variety of shapes and
                    363: representations.
                    364: \Twiglang{} would be overly complicated if it had to
                    365: know any more than the fundamental properties of trees.  Thus,
                    366: \twiglang{} treats trees as an acyclic directed
                    367: graph of almost featureless nodes with one
                    368: distinguished node, the root.  Each node has only one feature and that
                    369: is an integer corresponding to the $node\_id$\/s discussed above.
                    370: To provide this
                    371: view to \twiglang{}, the user must provide the following definitions and
                    372: subroutines.
                    373: 
                    374: \begin{itemize}
                    375: \item {\tt NODEPTR} is the type of a node.  The actual
                    376: details of
                    377: the {\tt NODEPTR}
                    378: are irrelevant and \twigcomp{} uses this definition only for storage
                    379: allocation and declaration purposes.
                    380: \item
                    381: {\tt NULL} is used to denote a null {\tt NODEPTR} value.
                    382: In the current implementation this is the same {\tt NULL} used by the
                    383: C standard I/O package and need not be defined by the user.
                    384: This restricts the {\tt NODEPTR} representation to be a pointer.
                    385: \item
                    386: {\tt mtGetNodes$(r,n)$} returns the $n$th child of $r$ where $r$
                    387: is a {\tt NODEPTR}.  The
                    388: leftmost child is {\tt mtGetNodes}$(r,1)$.  If $n > \hbox{degree}(r)$
                    389: then the function should return {\tt NULL}.  \Twigcomp{} expects the
                    390: expression {\tt mtGetNodes}$({\tt NULL}, 0)$
                    391: to denote the root node of
                    392: the subject tree.
                    393: \item
                    394: {\tt mtValue$(r)$} returns the integer associated with $r$
                    395: (See section~\ref{TreePatterns}).
                    396: \item
                    397: {\tt mtSetNodes$(r,n,c)$} replaces the $n$th child of $r$ with
                    398: $c$.  This routine may be used to replace whole subtrees with others.
                    399: {\tt mtSetNodes}$({\tt NULL},0,c)$ is used by \twigcomp{} to set the
                    400: root of the subject tree to $c$.
                    401: \end{itemize}
                    402: 
                    403: \subsection{Prologue and Inserts}
                    404: 
                    405: \begin{verse}
                    406: {\tt prologue} $ccode$;
                    407: \end{verse}
                    408: 
                    409: \noindent signals to \twigcomp{} that $ccode$ should be inserted at the
                    410: beginning of the output source file.  $Ccode$ should contain
                    411: definitions relevant to the C code in
                    412: rules that are defined later in the \twiglang{} source file.
                    413: 
                    414: \begin{verse}
                    415: {\tt insert} $ccode$;
                    416: \end{verse}
                    417: 
                    418: \noindent causes $ccode$ to be placed into the source file.
                    419: There can be multiple inserts and they will be placed into the source
                    420: file
                    421: in the order that they appear.
                    422: 
                    423: \subsection{Declarations}
                    424: 
                    425: All \twiglang{} identifiers are declared before they are used.
                    426: 
                    427: \begin{verse}
                    428: {\tt node $id[(int_1)] [= int_2] ...$;\hfil}
                    429: \end{verse}
                    430: 
                    431: \noindent A node declaration declares one or more identifier to be
                    432: associated with nodes of the subject tree.  The identifiers are
                    433: assigned numbers by \twigcomp{} but the user can overide the assigned number
                    434: by specifying $int_2$.  The degree of the node identifer, the
                    435: number of children, is assumed to be fixed.  Normally, \twigcomp{} will
                    436: deduce the degree when $id$ is used in a rule.  However, the user may
                    437: explicitly give the degree by specifying $int_1$.
                    438: 
                    439: \begin{verse}
                    440: {\tt label $id ...$;}
                    441: \end{verse}
                    442: 
                    443: A label declaration indicates that the following $id$'s are to be used
                    444: as labels of rules.
                    445: 
                    446: \subsection{Costs and Action code}
                    447: 
                    448: Code fragments such as the $cost$ and $action$ clauses of a rule are
                    449: specfied by enclosing C code in braces.  All legal C constructs are
                    450: permitted in code fragments.  In addition, the following are provided
                    451: for convenient access to the subject tree and internal data
                    452: structures of the matcher.
                    453: 
                    454: \begin{itemize}
                    455: \item $\$\%n\$$ denotes a pointer value to the matcher data
                    456: structure for the $n$th labelled leaf.  Thus to access the cost value
                    457: associated with that leaf, the notation $\$\%n\$\to${\tt cost} may be used.
                    458: 
                    459: \item $\$\$$ denotes a pointer value to the root of the subject
                    460: tree.
                    461: 
                    462: \item $\$n_1.n_2.n_3.\, \hbox{...}\,.n_{k-1}.n_k\$$ denotes
                    463: a pointer value to the $n_k$th child of the $n_{k-1}th$ child of the
                    464: $n_{k-2}$th child of \hbox{...} of the $n_1$th child of the root of
                    465: the subject tree.  Each $n_i$ is a positive non-zero integer.
                    466: \end{itemize}
                    467: 
                    468: Some special constructs are available to code fragments in the cost
                    469: part of the specification.  The
                    470: statement ``{\tt ABORT;}'' when encountered during the execution of
                    471: the cost code, signals to the matcher that this pattern is to be
                    472: rejected.  When a ``{\tt REWRITE;}'' statement is encountered, control
                    473: is returned to the matcher and the rule will become a {\sl rewrite}
                    474: mode match.  When the end of the cost code fragment is reached,
                    475: control is returned to the matcher and the rule becomes a {\sl defer}
                    476: mode match.
                    477: The statement ``{\tt TOPDOWN;}'' is like ``{\tt REWRITE;}'' except that
                    478: the mode of the match becomes {\sl topdown}.  The meanings of these modes
                    479: will be explained in Section~\ref{pattern}.
                    480: 
                    481: Cost values are returned to the matcher by assigning to
                    482: the ``{\tt cost}'' variable in the cost clause.  If no assignment is
                    483: made to the {\tt cost} variable then the returned cost will be {\tt
                    484: DEFAULT\_COST}.
                    485: 
                    486: Action clauses are expected to return a new tree which will replace
                    487: the subject tree.  This is done by returning using the
                    488: ``{\tt return($new\_tree$);}'' statement.
                    489: where {\tt $new\_tree$} is of type
                    490: {\tt NODEPTR}.
                    491: If execution reaches the end of the action clause, the
                    492: matcher resumes execution and the subject tree is not modified.
                    493: 
                    494: \section{Pattern Matcher Operation}
                    495: \label{pattern}
                    496: 
                    497: The
                    498: pattern matcher operates in two phases: the costing phase and the execution
                    499: phase.
                    500: During
                    501: the costing phase, a minimal cost covering of the subject tree is found.
                    502: The execution phase invokes the actions that are associated with the patterns
                    503: making up the covering.  Execution phases may begin during the costing phase
                    504: to execute the covering of a subtree as described in Section~\ref{exec:phase}.
                    505: 
                    506: \subsection{The Costing Phase}
                    507: \label{cost:phase}
                    508: 
                    509: Let the patterns of the \twiglang{} specifications be interpreted as grammar
                    510: productions with {\sl label\_id}s as non-terminals and {\sl node\_id}s
                    511: as terminals.
                    512: If we add a production
                    513: $S\,\to\,\hbox{\sl label\_id}$ for all {\sl label\_id}s where $S$ is the
                    514: start symbol,
                    515: then a covering for a tree is analogous to a derivation
                    516: for the prefixed form of the tree.
                    517: Figure~\ref{gram:interp} shows some patterns and its corresponding grammar.
                    518: \begin{figure}
                    519: \begin{center}
                    520: \begin{verse}
                    521: \tt
                    522: e: plus(e,e)\\
                    523: e: plus(e, plus(e,e))\\
                    524: e: id\\
                    525: e: const\\
                    526: \end{verse}
                    527: \end{center}
                    528: \vspace{0.3in}
                    529: \begin{center}\begin{verse}
                    530: $S\to e$\\
                    531: $e\to plus(e,e)$\\
                    532: $e\to plus(e,plus(e,e)$\\
                    533: $e\to id$\\
                    534: $e\to const$\\
                    535: \end{verse}\end{center}
                    536: \caption{Pattern and Grammar}
                    537: \label{gram:interp}
                    538: \end{figure}
                    539: The productions of the form $S\to\hbox{\sl label\_id}$ reflect the fact that
                    540: any {\sl label\_id} may start a cover.
                    541: 
                    542: The matcher finds the least cost cover by doing a preorder traversal of
                    543: the subject tree.
                    544: At the same time, it
                    545: builds a skeleton tree that is structurally isomorphic to the subject tree.
                    546: When a match is discovered the cost clause of the pattern is invoked to
                    547: calculate the cost.  Many patterns with different labels could match at any
                    548: given node but only the lowest cost pattern for each label is recorded in
                    549: the skeleton.
                    550: 
                    551: When a pattern is matched, its label is then used as input to the pattern
                    552: matcher so that matching of patterns with labelled leaves can begin.
                    553: This process
                    554: is analogous to a reduce action in
                    555: bottom up parsers.
                    556: 
                    557: \subsection{The Execution Phase}
                    558: \label{exec:phase}
                    559: 
                    560: The execution phase starts when the costing phase is complete or when
                    561: a sufficiently low cost {\sl rewrite} mode rule is encountered.
                    562: Let $M$ be a matching
                    563: rule and $L_1, L_2, L_3, ... ,L_n$ be
                    564: the labelled leaves of the matching pattern.
                    565: The following procedure is used to execute the action parts of $M$.
                    566: 
                    567: \vbox{
                    568: \vspace{2ex}
                    569: \noindent {\bf Procedure \it Execute}
                    570: 
                    571: \begin{itemize}
                    572: \item If $M$ is a {\sl deferred} mode match then
                    573: execution occurs by first applying {\it Execute} to the $L_i$s
                    574: from left to right.  That is, in the order that they appear in
                    575: the prefix form of the tree.
                    576: When all the $L_i$s are executed, the action part of $M$ is
                    577: invoked.
                    578: 
                    579: \item If $M$ is a {\sl rewrite } mode match then
                    580: just the action part of $M$ is executed.  When the action part returns
                    581: the matcher deletes the skeleton corresponding to the subtree that
                    582: $M$ matched and rescans the new subtree that may have been substituted
                    583: by executing $M$.
                    584: 
                    585: \item If $M$ is a {\sl topdown} mode match then only
                    586: the action part of $M$ is executed.  To execute a
                    587: labelled leaves, $L_i$, $M$'s action part may do so explicitly
                    588: by using the {\tt tDO($\$\%i\$$)} macro.  This allows
                    589: the user to arbitrary order the execution of the labelled leaves.
                    590: \end{itemize}
                    591: }
                    592: 
                    593: \noindent When the costing phase is over, the execution phase is started
                    594: by picking the lowest cost match at the root of the subject tree.
                    595: This will be the root of the minimum cost cover.
                    596: The match is executed as described above.
                    597: 
                    598: The execution phase may begin before the costing phase is over.
                    599: This occurs when a rewrite rule matches and its cost is lower than
                    600: all other matches at that subtree.  In this situation the rewrite rule
                    601: is executed immediately and the new rewritten subtree is scanned once more.
                    602: The prescence of rewrite rules throws a wrench in the theoretical niceties
                    603: of the matching algorithm.  For example, there is now no guarantee
                    604: that the algorithm will terminate because the tree can
                    605: be repeatedly enlarged.
                    606: Rewrite rules can be dangerous.  They can
                    607: be triggered unintentionally if the user is not careful to abort them
                    608: in situations where they are not wanted.
                    609: However, if used with care they can help to reduce the size of the \twiglang{}
                    610: specification.
                    611: 
                    612: \section{Some Examples}
                    613: 
                    614: \subsection{Expression Trees to Prefix}
                    615: A \twiglang{} program to print out the prefix form of expression
                    616: parse trees
                    617: is shown in Figure~\ref{trees:prefix}.
                    618: 
                    619: \begin{itemize}
                    620: 
                    621: \item The rules do not have cost calculations.
                    622: Since there are no
                    623: ambiguities costs are not necessary.
                    624: 
                    625: \item The second rule is a topdown mode rule.  This is essential in printing
                    626: the prefix form.  If it was omitted the postfix form would be printed.
                    627: 
                    628: \item Before any matching can begin {\tt \_matchinit} must be called.
                    629: Trees can then be processed by calling {\tt \_match} after arranging that
                    630: the call
                    631: {\tt mtGetNodes}$({\tt NULL},0)$ returns the root of the subject tree.
                    632: \end{itemize}
                    633: 
                    634: \begin{figure}[tp]
                    635: \begin{tabbing}
                    636: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
                    637: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
                    638: prologue \> $\{$\\[4pt]
                    639: typedef \> struct node $*$\>\>{\tt NODEPTR};\\
                    640: \#define \> {\tt COST} \>\> int\\
                    641: \#define \> {\tt INFINITY}\>\>  100000\\
                    642: \#define \> {\tt DEFAULT\_COST} \>\> 0\\[4pt]
                    643: {\tt NODEPTR} Root;\\
                    644: struct code \> $\{$\\
                    645: \> char op; \> \> $/*$ null if node is a leaf $*/$\\
                    646: \> NODEPTR left, right;\\
                    647: \> char $*$id;\\
                    648: $\}$;\\[4pt]
                    649: $\}$;\\[6pt]
                    650: node \> nOp nIdent;\\
                    651: label \> lExpr;\\[6pt]
                    652: lExpr:\> nIdent\\
                    653: \>= $\{$ printf(``$\%$s'', id); $\}$;\\[4pt]
                    654: lExpr:\>nOp(lExpr,lExpr)\\
                    655: \>$\{$ {\tt TOPDOWN};$\}$\\
                    656: \>= $\{$\\
                    657: \>\>putchar($\$\$\to$op);\\
                    658: \>\>tDO($\$\%1\$$);\\
                    659: \>\>tDO($\$\%2\$$);\\
                    660: \>$\}$;\\[6pt]
                    661: insert\>$\{$\\[4pt]
                    662: mtValue(root)\\
                    663: \>{\tt NODEPTR} root;\\
                    664: $\{$\\
                    665: \>if(root$\to$op==0)\\
                    666: \>\>return(nIdent);\\
                    667: \>else\\
                    668: \>\>return(nOp);\\
                    669: $\}$\\[4pt]
                    670: \end{tabbing}
                    671: \caption{Printing Expression trees to Prefix form}
                    672: \end{figure}
                    673: \addtocounter{figure}{-1}
                    674: \begin{figure}[tp]
                    675: \begin{tabbing}
                    676: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
                    677: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
                    678: mtGetNodes(r,n)\\
                    679: \>{\tt NODEPTR} r;\\
                    680: \>int n;\\
                    681: $\{$\\
                    682: \>if(n==1)\\
                    683: \>\>return(r$\to$left);\\
                    684: \>else if(n==1)\\
                    685: \>\>return(r$\to$right);\\
                    686: \>else if(n==0 $\&\&$ r=={\tt NULL})\\
                    687: \>\>return(Root);\\
                    688: \>else return({\tt NULL});\\
                    689: $\}$\\[4pt]
                    690: mtSetNodes(r,n, c)\\
                    691: \>{\tt NODEPTR} r, c;\\
                    692: \>int n;\\
                    693: $\{$\\
                    694: \>if(n==1)\\
                    695: \>\>r$\to$left = c;\\
                    696: \>else if(n==2)\\
                    697: \>\>r$\to$right = c;\\
                    698: \>else if(n==0)\\
                    699: \>\>Root = c;\\
                    700: $\}$\\[4pt]
                    701: main(argc,argv)\>\>\>$/*$ called by user only $*/$\\
                    702: \>char $**$argv;\\
                    703: $\{$\\
                    704: \>\_matchinit();\>\>$/*$ initialize the matcher $*/$\\
                    705: \>$...$ {\sl get a tree and set Root to it}\\
                    706: \>\_match();\>\>$/*$ do the match $*/$\\
                    707: $\}$\\
                    708: $\}$\\
                    709: \end{tabbing}
                    710: \caption{(cont.) Printing Expression trees to Prefix form}
                    711: \label{trees:prefix}
                    712: \end{figure}
                    713: 
                    714: \subsection{Short Circuited Boolean Evaluation}
                    715: 
                    716: Short circuited boolean evaluation is naturally topdown.
                    717: A fragment of \twiglang{} program that implements this
                    718: is shown in Figure~\ref{short:circuit}.
                    719: 
                    720: \begin{itemize}
                    721: 
                    722: \item Labels for true and false branches are passed down to
                    723: rules below by putting them into the nodes.  Another way of
                    724: accomplishing this is to use an auxillary argument stack.
                    725: 
                    726: \item The rules assume that every test generates a branch to both
                    727: the false and true label.  On conventional machines, extra branches
                    728: can be eliminated by recognizing that one could generate code that
                    729: falls through to its true or false labels.  This can be done by
                    730: separating {\sl lTest} into {\sl lTrueTest} and {\sl lFalseTest} which
                    731: branches only when true and false respectively.  Rules can then be
                    732: written to take advantage of this.
                    733: \end{itemize}
                    734: 
                    735: \begin{figure}
                    736: \begin{tabbing}
                    737: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
                    738: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
                    739: prologue \> $\{$\\[4pt]
                    740: \#include\>``otherdefs.h''\\
                    741: union node $\{$\\
                    742: \> $...$ \>\>$/*$ definition of other node types $*/$\\
                    743: \> union $\{$\\
                    744: \>\> int operation;\\
                    745: \>\> {\tt NODEPTR} left, right;\\
                    746: \>\> {\tt LABEL} falselab, truelab;\>\>$/*$ {\tt LABEL} will defined
                    747: elsewhere $*/$\\
                    748: \>\> $...$ \>\> $/*$ other definitions relevant to {\sl test nodes} $*/$\\
                    749: \> $\}$ test;
                    750: \> $...$ \>\>$/*$ definition of other node types $*/$\\
                    751: $\}$;\\[4pt]
                    752: $\}$;\\[6pt]
                    753: node\>nAnd nOr nNot nLess;\\
                    754: label\>lTest lExpr;\\[6pt]
                    755: lTest:\>nAnd(lTest,lTest)\\
                    756: \>$\{$ {\tt TOPDOWN}; $\}$\\
                    757: \>=$\{$\\
                    758: \>\>$\$1\$\to$test.falselab = $\$\$\to$test.falselab;\\
                    759: \>\>$\$1\$\to$test.truelab = getlabel();\\
                    760: \>\>tDO($\$\%1\$$);\\
                    761: \>\>printlab($\$1\$\to$test.truelab);\\
                    762: \>\>$\$2\$\to$test.falselab = $\$\$\to$test.falselab;\\
                    763: \>\>$\$2\$\to$test.truelab = $\$\$\to$test.truelab;\\
                    764: \>\>tDO($\$\%2\$$);\\
                    765: \>$\}$;\\[4pt]
                    766: lTest:\>nOr(lTest,lTest)\\
                    767: \>$\{$ {\tt TOPDOWN}; $\}$\\
                    768: \>=$\{$\\
                    769: \>\>$\$1\$\to$test.falselab = getlabel();\\
                    770: \>\>$\$1\$\to$test.truelab = $\$\$\to$test.truelab;\\
                    771: \>\>tDO($\$\%1\$$);\\
                    772: \>\>$\$2\$\to$test.truelab = $\$\$\to$test.truelab;\\
                    773: \>\>$\$2\$\to$test.falselab = $\$\$\to$test.falselab;\\
                    774: \>\>tDO($\$\%2\$$);\\
                    775: \>$\}$;\\
                    776: \end{tabbing}
                    777: \caption{Short circuited Boolean Evaluation}
                    778: \label{short:circuit}
                    779: \end{figure}
                    780: \addtocounter{figure}{-1}
                    781: \begin{figure}[tp]
                    782: \begin{tabbing}
                    783: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
                    784: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
                    785: lTest:\>nNot(lTest)\\
                    786: \>$\{$ {\tt TOPDOWN}; $\}$\\
                    787: \>=$\{$\\
                    788: \>\>$\$1\$\to$test.truelab = $\$\$\to$test.falselab;\\
                    789: \>\>$\$1\$\to$test.falselab = $\$\$\to$test.truelab;\\
                    790: \>\>tDO($\$\%1\$$);\\
                    791: \>$\}$;\\[4pt]
                    792: lTest:\>nLess(lExpr,lExpr)\>\>$/*$ {\sl not topdown anymore } $*/$\\
                    793: \>=$\{$ $...$ {\sl generate code for comparison } $...$ $\}$;\\
                    794: \end{tabbing}
                    795: \caption{(cont.) Short circuited Boolean Evaluation}
                    796: \end{figure}
                    797: 
                    798: \subsection{Code Generation}
                    799: 
                    800: Figure~\ref{example} is an 
                    801: example of \twiglang{} program that
                    802: can be used to generate VAX code for
                    803: the subtract instruction:
                    804: 
                    805: \begin{figure}
                    806: \begin{tabbing}
                    807: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=
                    808: \hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\hspace{1in}\=\kill
                    809: prologue \> $\{$\\
                    810: $/*$ otherdefs.h will have a type definition for {\tt NODEPTR} $*/$\\
                    811: \#include\>``otherdefs.h''\\
                    812: \#define\>{\tt TEMP\_COST}\>\>5\\
                    813: \#define\>{\tt SUB\_COST}\>\>30\\
                    814: \#define\>{\tt DEC\_COST}\>\>10\\
                    815: NODEPTR \> gettemp();\\
                    816: $\}$;\\[4pt]
                    817: node \> long constant sub;\\
                    818: label \> operand temp;\\[4pt]
                    819: operand: \> long;\>\>\>$/*$ rule 1 $*/$\\
                    820: operand:\>constant;\>\>\>$/*$ rule 2 $*/$\\
                    821: operand:\>temp;\>\>\>$/*$ rule 3 $*/$\\[4pt]
                    822: temp:\>operand\>\>\>$/*$ rule 4 $*/$\\
                    823: \>$\{$ cost = {\tt TEMP\_COST}+$\$\%1\$\to$cost;$\}$\\
                    824: \>$=\{$\\
                    825: \>\>NODEPTR t = gettemp();\\
                    826: \>\> emit(``mov'', $\$\$$, t, 0);\\
                    827: \>\>return(t);\\
                    828: \>$\}$;\\[4pt]
                    829: operand:\>sub(operand,operand)\>\>\>$/*$ rule 5 $*/$\\
                    830: \>$\{$cost = $\$\%1\$\to$cost + $\$\%2\$\to$cost+{\tt SUB\_COST};$\}$\\
                    831: \>=$\{$\\
                    832: \>\>NODEPTR t = gettemp();\\
                    833: \>\>emit(``sub'', $\$1\$$, $\$2\$$, t, 0);\\
                    834: \>\>return(t);\\
                    835: \>$\}$;\\[4pt]
                    836: temp:\>sub(temp,constant)\>\>\>$/*$ rule 6 $*/$\\
                    837: \>$\{$\\
                    838: \>\>if(value($\$2\$$)==1)\\
                    839: \>\>\>cost = $\$\%1\$\to$cost+{\tt DEC\_COST};\\
                    840: \>\>else ABORT;\\
                    841: \>$\}$=$\{$\\
                    842: \>\>emit(``dec'', $\$1\$$, 0);\\
                    843: \>\>return($\$1\$$);\\
                    844: \>$\}$;
                    845: \end{tabbing}
                    846: \caption{Small Code Generation Example}
                    847: \label{example}
                    848: \end{figure}
                    849: 
                    850: \begin{itemize}
                    851: \item Rules 3 and 4 form a loop.
                    852: The potential loop: temp$\to$operand$\to$temp$\to$operand$...$ is
                    853: broken
                    854: by the matcher recognizing that the cost of the second match of temp
                    855: does not cost less than the first match of temp.
                    856: 
                    857: \item In the cost clause of rule 5, the cost is the sum of the
                    858: leaves plus the cost of the subtract instruction.
                    859: The action clause emits code to add the two operands
                    860: and leave the result in a temporary location.  The temporary is
                    861: returned as a substitution for the subject tree.
                    862: 
                    863: \item Rule 6 handles a special case where the left operand is
                    864: already in a temporary and the constant is one.  In this case, the
                    865: temporary is directly decremented and returned as the new tree.
                    866: \end{itemize}
                    867: 
                    868: \section{How to run \Twigcomp{}}
                    869: 
                    870: Once a user has written the specification, \twigcomp{} is used to compile it
                    871: into a subroutine.
                    872: Figure~\ref{overview} gives an overview of how \twigcomp{} does this.
                    873: In Figure~\ref{overview}, rounded boxes indicate files and sharp
                    874: cornered boxes
                    875: are processing programs.  The user creates the \twiglang{} specification which
                    876: is represented by the top box.  The specification file must have the
                    877: suffix ``{\tt .mt}''.  Let's say we have one called {\tt arbor.mt}.
                    878: To invoke the \twigcomp{} compiler we type:
                    879: 
                    880: \begin{verse}
                    881: {\bf twig} [--{\bf w$xx$}] arbor.mt
                    882: \end{verse}
                    883: 
                    884: The proprocessor reads {\tt arbor.mt} and if there are no errors
                    885: will create two files: {\tt symbols.h} and {\tt walker.c}.
                    886: To build {\tt walker.c}, \twigcomp{} uses a template file called {\tt walker.$xx$}
                    887: where $xx$ is the argument of the optional --{\bf w} flag.
                    888: If the flag is omitted then $xx$ defaults to ``{\tt c1}''.
                    889: 
                    890: {\tt Symbols.h} and {\tt walker.c}
                    891: can then compiled and linked with the other modules of the
                    892: user's program.  These other modules should provide the
                    893: Tree and Cost ADTs described above.
                    894: {\tt Walker.h} is an auxilliary include file that is referenced by {
                    895: \tt walker.c}.
                    896: A typical compile command is:
                    897: 
                    898: \begin{verse}
                    899: {\bf cc} -- {\bf I\sl include\_dir} {\bf -c} walker.c
                    900: \end{verse}
                    901: 
                    902: The --{\bf I\sl include\_dir} argument causes {\bf cc} to look for
                    903: walker.h in {\sl include\_dir}.  The exact value of {\sl include\_dir}
                    904: will depend on where the files are stored at your site.
                    905: The tree matcher is initialized by calling {\tt \_matchinit} and matching
                    906: can begin by calling {\tt \_match} from
                    907: the user program.
                    908: 
                    909: \section{Organization of the \Twigcomp{} Compiler}
                    910: The preprocessor is written completely in C and YACC.
                    911: It can be roughly broken up into four modules:
                    912: \begin{itemize}
                    913: \item Lexer --- The lexer performs lexical analysis.
                    914: The input is tokenized by this module and passed to the parser
                    915: as required.  Identifiers are looked up in the symbol table
                    916: and space is reserved for them if they have never been encountered.
                    917: \item Parser --- This part is written in YACC and recognizes the
                    918: \twiglang{} language.  When a language construct is recognized actions
                    919: are invoked.  Declarations, prologues and inserts will invoke symbol
                    920: table operations when recognized.  Rules are broken up and stored
                    921: in the symbol table too.  The tree patterns in the rules are unflattened
                    922: and passed to the machine builder.
                    923: \item Machine Builder --- This module takes tree patterns and incrementally
                    924: builds an internal representation of the matching automaton.
                    925: The builder takes this
                    926: tree and then enumerates each of the possible paths from the root to the
                    927: leaves.  These paths are treated as strings and a string matching automaton
                    928: is built as described in \cite{fgrep} and \cite{treematch}.
                    929: 
                    930: Inside the builder the partially built machine is kept in a linked list
                    931: form.  Each machine state is a linked list and each node represents
                    932: a state transition.  When the tables for the machine are written out into {
                    933: \tt walker.c}, the linked list structure is
                    934: translated into a table of sixteen bit integers.
                    935: Each transistion is stored as two integers.  The first is
                    936: the integer corresponding to the symbol that causes the transistion and the
                    937: second is the index of the next state in the table.
                    938: 
                    939: \item Symbol Table ---  The central data structure in this symbol table is
                    940: the hash table.
                    941: Each entry in the hash table is a bucket --- a linked list of symbol table
                    942: entries.  There is no rehashing.  All colliding items are placed
                    943: in the linked list.
                    944: This is a simple
                    945: and adequately efficient arrangement.  Depending on the type of symbols, the
                    946: entry may point to other data structures.  Entries for $label\_id$s point to
                    947: lists of trees.  $Node\_id$ entries record the integers that have been
                    948: associates with them.  Other entries may point to data structures holding a
                    949: textual representation of C code that forms action and cost parts.
                    950: \end{itemize}
                    951: 
                    952: Here is a list of the files that form the \twigcomp{} compiler's source
                    953: and their approximate function.
                    954: \begin{itemize}
                    955: \item {\tt common.h} --- This file contains data definitions for how
                    956: trees are represented in the parser.  It also contains type definitions for
                    957: external functions.
                    958: \item {\tt code.h} --- This file contains definitions for how code fragments
                    959: are stored.
                    960: \item {\tt sym.h} --- This file defines the major data structures in the
                    961: symbol table.
                    962: \item {\tt machine.h} ---  This file defines the how the matching automaton
                    963: is represented internally.
                    964: \item {\tt twig.y} --- This is the parser.
                    965: \item {\tt sym.c} ---  This is the symbol table manager.
                    966: \item {\tt path.c} --- The path string enumerator.
                    967: \item {\tt machine.c} --- The machine builder.
                    968: \item {\tt lex.c } --- The lexer.
                    969: \item {\tt tree.c}, {\tt io.c}, and {\tt code.c} --- Miscellaneous
                    970: routines for manipulating trees, performing I/O and manipulating
                    971: code fragments.
                    972: \end{itemize}
                    973: 
                    974: \section{Performance}
                    975: 
                    976: So far, the only experience we have with \twiglang{}
                    977: is a VAX code generator written
                    978: for the pcc2 compiler.  The \twiglang{} code generator is 25\% faster than
                    979: the original code generator.
                    980: The maintainability and modifiability of the code generator has
                    981: improved.  For example adding the indexed addressing mode
                    982: into the code generator took only a few hours.  The target code quality
                    983: is just as good as the original code generator.
                    984: 
                    985: The \twigcomp{} table
                    986: generation algorithm is fast.  For the VAX machine description which
                    987: consist of 109 rules, the generation time was 4.2 seconds on a 780.
                    988: The generation time could be increased
                    989: by two orders of magnitude
                    990: before other table driven system can compete with \twigcomp{} on this basis.
                    991: Furthermore, the tables are small.  The VAX description is only 7.5K
                    992: and the text space for the walker is 30K.
                    993: Again this is much smaller than those of other table driven systems.
                    994: 
                    995: \Twiglang{} is currently a research tool.  Several things can be done to improve
                    996: \twiglang{}'s performance.
                    997: \begin{itemize}
                    998: \item  \Twiglang{} does a lot of procedure calls.  Every machine transition
                    999: require at least one procedure call.  Contrast this to YACC where a machine
                   1000: transition is done in line.
                   1001: \item \Twiglang{} performs a very expensive closure operation with respect to
                   1002: unit rules.  A unit rule is the analogue of a unit production and has the
                   1003: form:
                   1004: 
                   1005: \begin{verse}
                   1006: $label_1$: $label_2$;
                   1007: \end{verse}
                   1008: 
                   1009: \noindent This closure is done at run time.
                   1010: It requires many procedure calls and manipulates complex data structures.
                   1011: We are looking at ways of doing this
                   1012: at table generation time or to hash the results of the closures so that
                   1013: redundant calculations can be avoided.  The problem with doing them
                   1014: in the table generator is the possible explosion in its
                   1015: running time.
                   1016: 
                   1017: \item Many of the cost parts for the rules are similar and hence some
                   1018: tests are performed and recalculated many times
                   1019: on a node.  \Twigcomp{} should be clever
                   1020: enough to factor out some of these tests and do them only once.
                   1021: However, the information required for \twigcomp{} to do this is not easily
                   1022: available.  It is hidden in the C code fragments.
                   1023: 
                   1024: \item The current compact
                   1025: representation of the matching automaton
                   1026: require linear searching except at the
                   1027: start state where the large branching factor compelled us to use an
                   1028: array scheme.  Using another representation would speed up the
                   1029: matcher.  For example, implementing the VAX description as an array is
                   1030: estimated to use about 50K bytes but may provide
                   1031: performance improvement.
                   1032: 
                   1033: \end{itemize}
                   1034: \bibliography{mybib}
                   1035: \bibliographystyle{plain}
                   1036: \end{document}

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.