Annotation of researchv10no/cmd/twig/doc/manual/twigb.tex, revision 1.1.1.1

1.1       root        1: \magnification=1200
                      2: \centerline{\bf Twig: A Brief Description}
                      3: \bigskip
                      4: Twig is a language for processing trees.  A twig program consists of a set
                      5: of pattern-action rules together with associated declarations.  Patterns
                      6: describe trees to be matched.  Actions calculate costs, perform tree
                      7: manipulations and other functions such as emitting code.
                      8: A twig program is translated by the twig
                      9: preprocessor into subroutines and tables in the host language which is C in
                     10: the current implementation.
                     11: 
                     12: There are many ways to represent trees and costs.  To give the user more
                     13: choice in representation, twig treats them as abstract data
                     14: types (ADT).  Its manipulation of trees and costs are strictly performed
                     15: through calls to subroutines provided by the user.
                     16: \medskip
                     17: \line{\bf Rules\hfil}
                     18: The fundamental unit of a twig program is a rule.
                     19: \medskip
                     20: \line{\hfil\vbox{\hsize=3.5truein
                     21: \noindent$label\_id$ {\bf :} $tree\_pattern$\quad {\bf [} $cost$ {\bf ]}
                     22: {\bf [=} $action$ {\bf ]}}\hfil}
                     23: \medskip
                     24: Intuitively, the pattern is used to match a subtree.  The $cost$ code
                     25: fragment is then evaluated.  The resulting cost is recorded by
                     26: the matcher for use in dynamic programming.  The $action$ is executed
                     27: if the rule is part of the least cost cover of the tree.
                     28: 
                     29: If the $cost$ part is missing, Twig will insert default code
                     30: that returns the special value, {\tt DEFAULT\_COST}.
                     31: A missing $action$ part indicates that nothing will be done when a
                     32: match is found.
                     33: \medskip
                     34: \line{\bf Tree Patterns\hfil}
                     35: Tree patterns are specified
                     36: in prefix parenthesized form and can be described by the
                     37: following BNF:
                     38: \medskip
                     39: \line{\hfil\vbox{\hsize=3.5truein\obeylines\parindent=0pt\parskip=0pt
                     40: $tree\_pattern\to node\_id\;\vert\;label\_id$
                     41: $tree\_pattern\to node\_id \;{\bf (}\; tree\_list \;{\bf )}$
                     42: $tree\_list \to tree \;{\bf ,}\; tree\_list \;\vert\; tree$
                     43: }\hfil}
                     44: 
                     45: For example, the tree,
                     46: \midinsert
                     47: \vskip 1.0truein
                     48: \endinsert
                     49: \noindent will be written in twig as
                     50: \medskip
                     51: \line{\hfil\vbox{\hsize=3.5truein\tt oper(leftOp, rightOp)}\hfil}
                     52: \medskip
                     53: There are two types of symbols: $node\_id$\/s and $label\_id$\/s
                     54: $Node\_id$\/s are used to denote internal nodes and leaves.
                     55: $Label\_id$\/s label tree patterns and are analogous to
                     56: non-terminals in context free grammars.
                     57: For example, the following twig rules without their action parts describe
                     58: simple expression trees with the {\tt plus} operator.
                     59: \medskip
                     60: \line{\hfil\vbox{
                     61: \tt\noindent\halign{ #:\quad & #\cr
                     62: expr & plus(expr, expr)\cr
                     63: expr & identifier\cr
                     64: expr & constant\cr
                     65: }}\hfil}
                     66: \medskip
                     67: \noindent Here, {\tt identifier} and {\tt constant} are $node\_id$\/s
                     68: representing leaves, and
                     69: {\tt plus} is a $node\_id$\/s representing an internal node whereas
                     70: {\tt expr} is a
                     71: $label\_id$.  Leaves of patterns that are $label\_id$\/s are called
                     72: {\sl labelled leaves}.
                     73: In the first rule, both leaves of the pattern are labelled.
                     74: 
                     75: Twig assigns an integer to each node symbol and label symbol.  These
                     76: integers are used by the twig pattern matcher as encodings for the
                     77: symbols.
                     78: As the matcher traverses the
                     79: tree, a user supplied subroutine is called to provide an integer
                     80: corresponding to each node.
                     81: \medskip
                     82: \line{\bf Costs\hfil}
                     83: To increase the flexibility of representing costs, the tree matcher
                     84: views costs as an ADT.
                     85: For example, costs may be represented as an integer or
                     86: as a vector of integers with each element representing the cost of a specific
                     87: resource.
                     88: A cost ADT suitable for Twig must have the following four definitions:
                     89: \medskip
                     90: \noindent$\bullet$ {\tt COST} is a C type.
                     91: Although the proper functioning of the tree matcher
                     92: does not depend on the details of the {\tt COST} type, it must have
                     93: the type information for storage allocation purposes.
                     94: \smallskip
                     95: \noindent$\bullet$
                     96: {\tt INFINITY} is the maximum attainable cost value.  The matcher uses {\tt
                     97: INFINITY} to initialize its data structures.
                     98: \smallskip\noindent$\bullet$
                     99: {\tt DEFAULT\_COST} is the cost
                    100: value returned by rules without a cost part.
                    101: \smallskip\noindent$\bullet$
                    102: {\tt COSTLESS} is a function of
                    103: two cost values.  It must return true if and only if the first argument is
                    104: less than the second.
                    105: \medskip
                    106: \line{\bf Trees\hfil}
                    107: 
                    108: As with costs, Twig treats trees as an ADT.  Here, using an ADT is
                    109: even more important because trees come in a variety of shapes and
                    110: representations.
                    111: Twig would be overly complicated if it had to
                    112: know any more than the fundamental properties of trees.  Thus,
                    113: twig treats trees as an acyclic directed
                    114: graph of almost featureless nodes with one
                    115: distinguished node, the root.  Each node has only one feature and that
                    116: is an integer corresponding to the $node\_id$\/s discussed above.
                    117: To provide this
                    118: view to Twig, the user must provide the following definitions and
                    119: subroutines.
                    120: \medskip
                    121: \noindent$\bullet$ {\tt NODE} is the type of a node.  The actual
                    122: details of
                    123: the NODE are irrelevant and Twig uses this definition only for storage
                    124: allocation and declaration purposes.
                    125: \smallskip
                    126: \noindent$\bullet$ {\tt NONODE} is a constant of type
                    127: {\tt NODE}.  It is a distinguished
                    128: value returned by the routines defined below to represent a null value.
                    129: \smallskip
                    130: \noindent$\bullet$
                    131: {\tt mtGetNodes$(r,n)$} returns the $n$th child of $r$ where $r$
                    132: is a {\tt NODE}.  The
                    133: leftmost child is {\tt mtGetNodes}$(r,1)$.  If $n > \hbox{degree}(r)$
                    134: then the function should return {\tt NONODE}.  Twig expects the
                    135: expression {\tt mtGetNodes}$({\tt NONODE}, 0)$
                    136: to denote the root node of
                    137: the subject tree.
                    138: \smallskip
                    139: \noindent$\bullet$
                    140: {\tt mtValue$(r)$} returns the integer associated with $r$.
                    141: \smallskip
                    142: \noindent$\bullet$
                    143: {\tt mtSetNodes$(r,n,c)$} replaces the $n$th child of $r$ with
                    144: $c$.  This routine may be used to replace whole subtrees with others.
                    145: \medskip
                    146: \line{ \bf Pattern Matcher Operation \hfil}
                    147: 
                    148: The pattern matcher operates in two phases: the costing phase and the execution
                    149: phase.
                    150: 
                    151: In the costing phase, the matcher performs a preorder traversal of
                    152: a subject tree and discovers matches from the bottom up.  At the same time, it
                    153: builds a skeleton tree that is structurally isomorphic to the subject tree.
                    154: When a match is discovered the cost clause of the pattern is invoked to
                    155: calculate the cost.  Many patterns with different labels could match at any
                    156: given node but only the lowest cost pattern for each label is recorded in
                    157: the skeleton.
                    158: 
                    159: When a pattern is matched, its label is then used as input to the pattern
                    160: matcher so that matching of patterns with labelled leaves can begin.
                    161: This
                    162: process
                    163: is analogous to a reduce action in
                    164: bottom up parsers.  In twig, multiple reductions are
                    165: tracked.
                    166: 
                    167: The cost part of a rule can also assign a {\sl mode} to a match.  The {\sl
                    168: mode} controls how action parts are to be executed.
                    169: Actions parts of {\sl defer} mode matches are not executed until the
                    170: execution phase.  {\sl Rewrite} mode matches cause the action part to
                    171: be executed immediately during the costing phase to transform the
                    172: matched subtree.  Once the rewrite mode action is executed, the costing
                    173: phase continues by purging all information in the skeleton for the old
                    174: subtree and then scanning the newly transformed subtree.  {\sl
                    175: Rewrite} actions are similar to actions that are invoked on a reduction in
                    176: LR parsing.
                    177: 
                    178: In the case where multiple rewrite actions are possible the one with
                    179: the lowest cost will be executed.  Furthermore, a rewrite action is
                    180: considered only if its cost is lower than all the defer mode actions.
                    181: Rewrite mode actions that are not executed are treated in the same
                    182: manner as defer mode actions in the execution phase.
                    183: 
                    184: When the root of the subject tree is reached in the preorder
                    185: traversal, the execution phase begins.  The lowest cost match is then
                    186: chosen at the root and executed.  In this context, execution involves
                    187: executing the actions of the labelled leaves before invoking the code
                    188: in the current action.
                    189: \medskip
                    190: \line{\bf Lexical Issues and Conventions\hfil}
                    191: Currently the
                    192: following are keywords of twig in no particular order:
                    193: \medskip
                    194: \line{\hfil\vbox{\hsize=3.5truein\tt
                    195: \halign{ # \hfil & # \hfil & # \hfil \cr
                    196: label & node & cost \cr
                    197: action & prologue & insert\cr
                    198: }}\hfil}
                    199: \medskip
                    200: \noindent  They have special meanings and may not be used as
                    201: identifiers.
                    202: ;:(),= are special characters.  All blanks, tabs, formfeeds and
                    203: newlines are ignored by twig but they may not appear inside
                    204: identifiers and numbers.  Identifiers are nonempty strings
                    205: made of letters, digits or underscores and starting with a letter.
                    206: Numbers are nonempty string of digits.  Code fragments or action parts
                    207: are enclosed in braces.  Inside code fragments, C lexical rules must
                    208: be obeyed except that strings of the form $\$...\$$ that are not inside C
                    209: strings are interpreted by twig.  In the following sections, $id$
                    210: denotes an identifier; $int_1$, $int_2$ denotes numbers; $ccode$
                    211: denotes C code fragments; $...$ indicates repetition of the previous
                    212: item; $[...]$ indicates that $...$ is optional.
                    213: 
                    214: The input to the Twig program will be referred to as the subject tree.
                    215: \medskip
                    216: \line{\bf Prologue and Inserts\hfil}
                    217: 
                    218: \noindent\line{\hfil \vbox{\hsize=3.5truein \tt prologue $ccode$;}\hfil}
                    219: \medskip
                    220: \noindent signals to twig that $ccode$ should be inserted at the
                    221: beginning of the output source file.  $Ccode$ should contain
                    222: definitions relevant to the C code in
                    223: rules that are defined later in the twig source file.
                    224: \medskip
                    225: \noindent\line{\hfil\tt\vbox{\hsize=3.5truein insert $ccode$;}\hfil}
                    226: \medskip
                    227: \noindent should cause $ccode$ to be placed into the source file.
                    228: There can be multiple inserts and they will be placed into the source
                    229: file
                    230: in the order that they appear.
                    231: \medskip
                    232: \line{\bf Declarations\hfil}
                    233: All twig identifiers are declared before they are used.
                    234: \medskip
                    235: \line{\hfil \tt node $id[(int_1)] [= int_2] ...$;\hfil}
                    236: \medskip
                    237: \noindent A node declaration declares one or more identifier to be
                    238: associated with nodes of the subject tree.  The identifiers are
                    239: assigned numbers by twig but the user can overide the assigned number
                    240: by specifying $int_2$.  The degree of the node identifer, i.e. the
                    241: number of children, is assumed to be fixed.  Normally, twig will
                    242: deduce the degree when $id$ is used in a rule.  However, the user may
                    243: explicitly give the degree by specifying $int_1$.
                    244: \medskip
                    245: \line{\hfil\vbox{\hsize=3.5truein\tt label $id ...$;}\hfil}
                    246: \medskip
                    247: A label declaration indicates that the following $id$'s are to be used
                    248: as labels of rules.
                    249: \medskip\line{\bf Costs and Action code\hfil}
                    250: 
                    251: Code fragments such as the $cost$ and $action$ clauses of a rule are
                    252: specfied by enclosing C code in braces.  All legal C constructs are
                    253: permitted in code fragments.  In addition, the following are provided
                    254: for convenient access to the subject tree and internal data
                    255: structures of the matcher.
                    256: \smallskip
                    257: {\parindent=0pt
                    258: $\bullet$ $\$\%n\$$ denotes a pointer value to the matcher data
                    259: structure for the $n$th labelled leaf.  Thus to access the cost value
                    260: associated with that leaf, the notation $\$\%n\$\to${\tt cost} may be used.
                    261: 
                    262: $\bullet$ $\$\$$ denotes a pointer value to the root of the subject
                    263: tree.
                    264: 
                    265: $\bullet$ $\$n_1.n_2.n_3.\, \hbox{...}\,.n_{k-1}.n_k\$$ denotes
                    266: a pointer value to the $n_k$th child of the $n_{k-1}th$ child of the
                    267: $n_{k-2}$th child of \hbox{...} of the $n_1$th child of the root of
                    268: the subject tree.  Each $n_i$ is a positive non-zero integer.
                    269: }
                    270: \medskip
                    271: 
                    272: Some special constructs are available to code fragments in the cost
                    273: part of the specification.  The
                    274: statement ``{\tt ABORT;}'' when encountered during the execution of
                    275: the cost code, signals to the matcher that this pattern is to be
                    276: rejected.  When a ``{\tt REWRITE;}'' statement is encountered, control
                    277: is returned to the matcher and the rule will become a {\sl rewrite}
                    278: mode match.  When the end of the cost code fragment is reached,
                    279: control is returned to the matcher and the rule becomes a {\sl defer}
                    280: mode match.  Cost values are returned to the matcher by assigning to
                    281: the ``{\tt cost}'' variable in the cost clause.  If no assignment is
                    282: made to the {\tt cost} variable then the returned cost will be {\tt
                    283: DEFAULT\_COST}.
                    284: 
                    285: Action clauses are expected to return a new tree which will replace
                    286: the subject tree.  This is done by returning using the standard C
                    287: ``{\tt return($new\_tree$); }'' statement
                    288: where {\tt $new\_tree$} is of type
                    289: {\tt NODEPTR}.  When the end of the action clause is encountered, the
                    290: matcher resumes execution and the subject tree is not replaced.
                    291: \vfil\eject
                    292: \line{\bf An Example\hfil}
                    293: 
                    294: The following are examples of twig rules that generate VAX code for
                    295: the subtract instruction:
                    296: \midinsert
                    297: \moveright 1truein \vbox{\settabs 9 \columns
                    298: \+prologue & $\{$\cr
                    299: \smallskip
                    300: \+NODE & gettemp();\cr
                    301: \smallskip
                    302: \+$\}$;\cr\smallskip
                    303: \+node&long constant sub;\cr
                    304: \+label&operand temp;\cr
                    305: \smallskip
                    306: \+operand:&long;&&&$/*$ rule 1 $*/$\cr
                    307: \+operand:&constant;&&&$/*$ rule 2 $*/$\cr
                    308: \+operand:&temp;&&&$/*$ rule 3 $*/$\cr
                    309: \smallskip
                    310: \+temp:&operand&&&$/*$ rule 4 $*/$\cr
                    311: \+&$\{$ cost = {\tt TEMP\_COST}+$\$\%1\$\to$cost;$\}$\cr
                    312: \+&$=\{$\cr
                    313: \+&&NODE t = gettemp();\cr
                    314: \+&& emit(``mov'', $\$\$$, t, 0);\cr
                    315: \+&&return(t);\cr
                    316: \+&$\}$;\cr
                    317: \smallskip
                    318: \+operand:&sub(operand,operand)&&&$/*$ rule 5 $*/$\cr
                    319: \+&$\{$cost = $\$\%1\$\to$cost + $\$\%2\$\to$cost+{\tt SUB\_COST};$\}$\cr
                    320: \+&=$\{$\cr
                    321: \+&&NODE t = gettemp();\cr
                    322: \+&&emit(``sub'', $\$1\$$, $\$2\$$, t, 0);\cr
                    323: \+&&return(t);\cr
                    324: \+&$\}$;\cr
                    325: \smallskip
                    326: \+temp:&sub(temp,constant)&&&$/*$ rule 6 $*/$\cr
                    327: \+&$\{$\cr
                    328: \+&&if(value($\$2\$$)==1)\cr
                    329: \+&&&cost = $\$\%1\$\to$cost+{\tt DEC\_COST};\cr
                    330: \+&&else ABORT;\cr
                    331: \+&$\}$=$\{$\cr
                    332: \+&&emit(``dec'', $\$1\$$, 0);\cr
                    333: \+&&return($\$1\$$);\cr
                    334: \+&$\}$;\cr
                    335: }
                    336: \endinsert
                    337: {\parindent=0pt
                    338: $\bullet$ Rules 3 and 4 form a loop.
                    339: The potential loop: temp$\to$operand$\to$temp$\to$operand$...$ is
                    340: broken
                    341: by the matcher recognizing that the cost of the second match of temp
                    342: does not cost less than the first match of temp.
                    343: 
                    344: $\bullet$ Rule 4 localizes the testing of whether an operand is a
                    345: temporary.  It is not necessary but makes the coding of other
                    346: cost clauses less tedious.
                    347: 
                    348: $\bullet$ In the cost clause of rule 5, the cost is the sum of the
                    349: leaves plus the cost of the subtract instruction.
                    350: The action clause emits code to add the two operands
                    351: and leave the result in a temporary location.  The temporary is
                    352: returned as a substitution for the subject tree.
                    353: 
                    354: $\bullet$ Rule 6 handles a special case where the left operand is
                    355: already in a temporary and the constant is one.  In this case, the
                    356: temporary is directly decremented and returned as the new tree.
                    357: }\vfil\eject
                    358: \end

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.