Annotation of researchv10no/cmd/twig/doc/manual/twigb.tex, revision 1.1

1.1     ! root        1: \magnification=1200
        !             2: \centerline{\bf Twig: A Brief Description}
        !             3: \bigskip
        !             4: Twig is a language for processing trees.  A twig program consists of a set
        !             5: of pattern-action rules together with associated declarations.  Patterns
        !             6: describe trees to be matched.  Actions calculate costs, perform tree
        !             7: manipulations and other functions such as emitting code.
        !             8: A twig program is translated by the twig
        !             9: preprocessor into subroutines and tables in the host language which is C in
        !            10: the current implementation.
        !            11: 
        !            12: There are many ways to represent trees and costs.  To give the user more
        !            13: choice in representation, twig treats them as abstract data
        !            14: types (ADT).  Its manipulation of trees and costs are strictly performed
        !            15: through calls to subroutines provided by the user.
        !            16: \medskip
        !            17: \line{\bf Rules\hfil}
        !            18: The fundamental unit of a twig program is a rule.
        !            19: \medskip
        !            20: \line{\hfil\vbox{\hsize=3.5truein
        !            21: \noindent$label\_id$ {\bf :} $tree\_pattern$\quad {\bf [} $cost$ {\bf ]}
        !            22: {\bf [=} $action$ {\bf ]}}\hfil}
        !            23: \medskip
        !            24: Intuitively, the pattern is used to match a subtree.  The $cost$ code
        !            25: fragment is then evaluated.  The resulting cost is recorded by
        !            26: the matcher for use in dynamic programming.  The $action$ is executed
        !            27: if the rule is part of the least cost cover of the tree.
        !            28: 
        !            29: If the $cost$ part is missing, Twig will insert default code
        !            30: that returns the special value, {\tt DEFAULT\_COST}.
        !            31: A missing $action$ part indicates that nothing will be done when a
        !            32: match is found.
        !            33: \medskip
        !            34: \line{\bf Tree Patterns\hfil}
        !            35: Tree patterns are specified
        !            36: in prefix parenthesized form and can be described by the
        !            37: following BNF:
        !            38: \medskip
        !            39: \line{\hfil\vbox{\hsize=3.5truein\obeylines\parindent=0pt\parskip=0pt
        !            40: $tree\_pattern\to node\_id\;\vert\;label\_id$
        !            41: $tree\_pattern\to node\_id \;{\bf (}\; tree\_list \;{\bf )}$
        !            42: $tree\_list \to tree \;{\bf ,}\; tree\_list \;\vert\; tree$
        !            43: }\hfil}
        !            44: 
        !            45: For example, the tree,
        !            46: \midinsert
        !            47: \vskip 1.0truein
        !            48: \endinsert
        !            49: \noindent will be written in twig as
        !            50: \medskip
        !            51: \line{\hfil\vbox{\hsize=3.5truein\tt oper(leftOp, rightOp)}\hfil}
        !            52: \medskip
        !            53: There are two types of symbols: $node\_id$\/s and $label\_id$\/s
        !            54: $Node\_id$\/s are used to denote internal nodes and leaves.
        !            55: $Label\_id$\/s label tree patterns and are analogous to
        !            56: non-terminals in context free grammars.
        !            57: For example, the following twig rules without their action parts describe
        !            58: simple expression trees with the {\tt plus} operator.
        !            59: \medskip
        !            60: \line{\hfil\vbox{
        !            61: \tt\noindent\halign{ #:\quad & #\cr
        !            62: expr & plus(expr, expr)\cr
        !            63: expr & identifier\cr
        !            64: expr & constant\cr
        !            65: }}\hfil}
        !            66: \medskip
        !            67: \noindent Here, {\tt identifier} and {\tt constant} are $node\_id$\/s
        !            68: representing leaves, and
        !            69: {\tt plus} is a $node\_id$\/s representing an internal node whereas
        !            70: {\tt expr} is a
        !            71: $label\_id$.  Leaves of patterns that are $label\_id$\/s are called
        !            72: {\sl labelled leaves}.
        !            73: In the first rule, both leaves of the pattern are labelled.
        !            74: 
        !            75: Twig assigns an integer to each node symbol and label symbol.  These
        !            76: integers are used by the twig pattern matcher as encodings for the
        !            77: symbols.
        !            78: As the matcher traverses the
        !            79: tree, a user supplied subroutine is called to provide an integer
        !            80: corresponding to each node.
        !            81: \medskip
        !            82: \line{\bf Costs\hfil}
        !            83: To increase the flexibility of representing costs, the tree matcher
        !            84: views costs as an ADT.
        !            85: For example, costs may be represented as an integer or
        !            86: as a vector of integers with each element representing the cost of a specific
        !            87: resource.
        !            88: A cost ADT suitable for Twig must have the following four definitions:
        !            89: \medskip
        !            90: \noindent$\bullet$ {\tt COST} is a C type.
        !            91: Although the proper functioning of the tree matcher
        !            92: does not depend on the details of the {\tt COST} type, it must have
        !            93: the type information for storage allocation purposes.
        !            94: \smallskip
        !            95: \noindent$\bullet$
        !            96: {\tt INFINITY} is the maximum attainable cost value.  The matcher uses {\tt
        !            97: INFINITY} to initialize its data structures.
        !            98: \smallskip\noindent$\bullet$
        !            99: {\tt DEFAULT\_COST} is the cost
        !           100: value returned by rules without a cost part.
        !           101: \smallskip\noindent$\bullet$
        !           102: {\tt COSTLESS} is a function of
        !           103: two cost values.  It must return true if and only if the first argument is
        !           104: less than the second.
        !           105: \medskip
        !           106: \line{\bf Trees\hfil}
        !           107: 
        !           108: As with costs, Twig treats trees as an ADT.  Here, using an ADT is
        !           109: even more important because trees come in a variety of shapes and
        !           110: representations.
        !           111: Twig would be overly complicated if it had to
        !           112: know any more than the fundamental properties of trees.  Thus,
        !           113: twig treats trees as an acyclic directed
        !           114: graph of almost featureless nodes with one
        !           115: distinguished node, the root.  Each node has only one feature and that
        !           116: is an integer corresponding to the $node\_id$\/s discussed above.
        !           117: To provide this
        !           118: view to Twig, the user must provide the following definitions and
        !           119: subroutines.
        !           120: \medskip
        !           121: \noindent$\bullet$ {\tt NODE} is the type of a node.  The actual
        !           122: details of
        !           123: the NODE are irrelevant and Twig uses this definition only for storage
        !           124: allocation and declaration purposes.
        !           125: \smallskip
        !           126: \noindent$\bullet$ {\tt NONODE} is a constant of type
        !           127: {\tt NODE}.  It is a distinguished
        !           128: value returned by the routines defined below to represent a null value.
        !           129: \smallskip
        !           130: \noindent$\bullet$
        !           131: {\tt mtGetNodes$(r,n)$} returns the $n$th child of $r$ where $r$
        !           132: is a {\tt NODE}.  The
        !           133: leftmost child is {\tt mtGetNodes}$(r,1)$.  If $n > \hbox{degree}(r)$
        !           134: then the function should return {\tt NONODE}.  Twig expects the
        !           135: expression {\tt mtGetNodes}$({\tt NONODE}, 0)$
        !           136: to denote the root node of
        !           137: the subject tree.
        !           138: \smallskip
        !           139: \noindent$\bullet$
        !           140: {\tt mtValue$(r)$} returns the integer associated with $r$.
        !           141: \smallskip
        !           142: \noindent$\bullet$
        !           143: {\tt mtSetNodes$(r,n,c)$} replaces the $n$th child of $r$ with
        !           144: $c$.  This routine may be used to replace whole subtrees with others.
        !           145: \medskip
        !           146: \line{ \bf Pattern Matcher Operation \hfil}
        !           147: 
        !           148: The pattern matcher operates in two phases: the costing phase and the execution
        !           149: phase.
        !           150: 
        !           151: In the costing phase, the matcher performs a preorder traversal of
        !           152: a subject tree and discovers matches from the bottom up.  At the same time, it
        !           153: builds a skeleton tree that is structurally isomorphic to the subject tree.
        !           154: When a match is discovered the cost clause of the pattern is invoked to
        !           155: calculate the cost.  Many patterns with different labels could match at any
        !           156: given node but only the lowest cost pattern for each label is recorded in
        !           157: the skeleton.
        !           158: 
        !           159: When a pattern is matched, its label is then used as input to the pattern
        !           160: matcher so that matching of patterns with labelled leaves can begin.
        !           161: This
        !           162: process
        !           163: is analogous to a reduce action in
        !           164: bottom up parsers.  In twig, multiple reductions are
        !           165: tracked.
        !           166: 
        !           167: The cost part of a rule can also assign a {\sl mode} to a match.  The {\sl
        !           168: mode} controls how action parts are to be executed.
        !           169: Actions parts of {\sl defer} mode matches are not executed until the
        !           170: execution phase.  {\sl Rewrite} mode matches cause the action part to
        !           171: be executed immediately during the costing phase to transform the
        !           172: matched subtree.  Once the rewrite mode action is executed, the costing
        !           173: phase continues by purging all information in the skeleton for the old
        !           174: subtree and then scanning the newly transformed subtree.  {\sl
        !           175: Rewrite} actions are similar to actions that are invoked on a reduction in
        !           176: LR parsing.
        !           177: 
        !           178: In the case where multiple rewrite actions are possible the one with
        !           179: the lowest cost will be executed.  Furthermore, a rewrite action is
        !           180: considered only if its cost is lower than all the defer mode actions.
        !           181: Rewrite mode actions that are not executed are treated in the same
        !           182: manner as defer mode actions in the execution phase.
        !           183: 
        !           184: When the root of the subject tree is reached in the preorder
        !           185: traversal, the execution phase begins.  The lowest cost match is then
        !           186: chosen at the root and executed.  In this context, execution involves
        !           187: executing the actions of the labelled leaves before invoking the code
        !           188: in the current action.
        !           189: \medskip
        !           190: \line{\bf Lexical Issues and Conventions\hfil}
        !           191: Currently the
        !           192: following are keywords of twig in no particular order:
        !           193: \medskip
        !           194: \line{\hfil\vbox{\hsize=3.5truein\tt
        !           195: \halign{ # \hfil & # \hfil & # \hfil \cr
        !           196: label & node & cost \cr
        !           197: action & prologue & insert\cr
        !           198: }}\hfil}
        !           199: \medskip
        !           200: \noindent  They have special meanings and may not be used as
        !           201: identifiers.
        !           202: ;:(),= are special characters.  All blanks, tabs, formfeeds and
        !           203: newlines are ignored by twig but they may not appear inside
        !           204: identifiers and numbers.  Identifiers are nonempty strings
        !           205: made of letters, digits or underscores and starting with a letter.
        !           206: Numbers are nonempty string of digits.  Code fragments or action parts
        !           207: are enclosed in braces.  Inside code fragments, C lexical rules must
        !           208: be obeyed except that strings of the form $\$...\$$ that are not inside C
        !           209: strings are interpreted by twig.  In the following sections, $id$
        !           210: denotes an identifier; $int_1$, $int_2$ denotes numbers; $ccode$
        !           211: denotes C code fragments; $...$ indicates repetition of the previous
        !           212: item; $[...]$ indicates that $...$ is optional.
        !           213: 
        !           214: The input to the Twig program will be referred to as the subject tree.
        !           215: \medskip
        !           216: \line{\bf Prologue and Inserts\hfil}
        !           217: 
        !           218: \noindent\line{\hfil \vbox{\hsize=3.5truein \tt prologue $ccode$;}\hfil}
        !           219: \medskip
        !           220: \noindent signals to twig that $ccode$ should be inserted at the
        !           221: beginning of the output source file.  $Ccode$ should contain
        !           222: definitions relevant to the C code in
        !           223: rules that are defined later in the twig source file.
        !           224: \medskip
        !           225: \noindent\line{\hfil\tt\vbox{\hsize=3.5truein insert $ccode$;}\hfil}
        !           226: \medskip
        !           227: \noindent should cause $ccode$ to be placed into the source file.
        !           228: There can be multiple inserts and they will be placed into the source
        !           229: file
        !           230: in the order that they appear.
        !           231: \medskip
        !           232: \line{\bf Declarations\hfil}
        !           233: All twig identifiers are declared before they are used.
        !           234: \medskip
        !           235: \line{\hfil \tt node $id[(int_1)] [= int_2] ...$;\hfil}
        !           236: \medskip
        !           237: \noindent A node declaration declares one or more identifier to be
        !           238: associated with nodes of the subject tree.  The identifiers are
        !           239: assigned numbers by twig but the user can overide the assigned number
        !           240: by specifying $int_2$.  The degree of the node identifer, i.e. the
        !           241: number of children, is assumed to be fixed.  Normally, twig will
        !           242: deduce the degree when $id$ is used in a rule.  However, the user may
        !           243: explicitly give the degree by specifying $int_1$.
        !           244: \medskip
        !           245: \line{\hfil\vbox{\hsize=3.5truein\tt label $id ...$;}\hfil}
        !           246: \medskip
        !           247: A label declaration indicates that the following $id$'s are to be used
        !           248: as labels of rules.
        !           249: \medskip\line{\bf Costs and Action code\hfil}
        !           250: 
        !           251: Code fragments such as the $cost$ and $action$ clauses of a rule are
        !           252: specfied by enclosing C code in braces.  All legal C constructs are
        !           253: permitted in code fragments.  In addition, the following are provided
        !           254: for convenient access to the subject tree and internal data
        !           255: structures of the matcher.
        !           256: \smallskip
        !           257: {\parindent=0pt
        !           258: $\bullet$ $\$\%n\$$ denotes a pointer value to the matcher data
        !           259: structure for the $n$th labelled leaf.  Thus to access the cost value
        !           260: associated with that leaf, the notation $\$\%n\$\to${\tt cost} may be used.
        !           261: 
        !           262: $\bullet$ $\$\$$ denotes a pointer value to the root of the subject
        !           263: tree.
        !           264: 
        !           265: $\bullet$ $\$n_1.n_2.n_3.\, \hbox{...}\,.n_{k-1}.n_k\$$ denotes
        !           266: a pointer value to the $n_k$th child of the $n_{k-1}th$ child of the
        !           267: $n_{k-2}$th child of \hbox{...} of the $n_1$th child of the root of
        !           268: the subject tree.  Each $n_i$ is a positive non-zero integer.
        !           269: }
        !           270: \medskip
        !           271: 
        !           272: Some special constructs are available to code fragments in the cost
        !           273: part of the specification.  The
        !           274: statement ``{\tt ABORT;}'' when encountered during the execution of
        !           275: the cost code, signals to the matcher that this pattern is to be
        !           276: rejected.  When a ``{\tt REWRITE;}'' statement is encountered, control
        !           277: is returned to the matcher and the rule will become a {\sl rewrite}
        !           278: mode match.  When the end of the cost code fragment is reached,
        !           279: control is returned to the matcher and the rule becomes a {\sl defer}
        !           280: mode match.  Cost values are returned to the matcher by assigning to
        !           281: the ``{\tt cost}'' variable in the cost clause.  If no assignment is
        !           282: made to the {\tt cost} variable then the returned cost will be {\tt
        !           283: DEFAULT\_COST}.
        !           284: 
        !           285: Action clauses are expected to return a new tree which will replace
        !           286: the subject tree.  This is done by returning using the standard C
        !           287: ``{\tt return($new\_tree$); }'' statement
        !           288: where {\tt $new\_tree$} is of type
        !           289: {\tt NODEPTR}.  When the end of the action clause is encountered, the
        !           290: matcher resumes execution and the subject tree is not replaced.
        !           291: \vfil\eject
        !           292: \line{\bf An Example\hfil}
        !           293: 
        !           294: The following are examples of twig rules that generate VAX code for
        !           295: the subtract instruction:
        !           296: \midinsert
        !           297: \moveright 1truein \vbox{\settabs 9 \columns
        !           298: \+prologue & $\{$\cr
        !           299: \smallskip
        !           300: \+NODE & gettemp();\cr
        !           301: \smallskip
        !           302: \+$\}$;\cr\smallskip
        !           303: \+node&long constant sub;\cr
        !           304: \+label&operand temp;\cr
        !           305: \smallskip
        !           306: \+operand:&long;&&&$/*$ rule 1 $*/$\cr
        !           307: \+operand:&constant;&&&$/*$ rule 2 $*/$\cr
        !           308: \+operand:&temp;&&&$/*$ rule 3 $*/$\cr
        !           309: \smallskip
        !           310: \+temp:&operand&&&$/*$ rule 4 $*/$\cr
        !           311: \+&$\{$ cost = {\tt TEMP\_COST}+$\$\%1\$\to$cost;$\}$\cr
        !           312: \+&$=\{$\cr
        !           313: \+&&NODE t = gettemp();\cr
        !           314: \+&& emit(``mov'', $\$\$$, t, 0);\cr
        !           315: \+&&return(t);\cr
        !           316: \+&$\}$;\cr
        !           317: \smallskip
        !           318: \+operand:&sub(operand,operand)&&&$/*$ rule 5 $*/$\cr
        !           319: \+&$\{$cost = $\$\%1\$\to$cost + $\$\%2\$\to$cost+{\tt SUB\_COST};$\}$\cr
        !           320: \+&=$\{$\cr
        !           321: \+&&NODE t = gettemp();\cr
        !           322: \+&&emit(``sub'', $\$1\$$, $\$2\$$, t, 0);\cr
        !           323: \+&&return(t);\cr
        !           324: \+&$\}$;\cr
        !           325: \smallskip
        !           326: \+temp:&sub(temp,constant)&&&$/*$ rule 6 $*/$\cr
        !           327: \+&$\{$\cr
        !           328: \+&&if(value($\$2\$$)==1)\cr
        !           329: \+&&&cost = $\$\%1\$\to$cost+{\tt DEC\_COST};\cr
        !           330: \+&&else ABORT;\cr
        !           331: \+&$\}$=$\{$\cr
        !           332: \+&&emit(``dec'', $\$1\$$, 0);\cr
        !           333: \+&&return($\$1\$$);\cr
        !           334: \+&$\}$;\cr
        !           335: }
        !           336: \endinsert
        !           337: {\parindent=0pt
        !           338: $\bullet$ Rules 3 and 4 form a loop.
        !           339: The potential loop: temp$\to$operand$\to$temp$\to$operand$...$ is
        !           340: broken
        !           341: by the matcher recognizing that the cost of the second match of temp
        !           342: does not cost less than the first match of temp.
        !           343: 
        !           344: $\bullet$ Rule 4 localizes the testing of whether an operand is a
        !           345: temporary.  It is not necessary but makes the coding of other
        !           346: cost clauses less tedious.
        !           347: 
        !           348: $\bullet$ In the cost clause of rule 5, the cost is the sum of the
        !           349: leaves plus the cost of the subtract instruction.
        !           350: The action clause emits code to add the two operands
        !           351: and leave the result in a temporary location.  The temporary is
        !           352: returned as a substitution for the subject tree.
        !           353: 
        !           354: $\bullet$ Rule 6 handles a special case where the left operand is
        !           355: already in a temporary and the constant is one.  In this case, the
        !           356: temporary is directly decremented and returned as the new tree.
        !           357: }\vfil\eject
        !           358: \end

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.