Annotation of researchv10no/cmd/twig/doc/manual/twigb.t, revision 1.1.1.1

1.1       root        1: .EQ
                      2: delim ##
                      3: .EN
                      4: .DS C
                      5: \f3 Twig: A Brief Description\f1
                      6: .DE
                      7: .sp 2 ex
                      8: .PP
                      9: Twig is a language for processing trees based on the
                     10: algorithms described above.  A twig program consists of a set
                     11: of pattern-action rules together with associated declarations.  Patterns
                     12: describe trees to be matched.  Actions calculate costs, perform tree
                     13: manipulations and other functions such as emitting code.
                     14: A twig program is translated by the twig
                     15: preprocessor into subroutines and tables in the host language which is C.
                     16: .PP
                     17: There are many ways to represent trees and costs.  To give the user more
                     18: choice in representation, twig treats them as abstract data
                     19: types (ADT).  Its manipulation of trees and costs are strictly performed
                     20: through calls to subroutines provided by the user.
                     21: .sp 1ex
                     22: .SH
                     23: Rules
                     24: .PP
                     25: The fundamental unit of a twig program is a rule.
                     26: .sp 1ex
                     27: .ti 1.5in
                     28: \f2label \f3: \f2tree_pattern \f3[ \f2cost \f3] [ = \f2action \f3]\f1
                     29: .sp 1ex
                     30: .PP
                     31: Intuitively, the pattern is used to match a subtree.  The \f2cost\f1 code
                     32: fragment is then evaluated.  The resulting cost is recorded by
                     33: the matcher for use in dynamic programming.  The \f2action\f1 is executed
                     34: if the rule is part of the least cost cover of the tree.
                     35: .PP
                     36: If the \f2cost\f1 part is missing, Twig will insert default code
                     37: that returns the special value, DEFAULT_COST.
                     38: A missing \f2action\f1 part indicates that no action will be taken when a
                     39: match is found.
                     40: .SH
                     41: Tree Patterns
                     42: .PP
                     43: Tree patterns are specified
                     44: in prefix parenthesized form and can be described by the
                     45: following BNF:
                     46: .DS
                     47: .na
                     48: .nf
                     49: \f2tree_pattern \(-> node_id | label_id
                     50: tree_pattern \(-> node_id (tree_list)
                     51: tree_list \(-> tree,tree_list | tree
                     52: .DE
                     53: .ad
                     54: .fi
                     55: For example, the tree,
                     56: .DS
                     57: .sp 1in
                     58: .DE
                     59: will be written in twig as
                     60: .sp 1ex
                     61: .DS
                     62: .na
                     63: .nf
                     64: \f3oper(leftOp,rightOp)\f1
                     65: .DE
                     66: .ad
                     67: .fi
                     68: .PP
                     69: There are two types of symbols: \f2node_id\f1s and \f2label_id\f1s.
                     70: \f2Node_id\f1s are used to denote internal nodes and leaves.
                     71: \f2Label_id\f1s label tree patterns and are analogous to
                     72: non-terminals in context free grammars.
                     73: For example, the following twig rules without their action parts describe
                     74: simple expression trees with the \f3plus\f1 operator.
                     75: .DS
                     76: \f3
                     77: .na
                     78: .nf
                     79: expr: plus(expr, expr)
                     80: expr: identifier
                     81: expr: constant
                     82: .DE
                     83: .ad
                     84: .fi
                     85: \f3Identifier\f1 and \f3constant\f1 are \f2node_id\f1s
                     86: representing leaves, and
                     87: \f2plus\f1 is a \f2node_id\f1s representing an internal node whereas
                     88: \f3expr\f1 is a
                     89: \f2label_id\f1.  Leaves of patterns that are \f2label_id\f1s are called
                     90: \f2labelled leaves\f1.
                     91: In the first rule, both leaves of the pattern are labelled.
                     92: .PP
                     93: Twig assigns an integer to each node symbol and label symbol.  These
                     94: integers are used by the twig pattern matcher as encodings for the
                     95: symbols.
                     96: As the matcher traverses the
                     97: tree, a user supplied subroutine is called to provide an integer
                     98: corresponding to each node.
                     99: .SH
                    100: Costs
                    101: .PP
                    102: To increase the flexibility of representing costs, the tree matcher
                    103: views costs as an ADT.
                    104: For example, costs may be represented as an integer or
                    105: as a vector of integers with each element representing the cost of a specific
                    106: resource.
                    107: A cost ADT suitable for Twig must have the following four definitions:
                    108: .IP
                    109: \(bu \f3COST\f1 is a C type.
                    110: Although the proper functioning of the tree matcher
                    111: does not depend on the details of the \f3COST\f1 type, it must have
                    112: the type information for storage allocation purposes.
                    113: .IP
                    114: \(bu
                    115: \f3INFINITY\f1 is the maximum attainable cost value.  The matcher uses
                    116: \f3INFINITY\f1 to initialize its data structures.
                    117: .IP
                    118: \(bu
                    119: \f3DEFAULT_COST\f1 is the cost
                    120: value returned by rules without a cost part.
                    121: .IP
                    122: \(bu
                    123: \f3COSTLESS\f1 is a function of
                    124: two cost values.  It must return true if and only if the first argument is
                    125: less than the second.
                    126: .sp 2ex
                    127: .SH
                    128: Trees
                    129: .PP
                    130: As with costs, Twig treats trees as an ADT.  Here, using an ADT is
                    131: even more important because trees come in a variety of shapes and
                    132: representations.
                    133: Twig would be overly complicated if it had to
                    134: know any more than the fundamental properties of trees.  Thus,
                    135: twig treats trees as an acyclic directed
                    136: graph of almost featureless nodes with one
                    137: distinguished node, the root.  Each node has only one feature and that
                    138: is an integer corresponding to the \f2node_id\f1s discussed above.
                    139: To provide this
                    140: view to Twig, the user must provide the following definitions and
                    141: subroutines.
                    142: .IP
                    143: \(bu \f3NODE\f1 is the type of a node.  The actual
                    144: details of
                    145: the NODE are irrelevant and Twig uses this definition only for storage
                    146: allocation and declaration purposes.
                    147: .IP
                    148: \(bu \f3NONODE\f1 is a constant of type \f3NODE\f1.  It is a distinguished
                    149: value returned by the routines defined below to represent a null value.
                    150: .IP
                    151: \(bu
                    152: \f3mtGetNodes#(r,n)# returns the #n#th child of \f2r\f1 where \f2r\f1
                    153: is a \f3NODE\f1.  The
                    154: leftmost child is \f3mtGetNodes\f2(r,1)\f1.  If #n>degree(r)#
                    155: then the function should return \f3NONODE\f1.  Twig expects the
                    156: expression \f3mtGetNodes\f2(\f3NONODE\f2, 0)\f1
                    157: to denote the root node of
                    158: the subject tree.
                    159: .IP
                    160: \(bu
                    161: \f3mtValue\f2(r)\f1 returns the integer associated with \f2r\f1.
                    162: .IP
                    163: \(bu
                    164: \f3mtSetNodes\f2(r,n,c)\f1 replaces the \f2n\f1th child of \f2r\f1 with
                    165: \f2c\f1.  This routine may be used to replace whole subtrees with others.
                    166: .sp 2ex
                    167: .SH
                    168: Pattern Matcher Operation
                    169: .PP
                    170: The pattern matcher operates in two phases: the costing phase and the
                    171: execution phase.
                    172: .PP
                    173: In the costing phase, the matcher performs a preorder traversal of
                    174: a subject tree and discovers matches from the bottom up.  At the same time, it
                    175: builds a skeleton tree that is structurally isomorphic to the subject tree.
                    176: When a match is discovered the cost clause of the pattern is invoked to
                    177: calculate the cost.  Many patterns with different labels could match at any
                    178: given node but only the lowest cost pattern for each label is recorded in
                    179: the skeleton.
                    180: .PP
                    181: When a pattern is matched, its label is then used as input to the pattern
                    182: matcher so that matching of patterns with labelled leaves can begin.
                    183: This
                    184: process
                    185: is analogous to a reduce action in
                    186: bottom up parsers.  In twig, multiple reductions are
                    187: tracked.
                    188: .PP
                    189: The cost part of a rule can also assign a \f2mode\f1 to a match.  The
                    190: \f2mode\f1 controls how action parts are to be executed.
                    191: Actions parts of \f2defer\f1 mode matches are not executed until the
                    192: execution phase.  \f2Rewrite\f1 mode matches cause the action part to
                    193: be executed immediately during the costing phase to transform the
                    194: matched subtree.  Once the rewrite mode action is executed, the costing
                    195: phase continues by purging all information in the skeleton for the old
                    196: subtree and then scanning the newly transformed subtree.
                    197: \f2Rewrite\f1 actions are similar
                    198: to actions that are invoked on a reduction in
                    199: LR parsing.
                    200: .PP
                    201: In the case where multiple rewrite actions are possible the one with
                    202: the lowest cost will be executed.  Furthermore, a rewrite action is
                    203: considered only if its cost is lower than all the defer mode actions.
                    204: Rewrite mode actions that are not executed are treated in the same
                    205: manner as defer mode actions in the execution phase.
                    206: .PP
                    207: When the root of the subject tree is reached in the preorder
                    208: traversal, the execution phase begins.  The lowest cost match is then
                    209: chosen at the root and executed.  In this context, execution involves
                    210: executing the actions of the labelled leaves before invoking the code
                    211: in the current action.
                    212: .SH
                    213: Lexical Issues and Conventions
                    214: .PP
                    215: Currently the
                    216: following are keywords of twig in no particular order:
                    217: .DS
                    218: .TS
                    219: center;
                    220: l l l.
                    221: label  node    cost
                    222: action prologue        insert
                    223: .TE
                    224: .DE
                    225: They have special meanings and may not be used as
                    226: identifiers.
                    227: ;:(),= are special characters.  All blanks, tabs, formfeeds and
                    228: newlines are ignored by twig but they may not appear inside
                    229: identifiers and numbers.  Identifiers are nonempty strings
                    230: made of letters, digits or underscores and starting with a letter.
                    231: Numbers are nonempty string of digits.  Code fragments or action parts
                    232: are enclosed in braces.  Inside code fragments, C lexical rules must
                    233: be obeyed except that strings of the form $...$ that are not inside C
                    234: strings are interpreted by twig.  In the following sections, $id$
                    235: denotes an identifier; \f2int1\f1, \f2int2\f1 denotes numbers; \f2ccode\f1
                    236: denotes C code fragments; ... indicates repetition of the previous
                    237: item; [...] indicates that ... is optional.
                    238: .PP
                    239: The input to the Twig program will be referred to as the subject tree.
                    240: .sp 2ex
                    241: .SH
                    242: Prologue and Inserts
                    243: \f1
                    244: .DS
                    245: \f3prologue \f2ccode\f1;
                    246: .DE
                    247: signals to twig that \f2ccode\f1 should be inserted at the
                    248: beginning of the output source file.  \f2Ccode\f1 should contain
                    249: definitions relevant to the C code in
                    250: rules that are defined later in the twig source file.
                    251: .DS
                    252: \f3insert \f2ccode\f1;
                    253: .DE
                    254: should cause \f2ccode\f1 to be placed into the source file.
                    255: There can be multiple inserts and they will be placed into the source
                    256: file
                    257: in the order that they appear.
                    258: .SH
                    259: Declarations
                    260: .PP
                    261: All twig identifiers are declared before they are used.
                    262: .DS
                    263: \f3node\f1 \f2id\f1[(\f2int1\f1)] [= \f2int2\f1] ...;
                    264: .DE
                    265: A node declaration declares one or more identifier to be
                    266: associated with nodes of the subject tree.  The identifiers are
                    267: assigned numbers by twig but the user can overide the assigned number
                    268: by specifying \f2int2\f1.  The degree of the node identifer, i.e. the
                    269: number of children, is assumed to be fixed.  Normally, twig will
                    270: deduce the degree when \f2id\f1 is used in a rule.  However, the user may
                    271: explicitly give the degree by specifying \f2int1\f1.
                    272: .DS
                    273: \f3label\f2 id\f1 ...;
                    274: .DE
                    275: A label declaration indicates that the following $id$'s are to be used
                    276: as labels of rules.
                    277: .sp 2ex
                    278: .SH
                    279: Costs and Action code
                    280: .PP
                    281: Code fragments such as the \f2cost\f1 and \f2action\f1 clauses of a rule are
                    282: specfied by enclosing C code in braces.  All legal C constructs are
                    283: permitted in code fragments.  In addition, the following are provided
                    284: for convenient access to the subject tree and internal data
                    285: structures of the matcher.
                    286: .sp 1ex
                    287: .IP
                    288: \(bu $%\f2n\f1$ denotes a pointer value to the matcher data
                    289: structure for the $n$th labelled leaf.  Thus to access the cost value
                    290: associated with that leaf, the notation $%\f2n\f1$\(->cost may be used.
                    291: .IP
                    292: \(bu
                    293: $$ denotes a pointer value to the root of the subject
                    294: tree.
                    295: .IP
                    296: \(bu
                    297: $#{n sub 1}.{n sub 2}.{n sub 3}...{n sub {k-1}}.{n sub k}#$ denotes
                    298: a pointer value to the ${n sub k}$th child of the $#{n sub {k-1}}#th$
                    299: child of the
                    300: $#{n sub {k-2}}#$th child of ... of the $#{n sub 1 }#$th child of the root of
                    301: the subject tree.  Each $#{n sub i}#$ is a positive non-zero integer.
                    302: .PP
                    303: Some special constructs are available to code fragments in the cost
                    304: part of the specification.  The
                    305: statement "\f3ABORT;\f1" when encountered during the execution of
                    306: the cost code, signals to the matcher that this pattern is to be
                    307: rejected.  When a "\f3REWRITE;\f1" statement is encountered, control
                    308: is returned to the matcher and the rule will become a \f2rewrite\f1
                    309: mode match.  When the end of the cost code fragment is reached,
                    310: control is returned to the matcher and the rule becomes a \f2defer\f1
                    311: mode match.  Cost values are returned to the matcher by assigning to
                    312: a variable named
                    313: "\f3cost\f1" in the cost clause.  If no assignment is
                    314: made to \f3cost\f1 then the returned cost will be
                    315: \f3DEFAULT_COST\f1.
                    316: .PP
                    317: Action clauses are expected to return a new tree which will replace
                    318: the subject tree.  This is done by returning using the standard C
                    319: "\f3return\f2(new_tree);\f1" statement
                    320: where \f2new_tree\f1 is of type
                    321: \f3NODEPTR\f1.  When the end of the action clause is encountered, the
                    322: matcher resumes execution and the subject tree is not replaced.
                    323: .bp
                    324: .SH
                    325: An Example
                    326: .PP
                    327: The following are examples of twig rules that generate VAX code for
                    328: the subtract instruction:
                    329: .DS
                    330: .na
                    331: .nf
                    332: prologue {
                    333:        NODE gettermp();
                    334: };
                    335: node   long constant sub;
                    336: label  operand temp;
                    337: 
                    338: operand: long;                 /* rule 1 */
                    339: operand: constant;             /* rule 2 */
                    340: operand: temp;                 /* rule 3 */
                    341: temp:   operand;               /* rule 4 */
                    342:        { cost = TEMP_COST+$%1$\(->cost; }
                    343:        = {
                    344:                NODE t = gettemp();
                    345:                emit("mov", $$, t, 0);
                    346:                return(t);
                    347:        };
                    348: 
                    349: operand: sub(operand,opernad)  /* rule 5 */
                    350:        { cost = $%1$\(->cost-$%2$\(->cost+SUB_COST; }
                    351:        = {
                    352:                NODE t = gettemp();
                    353:                emit("sub", $1$, $2$, t, 0);
                    354:                return(t);
                    355:        };
                    356: 
                    357: temp:  sub(temp,constant)      /* rule 6 $*/
                    358:        {
                    359:                if(value($2$)==1)
                    360:                        cost = $%1$\(->cost+DEC_COST;
                    361:                else ABORT;
                    362:        } = {
                    363:                emit("dec", $1$, 0);
                    364:                return($1$);
                    365:        };
                    366: .DE
                    367: .IP
                    368: \(bu
                    369: Rules 3 and 4 form a loop.
                    370: The potential loop: temp\(->operand\(->temp\(->operand... is
                    371: broken
                    372: by the matcher recognizing that the cost of the second match of temp
                    373: does not cost less than the first match of temp.
                    374: .IP
                    375: \(bu In the cost clause of rule 5, the cost is the sum of the
                    376: leaves plus the cost of the subtract instruction.
                    377: The action clause emits code to add the two operands
                    378: and leave the result in a temporary location.  The temporary is
                    379: returned as a substitution for the subject tree.
                    380: .IP
                    381: \(bu Rule 6 handles a special case where the left operand is
                    382: already in a temporary and the constant is one.  In this case, the
                    383: temporary is directly decremented and returned as the new tree.
                    384: .bp
                    385: .DS C
                    386: \f3Experimental Results\f1
                    387: .DE
                    388: .sp 2ex
                    389: .PP
                    390: A code generator for the VAX has been build using Twig and incorporated
                    391: into the pcc2 compiler.  The following table summarizes the results
                    392: for three benchmark programs.
                    393: The compiler using the Twig code generator is abbreviated \f2tcc\f1.
                    394: The times were obtained on a VAX-11/780
                    395: running 4.2bsd UNIX.
                    396: The first number is the time spent in the compiler.
                    397: The parenthesized number is the time spent in the operating system.
                    398: .TS
                    399: center;
                    400: c c c c c
                    401: l n n n n.
                    402:        lines   tcc     pcc2    improvement
                    403: _
                    404: grep.c 458     9.9(1.4)        11.4(0.9)       13%
                    405: pmach.c        610     13.2(1.5)       13.5(0.6)       2%
                    406: reader.c       1005    23.3(1.6)       31.2(0.9)       25%
                    407: .TE
                    408: .PP
                    409: The running time for tcc is slightly
                    410: faster than that of pcc2.
                    411: However, The higher system times of tcc indicates that tcc place
                    412: higher demands on the operating system.
                    413: In the case of pmach.c, the total CPU time is actually more than that
                    414: of pcc2's.
                    415: .PP
                    416: Twig requires more dynamic storage than pcc2.  For a tree, the dynamic
                    417: storage requirement on the average is  number of nodes * 200
                    418: bytes
                    419: while the worst case is about number of nodes * 1000 bytes.
                    420: However this storage is reclaimed and reused for every new tree.
                    421: The larger system time
                    422: is caused by calls to the runtime system for dynamic storage.
                    423: .PP
                    424: Writing and modifying tcc was quick.
                    425: The basic
                    426: VAX without indexed addressing mode can be described in about two weeks.
                    427: This was done concurrently with debugging much of the table
                    428: walker.  Adding indexed modes took
                    429: only several hours once we were confident that the rest of the
                    430: description was correct.
                    431: .PP
                    432: One reason for the ease of modifcation is that rules can be added
                    433: independent of the other rules.  Looping can be completely ignored since
                    434: dynamic programming eliminates that possiblity.
                    435: Table generation was extremely fast or about 4.2 seconds for the
                    436: VAX description.  Thus testing the description could be done quickly.
                    437: .PP
                    438: The tables were very small.  For the VAX, there were only 109 rules.
                    439: This required 7.5K bytes of table space.
                    440: The whole code generator is another 40K
                    441: bytes.
                    442: .SH
                    443: Code Quality
                    444: .PP
                    445: The code generated by the Twig for the test programs were of the
                    446: same quality as code generated by pcc2.
                    447: However, Twig does better targeting.
                    448: .DS
                    449: tcc:
                    450: .ti +4em
                    451: addl3  -4(fp),-8(fp),-12(fp)
                    452: .sp 1ex
                    453: pcc2:
                    454: .ti +4em
                    455: addl3  -4(fp),-8(fp),r0
                    456: .ti +4em
                    457: movl   r0,-12(fp)
                    458: .DE
                    459: On the other hand, pcc2 handled temporary registers better
                    460: and for an equivalent sequence of code, Twig may use
                    461: more temporary registers.
                    462: .DS
                    463: tcc:
                    464: .ti +4em
                    465:        cvtbl   (r0),r1
                    466: pcc2:
                    467: .ti +4em
                    468:        cvtbl   (r0),r0
                    469: .DE
                    470: This is actually a symptom of how temporary registers are allocated in tcc.
                    471: Registers are allocated before code is emitted and thus any registers freed
                    472: during code emissions will not be reused until the next code sequence.
                    473: .SH
                    474: Room for improvement
                    475: .PP
                    476: Currently Twig is slow compared to other table-driven systems such as
                    477: [Henry] and [Ganapathi].  However, we have had less than one year of
                    478: experience with Twig and we believe that its performance can be
                    479: improved because
                    480: .IP
                    481: \(bu Unit productions such as
                    482: .DS
                    483: operand: temp
                    484: .DE
                    485: in the above example increases the running time of Twig
                    486: since the closure operation for unit production is done at run-time.
                    487: .IP
                    488: \(bu Machine representations.  The current compact
                    489: representation of the machine require linear searching except at the
                    490: start state where the large branching factor compelled us to use an
                    491: array scheme.  Using another representation would speed up the
                    492: matcher.  For example, implementing the VAX description as an array is
                    493: estimated to use about 50K bytes but would provide a significant
                    494: performance improvement.
                    495: .IP
                    496: \(bu Twig performs many more procedure calls than pcc2.
                    497: Every machine step requires at least one procedure call.  Contrast
                    498: this to YACC where a machine step is performed in line.  Procedure
                    499: calls on the VAX are expensive.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.