Annotation of researchv10no/cmd/twig/doc/manual/twigb.t, revision 1.1

1.1     ! root        1: .EQ
        !             2: delim ##
        !             3: .EN
        !             4: .DS C
        !             5: \f3 Twig: A Brief Description\f1
        !             6: .DE
        !             7: .sp 2 ex
        !             8: .PP
        !             9: Twig is a language for processing trees based on the
        !            10: algorithms described above.  A twig program consists of a set
        !            11: of pattern-action rules together with associated declarations.  Patterns
        !            12: describe trees to be matched.  Actions calculate costs, perform tree
        !            13: manipulations and other functions such as emitting code.
        !            14: A twig program is translated by the twig
        !            15: preprocessor into subroutines and tables in the host language which is C.
        !            16: .PP
        !            17: There are many ways to represent trees and costs.  To give the user more
        !            18: choice in representation, twig treats them as abstract data
        !            19: types (ADT).  Its manipulation of trees and costs are strictly performed
        !            20: through calls to subroutines provided by the user.
        !            21: .sp 1ex
        !            22: .SH
        !            23: Rules
        !            24: .PP
        !            25: The fundamental unit of a twig program is a rule.
        !            26: .sp 1ex
        !            27: .ti 1.5in
        !            28: \f2label \f3: \f2tree_pattern \f3[ \f2cost \f3] [ = \f2action \f3]\f1
        !            29: .sp 1ex
        !            30: .PP
        !            31: Intuitively, the pattern is used to match a subtree.  The \f2cost\f1 code
        !            32: fragment is then evaluated.  The resulting cost is recorded by
        !            33: the matcher for use in dynamic programming.  The \f2action\f1 is executed
        !            34: if the rule is part of the least cost cover of the tree.
        !            35: .PP
        !            36: If the \f2cost\f1 part is missing, Twig will insert default code
        !            37: that returns the special value, DEFAULT_COST.
        !            38: A missing \f2action\f1 part indicates that no action will be taken when a
        !            39: match is found.
        !            40: .SH
        !            41: Tree Patterns
        !            42: .PP
        !            43: Tree patterns are specified
        !            44: in prefix parenthesized form and can be described by the
        !            45: following BNF:
        !            46: .DS
        !            47: .na
        !            48: .nf
        !            49: \f2tree_pattern \(-> node_id | label_id
        !            50: tree_pattern \(-> node_id (tree_list)
        !            51: tree_list \(-> tree,tree_list | tree
        !            52: .DE
        !            53: .ad
        !            54: .fi
        !            55: For example, the tree,
        !            56: .DS
        !            57: .sp 1in
        !            58: .DE
        !            59: will be written in twig as
        !            60: .sp 1ex
        !            61: .DS
        !            62: .na
        !            63: .nf
        !            64: \f3oper(leftOp,rightOp)\f1
        !            65: .DE
        !            66: .ad
        !            67: .fi
        !            68: .PP
        !            69: There are two types of symbols: \f2node_id\f1s and \f2label_id\f1s.
        !            70: \f2Node_id\f1s are used to denote internal nodes and leaves.
        !            71: \f2Label_id\f1s label tree patterns and are analogous to
        !            72: non-terminals in context free grammars.
        !            73: For example, the following twig rules without their action parts describe
        !            74: simple expression trees with the \f3plus\f1 operator.
        !            75: .DS
        !            76: \f3
        !            77: .na
        !            78: .nf
        !            79: expr: plus(expr, expr)
        !            80: expr: identifier
        !            81: expr: constant
        !            82: .DE
        !            83: .ad
        !            84: .fi
        !            85: \f3Identifier\f1 and \f3constant\f1 are \f2node_id\f1s
        !            86: representing leaves, and
        !            87: \f2plus\f1 is a \f2node_id\f1s representing an internal node whereas
        !            88: \f3expr\f1 is a
        !            89: \f2label_id\f1.  Leaves of patterns that are \f2label_id\f1s are called
        !            90: \f2labelled leaves\f1.
        !            91: In the first rule, both leaves of the pattern are labelled.
        !            92: .PP
        !            93: Twig assigns an integer to each node symbol and label symbol.  These
        !            94: integers are used by the twig pattern matcher as encodings for the
        !            95: symbols.
        !            96: As the matcher traverses the
        !            97: tree, a user supplied subroutine is called to provide an integer
        !            98: corresponding to each node.
        !            99: .SH
        !           100: Costs
        !           101: .PP
        !           102: To increase the flexibility of representing costs, the tree matcher
        !           103: views costs as an ADT.
        !           104: For example, costs may be represented as an integer or
        !           105: as a vector of integers with each element representing the cost of a specific
        !           106: resource.
        !           107: A cost ADT suitable for Twig must have the following four definitions:
        !           108: .IP
        !           109: \(bu \f3COST\f1 is a C type.
        !           110: Although the proper functioning of the tree matcher
        !           111: does not depend on the details of the \f3COST\f1 type, it must have
        !           112: the type information for storage allocation purposes.
        !           113: .IP
        !           114: \(bu
        !           115: \f3INFINITY\f1 is the maximum attainable cost value.  The matcher uses
        !           116: \f3INFINITY\f1 to initialize its data structures.
        !           117: .IP
        !           118: \(bu
        !           119: \f3DEFAULT_COST\f1 is the cost
        !           120: value returned by rules without a cost part.
        !           121: .IP
        !           122: \(bu
        !           123: \f3COSTLESS\f1 is a function of
        !           124: two cost values.  It must return true if and only if the first argument is
        !           125: less than the second.
        !           126: .sp 2ex
        !           127: .SH
        !           128: Trees
        !           129: .PP
        !           130: As with costs, Twig treats trees as an ADT.  Here, using an ADT is
        !           131: even more important because trees come in a variety of shapes and
        !           132: representations.
        !           133: Twig would be overly complicated if it had to
        !           134: know any more than the fundamental properties of trees.  Thus,
        !           135: twig treats trees as an acyclic directed
        !           136: graph of almost featureless nodes with one
        !           137: distinguished node, the root.  Each node has only one feature and that
        !           138: is an integer corresponding to the \f2node_id\f1s discussed above.
        !           139: To provide this
        !           140: view to Twig, the user must provide the following definitions and
        !           141: subroutines.
        !           142: .IP
        !           143: \(bu \f3NODE\f1 is the type of a node.  The actual
        !           144: details of
        !           145: the NODE are irrelevant and Twig uses this definition only for storage
        !           146: allocation and declaration purposes.
        !           147: .IP
        !           148: \(bu \f3NONODE\f1 is a constant of type \f3NODE\f1.  It is a distinguished
        !           149: value returned by the routines defined below to represent a null value.
        !           150: .IP
        !           151: \(bu
        !           152: \f3mtGetNodes#(r,n)# returns the #n#th child of \f2r\f1 where \f2r\f1
        !           153: is a \f3NODE\f1.  The
        !           154: leftmost child is \f3mtGetNodes\f2(r,1)\f1.  If #n>degree(r)#
        !           155: then the function should return \f3NONODE\f1.  Twig expects the
        !           156: expression \f3mtGetNodes\f2(\f3NONODE\f2, 0)\f1
        !           157: to denote the root node of
        !           158: the subject tree.
        !           159: .IP
        !           160: \(bu
        !           161: \f3mtValue\f2(r)\f1 returns the integer associated with \f2r\f1.
        !           162: .IP
        !           163: \(bu
        !           164: \f3mtSetNodes\f2(r,n,c)\f1 replaces the \f2n\f1th child of \f2r\f1 with
        !           165: \f2c\f1.  This routine may be used to replace whole subtrees with others.
        !           166: .sp 2ex
        !           167: .SH
        !           168: Pattern Matcher Operation
        !           169: .PP
        !           170: The pattern matcher operates in two phases: the costing phase and the
        !           171: execution phase.
        !           172: .PP
        !           173: In the costing phase, the matcher performs a preorder traversal of
        !           174: a subject tree and discovers matches from the bottom up.  At the same time, it
        !           175: builds a skeleton tree that is structurally isomorphic to the subject tree.
        !           176: When a match is discovered the cost clause of the pattern is invoked to
        !           177: calculate the cost.  Many patterns with different labels could match at any
        !           178: given node but only the lowest cost pattern for each label is recorded in
        !           179: the skeleton.
        !           180: .PP
        !           181: When a pattern is matched, its label is then used as input to the pattern
        !           182: matcher so that matching of patterns with labelled leaves can begin.
        !           183: This
        !           184: process
        !           185: is analogous to a reduce action in
        !           186: bottom up parsers.  In twig, multiple reductions are
        !           187: tracked.
        !           188: .PP
        !           189: The cost part of a rule can also assign a \f2mode\f1 to a match.  The
        !           190: \f2mode\f1 controls how action parts are to be executed.
        !           191: Actions parts of \f2defer\f1 mode matches are not executed until the
        !           192: execution phase.  \f2Rewrite\f1 mode matches cause the action part to
        !           193: be executed immediately during the costing phase to transform the
        !           194: matched subtree.  Once the rewrite mode action is executed, the costing
        !           195: phase continues by purging all information in the skeleton for the old
        !           196: subtree and then scanning the newly transformed subtree.
        !           197: \f2Rewrite\f1 actions are similar
        !           198: to actions that are invoked on a reduction in
        !           199: LR parsing.
        !           200: .PP
        !           201: In the case where multiple rewrite actions are possible the one with
        !           202: the lowest cost will be executed.  Furthermore, a rewrite action is
        !           203: considered only if its cost is lower than all the defer mode actions.
        !           204: Rewrite mode actions that are not executed are treated in the same
        !           205: manner as defer mode actions in the execution phase.
        !           206: .PP
        !           207: When the root of the subject tree is reached in the preorder
        !           208: traversal, the execution phase begins.  The lowest cost match is then
        !           209: chosen at the root and executed.  In this context, execution involves
        !           210: executing the actions of the labelled leaves before invoking the code
        !           211: in the current action.
        !           212: .SH
        !           213: Lexical Issues and Conventions
        !           214: .PP
        !           215: Currently the
        !           216: following are keywords of twig in no particular order:
        !           217: .DS
        !           218: .TS
        !           219: center;
        !           220: l l l.
        !           221: label  node    cost
        !           222: action prologue        insert
        !           223: .TE
        !           224: .DE
        !           225: They have special meanings and may not be used as
        !           226: identifiers.
        !           227: ;:(),= are special characters.  All blanks, tabs, formfeeds and
        !           228: newlines are ignored by twig but they may not appear inside
        !           229: identifiers and numbers.  Identifiers are nonempty strings
        !           230: made of letters, digits or underscores and starting with a letter.
        !           231: Numbers are nonempty string of digits.  Code fragments or action parts
        !           232: are enclosed in braces.  Inside code fragments, C lexical rules must
        !           233: be obeyed except that strings of the form $...$ that are not inside C
        !           234: strings are interpreted by twig.  In the following sections, $id$
        !           235: denotes an identifier; \f2int1\f1, \f2int2\f1 denotes numbers; \f2ccode\f1
        !           236: denotes C code fragments; ... indicates repetition of the previous
        !           237: item; [...] indicates that ... is optional.
        !           238: .PP
        !           239: The input to the Twig program will be referred to as the subject tree.
        !           240: .sp 2ex
        !           241: .SH
        !           242: Prologue and Inserts
        !           243: \f1
        !           244: .DS
        !           245: \f3prologue \f2ccode\f1;
        !           246: .DE
        !           247: signals to twig that \f2ccode\f1 should be inserted at the
        !           248: beginning of the output source file.  \f2Ccode\f1 should contain
        !           249: definitions relevant to the C code in
        !           250: rules that are defined later in the twig source file.
        !           251: .DS
        !           252: \f3insert \f2ccode\f1;
        !           253: .DE
        !           254: should cause \f2ccode\f1 to be placed into the source file.
        !           255: There can be multiple inserts and they will be placed into the source
        !           256: file
        !           257: in the order that they appear.
        !           258: .SH
        !           259: Declarations
        !           260: .PP
        !           261: All twig identifiers are declared before they are used.
        !           262: .DS
        !           263: \f3node\f1 \f2id\f1[(\f2int1\f1)] [= \f2int2\f1] ...;
        !           264: .DE
        !           265: A node declaration declares one or more identifier to be
        !           266: associated with nodes of the subject tree.  The identifiers are
        !           267: assigned numbers by twig but the user can overide the assigned number
        !           268: by specifying \f2int2\f1.  The degree of the node identifer, i.e. the
        !           269: number of children, is assumed to be fixed.  Normally, twig will
        !           270: deduce the degree when \f2id\f1 is used in a rule.  However, the user may
        !           271: explicitly give the degree by specifying \f2int1\f1.
        !           272: .DS
        !           273: \f3label\f2 id\f1 ...;
        !           274: .DE
        !           275: A label declaration indicates that the following $id$'s are to be used
        !           276: as labels of rules.
        !           277: .sp 2ex
        !           278: .SH
        !           279: Costs and Action code
        !           280: .PP
        !           281: Code fragments such as the \f2cost\f1 and \f2action\f1 clauses of a rule are
        !           282: specfied by enclosing C code in braces.  All legal C constructs are
        !           283: permitted in code fragments.  In addition, the following are provided
        !           284: for convenient access to the subject tree and internal data
        !           285: structures of the matcher.
        !           286: .sp 1ex
        !           287: .IP
        !           288: \(bu $%\f2n\f1$ denotes a pointer value to the matcher data
        !           289: structure for the $n$th labelled leaf.  Thus to access the cost value
        !           290: associated with that leaf, the notation $%\f2n\f1$\(->cost may be used.
        !           291: .IP
        !           292: \(bu
        !           293: $$ denotes a pointer value to the root of the subject
        !           294: tree.
        !           295: .IP
        !           296: \(bu
        !           297: $#{n sub 1}.{n sub 2}.{n sub 3}...{n sub {k-1}}.{n sub k}#$ denotes
        !           298: a pointer value to the ${n sub k}$th child of the $#{n sub {k-1}}#th$
        !           299: child of the
        !           300: $#{n sub {k-2}}#$th child of ... of the $#{n sub 1 }#$th child of the root of
        !           301: the subject tree.  Each $#{n sub i}#$ is a positive non-zero integer.
        !           302: .PP
        !           303: Some special constructs are available to code fragments in the cost
        !           304: part of the specification.  The
        !           305: statement "\f3ABORT;\f1" when encountered during the execution of
        !           306: the cost code, signals to the matcher that this pattern is to be
        !           307: rejected.  When a "\f3REWRITE;\f1" statement is encountered, control
        !           308: is returned to the matcher and the rule will become a \f2rewrite\f1
        !           309: mode match.  When the end of the cost code fragment is reached,
        !           310: control is returned to the matcher and the rule becomes a \f2defer\f1
        !           311: mode match.  Cost values are returned to the matcher by assigning to
        !           312: a variable named
        !           313: "\f3cost\f1" in the cost clause.  If no assignment is
        !           314: made to \f3cost\f1 then the returned cost will be
        !           315: \f3DEFAULT_COST\f1.
        !           316: .PP
        !           317: Action clauses are expected to return a new tree which will replace
        !           318: the subject tree.  This is done by returning using the standard C
        !           319: "\f3return\f2(new_tree);\f1" statement
        !           320: where \f2new_tree\f1 is of type
        !           321: \f3NODEPTR\f1.  When the end of the action clause is encountered, the
        !           322: matcher resumes execution and the subject tree is not replaced.
        !           323: .bp
        !           324: .SH
        !           325: An Example
        !           326: .PP
        !           327: The following are examples of twig rules that generate VAX code for
        !           328: the subtract instruction:
        !           329: .DS
        !           330: .na
        !           331: .nf
        !           332: prologue {
        !           333:        NODE gettermp();
        !           334: };
        !           335: node   long constant sub;
        !           336: label  operand temp;
        !           337: 
        !           338: operand: long;                 /* rule 1 */
        !           339: operand: constant;             /* rule 2 */
        !           340: operand: temp;                 /* rule 3 */
        !           341: temp:   operand;               /* rule 4 */
        !           342:        { cost = TEMP_COST+$%1$\(->cost; }
        !           343:        = {
        !           344:                NODE t = gettemp();
        !           345:                emit("mov", $$, t, 0);
        !           346:                return(t);
        !           347:        };
        !           348: 
        !           349: operand: sub(operand,opernad)  /* rule 5 */
        !           350:        { cost = $%1$\(->cost-$%2$\(->cost+SUB_COST; }
        !           351:        = {
        !           352:                NODE t = gettemp();
        !           353:                emit("sub", $1$, $2$, t, 0);
        !           354:                return(t);
        !           355:        };
        !           356: 
        !           357: temp:  sub(temp,constant)      /* rule 6 $*/
        !           358:        {
        !           359:                if(value($2$)==1)
        !           360:                        cost = $%1$\(->cost+DEC_COST;
        !           361:                else ABORT;
        !           362:        } = {
        !           363:                emit("dec", $1$, 0);
        !           364:                return($1$);
        !           365:        };
        !           366: .DE
        !           367: .IP
        !           368: \(bu
        !           369: Rules 3 and 4 form a loop.
        !           370: The potential loop: temp\(->operand\(->temp\(->operand... is
        !           371: broken
        !           372: by the matcher recognizing that the cost of the second match of temp
        !           373: does not cost less than the first match of temp.
        !           374: .IP
        !           375: \(bu In the cost clause of rule 5, the cost is the sum of the
        !           376: leaves plus the cost of the subtract instruction.
        !           377: The action clause emits code to add the two operands
        !           378: and leave the result in a temporary location.  The temporary is
        !           379: returned as a substitution for the subject tree.
        !           380: .IP
        !           381: \(bu Rule 6 handles a special case where the left operand is
        !           382: already in a temporary and the constant is one.  In this case, the
        !           383: temporary is directly decremented and returned as the new tree.
        !           384: .bp
        !           385: .DS C
        !           386: \f3Experimental Results\f1
        !           387: .DE
        !           388: .sp 2ex
        !           389: .PP
        !           390: A code generator for the VAX has been build using Twig and incorporated
        !           391: into the pcc2 compiler.  The following table summarizes the results
        !           392: for three benchmark programs.
        !           393: The compiler using the Twig code generator is abbreviated \f2tcc\f1.
        !           394: The times were obtained on a VAX-11/780
        !           395: running 4.2bsd UNIX.
        !           396: The first number is the time spent in the compiler.
        !           397: The parenthesized number is the time spent in the operating system.
        !           398: .TS
        !           399: center;
        !           400: c c c c c
        !           401: l n n n n.
        !           402:        lines   tcc     pcc2    improvement
        !           403: _
        !           404: grep.c 458     9.9(1.4)        11.4(0.9)       13%
        !           405: pmach.c        610     13.2(1.5)       13.5(0.6)       2%
        !           406: reader.c       1005    23.3(1.6)       31.2(0.9)       25%
        !           407: .TE
        !           408: .PP
        !           409: The running time for tcc is slightly
        !           410: faster than that of pcc2.
        !           411: However, The higher system times of tcc indicates that tcc place
        !           412: higher demands on the operating system.
        !           413: In the case of pmach.c, the total CPU time is actually more than that
        !           414: of pcc2's.
        !           415: .PP
        !           416: Twig requires more dynamic storage than pcc2.  For a tree, the dynamic
        !           417: storage requirement on the average is  number of nodes * 200
        !           418: bytes
        !           419: while the worst case is about number of nodes * 1000 bytes.
        !           420: However this storage is reclaimed and reused for every new tree.
        !           421: The larger system time
        !           422: is caused by calls to the runtime system for dynamic storage.
        !           423: .PP
        !           424: Writing and modifying tcc was quick.
        !           425: The basic
        !           426: VAX without indexed addressing mode can be described in about two weeks.
        !           427: This was done concurrently with debugging much of the table
        !           428: walker.  Adding indexed modes took
        !           429: only several hours once we were confident that the rest of the
        !           430: description was correct.
        !           431: .PP
        !           432: One reason for the ease of modifcation is that rules can be added
        !           433: independent of the other rules.  Looping can be completely ignored since
        !           434: dynamic programming eliminates that possiblity.
        !           435: Table generation was extremely fast or about 4.2 seconds for the
        !           436: VAX description.  Thus testing the description could be done quickly.
        !           437: .PP
        !           438: The tables were very small.  For the VAX, there were only 109 rules.
        !           439: This required 7.5K bytes of table space.
        !           440: The whole code generator is another 40K
        !           441: bytes.
        !           442: .SH
        !           443: Code Quality
        !           444: .PP
        !           445: The code generated by the Twig for the test programs were of the
        !           446: same quality as code generated by pcc2.
        !           447: However, Twig does better targeting.
        !           448: .DS
        !           449: tcc:
        !           450: .ti +4em
        !           451: addl3  -4(fp),-8(fp),-12(fp)
        !           452: .sp 1ex
        !           453: pcc2:
        !           454: .ti +4em
        !           455: addl3  -4(fp),-8(fp),r0
        !           456: .ti +4em
        !           457: movl   r0,-12(fp)
        !           458: .DE
        !           459: On the other hand, pcc2 handled temporary registers better
        !           460: and for an equivalent sequence of code, Twig may use
        !           461: more temporary registers.
        !           462: .DS
        !           463: tcc:
        !           464: .ti +4em
        !           465:        cvtbl   (r0),r1
        !           466: pcc2:
        !           467: .ti +4em
        !           468:        cvtbl   (r0),r0
        !           469: .DE
        !           470: This is actually a symptom of how temporary registers are allocated in tcc.
        !           471: Registers are allocated before code is emitted and thus any registers freed
        !           472: during code emissions will not be reused until the next code sequence.
        !           473: .SH
        !           474: Room for improvement
        !           475: .PP
        !           476: Currently Twig is slow compared to other table-driven systems such as
        !           477: [Henry] and [Ganapathi].  However, we have had less than one year of
        !           478: experience with Twig and we believe that its performance can be
        !           479: improved because
        !           480: .IP
        !           481: \(bu Unit productions such as
        !           482: .DS
        !           483: operand: temp
        !           484: .DE
        !           485: in the above example increases the running time of Twig
        !           486: since the closure operation for unit production is done at run-time.
        !           487: .IP
        !           488: \(bu Machine representations.  The current compact
        !           489: representation of the machine require linear searching except at the
        !           490: start state where the large branching factor compelled us to use an
        !           491: array scheme.  Using another representation would speed up the
        !           492: matcher.  For example, implementing the VAX description as an array is
        !           493: estimated to use about 50K bytes but would provide a significant
        !           494: performance improvement.
        !           495: .IP
        !           496: \(bu Twig performs many more procedure calls than pcc2.
        !           497: Every machine step requires at least one procedure call.  Contrast
        !           498: this to YACC where a machine step is performed in line.  Procedure
        !           499: calls on the VAX are expensive.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.