Annotation of 43BSDReno/share/doc/ps1/15.yacc/ss4, revision 1.1.1.1

1.1       root        1: .\"    @(#)ss4 6.1 (Berkeley) 5/8/86
                      2: .\"
                      3: .SH
                      4: 4: How the Parser Works
                      5: .PP
                      6: Yacc turns the specification file into a C program, which
                      7: parses the input according to the specification given.
                      8: The algorithm used to go from the
                      9: specification to the parser is complex, and will not be discussed
                     10: here (see
                     11: the references for more information).
                     12: The parser itself, however, is relatively simple,
                     13: and understanding how it works, while
                     14: not strictly necessary, will nevertheless make
                     15: treatment of error recovery and ambiguities much more
                     16: comprehensible.
                     17: .PP
                     18: The parser produced by Yacc consists
                     19: of a finite state machine with a stack.
                     20: The parser is also capable of reading and remembering the next
                     21: input token (called the
                     22: .I lookahead
                     23: token).
                     24: The
                     25: .I "current state"
                     26: is always the one on the top of the stack.
                     27: The states of the finite state machine are given
                     28: small integer labels; initially, the machine is in state 0,
                     29: the stack contains only state 0, and no lookahead token has been read.
                     30: .PP
                     31: The machine has only four actions available to it, called
                     32: .I shift ,
                     33: .I reduce ,
                     34: .I accept ,
                     35: and
                     36: .I error .
                     37: A move of the parser is done as follows:
                     38: .IP 1.
                     39: Based on its current state, the parser decides
                     40: whether it needs a lookahead token to decide
                     41: what action should be done; if it needs one, and does
                     42: not have one, it calls
                     43: .I yylex
                     44: to obtain the next token.
                     45: .IP 2.
                     46: Using the current state, and the lookahead token
                     47: if needed, the parser decides on its next action, and
                     48: carries it out.
                     49: This may result in states being pushed onto the stack, or popped off of
                     50: the stack, and in the lookahead token being processed
                     51: or left alone.
                     52: .PP
                     53: The
                     54: .I shift
                     55: action is the most common action the parser takes.
                     56: Whenever a shift action is taken, there is always
                     57: a lookahead token.
                     58: For example, in state 56 there may be
                     59: an action:
                     60: .DS
                     61:        IF      shift 34
                     62: .DE
                     63: which says, in state 56, if the lookahead token is IF,
                     64: the current state (56) is pushed down on the stack,
                     65: and state 34 becomes the current state (on the
                     66: top of the stack).
                     67: The lookahead token is cleared.
                     68: .PP
                     69: The
                     70: .I reduce
                     71: action keeps the stack from growing without
                     72: bounds.
                     73: Reduce actions are appropriate when the parser has seen
                     74: the right hand side of a grammar rule,
                     75: and is prepared to announce that it has seen
                     76: an instance of the rule, replacing the right hand side
                     77: by the left hand side.
                     78: It may be necessary to consult the lookahead token
                     79: to decide whether to reduce, but
                     80: usually it is not; in fact, the default
                     81: action (represented by a ``.'') is often a reduce action.
                     82: .PP
                     83: Reduce actions are associated with individual grammar rules.
                     84: Grammar rules are also given small integer
                     85: numbers, leading to some confusion.
                     86: The action
                     87: .DS
                     88:        \fB.\fR reduce 18
                     89: .DE
                     90: refers to
                     91: .I "grammar rule"
                     92: 18, while the action
                     93: .DS
                     94:        IF      shift 34
                     95: .DE
                     96: refers to
                     97: .I state
                     98: 34.
                     99: .PP
                    100: Suppose the rule being reduced is
                    101: .DS
                    102: A      \fB:\fR x  y  z    ;
                    103: .DE
                    104: The reduce action depends on the
                    105: left hand symbol (A in this case), and the number of
                    106: symbols on the right hand side (three in this case).
                    107: To reduce, first pop off the top three states
                    108: from the stack
                    109: (In general, the number of states popped equals the number of symbols on the
                    110: right side of the rule).
                    111: In effect, these states were the ones
                    112: put on the stack while recognizing
                    113: .I x ,
                    114: .I y ,
                    115: and
                    116: .I z ,
                    117: and no longer serve any useful purpose.
                    118: After popping these states, a state is uncovered
                    119: which was the state the parser was in before beginning to
                    120: process the rule.
                    121: Using this uncovered state, and the symbol
                    122: on the left side of the rule, perform what is in
                    123: effect a shift of A.
                    124: A new state is obtained, pushed onto the stack, and parsing continues.
                    125: There are significant differences between the processing of
                    126: the left hand symbol and an ordinary shift of a token,
                    127: however, so this action is called a
                    128: .I goto
                    129: action.
                    130: In particular, the lookahead token is cleared by a shift, and
                    131: is not affected by a goto.
                    132: In any case, the uncovered state contains an entry such as:
                    133: .DS
                    134:        A       goto 20
                    135: .DE
                    136: causing state 20 to be pushed
                    137: onto the stack, and become the current state.
                    138: .PP
                    139: In effect, the reduce action ``turns back the clock'' in the parse,
                    140: popping the states off the stack to go back to the
                    141: state where the right hand side of the rule was first seen.
                    142: The parser then behaves as if it had seen the left side at that time.
                    143: If the right hand side of the rule is empty,
                    144: no states are popped off of the stack: the uncovered state
                    145: is in fact the current state.
                    146: .PP
                    147: The reduce action is also important in the treatment of user-supplied
                    148: actions and values.
                    149: When a rule is reduced, the code supplied with the rule is executed
                    150: before the stack is adjusted.
                    151: In addition to the stack holding the states, another stack,
                    152: running in parallel with it, holds the values returned
                    153: from the lexical analyzer and the actions.
                    154: When a shift takes place, the external variable
                    155: .I yylval
                    156: is copied onto the value stack.
                    157: After the return from the user code, the reduction is carried out.
                    158: When the
                    159: .I goto
                    160: action is done, the external variable
                    161: .I yyval
                    162: is copied onto the value stack.
                    163: The pseudo-variables $1, $2, etc., refer to the value stack.
                    164: .PP
                    165: The other two parser actions are conceptually much simpler.
                    166: The
                    167: .I accept
                    168: action indicates that the entire input has been seen and
                    169: that it matches the specification.
                    170: This action appears only when the lookahead token is 
                    171: the endmarker, and indicates that the parser has successfully
                    172: done its job.
                    173: The
                    174: .I error
                    175: action, on the other hand, represents a place where the parser
                    176: can no longer continue parsing according to the specification.
                    177: The input tokens it has seen, together with the lookahead token,
                    178: cannot be followed by anything that would result
                    179: in a legal input.
                    180: The parser reports an error, and attempts to recover the situation and
                    181: resume parsing: the error recovery (as opposed to the detection of error)
                    182: will be covered in Section 7.
                    183: .PP
                    184: It is time for an example!
                    185: Consider the specification
                    186: .DS
                    187: %token  DING  DONG  DELL
                    188: %%
                    189: rhyme  :       sound  place
                    190:        ;
                    191: sound  :       DING  DONG
                    192:        ;
                    193: place  :       DELL
                    194:        ;
                    195: .DE
                    196: .PP
                    197: When Yacc is invoked with the
                    198: .B \-v
                    199: option, a file called
                    200: .I y.output
                    201: is produced, with a human-readable description of the parser.
                    202: The
                    203: .I y.output
                    204: file corresponding to the above grammar (with some statistics
                    205: stripped off the end) is:
                    206: .DS
                    207: state 0
                    208:        $accept  :  \_rhyme  $end 
                    209: 
                    210:        DING  shift 3
                    211:        .  error
                    212: 
                    213:        rhyme  goto 1
                    214:        sound  goto 2
                    215: 
                    216: state 1
                    217:        $accept  :   rhyme\_$end 
                    218: 
                    219:        $end  accept
                    220:        .  error
                    221: 
                    222: state 2
                    223:        rhyme  :   sound\_place 
                    224: 
                    225:        DELL  shift 5
                    226:        .  error
                    227: 
                    228:        place   goto 4
                    229: 
                    230: state 3
                    231:        sound   :   DING\_DONG 
                    232: 
                    233:        DONG  shift 6
                    234:        .  error
                    235: 
                    236: state 4
                    237:        rhyme  :   sound  place\_    (1)
                    238: 
                    239:        .   reduce  1
                    240: 
                    241: state 5
                    242:        place  :   DELL\_    (3)
                    243: 
                    244:        .   reduce  3
                    245: 
                    246: state 6
                    247:        sound   :   DING  DONG\_    (2)
                    248: 
                    249:        .   reduce  2
                    250: .DE
                    251: Notice that, in addition to the actions for each state, there is a
                    252: description of the parsing rules being processed in each
                    253: state.  The \_ character is used to indicate
                    254: what has been seen, and what is yet to come, in each rule.
                    255: Suppose the input is
                    256: .DS
                    257: DING  DONG  DELL
                    258: .DE
                    259: It is instructive to follow the steps of the parser while
                    260: processing this input.
                    261: .PP
                    262: Initially, the current state is state 0.
                    263: The parser needs to refer to the input in order to decide
                    264: between the actions available in state 0, so
                    265: the first token,
                    266: .I DING ,
                    267: is read, becoming the lookahead token.
                    268: The action in state 0 on
                    269: .I DING
                    270: is
                    271: is ``shift 3'', so state 3 is pushed onto the stack,
                    272: and the lookahead token is cleared.
                    273: State 3 becomes the current state.
                    274: The next token,
                    275: .I DONG ,
                    276: is read, becoming the lookahead token.
                    277: The action in state 3 on the token
                    278: .I DONG
                    279: is ``shift 6'',
                    280: so state 6 is pushed onto the stack, and the lookahead is cleared.
                    281: The stack now contains 0, 3, and 6.
                    282: In state 6, without even consulting the lookahead,
                    283: the parser reduces by rule 2.
                    284: .DS
                    285:        sound  :   DING  DONG
                    286: .DE
                    287: This rule has two symbols on the right hand side, so
                    288: two states, 6 and 3, are popped off of the stack, uncovering state 0.
                    289: Consulting the description of state 0, looking for a goto on 
                    290: .I sound ,
                    291: .DS
                    292:        sound   goto 2
                    293: .DE
                    294: is obtained; thus state 2 is pushed onto the stack,
                    295: becoming the current state.
                    296: .PP
                    297: In state 2, the next token,
                    298: .I DELL ,
                    299: must be read.
                    300: The action is ``shift 5'', so state 5 is pushed onto the stack, which now has
                    301: 0, 2, and 5 on it, and the lookahead token is cleared.
                    302: In state 5, the only action is to reduce by rule 3.
                    303: This has one symbol on the right hand side, so one state, 5,
                    304: is popped off, and state 2 is uncovered.
                    305: The goto in state 2 on
                    306: .I place ,
                    307: the left side of rule 3,
                    308: is state 4.
                    309: Now, the stack contains 0, 2, and 4.
                    310: In state 4, the only action is to reduce by rule 1.
                    311: There are two symbols on the right, so the top two states are popped off,
                    312: uncovering state 0 again.
                    313: In state 0, there is a goto on
                    314: .I rhyme
                    315: causing the parser to enter state 1.
                    316: In state 1, the input is read; the endmarker is obtained,
                    317: indicated by ``$end'' in the
                    318: .I y.output
                    319: file.
                    320: The action in state 1 when the endmarker is seen is to accept,
                    321: successfully ending the parse.
                    322: .PP
                    323: The reader is urged to consider how the parser works
                    324: when confronted with such incorrect strings as
                    325: .I "DING DONG DONG" ,
                    326: .I "DING DONG" ,
                    327: .I "DING DONG DELL DELL" ,
                    328: etc.
                    329: A few minutes spend with this and other simple examples will
                    330: probably be repaid when problems arise in more complicated contexts.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.