Annotation of 43BSDReno/share/doc/ps1/15.yacc/ss4, revision 1.1

1.1     ! root        1: .\"    @(#)ss4 6.1 (Berkeley) 5/8/86
        !             2: .\"
        !             3: .SH
        !             4: 4: How the Parser Works
        !             5: .PP
        !             6: Yacc turns the specification file into a C program, which
        !             7: parses the input according to the specification given.
        !             8: The algorithm used to go from the
        !             9: specification to the parser is complex, and will not be discussed
        !            10: here (see
        !            11: the references for more information).
        !            12: The parser itself, however, is relatively simple,
        !            13: and understanding how it works, while
        !            14: not strictly necessary, will nevertheless make
        !            15: treatment of error recovery and ambiguities much more
        !            16: comprehensible.
        !            17: .PP
        !            18: The parser produced by Yacc consists
        !            19: of a finite state machine with a stack.
        !            20: The parser is also capable of reading and remembering the next
        !            21: input token (called the
        !            22: .I lookahead
        !            23: token).
        !            24: The
        !            25: .I "current state"
        !            26: is always the one on the top of the stack.
        !            27: The states of the finite state machine are given
        !            28: small integer labels; initially, the machine is in state 0,
        !            29: the stack contains only state 0, and no lookahead token has been read.
        !            30: .PP
        !            31: The machine has only four actions available to it, called
        !            32: .I shift ,
        !            33: .I reduce ,
        !            34: .I accept ,
        !            35: and
        !            36: .I error .
        !            37: A move of the parser is done as follows:
        !            38: .IP 1.
        !            39: Based on its current state, the parser decides
        !            40: whether it needs a lookahead token to decide
        !            41: what action should be done; if it needs one, and does
        !            42: not have one, it calls
        !            43: .I yylex
        !            44: to obtain the next token.
        !            45: .IP 2.
        !            46: Using the current state, and the lookahead token
        !            47: if needed, the parser decides on its next action, and
        !            48: carries it out.
        !            49: This may result in states being pushed onto the stack, or popped off of
        !            50: the stack, and in the lookahead token being processed
        !            51: or left alone.
        !            52: .PP
        !            53: The
        !            54: .I shift
        !            55: action is the most common action the parser takes.
        !            56: Whenever a shift action is taken, there is always
        !            57: a lookahead token.
        !            58: For example, in state 56 there may be
        !            59: an action:
        !            60: .DS
        !            61:        IF      shift 34
        !            62: .DE
        !            63: which says, in state 56, if the lookahead token is IF,
        !            64: the current state (56) is pushed down on the stack,
        !            65: and state 34 becomes the current state (on the
        !            66: top of the stack).
        !            67: The lookahead token is cleared.
        !            68: .PP
        !            69: The
        !            70: .I reduce
        !            71: action keeps the stack from growing without
        !            72: bounds.
        !            73: Reduce actions are appropriate when the parser has seen
        !            74: the right hand side of a grammar rule,
        !            75: and is prepared to announce that it has seen
        !            76: an instance of the rule, replacing the right hand side
        !            77: by the left hand side.
        !            78: It may be necessary to consult the lookahead token
        !            79: to decide whether to reduce, but
        !            80: usually it is not; in fact, the default
        !            81: action (represented by a ``.'') is often a reduce action.
        !            82: .PP
        !            83: Reduce actions are associated with individual grammar rules.
        !            84: Grammar rules are also given small integer
        !            85: numbers, leading to some confusion.
        !            86: The action
        !            87: .DS
        !            88:        \fB.\fR reduce 18
        !            89: .DE
        !            90: refers to
        !            91: .I "grammar rule"
        !            92: 18, while the action
        !            93: .DS
        !            94:        IF      shift 34
        !            95: .DE
        !            96: refers to
        !            97: .I state
        !            98: 34.
        !            99: .PP
        !           100: Suppose the rule being reduced is
        !           101: .DS
        !           102: A      \fB:\fR x  y  z    ;
        !           103: .DE
        !           104: The reduce action depends on the
        !           105: left hand symbol (A in this case), and the number of
        !           106: symbols on the right hand side (three in this case).
        !           107: To reduce, first pop off the top three states
        !           108: from the stack
        !           109: (In general, the number of states popped equals the number of symbols on the
        !           110: right side of the rule).
        !           111: In effect, these states were the ones
        !           112: put on the stack while recognizing
        !           113: .I x ,
        !           114: .I y ,
        !           115: and
        !           116: .I z ,
        !           117: and no longer serve any useful purpose.
        !           118: After popping these states, a state is uncovered
        !           119: which was the state the parser was in before beginning to
        !           120: process the rule.
        !           121: Using this uncovered state, and the symbol
        !           122: on the left side of the rule, perform what is in
        !           123: effect a shift of A.
        !           124: A new state is obtained, pushed onto the stack, and parsing continues.
        !           125: There are significant differences between the processing of
        !           126: the left hand symbol and an ordinary shift of a token,
        !           127: however, so this action is called a
        !           128: .I goto
        !           129: action.
        !           130: In particular, the lookahead token is cleared by a shift, and
        !           131: is not affected by a goto.
        !           132: In any case, the uncovered state contains an entry such as:
        !           133: .DS
        !           134:        A       goto 20
        !           135: .DE
        !           136: causing state 20 to be pushed
        !           137: onto the stack, and become the current state.
        !           138: .PP
        !           139: In effect, the reduce action ``turns back the clock'' in the parse,
        !           140: popping the states off the stack to go back to the
        !           141: state where the right hand side of the rule was first seen.
        !           142: The parser then behaves as if it had seen the left side at that time.
        !           143: If the right hand side of the rule is empty,
        !           144: no states are popped off of the stack: the uncovered state
        !           145: is in fact the current state.
        !           146: .PP
        !           147: The reduce action is also important in the treatment of user-supplied
        !           148: actions and values.
        !           149: When a rule is reduced, the code supplied with the rule is executed
        !           150: before the stack is adjusted.
        !           151: In addition to the stack holding the states, another stack,
        !           152: running in parallel with it, holds the values returned
        !           153: from the lexical analyzer and the actions.
        !           154: When a shift takes place, the external variable
        !           155: .I yylval
        !           156: is copied onto the value stack.
        !           157: After the return from the user code, the reduction is carried out.
        !           158: When the
        !           159: .I goto
        !           160: action is done, the external variable
        !           161: .I yyval
        !           162: is copied onto the value stack.
        !           163: The pseudo-variables $1, $2, etc., refer to the value stack.
        !           164: .PP
        !           165: The other two parser actions are conceptually much simpler.
        !           166: The
        !           167: .I accept
        !           168: action indicates that the entire input has been seen and
        !           169: that it matches the specification.
        !           170: This action appears only when the lookahead token is 
        !           171: the endmarker, and indicates that the parser has successfully
        !           172: done its job.
        !           173: The
        !           174: .I error
        !           175: action, on the other hand, represents a place where the parser
        !           176: can no longer continue parsing according to the specification.
        !           177: The input tokens it has seen, together with the lookahead token,
        !           178: cannot be followed by anything that would result
        !           179: in a legal input.
        !           180: The parser reports an error, and attempts to recover the situation and
        !           181: resume parsing: the error recovery (as opposed to the detection of error)
        !           182: will be covered in Section 7.
        !           183: .PP
        !           184: It is time for an example!
        !           185: Consider the specification
        !           186: .DS
        !           187: %token  DING  DONG  DELL
        !           188: %%
        !           189: rhyme  :       sound  place
        !           190:        ;
        !           191: sound  :       DING  DONG
        !           192:        ;
        !           193: place  :       DELL
        !           194:        ;
        !           195: .DE
        !           196: .PP
        !           197: When Yacc is invoked with the
        !           198: .B \-v
        !           199: option, a file called
        !           200: .I y.output
        !           201: is produced, with a human-readable description of the parser.
        !           202: The
        !           203: .I y.output
        !           204: file corresponding to the above grammar (with some statistics
        !           205: stripped off the end) is:
        !           206: .DS
        !           207: state 0
        !           208:        $accept  :  \_rhyme  $end 
        !           209: 
        !           210:        DING  shift 3
        !           211:        .  error
        !           212: 
        !           213:        rhyme  goto 1
        !           214:        sound  goto 2
        !           215: 
        !           216: state 1
        !           217:        $accept  :   rhyme\_$end 
        !           218: 
        !           219:        $end  accept
        !           220:        .  error
        !           221: 
        !           222: state 2
        !           223:        rhyme  :   sound\_place 
        !           224: 
        !           225:        DELL  shift 5
        !           226:        .  error
        !           227: 
        !           228:        place   goto 4
        !           229: 
        !           230: state 3
        !           231:        sound   :   DING\_DONG 
        !           232: 
        !           233:        DONG  shift 6
        !           234:        .  error
        !           235: 
        !           236: state 4
        !           237:        rhyme  :   sound  place\_    (1)
        !           238: 
        !           239:        .   reduce  1
        !           240: 
        !           241: state 5
        !           242:        place  :   DELL\_    (3)
        !           243: 
        !           244:        .   reduce  3
        !           245: 
        !           246: state 6
        !           247:        sound   :   DING  DONG\_    (2)
        !           248: 
        !           249:        .   reduce  2
        !           250: .DE
        !           251: Notice that, in addition to the actions for each state, there is a
        !           252: description of the parsing rules being processed in each
        !           253: state.  The \_ character is used to indicate
        !           254: what has been seen, and what is yet to come, in each rule.
        !           255: Suppose the input is
        !           256: .DS
        !           257: DING  DONG  DELL
        !           258: .DE
        !           259: It is instructive to follow the steps of the parser while
        !           260: processing this input.
        !           261: .PP
        !           262: Initially, the current state is state 0.
        !           263: The parser needs to refer to the input in order to decide
        !           264: between the actions available in state 0, so
        !           265: the first token,
        !           266: .I DING ,
        !           267: is read, becoming the lookahead token.
        !           268: The action in state 0 on
        !           269: .I DING
        !           270: is
        !           271: is ``shift 3'', so state 3 is pushed onto the stack,
        !           272: and the lookahead token is cleared.
        !           273: State 3 becomes the current state.
        !           274: The next token,
        !           275: .I DONG ,
        !           276: is read, becoming the lookahead token.
        !           277: The action in state 3 on the token
        !           278: .I DONG
        !           279: is ``shift 6'',
        !           280: so state 6 is pushed onto the stack, and the lookahead is cleared.
        !           281: The stack now contains 0, 3, and 6.
        !           282: In state 6, without even consulting the lookahead,
        !           283: the parser reduces by rule 2.
        !           284: .DS
        !           285:        sound  :   DING  DONG
        !           286: .DE
        !           287: This rule has two symbols on the right hand side, so
        !           288: two states, 6 and 3, are popped off of the stack, uncovering state 0.
        !           289: Consulting the description of state 0, looking for a goto on 
        !           290: .I sound ,
        !           291: .DS
        !           292:        sound   goto 2
        !           293: .DE
        !           294: is obtained; thus state 2 is pushed onto the stack,
        !           295: becoming the current state.
        !           296: .PP
        !           297: In state 2, the next token,
        !           298: .I DELL ,
        !           299: must be read.
        !           300: The action is ``shift 5'', so state 5 is pushed onto the stack, which now has
        !           301: 0, 2, and 5 on it, and the lookahead token is cleared.
        !           302: In state 5, the only action is to reduce by rule 3.
        !           303: This has one symbol on the right hand side, so one state, 5,
        !           304: is popped off, and state 2 is uncovered.
        !           305: The goto in state 2 on
        !           306: .I place ,
        !           307: the left side of rule 3,
        !           308: is state 4.
        !           309: Now, the stack contains 0, 2, and 4.
        !           310: In state 4, the only action is to reduce by rule 1.
        !           311: There are two symbols on the right, so the top two states are popped off,
        !           312: uncovering state 0 again.
        !           313: In state 0, there is a goto on
        !           314: .I rhyme
        !           315: causing the parser to enter state 1.
        !           316: In state 1, the input is read; the endmarker is obtained,
        !           317: indicated by ``$end'' in the
        !           318: .I y.output
        !           319: file.
        !           320: The action in state 1 when the endmarker is seen is to accept,
        !           321: successfully ending the parse.
        !           322: .PP
        !           323: The reader is urged to consider how the parser works
        !           324: when confronted with such incorrect strings as
        !           325: .I "DING DONG DONG" ,
        !           326: .I "DING DONG" ,
        !           327: .I "DING DONG DELL DELL" ,
        !           328: etc.
        !           329: A few minutes spend with this and other simple examples will
        !           330: probably be repaid when problems arise in more complicated contexts.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.