|
|
1.1 ! root 1: .\" @(#)ss4 6.1 (Berkeley) 5/8/86 ! 2: .\" ! 3: .SH ! 4: 4: How the Parser Works ! 5: .PP ! 6: Yacc turns the specification file into a C program, which ! 7: parses the input according to the specification given. ! 8: The algorithm used to go from the ! 9: specification to the parser is complex, and will not be discussed ! 10: here (see ! 11: the references for more information). ! 12: The parser itself, however, is relatively simple, ! 13: and understanding how it works, while ! 14: not strictly necessary, will nevertheless make ! 15: treatment of error recovery and ambiguities much more ! 16: comprehensible. ! 17: .PP ! 18: The parser produced by Yacc consists ! 19: of a finite state machine with a stack. ! 20: The parser is also capable of reading and remembering the next ! 21: input token (called the ! 22: .I lookahead ! 23: token). ! 24: The ! 25: .I "current state" ! 26: is always the one on the top of the stack. ! 27: The states of the finite state machine are given ! 28: small integer labels; initially, the machine is in state 0, ! 29: the stack contains only state 0, and no lookahead token has been read. ! 30: .PP ! 31: The machine has only four actions available to it, called ! 32: .I shift , ! 33: .I reduce , ! 34: .I accept , ! 35: and ! 36: .I error . ! 37: A move of the parser is done as follows: ! 38: .IP 1. ! 39: Based on its current state, the parser decides ! 40: whether it needs a lookahead token to decide ! 41: what action should be done; if it needs one, and does ! 42: not have one, it calls ! 43: .I yylex ! 44: to obtain the next token. ! 45: .IP 2. ! 46: Using the current state, and the lookahead token ! 47: if needed, the parser decides on its next action, and ! 48: carries it out. ! 49: This may result in states being pushed onto the stack, or popped off of ! 50: the stack, and in the lookahead token being processed ! 51: or left alone. ! 52: .PP ! 53: The ! 54: .I shift ! 55: action is the most common action the parser takes. ! 56: Whenever a shift action is taken, there is always ! 57: a lookahead token. ! 58: For example, in state 56 there may be ! 59: an action: ! 60: .DS ! 61: IF shift 34 ! 62: .DE ! 63: which says, in state 56, if the lookahead token is IF, ! 64: the current state (56) is pushed down on the stack, ! 65: and state 34 becomes the current state (on the ! 66: top of the stack). ! 67: The lookahead token is cleared. ! 68: .PP ! 69: The ! 70: .I reduce ! 71: action keeps the stack from growing without ! 72: bounds. ! 73: Reduce actions are appropriate when the parser has seen ! 74: the right hand side of a grammar rule, ! 75: and is prepared to announce that it has seen ! 76: an instance of the rule, replacing the right hand side ! 77: by the left hand side. ! 78: It may be necessary to consult the lookahead token ! 79: to decide whether to reduce, but ! 80: usually it is not; in fact, the default ! 81: action (represented by a ``.'') is often a reduce action. ! 82: .PP ! 83: Reduce actions are associated with individual grammar rules. ! 84: Grammar rules are also given small integer ! 85: numbers, leading to some confusion. ! 86: The action ! 87: .DS ! 88: \fB.\fR reduce 18 ! 89: .DE ! 90: refers to ! 91: .I "grammar rule" ! 92: 18, while the action ! 93: .DS ! 94: IF shift 34 ! 95: .DE ! 96: refers to ! 97: .I state ! 98: 34. ! 99: .PP ! 100: Suppose the rule being reduced is ! 101: .DS ! 102: A \fB:\fR x y z ; ! 103: .DE ! 104: The reduce action depends on the ! 105: left hand symbol (A in this case), and the number of ! 106: symbols on the right hand side (three in this case). ! 107: To reduce, first pop off the top three states ! 108: from the stack ! 109: (In general, the number of states popped equals the number of symbols on the ! 110: right side of the rule). ! 111: In effect, these states were the ones ! 112: put on the stack while recognizing ! 113: .I x , ! 114: .I y , ! 115: and ! 116: .I z , ! 117: and no longer serve any useful purpose. ! 118: After popping these states, a state is uncovered ! 119: which was the state the parser was in before beginning to ! 120: process the rule. ! 121: Using this uncovered state, and the symbol ! 122: on the left side of the rule, perform what is in ! 123: effect a shift of A. ! 124: A new state is obtained, pushed onto the stack, and parsing continues. ! 125: There are significant differences between the processing of ! 126: the left hand symbol and an ordinary shift of a token, ! 127: however, so this action is called a ! 128: .I goto ! 129: action. ! 130: In particular, the lookahead token is cleared by a shift, and ! 131: is not affected by a goto. ! 132: In any case, the uncovered state contains an entry such as: ! 133: .DS ! 134: A goto 20 ! 135: .DE ! 136: causing state 20 to be pushed ! 137: onto the stack, and become the current state. ! 138: .PP ! 139: In effect, the reduce action ``turns back the clock'' in the parse, ! 140: popping the states off the stack to go back to the ! 141: state where the right hand side of the rule was first seen. ! 142: The parser then behaves as if it had seen the left side at that time. ! 143: If the right hand side of the rule is empty, ! 144: no states are popped off of the stack: the uncovered state ! 145: is in fact the current state. ! 146: .PP ! 147: The reduce action is also important in the treatment of user-supplied ! 148: actions and values. ! 149: When a rule is reduced, the code supplied with the rule is executed ! 150: before the stack is adjusted. ! 151: In addition to the stack holding the states, another stack, ! 152: running in parallel with it, holds the values returned ! 153: from the lexical analyzer and the actions. ! 154: When a shift takes place, the external variable ! 155: .I yylval ! 156: is copied onto the value stack. ! 157: After the return from the user code, the reduction is carried out. ! 158: When the ! 159: .I goto ! 160: action is done, the external variable ! 161: .I yyval ! 162: is copied onto the value stack. ! 163: The pseudo-variables $1, $2, etc., refer to the value stack. ! 164: .PP ! 165: The other two parser actions are conceptually much simpler. ! 166: The ! 167: .I accept ! 168: action indicates that the entire input has been seen and ! 169: that it matches the specification. ! 170: This action appears only when the lookahead token is ! 171: the endmarker, and indicates that the parser has successfully ! 172: done its job. ! 173: The ! 174: .I error ! 175: action, on the other hand, represents a place where the parser ! 176: can no longer continue parsing according to the specification. ! 177: The input tokens it has seen, together with the lookahead token, ! 178: cannot be followed by anything that would result ! 179: in a legal input. ! 180: The parser reports an error, and attempts to recover the situation and ! 181: resume parsing: the error recovery (as opposed to the detection of error) ! 182: will be covered in Section 7. ! 183: .PP ! 184: It is time for an example! ! 185: Consider the specification ! 186: .DS ! 187: %token DING DONG DELL ! 188: %% ! 189: rhyme : sound place ! 190: ; ! 191: sound : DING DONG ! 192: ; ! 193: place : DELL ! 194: ; ! 195: .DE ! 196: .PP ! 197: When Yacc is invoked with the ! 198: .B \-v ! 199: option, a file called ! 200: .I y.output ! 201: is produced, with a human-readable description of the parser. ! 202: The ! 203: .I y.output ! 204: file corresponding to the above grammar (with some statistics ! 205: stripped off the end) is: ! 206: .DS ! 207: state 0 ! 208: $accept : \_rhyme $end ! 209: ! 210: DING shift 3 ! 211: . error ! 212: ! 213: rhyme goto 1 ! 214: sound goto 2 ! 215: ! 216: state 1 ! 217: $accept : rhyme\_$end ! 218: ! 219: $end accept ! 220: . error ! 221: ! 222: state 2 ! 223: rhyme : sound\_place ! 224: ! 225: DELL shift 5 ! 226: . error ! 227: ! 228: place goto 4 ! 229: ! 230: state 3 ! 231: sound : DING\_DONG ! 232: ! 233: DONG shift 6 ! 234: . error ! 235: ! 236: state 4 ! 237: rhyme : sound place\_ (1) ! 238: ! 239: . reduce 1 ! 240: ! 241: state 5 ! 242: place : DELL\_ (3) ! 243: ! 244: . reduce 3 ! 245: ! 246: state 6 ! 247: sound : DING DONG\_ (2) ! 248: ! 249: . reduce 2 ! 250: .DE ! 251: Notice that, in addition to the actions for each state, there is a ! 252: description of the parsing rules being processed in each ! 253: state. The \_ character is used to indicate ! 254: what has been seen, and what is yet to come, in each rule. ! 255: Suppose the input is ! 256: .DS ! 257: DING DONG DELL ! 258: .DE ! 259: It is instructive to follow the steps of the parser while ! 260: processing this input. ! 261: .PP ! 262: Initially, the current state is state 0. ! 263: The parser needs to refer to the input in order to decide ! 264: between the actions available in state 0, so ! 265: the first token, ! 266: .I DING , ! 267: is read, becoming the lookahead token. ! 268: The action in state 0 on ! 269: .I DING ! 270: is ! 271: is ``shift 3'', so state 3 is pushed onto the stack, ! 272: and the lookahead token is cleared. ! 273: State 3 becomes the current state. ! 274: The next token, ! 275: .I DONG , ! 276: is read, becoming the lookahead token. ! 277: The action in state 3 on the token ! 278: .I DONG ! 279: is ``shift 6'', ! 280: so state 6 is pushed onto the stack, and the lookahead is cleared. ! 281: The stack now contains 0, 3, and 6. ! 282: In state 6, without even consulting the lookahead, ! 283: the parser reduces by rule 2. ! 284: .DS ! 285: sound : DING DONG ! 286: .DE ! 287: This rule has two symbols on the right hand side, so ! 288: two states, 6 and 3, are popped off of the stack, uncovering state 0. ! 289: Consulting the description of state 0, looking for a goto on ! 290: .I sound , ! 291: .DS ! 292: sound goto 2 ! 293: .DE ! 294: is obtained; thus state 2 is pushed onto the stack, ! 295: becoming the current state. ! 296: .PP ! 297: In state 2, the next token, ! 298: .I DELL , ! 299: must be read. ! 300: The action is ``shift 5'', so state 5 is pushed onto the stack, which now has ! 301: 0, 2, and 5 on it, and the lookahead token is cleared. ! 302: In state 5, the only action is to reduce by rule 3. ! 303: This has one symbol on the right hand side, so one state, 5, ! 304: is popped off, and state 2 is uncovered. ! 305: The goto in state 2 on ! 306: .I place , ! 307: the left side of rule 3, ! 308: is state 4. ! 309: Now, the stack contains 0, 2, and 4. ! 310: In state 4, the only action is to reduce by rule 1. ! 311: There are two symbols on the right, so the top two states are popped off, ! 312: uncovering state 0 again. ! 313: In state 0, there is a goto on ! 314: .I rhyme ! 315: causing the parser to enter state 1. ! 316: In state 1, the input is read; the endmarker is obtained, ! 317: indicated by ``$end'' in the ! 318: .I y.output ! 319: file. ! 320: The action in state 1 when the endmarker is seen is to accept, ! 321: successfully ending the parse. ! 322: .PP ! 323: The reader is urged to consider how the parser works ! 324: when confronted with such incorrect strings as ! 325: .I "DING DONG DONG" , ! 326: .I "DING DONG" , ! 327: .I "DING DONG DELL DELL" , ! 328: etc. ! 329: A few minutes spend with this and other simple examples will ! 330: probably be repaid when problems arise in more complicated contexts.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.