|
|
1.1 root 1: .\" @(#)ss5 6.1 (Berkeley) 5/8/86
2: .\"
3: .SH
4: 5: Ambiguity and Conflicts
5: .PP
6: A set of grammar rules is
7: .I ambiguous
8: if there is some input string that can be structured in two or more different ways.
9: For example, the grammar rule
10: .DS
11: expr : expr \'\-\' expr
12: .DE
13: is a natural way of expressing the fact that one way of forming an arithmetic expression
14: is to put two other expressions together with a minus sign between them.
15: Unfortunately, this grammar rule does not
16: completely specify the way that all complex inputs
17: should be structured.
18: For example, if the input is
19: .DS
20: expr \- expr \- expr
21: .DE
22: the rule allows this input to be structured as either
23: .DS
24: ( expr \- expr ) \- expr
25: .DE
26: or as
27: .DS
28: expr \- ( expr \- expr )
29: .DE
30: (The first is called
31: .I "left association" ,
32: the second
33: .I "right association" ).
34: .PP
35: Yacc detects such ambiguities when it is attempting to build the parser.
36: It is instructive to consider the problem that confronts the parser when it is
37: given an input such as
38: .DS
39: expr \- expr \- expr
40: .DE
41: When the parser has read the second expr, the input that it has seen:
42: .DS
43: expr \- expr
44: .DE
45: matches the right side of the grammar rule above.
46: The parser could
47: .I reduce
48: the input by applying this rule;
49: after applying the rule;
50: the input is reduced to
51: .I expr (the
52: left side of the rule).
53: The parser would then read the final part of the input:
54: .DS
55: \- expr
56: .DE
57: and again reduce.
58: The effect of this is to take the left associative interpretation.
59: .PP
60: Alternatively, when the parser has seen
61: .DS
62: expr \- expr
63: .DE
64: it could defer the immediate application of the rule, and continue reading
65: the input until it had seen
66: .DS
67: expr \- expr \- expr
68: .DE
69: It could then apply the rule to the rightmost three symbols, reducing them to
70: .I expr
71: and leaving
72: .DS
73: expr \- expr
74: .DE
75: Now the rule can be reduced once more; the effect is to
76: take the right associative interpretation.
77: Thus, having read
78: .DS
79: expr \- expr
80: .DE
81: the parser can do two legal things, a shift or a reduction, and has no way of
82: deciding between them.
83: This is called a
84: .I "shift / reduce conflict" .
85: It may also happen that the parser has a choice of two legal reductions;
86: this is called a
87: .I "reduce / reduce conflict" .
88: Note that there are never any ``Shift/shift'' conflicts.
89: .PP
90: When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser.
91: It does this by selecting one of the valid steps wherever it has a choice.
92: A rule describing which choice to make in a given situation is called
93: a
94: .I "disambiguating rule" .
95: .PP
96: Yacc invokes two disambiguating rules by default:
97: .IP 1.
98: In a shift/reduce conflict, the default is to do the shift.
99: .IP 2.
100: In a reduce/reduce conflict, the default is to reduce by the
101: .I earlier
102: grammar rule (in the input sequence).
103: .PP
104: Rule 1 implies that reductions are deferred whenever there is a choice,
105: in favor of shifts.
106: Rule 2 gives the user rather crude control over the behavior of the parser
107: in this situation, but reduce/reduce conflicts should be avoided whenever possible.
108: .PP
109: Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent,
110: require a more complex parser than Yacc can construct.
111: The use of actions within rules can also cause conflicts, if the action must
112: be done before the parser can be sure which rule is being recognized.
113: In these cases, the application of disambiguating rules is inappropriate,
114: and leads to an incorrect parser.
115: For this reason, Yacc
116: always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2.
117: .PP
118: In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also
119: possible to rewrite the grammar rules so that the same inputs are read but there are no
120: conflicts.
121: For this reason, most previous parser generators
122: have considered conflicts to be fatal errors.
123: Our experience has suggested that this rewriting is somewhat unnatural,
124: and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts.
125: .PP
126: As an example of the power of disambiguating rules, consider a fragment from a programming
127: language involving an ``if-then-else'' construction:
128: .DS
129: stat : IF \'(\' cond \')\' stat
130: | IF \'(\' cond \')\' stat ELSE stat
131: ;
132: .DE
133: In these rules,
134: .I IF
135: and
136: .I ELSE
137: are tokens,
138: .I cond
139: is a nonterminal symbol describing
140: conditional (logical) expressions, and
141: .I stat
142: is a nonterminal symbol describing statements.
143: The first rule will be called the
144: .ul
145: simple-if
146: rule, and the
147: second the
148: .ul
149: if-else
150: rule.
151: .PP
152: These two rules form an ambiguous construction, since input of the form
153: .DS
154: IF ( C1 ) IF ( C2 ) S1 ELSE S2
155: .DE
156: can be structured according to these rules in two ways:
157: .DS
158: IF ( C1 ) {
159: IF ( C2 ) S1
160: }
161: ELSE S2
162: .DE
163: or
164: .DS
165: IF ( C1 ) {
166: IF ( C2 ) S1
167: ELSE S2
168: }
169: .DE
170: The second interpretation is the one given in most programming languages
171: having this construct.
172: Each
173: .I ELSE
174: is associated with the last preceding
175: ``un-\fIELSE'\fRd''
176: .I IF .
177: In this example, consider the situation where the parser has seen
178: .DS
179: IF ( C1 ) IF ( C2 ) S1
180: .DE
181: and is looking at the
182: .I ELSE .
183: It can immediately
184: reduce
185: by the simple-if rule to get
186: .DS
187: IF ( C1 ) stat
188: .DE
189: and then read the remaining input,
190: .DS
191: ELSE S2
192: .DE
193: and reduce
194: .DS
195: IF ( C1 ) stat ELSE S2
196: .DE
197: by the if-else rule.
198: This leads to the first of the above groupings of the input.
199: .PP
200: On the other hand, the
201: .I ELSE
202: may be shifted,
203: .I S2
204: read, and then the right hand portion of
205: .DS
206: IF ( C1 ) IF ( C2 ) S1 ELSE S2
207: .DE
208: can be reduced by the if-else rule to get
209: .DS
210: IF ( C1 ) stat
211: .DE
212: which can be reduced by the simple-if rule.
213: This leads to the second of the above groupings of the input, which
214: is usually desired.
215: .PP
216: Once again the parser can do two valid things \- there is a shift/reduce conflict.
217: The application of disambiguating rule 1 tells the parser to shift in this case,
218: which leads to the desired grouping.
219: .PP
220: This shift/reduce conflict arises only when there is a particular current input symbol,
221: .I ELSE ,
222: and particular inputs already seen, such as
223: .DS
224: IF ( C1 ) IF ( C2 ) S1
225: .DE
226: In general, there may be many conflicts, and each one
227: will be associated with an input symbol and
228: a set of previously read inputs.
229: The previously read inputs are characterized by the
230: state
231: of the parser.
232: .PP
233: The conflict messages of Yacc are best understood
234: by examining the verbose (\fB\-v\fR) option output file.
235: For example, the output corresponding to the above
236: conflict state might be:
237: .DS L
238: 23: shift/reduce conflict (shift 45, reduce 18) on ELSE
239:
240: state 23
241:
242: stat : IF ( cond ) stat\_ (18)
243: stat : IF ( cond ) stat\_ELSE stat
244:
245: ELSE shift 45
246: . reduce 18
247:
248: .DE
249: The first line describes the conflict, giving the state and the input symbol.
250: The ordinary state description follows, giving
251: the grammar rules active in the state, and the parser actions.
252: Recall that the underline marks the
253: portion of the grammar rules which has been seen.
254: Thus in the example, in state 23 the parser has seen input corresponding
255: to
256: .DS
257: IF ( cond ) stat
258: .DE
259: and the two grammar rules shown are active at this time.
260: The parser can do two possible things.
261: If the input symbol is
262: .I ELSE ,
263: it is possible to shift into state
264: 45.
265: State 45 will have, as part of its description, the line
266: .DS
267: stat : IF ( cond ) stat ELSE\_stat
268: .DE
269: since the
270: .I ELSE
271: will have been shifted in this state.
272: Back in state 23, the alternative action, described by ``\fB.\fR'',
273: is to be done if the input symbol is not mentioned explicitly in the above actions; thus,
274: in this case, if the input symbol is not
275: .I ELSE ,
276: the parser reduces by grammar rule 18:
277: .DS
278: stat : IF \'(\' cond \')\' stat
279: .DE
280: Once again, notice that the numbers following ``shift'' commands refer to other states,
281: while the numbers following ``reduce'' commands refer to grammar
282: rule numbers.
283: In the
284: .I y.output
285: file, the rule numbers are printed after those rules which can be reduced.
286: In most one states, there will be at most reduce action possible in the
287: state, and this will be the default command.
288: The user who encounters unexpected shift/reduce conflicts will probably want to
289: look at the verbose output to decide whether the default actions are appropriate.
290: In really tough cases, the user might need to know more about
291: the behavior and construction of the parser than can be covered here.
292: In this case, one of the theoretical references
293: .[
294: Aho Johnson Surveys Parsing
295: .]
296: .[
297: Aho Johnson Ullman Deterministic Ambiguous
298: .]
299: .[
300: Aho Ullman Principles Design
301: .]
302: might be consulted; the services of a local guru might also be appropriate.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.