Annotation of 43BSDReno/share/doc/ps1/15.yacc/ss5, revision 1.1.1.1

1.1       root        1: .\"    @(#)ss5 6.1 (Berkeley) 5/8/86
                      2: .\"
                      3: .SH
                      4: 5: Ambiguity and Conflicts
                      5: .PP
                      6: A set of grammar rules is
                      7: .I ambiguous
                      8: if there is some input string that can be structured in two or more different ways.
                      9: For example, the grammar rule
                     10: .DS
                     11: expr   :       expr  \'\-\'  expr
                     12: .DE
                     13: is a natural way of expressing the fact that one way of forming an arithmetic expression
                     14: is to put two other expressions together with a minus sign between them.
                     15: Unfortunately, this grammar rule does not
                     16: completely specify the way that all complex inputs
                     17: should be structured.
                     18: For example, if the input is
                     19: .DS
                     20: expr  \-  expr  \-  expr
                     21: .DE
                     22: the rule allows this input to be structured as either
                     23: .DS
                     24: (  expr  \-  expr  )  \-  expr
                     25: .DE
                     26: or as
                     27: .DS
                     28: expr  \-  (  expr  \-  expr  )
                     29: .DE
                     30: (The first is called
                     31: .I "left association" ,
                     32: the second
                     33: .I "right association" ).
                     34: .PP
                     35: Yacc detects such ambiguities when it is attempting to build the parser.
                     36: It is instructive to consider the problem that confronts the parser when it is
                     37: given an input such as
                     38: .DS
                     39: expr  \-  expr  \-  expr
                     40: .DE
                     41: When the parser has read the second expr, the input that it has seen:
                     42: .DS
                     43: expr  \-  expr
                     44: .DE
                     45: matches the right side of the grammar rule above.
                     46: The parser could
                     47: .I reduce
                     48: the input by applying this rule;
                     49: after applying the rule;
                     50: the input is reduced to
                     51: .I expr (the
                     52: left side of the rule).
                     53: The parser would then read the final part of the input:
                     54: .DS
                     55: \-  expr
                     56: .DE
                     57: and again reduce.
                     58: The effect of this is to take the left associative interpretation.
                     59: .PP
                     60: Alternatively, when the parser has seen
                     61: .DS
                     62: expr  \-  expr
                     63: .DE
                     64: it could defer the immediate application of the rule, and continue reading
                     65: the input until it had seen
                     66: .DS
                     67: expr  \-  expr  \-  expr
                     68: .DE
                     69: It could then apply the rule to the rightmost three symbols, reducing them to
                     70: .I expr
                     71: and leaving
                     72: .DS
                     73: expr  \-  expr
                     74: .DE
                     75: Now the rule can be reduced once more; the effect is to
                     76: take the right associative interpretation.
                     77: Thus, having read
                     78: .DS
                     79: expr  \-  expr
                     80: .DE
                     81: the parser can do two legal things, a shift or a reduction, and has no way of
                     82: deciding between them.
                     83: This is called a
                     84: .I "shift / reduce conflict" .
                     85: It may also happen that the parser has a choice of two legal reductions;
                     86: this is called a
                     87: .I "reduce / reduce conflict" .
                     88: Note that there are never any ``Shift/shift'' conflicts.
                     89: .PP
                     90: When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser.
                     91: It does this by selecting one of the valid steps wherever it has a choice.
                     92: A rule describing which choice to make in a given situation is called
                     93: a
                     94: .I "disambiguating rule" .
                     95: .PP
                     96: Yacc invokes two disambiguating rules by default:
                     97: .IP 1.
                     98: In a shift/reduce conflict, the default is to do the shift.
                     99: .IP 2.
                    100: In a reduce/reduce conflict, the default is to reduce by the
                    101: .I earlier
                    102: grammar rule (in the input sequence).
                    103: .PP
                    104: Rule 1 implies that reductions are deferred whenever there is a choice,
                    105: in favor of shifts.
                    106: Rule 2 gives the user rather crude control over the behavior of the parser
                    107: in this situation, but reduce/reduce conflicts should be avoided whenever possible.
                    108: .PP
                    109: Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent,
                    110: require a more complex parser than Yacc can construct.
                    111: The use of actions within rules can also cause conflicts, if the action must
                    112: be done before the parser can be sure which rule is being recognized.
                    113: In these cases, the application of disambiguating rules is inappropriate,
                    114: and leads to an incorrect parser.
                    115: For this reason, Yacc
                    116: always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2.
                    117: .PP
                    118: In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also
                    119: possible to rewrite the grammar rules so that the same inputs are read but there are no
                    120: conflicts.
                    121: For this reason, most previous parser generators
                    122: have considered conflicts to be fatal errors.
                    123: Our experience has suggested that this rewriting is somewhat unnatural,
                    124: and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts.
                    125: .PP
                    126: As an example of the power of disambiguating rules, consider a fragment from a programming
                    127: language involving an ``if-then-else'' construction:
                    128: .DS
                    129: stat   :       IF  \'(\'  cond  \')\'  stat
                    130:        |       IF  \'(\'  cond  \')\'  stat  ELSE  stat
                    131:        ;
                    132: .DE
                    133: In these rules,
                    134: .I IF
                    135: and
                    136: .I ELSE
                    137: are tokens,
                    138: .I cond
                    139: is a nonterminal symbol describing
                    140: conditional (logical) expressions, and
                    141: .I stat
                    142: is a nonterminal symbol describing statements.
                    143: The first rule will be called the
                    144: .ul
                    145: simple-if
                    146: rule, and the
                    147: second the
                    148: .ul
                    149: if-else
                    150: rule.
                    151: .PP
                    152: These two rules form an ambiguous construction, since input of the form
                    153: .DS
                    154: IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2
                    155: .DE
                    156: can be structured according to these rules in two ways:
                    157: .DS
                    158: IF  (  C1  )  {
                    159:        IF  (  C2  )  S1
                    160:        }
                    161: ELSE  S2
                    162: .DE
                    163: or
                    164: .DS
                    165: IF  (  C1  )  {
                    166:        IF  (  C2  )  S1
                    167:        ELSE  S2
                    168:        }
                    169: .DE
                    170: The second interpretation is the one given in most programming languages
                    171: having this construct.
                    172: Each
                    173: .I ELSE
                    174: is associated with the last preceding
                    175: ``un-\fIELSE'\fRd''
                    176: .I IF .
                    177: In this example, consider the situation where the parser has seen
                    178: .DS
                    179: IF  (  C1  )  IF  (  C2  )  S1
                    180: .DE
                    181: and is looking at the
                    182: .I ELSE .
                    183: It can immediately
                    184: reduce
                    185: by the simple-if rule to get
                    186: .DS
                    187: IF  (  C1  )  stat
                    188: .DE
                    189: and then read the remaining input,
                    190: .DS
                    191: ELSE  S2
                    192: .DE
                    193: and reduce
                    194: .DS
                    195: IF  (  C1  )  stat  ELSE  S2
                    196: .DE
                    197: by the if-else rule.
                    198: This leads to the first of the above groupings of the input.
                    199: .PP
                    200: On the other hand, the
                    201: .I ELSE
                    202: may be shifted,
                    203: .I S2
                    204: read, and then the right hand portion of
                    205: .DS
                    206: IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2
                    207: .DE
                    208: can be reduced by the if-else rule to get
                    209: .DS
                    210: IF  (  C1  )  stat
                    211: .DE
                    212: which can be reduced by the simple-if rule.
                    213: This leads to the second of the above groupings of the input, which
                    214: is usually desired.
                    215: .PP
                    216: Once again the parser can do two valid things \- there is a shift/reduce conflict.
                    217: The application of disambiguating rule 1 tells the parser to shift in this case,
                    218: which leads to the desired grouping.
                    219: .PP
                    220: This shift/reduce conflict arises only when there is a particular current input symbol,
                    221: .I ELSE ,
                    222: and particular inputs already seen, such as
                    223: .DS
                    224: IF  (  C1  )  IF  (  C2  )  S1
                    225: .DE
                    226: In general, there may be many conflicts, and each one
                    227: will be associated with an input symbol and
                    228: a set of previously read inputs.
                    229: The previously read inputs are characterized by the
                    230: state
                    231: of the parser.
                    232: .PP
                    233: The conflict messages of Yacc are best understood
                    234: by examining the verbose (\fB\-v\fR) option output file.
                    235: For example, the output corresponding to the above
                    236: conflict state might be:
                    237: .DS L
                    238: 23: shift/reduce conflict (shift 45, reduce 18) on ELSE
                    239: 
                    240: state 23
                    241: 
                    242:          stat  :  IF  (  cond  )  stat\_         (18)
                    243:          stat  :  IF  (  cond  )  stat\_ELSE  stat
                    244: 
                    245:         ELSE     shift 45
                    246:         .        reduce 18
                    247: 
                    248: .DE
                    249: The first line describes the conflict, giving the state and the input symbol.
                    250: The ordinary state description follows, giving
                    251: the grammar rules active in the state, and the parser actions.
                    252: Recall that the underline marks the
                    253: portion of the grammar rules which has been seen.
                    254: Thus in the example, in state 23 the parser has seen input corresponding
                    255: to
                    256: .DS
                    257: IF  (  cond  )  stat
                    258: .DE
                    259: and the two grammar rules shown are active at this time.
                    260: The parser can do two possible things.
                    261: If the input symbol is
                    262: .I ELSE ,
                    263: it is possible to shift into state
                    264: 45.
                    265: State 45 will have, as part of its description, the line
                    266: .DS
                    267: stat  :  IF  (  cond  )  stat  ELSE\_stat
                    268: .DE
                    269: since the
                    270: .I ELSE
                    271: will have been shifted in this state.
                    272: Back in state 23, the alternative action, described by ``\fB.\fR'',
                    273: is to be done if the input symbol is not mentioned explicitly in the above actions; thus,
                    274: in this case, if the input symbol is not
                    275: .I ELSE ,
                    276: the parser reduces by grammar rule 18:
                    277: .DS
                    278: stat  :  IF  \'(\'  cond  \')\'  stat
                    279: .DE
                    280: Once again, notice that the numbers following ``shift'' commands refer to other states,
                    281: while the numbers following ``reduce'' commands refer to grammar
                    282: rule numbers.
                    283: In the
                    284: .I y.output
                    285: file, the rule numbers are printed after those rules which can be reduced.
                    286: In most one states, there will be at most reduce action possible in the
                    287: state, and this will be the default command.
                    288: The user who encounters unexpected shift/reduce conflicts will probably want to
                    289: look at the verbose output to decide whether the default actions are appropriate.
                    290: In really tough cases, the user might need to know more about
                    291: the behavior and construction of the parser than can be covered here.
                    292: In this case, one of the theoretical references
                    293: .[
                    294: Aho Johnson Surveys Parsing
                    295: .]
                    296: .[
                    297: Aho Johnson Ullman Deterministic Ambiguous
                    298: .]
                    299: .[
                    300: Aho Ullman Principles Design
                    301: .]
                    302: might be consulted; the services of a local guru might also be appropriate.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.