|
|
1.1 root 1: .\" @(#)ss7 6.1 (Berkeley) 5/8/86
2: .\"
3: .SH
4: 7: Error Handling
5: .PP
6: Error handling is an extremely difficult area, and many of the problems are semantic ones.
7: When an error is found, for example, it may be necessary to reclaim parse tree storage,
8: delete or alter symbol table entries, and, typically, set switches to avoid generating any further output.
9: .PP
10: It is seldom acceptable to stop all processing when an error is found; it is more useful to continue
11: scanning the input to find further syntax errors.
12: This leads to the problem of getting the parser ``restarted'' after an error.
13: A general class of algorithms to do this involves discarding a number of tokens
14: from the input string, and attempting to adjust the parser so that input can continue.
15: .PP
16: To allow the user some control over this process,
17: Yacc provides a simple, but reasonably general, feature.
18: The token name ``error'' is reserved for error handling.
19: This name can be used in grammar rules;
20: in effect, it suggests places where errors are expected, and recovery might take place.
21: The parser pops its stack until it enters a state where the token ``error'' is legal.
22: It then behaves as if the token ``error'' were the current lookahead token,
23: and performs the action encountered.
24: The lookahead token is then reset to the token that caused the error.
25: If no special error rules have been specified, the processing halts when an error is detected.
26: .PP
27: In order to prevent a cascade of error messages, the parser, after
28: detecting an error, remains in error state until three tokens have been successfully
29: read and shifted.
30: If an error is detected when the parser is already in error state,
31: no message is given, and the input token is quietly deleted.
32: .PP
33: As an example, a rule of the form
34: .DS
35: stat : error
36: .DE
37: would, in effect, mean that on a syntax error the parser would attempt to skip over the statement
38: in which the error was seen.
39: More precisely, the parser will
40: scan ahead, looking for three tokens that might legally follow
41: a statement, and start processing at the first of these; if
42: the beginnings of statements are not sufficiently distinctive, it may make a
43: false start in the middle of a statement, and end up reporting a
44: second error where there is in fact no error.
45: .PP
46: Actions may be used with these special error rules.
47: These actions might attempt to reinitialize tables, reclaim symbol table space, etc.
48: .PP
49: Error rules such as the above are very general, but difficult to control.
50: Somewhat easier are rules such as
51: .DS
52: stat : error \';\'
53: .DE
54: Here, when there is an error, the parser attempts to skip over the statement, but
55: will do so by skipping to the next \';\'.
56: All tokens after the error and before the next \';\' cannot be shifted, and are discarded.
57: When the \';\' is seen, this rule will be reduced, and any ``cleanup''
58: action associated with it performed.
59: .PP
60: Another form of error rule arises in interactive applications, where
61: it may be desirable to permit a line to be reentered after an error.
62: A possible error rule might be
63: .DS
64: input : error \'\en\' { printf( "Reenter last line: " ); } input
65: { $$ = $4; }
66: .DE
67: There is one potential difficulty with this approach;
68: the parser must correctly process three input tokens before it
69: admits that it has correctly resynchronized after the error.
70: If the reentered line contains an error
71: in the first two tokens, the parser deletes the offending tokens,
72: and gives no message; this is clearly unacceptable.
73: For this reason, there is a mechanism that
74: can be used to force the parser
75: to believe that an error has been fully recovered from.
76: The statement
77: .DS
78: yyerrok ;
79: .DE
80: in an action
81: resets the parser to its normal mode.
82: The last example is better written
83: .DS
84: input : error \'\en\'
85: { yyerrok;
86: printf( "Reenter last line: " ); }
87: input
88: { $$ = $4; }
89: ;
90: .DE
91: .PP
92: As mentioned above, the token seen immediately
93: after the ``error'' symbol is the input token at which the
94: error was discovered.
95: Sometimes, this is inappropriate; for example, an
96: error recovery action might
97: take upon itself the job of finding the correct place to resume input.
98: In this case,
99: the previous lookahead token must be cleared.
100: The statement
101: .DS
102: yyclearin ;
103: .DE
104: in an action will have this effect.
105: For example, suppose the action after error
106: were to call some sophisticated resynchronization routine,
107: supplied by the user, that attempted to advance the input to the
108: beginning of the next valid statement.
109: After this routine was called, the next token returned by yylex would presumably
110: be the first token in a legal statement;
111: the old, illegal token must be discarded, and the error state reset.
112: This could be done by a rule like
113: .DS
114: stat : error
115: { resynch();
116: yyerrok ;
117: yyclearin ; }
118: ;
119: .DE
120: .PP
121: These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser
122: from many errors;
123: moreover, the user can get control to deal with
124: the error actions required by other portions of the program.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.