|
|
1.1 root 1: .\" @(#)ss0 6.1 (Berkeley) 5/8/86
2: .\"
3: .SH
4: 0: Introduction
5: .PP
6: Yacc provides a general tool for imposing structure on the input to a computer program.
7: The Yacc user prepares a
8: specification of the input process; this includes rules
9: describing the input structure, code to be invoked when these
10: rules are recognized, and a low-level routine to do the
11: basic input.
12: Yacc then generates a function to control the input process.
13: This function, called a
14: .I parser ,
15: calls the user-supplied low-level input routine
16: (the
17: .I "lexical analyzer" )
18: to pick up the basic items
19: (called
20: .I tokens )
21: from the input stream.
22: These tokens are organized according to the input structure rules,
23: called
24: .I "grammar rules" \|;
25: when one of these rules has been recognized,
26: then user code supplied for this rule, an
27: .I action ,
28: is invoked; actions have the ability to return values and
29: make use of the values of other actions.
30: .PP
31: Yacc is written in a portable dialect of C
32: .[
33: Ritchie Kernighan Language Prentice
34: .]
35: and the actions, and output subroutine, are in C as well.
36: Moreover, many of the syntactic conventions of Yacc follow C.
37: .PP
38: The heart of the input specification is a collection of grammar rules.
39: Each rule describes an allowable structure and gives it a name.
40: For example, one grammar rule might be
41: .DS
42: date : month\_name day \',\' year ;
43: .DE
44: Here,
45: .I date ,
46: .I month\_name ,
47: .I day ,
48: and
49: .I year
50: represent structures of interest in the input process;
51: presumably,
52: .I month\_name ,
53: .I day ,
54: and
55: .I year
56: are defined elsewhere.
57: The comma ``,'' is enclosed in single quotes; this implies that the
58: comma is to appear literally in the input.
59: The colon and semicolon merely serve as punctuation in the rule, and have
60: no significance in controlling the input.
61: Thus, with proper definitions, the input
62: .DS
63: July 4, 1776
64: .DE
65: might be matched by the above rule.
66: .PP
67: An important part of the input process is carried out by the
68: lexical analyzer.
69: This user routine reads the input stream, recognizing the lower level structures,
70: and communicates these tokens
71: to the parser.
72: For historical reasons, a structure recognized by the lexical analyzer is called a
73: .I "terminal symbol" ,
74: while the structure recognized by the parser is called a
75: .I "nonterminal symbol" .
76: To avoid confusion, terminal symbols will usually be referred to as
77: .I tokens .
78: .PP
79: There is considerable leeway in deciding whether to recognize structures using the lexical
80: analyzer or grammar rules.
81: For example, the rules
82: .DS
83: month\_name : \'J\' \'a\' \'n\' ;
84: month\_name : \'F\' \'e\' \'b\' ;
85:
86: . . .
87:
88: month\_name : \'D\' \'e\' \'c\' ;
89: .DE
90: might be used in the above example.
91: The lexical analyzer would only need to recognize individual letters, and
92: .I month\_name
93: would be a nonterminal symbol.
94: Such low-level rules tend to waste time and space, and may
95: complicate the specification beyond Yacc's ability to deal with it.
96: Usually, the lexical analyzer would
97: recognize the month names,
98: and return an indication that a
99: .I month\_name
100: was seen; in this case,
101: .I month\_name
102: would be a token.
103: .PP
104: Literal characters such as ``,'' must also be passed through the lexical
105: analyzer, and are also considered tokens.
106: .PP
107: Specification files are very flexible.
108: It is realively easy to add to the above example the rule
109: .DS
110: date : month \'/\' day \'/\' year ;
111: .DE
112: allowing
113: .DS
114: 7 / 4 / 1776
115: .DE
116: as a synonym for
117: .DS
118: July 4, 1776
119: .DE
120: In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
121: and little danger of disrupting existing input.
122: .PP
123: The input being read may not conform to the
124: specifications.
125: These input errors are detected as early as is theoretically possible with a
126: left-to-right scan;
127: thus, not only is the chance of reading and computing with bad
128: input data substantially reduced, but the bad data can usually be quickly found.
129: Error handling,
130: provided as part of the input specifications,
131: permits the reentry of bad data,
132: or the continuation of the input process after skipping over the bad data.
133: .PP
134: In some cases, Yacc fails to produce a parser when given a set of
135: specifications.
136: For example, the specifications may be self contradictory, or they may
137: require a more powerful recognition mechanism than that available to Yacc.
138: The former cases represent design errors;
139: the latter cases
140: can often be corrected
141: by making
142: the lexical analyzer
143: more powerful, or by rewriting some of the grammar rules.
144: While Yacc cannot handle all possible specifications, its power
145: compares favorably with similar systems;
146: moreover, the
147: constructions which are difficult for Yacc to handle are
148: also frequently difficult for human beings to handle.
149: Some users have reported that the discipline of formulating valid
150: Yacc specifications for their input revealed errors of
151: conception or design early in the program development.
152: .PP
153: The theory underlying Yacc has been described elsewhere.
154: .[
155: Aho Johnson Surveys LR Parsing
156: .]
157: .[
158: Aho Johnson Ullman Ambiguous Grammars
159: .]
160: .[
161: Aho Ullman Principles Compiler Design
162: .]
163: Yacc has been extensively used in numerous practical applications,
164: including
165: .I lint ,
166: .[
167: Johnson Lint
168: .]
169: the Portable C Compiler,
170: .[
171: Johnson Portable Compiler Theory
172: .]
173: and a system for typesetting mathematics.
174: .[
175: Kernighan Cherry typesetting system CACM
176: .]
177: .PP
178: The next several sections describe the
179: basic process of preparing a Yacc specification;
180: Section 1 describes the preparation of grammar rules,
181: Section 2 the preparation of the user supplied actions associated with these rules,
182: and Section 3 the preparation of lexical analyzers.
183: Section 4 describes the operation of the parser.
184: Section 5 discusses various reasons why Yacc may be unable to produce a
185: parser from a specification, and what to do about it.
186: Section 6 describes a simple mechanism for
187: handling operator precedences in arithmetic expressions.
188: Section 7 discusses error detection and recovery.
189: Section 8 discusses the operating environment and special features
190: of the parsers Yacc produces.
191: Section 9 gives some suggestions which should improve the
192: style and efficiency of the specifications.
193: Section 10 discusses some advanced topics, and Section 11 gives
194: acknowledgements.
195: Appendix A has a brief example, and Appendix B gives a
196: summary of the Yacc input syntax.
197: Appendix C gives an example using some of the more advanced
198: features of Yacc, and, finally,
199: Appendix D describes mechanisms and syntax
200: no longer actively supported, but
201: provided for historical continuity with older versions of Yacc.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.