|
|
1.1 root 1: .\" @(#)ss1 6.1 (Berkeley) 5/8/86
2: .\"
3: .tr *\(**
4: .tr |\(or
5: .SH
6: 1: Basic Specifications
7: .PP
8: Names refer to either tokens or nonterminal symbols.
9: Yacc requires
10: token names to be declared as such.
11: In addition, for reasons discussed in Section 3, it is often desirable
12: to include the lexical analyzer as part of the specification file;
13: it may be useful to include other programs as well.
14: Thus, every specification file consists of three sections:
15: the
16: .I declarations ,
17: .I "(grammar) rules" ,
18: and
19: .I programs .
20: The sections are separated by double percent ``%%'' marks.
21: (The percent ``%'' is generally used in Yacc specifications as an escape character.)
22: .PP
23: In other words, a full specification file looks like
24: .DS
25: declarations
26: %%
27: rules
28: %%
29: programs
30: .DE
31: .PP
32: The declaration section may be empty.
33: Moreover, if the programs section is omitted, the second %% mark may be omitted also;
34: thus, the smallest legal Yacc specification is
35: .DS
36: %%
37: rules
38: .DE
39: .PP
40: Blanks, tabs, and newlines are ignored except
41: that they may not appear in names or multi-character reserved symbols.
42: Comments may appear wherever a name is legal; they are enclosed
43: in /* . . . */, as in C and PL/I.
44: .PP
45: The rules section is made up of one or more grammar rules.
46: A grammar rule has the form:
47: .DS
48: A : BODY ;
49: .DE
50: A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals.
51: The colon and the semicolon are Yacc punctuation.
52: .PP
53: Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and
54: non-initial digits.
55: Upper and lower case letters are distinct.
56: The names used in the body of a grammar rule may represent tokens or nonterminal symbols.
57: .PP
58: A literal consists of a character enclosed in single quotes ``\'''.
59: As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes
60: are recognized.
61: Thus
62: .DS
63: \'\en\' newline
64: \'\er\' return
65: \'\e\'\' single quote ``\'''
66: \'\e\e\' backslash ``\e''
67: \'\et\' tab
68: \'\eb\' backspace
69: \'\ef\' form feed
70: \'\exxx\' ``xxx'' in octal
71: .DE
72: For a number of technical reasons, the
73: \s-2NUL\s0
74: character (\'\e0\' or 0) should never
75: be used in grammar rules.
76: .PP
77: If there are several grammar rules with the same left hand side, the vertical bar ``|''
78: can be used to avoid rewriting the left hand side.
79: In addition,
80: the semicolon at the end of a rule can be dropped before a vertical bar.
81: Thus the grammar rules
82: .DS
83: A : B C D ;
84: A : E F ;
85: A : G ;
86: .DE
87: can be given to Yacc as
88: .DS
89: A : B C D
90: | E F
91: | G
92: ;
93: .DE
94: It is not necessary that all grammar rules with the same left side appear together in the grammar rules section,
95: although it makes the input much more readable, and easier to change.
96: .PP
97: If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
98: .DS
99: empty : ;
100: .DE
101: .PP
102: Names representing tokens must be declared; this is most simply done by writing
103: .DS
104: %token name1 name2 . . .
105: .DE
106: in the declarations section.
107: (See Sections 3 , 5, and 6 for much more discussion).
108: Every name not defined in the declarations section is assumed to represent a nonterminal symbol.
109: Every nonterminal symbol must appear on the left side of at least one rule.
110: .PP
111: Of all the nonterminal symbols, one, called the
112: .I "start symbol" ,
113: has particular importance.
114: The parser is designed to recognize the start symbol; thus,
115: this symbol represents the largest,
116: most general structure described by the grammar rules.
117: By default,
118: the start symbol is taken to be the left hand side of the first
119: grammar rule in the rules section.
120: It is possible, and in fact desirable, to declare the start
121: symbol explicitly in the declarations section using the %start keyword:
122: .DS
123: %start symbol
124: .DE
125: .PP
126: The end of the input to the parser is signaled by a special token, called the
127: .I endmarker .
128: If the tokens up to, but not including, the endmarker form a structure
129: which matches the start symbol, the parser function returns to its caller
130: after the endmarker is seen; it
131: .I accepts
132: the input.
133: If the endmarker is seen in any other context, it is an error.
134: .PP
135: It is the job of the user-supplied lexical analyzer
136: to return the endmarker when appropriate; see section 3, below.
137: Usually the endmarker represents some reasonably obvious
138: I/O status, such as ``end-of-file'' or ``end-of-record''.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.