|
|
1.1 root 1: .\" @(#)ss9 6.1 (Berkeley) 5/8/86
2: .\"
3: .SH
4: 9: Hints for Preparing Specifications
5: .PP
6: This section contains miscellaneous hints on preparing efficient, easy to change,
7: and clear specifications.
8: The individual subsections are more or less
9: independent.
10: .SH
11: Input Style
12: .PP
13: It is difficult to
14: provide rules with substantial actions
15: and still have a readable specification file.
16: The following style hints owe much to Brian Kernighan.
17: .IP a.
18: Use all capital letters for token names, all lower case letters for
19: nonterminal names.
20: This rule comes under the heading of ``knowing who to blame when
21: things go wrong.''
22: .IP b.
23: Put grammar rules and actions on separate lines.
24: This allows either to be changed without
25: an automatic need to change the other.
26: .IP c.
27: Put all rules with the same left hand side together.
28: Put the left hand side in only once, and let all
29: following rules begin with a vertical bar.
30: .IP d.
31: Put a semicolon only after the last rule with a given left hand side,
32: and put the semicolon on a separate line.
33: This allows new rules to be easily added.
34: .IP e.
35: Indent rule bodies by two tab stops, and action bodies by three
36: tab stops.
37: .PP
38: The example in Appendix A is written following this style, as are
39: the examples in the text of this paper (where space permits).
40: The user must make up his own mind about these stylistic questions;
41: the central problem, however, is to make the rules visible through
42: the morass of action code.
43: .SH
44: Left Recursion
45: .PP
46: The algorithm used by the Yacc parser encourages so called ``left recursive''
47: grammar rules: rules of the form
48: .DS
49: name : name rest_of_rule ;
50: .DE
51: These rules frequently arise when
52: writing specifications of sequences and lists:
53: .DS
54: list : item
55: | list \',\' item
56: ;
57: .DE
58: and
59: .DS
60: seq : item
61: | seq item
62: ;
63: .DE
64: In each of these cases, the first rule
65: will be reduced for the first item only, and the second rule
66: will be reduced for the second and all succeeding items.
67: .PP
68: With right recursive rules, such as
69: .DS
70: seq : item
71: | item seq
72: ;
73: .DE
74: the parser would be a bit bigger, and the items would be seen, and reduced,
75: from right to left.
76: More seriously, an internal stack in the parser
77: would be in danger of overflowing if a very long sequence were read.
78: Thus, the user should use left recursion wherever reasonable.
79: .PP
80: It is worth considering whether a sequence with zero
81: elements has any meaning, and if so, consider writing
82: the sequence specification with an empty rule:
83: .DS
84: seq : /* empty */
85: | seq item
86: ;
87: .DE
88: Once again, the first rule would always be reduced exactly once, before the
89: first item was read,
90: and then the second rule would be reduced once for each item read.
91: Permitting empty sequences
92: often leads to increased generality.
93: However, conflicts might arise if Yacc is asked to decide
94: which empty sequence it has seen, when it hasn't seen enough to
95: know!
96: .SH
97: Lexical Tie-ins
98: .PP
99: Some lexical decisions depend on context.
100: For example, the lexical analyzer might want to
101: delete blanks normally, but not within quoted strings.
102: Or names might be entered into a symbol table in declarations,
103: but not in expressions.
104: .PP
105: One way of handling this situation is
106: to create a global flag that is
107: examined by the lexical analyzer, and set by actions.
108: For example, suppose a program
109: consists of 0 or more declarations, followed by 0 or more statements.
110: Consider:
111: .DS
112: %{
113: int dflag;
114: %}
115: ... other declarations ...
116:
117: %%
118:
119: prog : decls stats
120: ;
121:
122: decls : /* empty */
123: { dflag = 1; }
124: | decls declaration
125: ;
126:
127: stats : /* empty */
128: { dflag = 0; }
129: | stats statement
130: ;
131:
132: ... other rules ...
133: .DE
134: The flag
135: .I dflag
136: is now 0 when reading statements, and 1 when reading declarations,
137: .ul
138: except for the first token in the first statement.
139: This token must be seen by the parser before it can tell that
140: the declaration section has ended and the statements have
141: begun.
142: In many cases, this single token exception does not
143: affect the lexical scan.
144: .PP
145: This kind of ``backdoor'' approach can be elaborated
146: to a noxious degree.
147: Nevertheless, it represents a way of doing some things
148: that are difficult, if not impossible, to
149: do otherwise.
150: .SH
151: Reserved Words
152: .PP
153: Some programming languages
154: permit the user to
155: use words like ``if'', which are normally reserved,
156: as label or variable names, provided that such use does not
157: conflict with the legal use of these names in the programming language.
158: This is extremely hard to do in the framework of Yacc;
159: it is difficult to pass information to the lexical analyzer
160: telling it ``this instance of `if' is a keyword, and that instance is a variable''.
161: The user can make a stab at it, using the
162: mechanism described in the last subsection,
163: but it is difficult.
164: .PP
165: A number of ways of making this easier are under advisement.
166: Until then, it is better that the keywords be
167: .I reserved \|;
168: that is, be forbidden for use as variable names.
169: There are powerful stylistic reasons for preferring this, anyway.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.