|
|
1.1 root 1: .\" @(#)e5 6.1 (Berkeley) 5/22/86
2: .\"
3: .NH
4: Language Theory
5: .PP
6: The basic structure of the language is
7: not a particularly original one.
8: Equations are pictured as a set of ``boxes,''
9: pieced together in various ways.
10: For example, something with a subscript is
11: just a box followed by another box moved downward
12: and shrunk
13: by an appropriate amount.
14: A fraction is just a box centered above another box,
15: at the right altitude,
16: with a line of correct length drawn between them.
17: .PP
18: The grammar for the language is shown below.
19: For purposes of exposition, we have collapsed
20: some productions. In the original grammar, there
21: are about 70 productions, but many of these
22: are simple ones used only to guarantee
23: that some keyword is recognized early enough in the parsing process.
24: Symbols in
25: capital letters
26: are terminal symbols;
27: lower case
28: symbols are non-terminals, i.e., syntactic categories.
29: The vertical bar \(bv indicates an alternative;
30: the brackets [ ] indicate optional material.
31: A
32: .UC TEXT
33: is a string of non-blank characters or
34: any string inside double quotes;
35: the other terminal symbols represent literal occurrences
36: of the corresponding keyword.
37: .P1
38: .ce 0
39: .ta .3i
40: .ps 9
41: .ne 17
42: .in 1
43: eqn : box | eqn box
44: .sp 5p
45: box : text
46: | { eqn }
47: | box OVER box
48: | SQRT box
49: | box SUB box | box SUP box
50: | [ L | C | R ]PILE { list }
51: | LEFT text eqn [ RIGHT text ]
52: | box [ FROM box ] [ TO box ]
53: | SIZE text box
54: | [ROMAN | BOLD | ITALIC] box
55: | box [HAT | BAR | DOT | DOTDOT | TILDE]
56: | DEFINE text text
57: .sp 5p
58: list : eqn | list ABOVE eqn
59: .sp 5p
60: text : TEXT
61: .ps 10
62: .in 0
63: .P2
64: .PP
65: The grammar makes it obvious why there are few exceptions.
66: For example, the observation that something can be replaced by a more complicated something
67: in braces is implicit in the productions:
68: .P1
69: .ce 0
70: eqn : box | eqn box
71: box : text | { eqn }
72: .P2
73: Anywhere a single character could be used,
74: .ul
75: any
76: legal construction can be used.
77: .PP
78: Clearly, our grammar is highly ambiguous.
79: What, for instance, do we do with the input
80: .P1
81: a over b over c ?
82: .P2
83: Is it
84: .P1
85: {a over b} over c
86: .P2
87: or is it
88: .P1
89: a over {b over c} ?
90: .P2
91: .PP
92: To answer questions like this, the grammar
93: is supplemented with a small set of rules that describe the precedence
94: and associativity
95: of operators.
96: In particular, we specify (more or less arbitrarily)
97: that
98: .ul
99: over
100: associates to the left,
101: so the first alternative above is the one chosen.
102: On the other hand,
103: .ul
104: sub
105: and
106: .ul
107: sup
108: bind to the right,
109: because this is closer to standard mathematical practice.
110: That is, we assume $x sup a sup b$ is $x sup {(a sup b )}$,
111: not $(x sup a ) sup b$.
112: .PP
113: The precedence rules resolve the ambiguity in a construction like
114: .P1
115: a sup 2 over b
116: .P2
117: We define
118: .ul
119: sup
120: to have a higher precedence than
121: .ul
122: over,
123: so this construction is parsed as
124: $a sup 2 over b$ instead of $a sup {2 over b}$.
125: .PP
126: Naturally, a user can always
127: force a particular parsing
128: by placing braces around expressions.
129: .PP
130: The ambiguous grammar approach seems to be quite useful.
131: The grammar we use is small enough to be easily understood,
132: for it contains none of the productions that would be
133: normally used for resolving ambiguity.
134: Instead the supplemental information about
135: precedence and associativity (also small enough to be understood)
136: provides the compiler-compiler
137: with the information it needs
138: to make a fast, deterministic parser for
139: the specific language we want.
140: When the language is supplemented by the disambiguating rules,
141: it is in fact
142: .UC LR(1)
143: and thus easy to parse[5].
144: .PP
145: The output code is generated as the input is scanned.
146: Any time a production
147: of the grammar is recognized,
148: (potentially) some
149: .UC TROFF
150: commands are output.
151: For example, when the lexical analyzer
152: reports that it has found a
153: .UC TEXT
154: (i.e., a string of contiguous characters),
155: we have recognized the production:
156: .P1
157: text : TEXT
158: .P2
159: The translation of this is simple.
160: We generate a local name for the string,
161: then hand the name and the string to
162: .UC TROFF,
163: and let
164: .UC TROFF
165: perform the storage management.
166: All we save is the name of the string,
167: its height, and its baseline.
168: .PP
169: As another example,
170: the translation associated with the production
171: .P1
172: box : box OVER box
173: .P2
174: is:
175: .P1
176: .ce 0
177: .in 1
178: .ne 14
179: Width of output box =
180: slightly more than largest input width
181: Height of output box =
182: slightly more than sum of input heights
183: Base of output box =
184: slightly more than height of bottom input box
185: String describing output box =
186: move down;
187: move right enough to center bottom box;
188: draw bottom box (i.e., copy string for bottom box);
189: move up; move left enough to center top box;
190: draw top box (i.e., copy string for top box);
191: move down and left; draw line full width;
192: return to proper base line.
193: .in 0
194: .P2
195: Most of the other productions have
196: equally simple semantic actions.
197: Picturing the output as a set of properly placed boxes
198: makes the right sequence of positioning commands
199: quite obvious.
200: The main difficulty is in finding the right numbers to use
201: for esthetically pleasing positioning.
202: .PP
203: With a grammar, it is usually clear how to extend the language.
204: For instance, one of our users
205: suggested a
206: .UC TENSOR
207: operator, to make constructions like
208: .EQ
209: ~ sub size 7 m sup size 7 l
210: {bold T from n to k} sub size 7 i sup size 7 j
211: .EN
212: Grammatically, this is easy:
213: it is sufficient to add a production like
214: .P1
215: box : TENSOR { list }
216: .P2
217: Semantically, we need only juggle the boxes to the right places.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.