|
|
1.1 ! root 1: .\" @(#)e5 6.1 (Berkeley) 5/22/86 ! 2: .\" ! 3: .NH ! 4: Language Theory ! 5: .PP ! 6: The basic structure of the language is ! 7: not a particularly original one. ! 8: Equations are pictured as a set of ``boxes,'' ! 9: pieced together in various ways. ! 10: For example, something with a subscript is ! 11: just a box followed by another box moved downward ! 12: and shrunk ! 13: by an appropriate amount. ! 14: A fraction is just a box centered above another box, ! 15: at the right altitude, ! 16: with a line of correct length drawn between them. ! 17: .PP ! 18: The grammar for the language is shown below. ! 19: For purposes of exposition, we have collapsed ! 20: some productions. In the original grammar, there ! 21: are about 70 productions, but many of these ! 22: are simple ones used only to guarantee ! 23: that some keyword is recognized early enough in the parsing process. ! 24: Symbols in ! 25: capital letters ! 26: are terminal symbols; ! 27: lower case ! 28: symbols are non-terminals, i.e., syntactic categories. ! 29: The vertical bar \(bv indicates an alternative; ! 30: the brackets [ ] indicate optional material. ! 31: A ! 32: .UC TEXT ! 33: is a string of non-blank characters or ! 34: any string inside double quotes; ! 35: the other terminal symbols represent literal occurrences ! 36: of the corresponding keyword. ! 37: .P1 ! 38: .ce 0 ! 39: .ta .3i ! 40: .ps 9 ! 41: .ne 17 ! 42: .in 1 ! 43: eqn : box | eqn box ! 44: .sp 5p ! 45: box : text ! 46: | { eqn } ! 47: | box OVER box ! 48: | SQRT box ! 49: | box SUB box | box SUP box ! 50: | [ L | C | R ]PILE { list } ! 51: | LEFT text eqn [ RIGHT text ] ! 52: | box [ FROM box ] [ TO box ] ! 53: | SIZE text box ! 54: | [ROMAN | BOLD | ITALIC] box ! 55: | box [HAT | BAR | DOT | DOTDOT | TILDE] ! 56: | DEFINE text text ! 57: .sp 5p ! 58: list : eqn | list ABOVE eqn ! 59: .sp 5p ! 60: text : TEXT ! 61: .ps 10 ! 62: .in 0 ! 63: .P2 ! 64: .PP ! 65: The grammar makes it obvious why there are few exceptions. ! 66: For example, the observation that something can be replaced by a more complicated something ! 67: in braces is implicit in the productions: ! 68: .P1 ! 69: .ce 0 ! 70: eqn : box | eqn box ! 71: box : text | { eqn } ! 72: .P2 ! 73: Anywhere a single character could be used, ! 74: .ul ! 75: any ! 76: legal construction can be used. ! 77: .PP ! 78: Clearly, our grammar is highly ambiguous. ! 79: What, for instance, do we do with the input ! 80: .P1 ! 81: a over b over c ? ! 82: .P2 ! 83: Is it ! 84: .P1 ! 85: {a over b} over c ! 86: .P2 ! 87: or is it ! 88: .P1 ! 89: a over {b over c} ? ! 90: .P2 ! 91: .PP ! 92: To answer questions like this, the grammar ! 93: is supplemented with a small set of rules that describe the precedence ! 94: and associativity ! 95: of operators. ! 96: In particular, we specify (more or less arbitrarily) ! 97: that ! 98: .ul ! 99: over ! 100: associates to the left, ! 101: so the first alternative above is the one chosen. ! 102: On the other hand, ! 103: .ul ! 104: sub ! 105: and ! 106: .ul ! 107: sup ! 108: bind to the right, ! 109: because this is closer to standard mathematical practice. ! 110: That is, we assume $x sup a sup b$ is $x sup {(a sup b )}$, ! 111: not $(x sup a ) sup b$. ! 112: .PP ! 113: The precedence rules resolve the ambiguity in a construction like ! 114: .P1 ! 115: a sup 2 over b ! 116: .P2 ! 117: We define ! 118: .ul ! 119: sup ! 120: to have a higher precedence than ! 121: .ul ! 122: over, ! 123: so this construction is parsed as ! 124: $a sup 2 over b$ instead of $a sup {2 over b}$. ! 125: .PP ! 126: Naturally, a user can always ! 127: force a particular parsing ! 128: by placing braces around expressions. ! 129: .PP ! 130: The ambiguous grammar approach seems to be quite useful. ! 131: The grammar we use is small enough to be easily understood, ! 132: for it contains none of the productions that would be ! 133: normally used for resolving ambiguity. ! 134: Instead the supplemental information about ! 135: precedence and associativity (also small enough to be understood) ! 136: provides the compiler-compiler ! 137: with the information it needs ! 138: to make a fast, deterministic parser for ! 139: the specific language we want. ! 140: When the language is supplemented by the disambiguating rules, ! 141: it is in fact ! 142: .UC LR(1) ! 143: and thus easy to parse[5]. ! 144: .PP ! 145: The output code is generated as the input is scanned. ! 146: Any time a production ! 147: of the grammar is recognized, ! 148: (potentially) some ! 149: .UC TROFF ! 150: commands are output. ! 151: For example, when the lexical analyzer ! 152: reports that it has found a ! 153: .UC TEXT ! 154: (i.e., a string of contiguous characters), ! 155: we have recognized the production: ! 156: .P1 ! 157: text : TEXT ! 158: .P2 ! 159: The translation of this is simple. ! 160: We generate a local name for the string, ! 161: then hand the name and the string to ! 162: .UC TROFF, ! 163: and let ! 164: .UC TROFF ! 165: perform the storage management. ! 166: All we save is the name of the string, ! 167: its height, and its baseline. ! 168: .PP ! 169: As another example, ! 170: the translation associated with the production ! 171: .P1 ! 172: box : box OVER box ! 173: .P2 ! 174: is: ! 175: .P1 ! 176: .ce 0 ! 177: .in 1 ! 178: .ne 14 ! 179: Width of output box = ! 180: slightly more than largest input width ! 181: Height of output box = ! 182: slightly more than sum of input heights ! 183: Base of output box = ! 184: slightly more than height of bottom input box ! 185: String describing output box = ! 186: move down; ! 187: move right enough to center bottom box; ! 188: draw bottom box (i.e., copy string for bottom box); ! 189: move up; move left enough to center top box; ! 190: draw top box (i.e., copy string for top box); ! 191: move down and left; draw line full width; ! 192: return to proper base line. ! 193: .in 0 ! 194: .P2 ! 195: Most of the other productions have ! 196: equally simple semantic actions. ! 197: Picturing the output as a set of properly placed boxes ! 198: makes the right sequence of positioning commands ! 199: quite obvious. ! 200: The main difficulty is in finding the right numbers to use ! 201: for esthetically pleasing positioning. ! 202: .PP ! 203: With a grammar, it is usually clear how to extend the language. ! 204: For instance, one of our users ! 205: suggested a ! 206: .UC TENSOR ! 207: operator, to make constructions like ! 208: .EQ ! 209: ~ sub size 7 m sup size 7 l ! 210: {bold T from n to k} sub size 7 i sup size 7 j ! 211: .EN ! 212: Grammatically, this is easy: ! 213: it is sufficient to add a production like ! 214: .P1 ! 215: box : TENSOR { list } ! 216: .P2 ! 217: Semantically, we need only juggle the boxes to the right places.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.