Annotation of 43BSDReno/share/doc/usd/26.eqn/e5, revision 1.1.1.1

1.1       root        1: .\"    @(#)e5  6.1 (Berkeley) 5/22/86
                      2: .\"
                      3: .NH
                      4: Language Theory
                      5: .PP
                      6: The basic structure of the language is
                      7: not a particularly original one.
                      8: Equations are pictured as a set of ``boxes,''
                      9: pieced together in various ways.
                     10: For example, something with a subscript is
                     11: just a box followed by another box moved downward
                     12: and shrunk
                     13: by an appropriate amount.
                     14: A fraction is just a box centered above another box,
                     15: at the right altitude,
                     16: with a line of correct length drawn between them.
                     17: .PP
                     18: The grammar for the language is shown below.
                     19: For purposes of exposition, we have collapsed
                     20: some productions. In the original grammar, there
                     21: are about 70 productions, but many of these
                     22: are simple ones used only to guarantee
                     23: that some keyword is recognized early enough in the parsing process.
                     24: Symbols in
                     25: capital letters
                     26: are terminal symbols; 
                     27: lower case
                     28: symbols are non-terminals, i.e., syntactic categories.
                     29: The vertical bar \(bv indicates an alternative;
                     30: the brackets [ ] indicate optional material.
                     31: A
                     32: .UC TEXT
                     33: is a string of non-blank characters or
                     34: any string inside double quotes;
                     35: the other terminal symbols represent literal occurrences
                     36: of the corresponding keyword.
                     37: .P1
                     38: .ce 0
                     39: .ta .3i
                     40: .ps 9
                     41: .ne 17
                     42: .in 1
                     43: eqn    : box | eqn box
                     44: .sp 5p
                     45: box    : text
                     46:        | { eqn }
                     47:        | box OVER box
                     48:        | SQRT box
                     49:        | box SUB box | box SUP box
                     50:        | [ L | C | R ]PILE { list }
                     51:        | LEFT text eqn [ RIGHT text ]
                     52:        | box [ FROM box ] [ TO box ]
                     53:        | SIZE text box
                     54:        | [ROMAN | BOLD | ITALIC] box
                     55:        | box [HAT | BAR | DOT | DOTDOT | TILDE]
                     56:        | DEFINE text text
                     57: .sp 5p
                     58: list   : eqn | list ABOVE eqn
                     59: .sp 5p
                     60: text   : TEXT
                     61: .ps 10
                     62: .in 0
                     63: .P2
                     64: .PP
                     65: The grammar makes it obvious why there are few exceptions.
                     66: For example, the observation that something can be replaced by a more complicated something 
                     67: in braces is implicit in the productions:
                     68: .P1
                     69: .ce 0
                     70:    eqn : box | eqn box
                     71:    box : text | { eqn }
                     72: .P2
                     73: Anywhere a single character could be used,
                     74: .ul
                     75: any
                     76: legal construction can be used.
                     77: .PP
                     78: Clearly, our grammar is highly ambiguous.
                     79: What, for instance, do we do with the input
                     80: .P1
                     81: a over b over c  ?
                     82: .P2
                     83: Is it
                     84: .P1
                     85: {a over b} over c 
                     86: .P2
                     87: or is it
                     88: .P1
                     89: a over {b over c}  ?
                     90: .P2
                     91: .PP
                     92: To answer questions like this, the grammar
                     93: is supplemented with a small set of rules that describe the precedence 
                     94: and associativity
                     95: of operators.
                     96: In particular, we specify (more or less arbitrarily)
                     97: that
                     98: .ul
                     99: over
                    100: associates to the left,
                    101: so the first alternative above is the one chosen.
                    102: On the other hand, 
                    103: .ul
                    104: sub
                    105: and
                    106: .ul
                    107: sup
                    108: bind to the right,
                    109: because this is closer to standard mathematical practice.
                    110: That is, we assume $x sup a sup b$ is $x sup {(a sup b )}$,
                    111: not  $(x sup a ) sup b$.
                    112: .PP
                    113: The precedence rules resolve the ambiguity in a construction like
                    114: .P1
                    115: a sup 2 over b
                    116: .P2
                    117: We define
                    118: .ul
                    119: sup
                    120: to have a higher precedence than
                    121: .ul
                    122: over,
                    123: so this construction is parsed as
                    124: $a sup 2 over b$ instead of $a sup {2 over b}$.
                    125: .PP
                    126: Naturally, a user can always
                    127: force a particular parsing
                    128: by placing braces around expressions.
                    129: .PP
                    130: The ambiguous grammar approach seems to be quite useful.
                    131: The grammar we use is small enough to be easily understood,
                    132: for it contains none of the productions that would be
                    133: normally used for resolving ambiguity.
                    134: Instead the supplemental information about
                    135: precedence and associativity (also small enough to be understood)
                    136: provides the compiler-compiler
                    137: with the information it needs
                    138: to make a fast, deterministic parser for
                    139: the specific language we want.
                    140: When the language is supplemented by the disambiguating rules,
                    141: it is in fact
                    142: .UC LR(1)
                    143: and thus easy to parse[5].
                    144: .PP
                    145: The output code is generated as the input is scanned.
                    146: Any time a production
                    147: of the grammar is recognized,
                    148: (potentially) some
                    149: .UC TROFF
                    150: commands are output.
                    151: For example, when the lexical analyzer 
                    152: reports that it has found a
                    153: .UC TEXT
                    154: (i.e., a string of contiguous characters),
                    155: we have recognized the production:
                    156: .P1
                    157: text    : TEXT
                    158: .P2
                    159: The translation of this is simple.
                    160: We generate a local name for the string,
                    161: then hand the name and the string to
                    162: .UC TROFF,
                    163: and let
                    164: .UC TROFF
                    165: perform the storage management.
                    166: All we save is the name of the string,
                    167: its height, and its baseline.
                    168: .PP
                    169: As another example,
                    170: the translation associated with the production
                    171: .P1
                    172: box    : box OVER box
                    173: .P2
                    174: is:
                    175: .P1
                    176: .ce 0
                    177: .in 1
                    178: .ne 14
                    179: Width of output box =
                    180:   slightly more than largest input width
                    181: Height of output box =
                    182:   slightly more than sum of input heights
                    183: Base of output box =
                    184:   slightly more than height of bottom input box
                    185: String describing output box =
                    186:   move down;
                    187:   move right enough to center bottom box;
                    188:   draw bottom box (i.e., copy string for bottom box);
                    189:   move up; move left enough to center top box;
                    190:   draw top box (i.e., copy string for top box);
                    191:   move down and left; draw line full width;
                    192:   return to proper base line.
                    193: .in 0
                    194: .P2
                    195: Most of the other productions have 
                    196: equally simple semantic actions.
                    197: Picturing the output as a set of properly placed boxes
                    198: makes the right sequence of positioning commands
                    199: quite obvious.
                    200: The main difficulty is in finding the right numbers to use
                    201: for esthetically pleasing positioning.
                    202: .PP
                    203: With a grammar, it is usually clear how to extend the language.
                    204: For instance, one of our users
                    205: suggested a
                    206: .UC TENSOR
                    207: operator, to make constructions like
                    208: .EQ
                    209: ~ sub size 7 m sup size 7 l
                    210: {bold T from n to k} sub size 7 i sup size 7 j
                    211: .EN
                    212: Grammatically, this is easy:
                    213: it is sufficient to add a production like
                    214: .P1
                    215: box    : TENSOR { list }
                    216: .P2
                    217: Semantically, we need only juggle the boxes to the right places.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.