Annotation of 43BSDReno/share/doc/ps1/15.yacc/ss1, revision 1.1.1.1

1.1       root        1: .\"    @(#)ss1 6.1 (Berkeley) 5/8/86
                      2: .\"
                      3: .tr *\(**
                      4: .tr |\(or
                      5: .SH
                      6: 1: Basic Specifications
                      7: .PP
                      8: Names refer to either tokens or nonterminal symbols.
                      9: Yacc requires
                     10: token names to be declared as such.
                     11: In addition, for reasons discussed in Section 3, it is often desirable
                     12: to include the lexical analyzer as part of the specification file;
                     13: it may be useful to include other programs as well.
                     14: Thus, every specification file consists of three sections:
                     15: the
                     16: .I declarations ,
                     17: .I "(grammar) rules" ,
                     18: and
                     19: .I programs .
                     20: The sections are separated by double percent ``%%'' marks.
                     21: (The percent ``%'' is generally used in Yacc specifications as an escape character.)
                     22: .PP
                     23: In other words, a full specification file looks like
                     24: .DS
                     25: declarations
                     26: %%
                     27: rules
                     28: %%
                     29: programs
                     30: .DE
                     31: .PP
                     32: The declaration section may be empty.
                     33: Moreover, if the programs section is omitted, the second %% mark may be omitted also;
                     34: thus, the smallest legal Yacc specification is
                     35: .DS
                     36: %%
                     37: rules
                     38: .DE
                     39: .PP
                     40: Blanks, tabs, and newlines are ignored except
                     41: that they may not appear in names or multi-character reserved symbols.
                     42: Comments may appear wherever a name is legal; they are enclosed
                     43: in /* . . . */, as in C and PL/I.
                     44: .PP
                     45: The rules section is made up of one or more grammar rules.
                     46: A grammar rule has the form:
                     47: .DS
                     48: A  :  BODY  ;
                     49: .DE
                     50: A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals.
                     51: The colon and the semicolon are Yacc punctuation.
                     52: .PP
                     53: Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and
                     54: non-initial digits.
                     55: Upper and lower case letters are distinct.
                     56: The names used in the body of a grammar rule may represent tokens or nonterminal symbols.
                     57: .PP
                     58: A literal consists of a character enclosed in single quotes ``\'''.
                     59: As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes
                     60: are recognized.
                     61: Thus
                     62: .DS
                     63: \'\en\'        newline
                     64: \'\er\'        return
                     65: \'\e\'\'       single quote ``\'''
                     66: \'\e\e\'       backslash ``\e''
                     67: \'\et\'        tab
                     68: \'\eb\'        backspace
                     69: \'\ef\'        form feed
                     70: \'\exxx\'      ``xxx'' in octal
                     71: .DE
                     72: For a number of technical reasons, the
                     73: \s-2NUL\s0
                     74: character (\'\e0\' or 0) should never
                     75: be used in grammar rules.
                     76: .PP
                     77: If there are several grammar rules with the same left hand side, the vertical bar ``|''
                     78: can be used to avoid rewriting the left hand side.
                     79: In addition,
                     80: the semicolon at the end of a rule can be dropped before a vertical bar.
                     81: Thus the grammar rules
                     82: .DS
                     83: A      :       B  C  D   ;
                     84: A      :       E  F   ;
                     85: A      :       G   ;
                     86: .DE
                     87: can be given to Yacc as
                     88: .DS
                     89: A      :       B  C  D
                     90:        |       E  F
                     91:        |       G
                     92:        ;
                     93: .DE
                     94: It is not necessary that all grammar rules with the same left side appear together in the grammar rules section,
                     95: although it makes the input much more readable, and easier to change.
                     96: .PP
                     97: If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
                     98: .DS
                     99: empty :   ;
                    100: .DE
                    101: .PP
                    102: Names representing tokens must be declared; this is most simply done by writing
                    103: .DS
                    104: %token   name1  name2 . . .
                    105: .DE
                    106: in the declarations section.
                    107: (See Sections 3 , 5, and 6 for much more discussion).
                    108: Every name not defined in the declarations section is assumed to represent a nonterminal symbol.
                    109: Every nonterminal symbol must appear on the left side of at least one rule.
                    110: .PP
                    111: Of all the nonterminal symbols, one, called the
                    112: .I "start symbol" ,
                    113: has particular importance.
                    114: The parser is designed to recognize the start symbol; thus,
                    115: this symbol represents the largest,
                    116: most general structure described by the grammar rules.
                    117: By default,
                    118: the start symbol is taken to be the left hand side of the first
                    119: grammar rule in the rules section.
                    120: It is possible, and in fact desirable, to declare the start
                    121: symbol explicitly in the declarations section using the %start keyword:
                    122: .DS
                    123: %start   symbol
                    124: .DE
                    125: .PP
                    126: The end of the input to the parser is signaled by a special token, called the
                    127: .I endmarker .
                    128: If the tokens up to, but not including, the endmarker form a structure
                    129: which matches the start symbol, the parser function returns to its caller
                    130: after the endmarker is seen; it
                    131: .I accepts
                    132: the input.
                    133: If the endmarker is seen in any other context, it is an error.
                    134: .PP
                    135: It is the job of the user-supplied lexical analyzer
                    136: to return the endmarker when appropriate; see section 3, below.
                    137: Usually the endmarker represents some reasonably obvious 
                    138: I/O status, such as ``end-of-file'' or ``end-of-record''.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.