Annotation of 43BSDReno/share/doc/ps1/15.yacc/ss0, revision 1.1.1.1

1.1       root        1: .\"    @(#)ss0 6.1 (Berkeley) 5/8/86
                      2: .\"
                      3: .SH
                      4: 0: Introduction
                      5: .PP
                      6: Yacc provides a general tool for imposing structure on the input to a computer program.
                      7: The Yacc user prepares a
                      8: specification of the input process; this includes rules
                      9: describing the input structure, code to be invoked when these
                     10: rules are recognized, and a low-level routine to do the
                     11: basic input.
                     12: Yacc then generates a function to control the input process.
                     13: This function, called a
                     14: .I parser ,
                     15: calls the user-supplied low-level input routine
                     16: (the
                     17: .I "lexical analyzer" )
                     18: to pick up the basic items
                     19: (called
                     20: .I tokens )
                     21: from the input stream.
                     22: These tokens are organized according to the input structure rules,
                     23: called
                     24: .I "grammar rules" \|;
                     25: when one of these rules has been recognized,
                     26: then user code supplied for this rule, an
                     27: .I action ,
                     28: is invoked; actions have the ability to return values and
                     29: make use of the values of other actions.
                     30: .PP
                     31: Yacc is written in a portable dialect of C
                     32: .[
                     33: Ritchie Kernighan Language Prentice
                     34: .]
                     35: and the actions, and output subroutine, are in C as well.
                     36: Moreover, many of the syntactic conventions of Yacc follow C.
                     37: .PP
                     38: The heart of the input specification is a collection of grammar rules.
                     39: Each rule describes an allowable structure and gives it a name.
                     40: For example, one grammar rule might be
                     41: .DS
                     42: date  :  month\_name  day  \',\'  year   ;
                     43: .DE
                     44: Here,
                     45: .I date ,
                     46: .I month\_name ,
                     47: .I day ,
                     48: and
                     49: .I year
                     50: represent structures of interest in the input process;
                     51: presumably,
                     52: .I month\_name ,
                     53: .I day ,
                     54: and
                     55: .I year
                     56: are defined elsewhere.
                     57: The comma ``,'' is enclosed in single quotes; this implies that the
                     58: comma is to appear literally in the input.
                     59: The colon and semicolon merely serve as punctuation in the rule, and have
                     60: no significance in controlling the input.
                     61: Thus, with proper definitions, the input
                     62: .DS
                     63: July  4, 1776
                     64: .DE
                     65: might be matched by the above rule.
                     66: .PP
                     67: An important part of the input process is carried out by the
                     68: lexical analyzer.
                     69: This user routine reads the input stream, recognizing the lower level structures,
                     70: and communicates these tokens
                     71: to the parser.
                     72: For historical reasons, a structure recognized by the lexical analyzer is called a
                     73: .I "terminal symbol" ,
                     74: while the structure recognized by the parser is called a
                     75: .I "nonterminal symbol" .
                     76: To avoid confusion, terminal symbols will usually be referred to as
                     77: .I tokens .
                     78: .PP
                     79: There is considerable leeway in deciding whether to recognize structures using the lexical
                     80: analyzer or grammar rules.
                     81: For example, the rules
                     82: .DS
                     83: month\_name  :  \'J\' \'a\' \'n\'   ;
                     84: month\_name  :  \'F\' \'e\' \'b\'   ;
                     85: 
                     86:          . . .
                     87: 
                     88: month\_name  :  \'D\' \'e\' \'c\'   ;
                     89: .DE
                     90: might be used in the above example.
                     91: The lexical analyzer would only need to recognize individual letters, and
                     92: .I month\_name
                     93: would be a nonterminal symbol.
                     94: Such low-level rules tend to waste time and space, and may
                     95: complicate the specification beyond Yacc's ability to deal with it.
                     96: Usually, the lexical analyzer would
                     97: recognize the month names,
                     98: and return an indication that a
                     99: .I month\_name
                    100: was seen; in this case,
                    101: .I month\_name
                    102: would be a token.
                    103: .PP
                    104: Literal characters such as ``,'' must also be passed through the lexical
                    105: analyzer, and are also considered tokens.
                    106: .PP
                    107: Specification files are very flexible.
                    108: It is realively easy to add to the above example the rule
                    109: .DS
                    110: date  :  month \'/\' day \'/\' year   ;
                    111: .DE
                    112: allowing
                    113: .DS
                    114: 7 / 4 / 1776
                    115: .DE
                    116: as a synonym for
                    117: .DS
                    118: July 4, 1776
                    119: .DE
                    120: In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
                    121: and little danger of disrupting existing input.
                    122: .PP
                    123: The input being read may not conform to the
                    124: specifications.
                    125: These input errors are detected as early as is theoretically possible with a
                    126: left-to-right scan;
                    127: thus, not only is the chance of reading and computing with bad
                    128: input data substantially reduced, but the bad data can usually be quickly found.
                    129: Error handling,
                    130: provided as part of the input specifications,
                    131: permits the reentry of bad data,
                    132: or the continuation of the input process after skipping over the bad data.
                    133: .PP
                    134: In some cases, Yacc fails to produce a parser when given a set of
                    135: specifications.
                    136: For example, the specifications may be self contradictory, or they may
                    137: require a more powerful recognition mechanism than that available to Yacc.
                    138: The former cases represent design errors;
                    139: the latter cases
                    140: can often be corrected
                    141: by making
                    142: the lexical analyzer
                    143: more powerful, or by rewriting some of the grammar rules.
                    144: While Yacc cannot handle all possible specifications, its power
                    145: compares favorably with similar systems;
                    146: moreover, the
                    147: constructions which are difficult for Yacc to handle are
                    148: also frequently difficult for human beings to handle.
                    149: Some users have reported that the discipline of formulating valid
                    150: Yacc specifications for their input revealed errors of
                    151: conception or design early in the program development.
                    152: .PP
                    153: The theory underlying Yacc has been described elsewhere.
                    154: .[
                    155: Aho Johnson Surveys LR Parsing
                    156: .]
                    157: .[
                    158: Aho Johnson Ullman Ambiguous Grammars
                    159: .]
                    160: .[
                    161: Aho Ullman Principles Compiler Design
                    162: .]
                    163: Yacc has been extensively used in numerous practical applications,
                    164: including
                    165: .I lint ,
                    166: .[
                    167: Johnson Lint
                    168: .]
                    169: the Portable C Compiler,
                    170: .[
                    171: Johnson Portable Compiler Theory
                    172: .]
                    173: and a system for typesetting mathematics.
                    174: .[
                    175: Kernighan Cherry typesetting system CACM
                    176: .]
                    177: .PP
                    178: The next several sections describe the
                    179: basic process of preparing a Yacc specification;
                    180: Section 1 describes the preparation of grammar rules,
                    181: Section 2 the preparation of the user supplied actions associated with these rules,
                    182: and Section 3 the preparation of lexical analyzers.
                    183: Section 4 describes the operation of the parser.
                    184: Section 5 discusses various reasons why Yacc may be unable to produce a
                    185: parser from a specification, and what to do about it.
                    186: Section 6 describes a simple mechanism for
                    187: handling operator precedences in arithmetic expressions.
                    188: Section 7 discusses error detection and recovery.
                    189: Section 8 discusses the operating environment and special features
                    190: of the parsers Yacc produces.
                    191: Section 9 gives some suggestions which should improve the
                    192: style and efficiency of the specifications.
                    193: Section 10 discusses some advanced topics, and Section 11 gives
                    194: acknowledgements.
                    195: Appendix A has a brief example, and Appendix B gives a
                    196: summary of the Yacc input syntax.
                    197: Appendix C gives an example using some of the more advanced
                    198: features of Yacc, and, finally,
                    199: Appendix D describes mechanisms and syntax
                    200: no longer actively supported, but
                    201: provided for historical continuity with older versions of Yacc.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.