|
|
1.1 ! root 1: .\" @(#)ss0 6.1 (Berkeley) 5/8/86 ! 2: .\" ! 3: .SH ! 4: 0: Introduction ! 5: .PP ! 6: Yacc provides a general tool for imposing structure on the input to a computer program. ! 7: The Yacc user prepares a ! 8: specification of the input process; this includes rules ! 9: describing the input structure, code to be invoked when these ! 10: rules are recognized, and a low-level routine to do the ! 11: basic input. ! 12: Yacc then generates a function to control the input process. ! 13: This function, called a ! 14: .I parser , ! 15: calls the user-supplied low-level input routine ! 16: (the ! 17: .I "lexical analyzer" ) ! 18: to pick up the basic items ! 19: (called ! 20: .I tokens ) ! 21: from the input stream. ! 22: These tokens are organized according to the input structure rules, ! 23: called ! 24: .I "grammar rules" \|; ! 25: when one of these rules has been recognized, ! 26: then user code supplied for this rule, an ! 27: .I action , ! 28: is invoked; actions have the ability to return values and ! 29: make use of the values of other actions. ! 30: .PP ! 31: Yacc is written in a portable dialect of C ! 32: .[ ! 33: Ritchie Kernighan Language Prentice ! 34: .] ! 35: and the actions, and output subroutine, are in C as well. ! 36: Moreover, many of the syntactic conventions of Yacc follow C. ! 37: .PP ! 38: The heart of the input specification is a collection of grammar rules. ! 39: Each rule describes an allowable structure and gives it a name. ! 40: For example, one grammar rule might be ! 41: .DS ! 42: date : month\_name day \',\' year ; ! 43: .DE ! 44: Here, ! 45: .I date , ! 46: .I month\_name , ! 47: .I day , ! 48: and ! 49: .I year ! 50: represent structures of interest in the input process; ! 51: presumably, ! 52: .I month\_name , ! 53: .I day , ! 54: and ! 55: .I year ! 56: are defined elsewhere. ! 57: The comma ``,'' is enclosed in single quotes; this implies that the ! 58: comma is to appear literally in the input. ! 59: The colon and semicolon merely serve as punctuation in the rule, and have ! 60: no significance in controlling the input. ! 61: Thus, with proper definitions, the input ! 62: .DS ! 63: July 4, 1776 ! 64: .DE ! 65: might be matched by the above rule. ! 66: .PP ! 67: An important part of the input process is carried out by the ! 68: lexical analyzer. ! 69: This user routine reads the input stream, recognizing the lower level structures, ! 70: and communicates these tokens ! 71: to the parser. ! 72: For historical reasons, a structure recognized by the lexical analyzer is called a ! 73: .I "terminal symbol" , ! 74: while the structure recognized by the parser is called a ! 75: .I "nonterminal symbol" . ! 76: To avoid confusion, terminal symbols will usually be referred to as ! 77: .I tokens . ! 78: .PP ! 79: There is considerable leeway in deciding whether to recognize structures using the lexical ! 80: analyzer or grammar rules. ! 81: For example, the rules ! 82: .DS ! 83: month\_name : \'J\' \'a\' \'n\' ; ! 84: month\_name : \'F\' \'e\' \'b\' ; ! 85: ! 86: . . . ! 87: ! 88: month\_name : \'D\' \'e\' \'c\' ; ! 89: .DE ! 90: might be used in the above example. ! 91: The lexical analyzer would only need to recognize individual letters, and ! 92: .I month\_name ! 93: would be a nonterminal symbol. ! 94: Such low-level rules tend to waste time and space, and may ! 95: complicate the specification beyond Yacc's ability to deal with it. ! 96: Usually, the lexical analyzer would ! 97: recognize the month names, ! 98: and return an indication that a ! 99: .I month\_name ! 100: was seen; in this case, ! 101: .I month\_name ! 102: would be a token. ! 103: .PP ! 104: Literal characters such as ``,'' must also be passed through the lexical ! 105: analyzer, and are also considered tokens. ! 106: .PP ! 107: Specification files are very flexible. ! 108: It is realively easy to add to the above example the rule ! 109: .DS ! 110: date : month \'/\' day \'/\' year ; ! 111: .DE ! 112: allowing ! 113: .DS ! 114: 7 / 4 / 1776 ! 115: .DE ! 116: as a synonym for ! 117: .DS ! 118: July 4, 1776 ! 119: .DE ! 120: In most cases, this new rule could be ``slipped in'' to a working system with minimal effort, ! 121: and little danger of disrupting existing input. ! 122: .PP ! 123: The input being read may not conform to the ! 124: specifications. ! 125: These input errors are detected as early as is theoretically possible with a ! 126: left-to-right scan; ! 127: thus, not only is the chance of reading and computing with bad ! 128: input data substantially reduced, but the bad data can usually be quickly found. ! 129: Error handling, ! 130: provided as part of the input specifications, ! 131: permits the reentry of bad data, ! 132: or the continuation of the input process after skipping over the bad data. ! 133: .PP ! 134: In some cases, Yacc fails to produce a parser when given a set of ! 135: specifications. ! 136: For example, the specifications may be self contradictory, or they may ! 137: require a more powerful recognition mechanism than that available to Yacc. ! 138: The former cases represent design errors; ! 139: the latter cases ! 140: can often be corrected ! 141: by making ! 142: the lexical analyzer ! 143: more powerful, or by rewriting some of the grammar rules. ! 144: While Yacc cannot handle all possible specifications, its power ! 145: compares favorably with similar systems; ! 146: moreover, the ! 147: constructions which are difficult for Yacc to handle are ! 148: also frequently difficult for human beings to handle. ! 149: Some users have reported that the discipline of formulating valid ! 150: Yacc specifications for their input revealed errors of ! 151: conception or design early in the program development. ! 152: .PP ! 153: The theory underlying Yacc has been described elsewhere. ! 154: .[ ! 155: Aho Johnson Surveys LR Parsing ! 156: .] ! 157: .[ ! 158: Aho Johnson Ullman Ambiguous Grammars ! 159: .] ! 160: .[ ! 161: Aho Ullman Principles Compiler Design ! 162: .] ! 163: Yacc has been extensively used in numerous practical applications, ! 164: including ! 165: .I lint , ! 166: .[ ! 167: Johnson Lint ! 168: .] ! 169: the Portable C Compiler, ! 170: .[ ! 171: Johnson Portable Compiler Theory ! 172: .] ! 173: and a system for typesetting mathematics. ! 174: .[ ! 175: Kernighan Cherry typesetting system CACM ! 176: .] ! 177: .PP ! 178: The next several sections describe the ! 179: basic process of preparing a Yacc specification; ! 180: Section 1 describes the preparation of grammar rules, ! 181: Section 2 the preparation of the user supplied actions associated with these rules, ! 182: and Section 3 the preparation of lexical analyzers. ! 183: Section 4 describes the operation of the parser. ! 184: Section 5 discusses various reasons why Yacc may be unable to produce a ! 185: parser from a specification, and what to do about it. ! 186: Section 6 describes a simple mechanism for ! 187: handling operator precedences in arithmetic expressions. ! 188: Section 7 discusses error detection and recovery. ! 189: Section 8 discusses the operating environment and special features ! 190: of the parsers Yacc produces. ! 191: Section 9 gives some suggestions which should improve the ! 192: style and efficiency of the specifications. ! 193: Section 10 discusses some advanced topics, and Section 11 gives ! 194: acknowledgements. ! 195: Appendix A has a brief example, and Appendix B gives a ! 196: summary of the Yacc input syntax. ! 197: Appendix C gives an example using some of the more advanced ! 198: features of Yacc, and, finally, ! 199: Appendix D describes mechanisms and syntax ! 200: no longer actively supported, but ! 201: provided for historical continuity with older versions of Yacc.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.