|
|
1.1 ! root 1: .\" @(#)ss3 6.1 (Berkeley) 5/8/86 ! 2: .\" ! 3: .SH ! 4: 3: Lexical Analysis ! 5: .PP ! 6: The user must supply a lexical analyzer to read the input stream and communicate tokens ! 7: (with values, if desired) to the parser. ! 8: The lexical analyzer is an integer-valued function called ! 9: .I yylex . ! 10: The function returns an integer, the ! 11: .I "token number" , ! 12: representing the kind of token read. ! 13: If there is a value associated with that token, it should be assigned ! 14: to the external variable ! 15: .I yylval . ! 16: .PP ! 17: The parser and the lexical analyzer must agree on these token numbers in order for ! 18: communication between them to take place. ! 19: The numbers may be chosen by Yacc, or chosen by the user. ! 20: In either case, the ``# define'' mechanism of C is used to allow the lexical analyzer ! 21: to return these numbers symbolically. ! 22: For example, suppose that the token name DIGIT has been defined in the declarations section of the ! 23: Yacc specification file. ! 24: The relevant portion of the lexical analyzer might look like: ! 25: .DS ! 26: yylex(){ ! 27: extern int yylval; ! 28: int c; ! 29: . . . ! 30: c = getchar(); ! 31: . . . ! 32: switch( c ) { ! 33: . . . ! 34: case \'0\': ! 35: case \'1\': ! 36: . . . ! 37: case \'9\': ! 38: yylval = c\-\'0\'; ! 39: return( DIGIT ); ! 40: . . . ! 41: } ! 42: . . . ! 43: .DE ! 44: .PP ! 45: The intent is to return a token number of DIGIT, and a value equal to the numerical value of the ! 46: digit. ! 47: Provided that the lexical analyzer code is placed in the programs section of the specification file, ! 48: the identifier DIGIT will be defined as the token number associated ! 49: with the token DIGIT. ! 50: .PP ! 51: This mechanism leads to clear, ! 52: easily modified lexical analyzers; the only pitfall is the need ! 53: to avoid using any token names in the grammar that are reserved ! 54: or significant in C or the parser; for example, the use of ! 55: token names ! 56: .I if ! 57: or ! 58: .I while ! 59: will almost certainly cause severe ! 60: difficulties when the lexical analyzer is compiled. ! 61: The token name ! 62: .I error ! 63: is reserved for error handling, and should not be used naively ! 64: (see Section 7). ! 65: .PP ! 66: As mentioned above, the token numbers may be chosen by Yacc or by the user. ! 67: In the default situation, the numbers are chosen by Yacc. ! 68: The default token number for a literal ! 69: character is the numerical value of the character in the local character set. ! 70: Other names are assigned token numbers ! 71: starting at 257. ! 72: .PP ! 73: To assign a token number to a token (including literals), ! 74: the first appearance of the token name or literal ! 75: .I ! 76: in the declarations section ! 77: .R ! 78: can be immediately followed by ! 79: a nonnegative integer. ! 80: This integer is taken to be the token number of the name or literal. ! 81: Names and literals not defined by this mechanism retain their default definition. ! 82: It is important that all token numbers be distinct. ! 83: .PP ! 84: For historical reasons, the endmarker must have token ! 85: number 0 or negative. ! 86: This token number cannot be redefined by the user; thus, all ! 87: lexical analyzers should be prepared to return 0 or negative as a token number ! 88: upon reaching the end of their input. ! 89: .PP ! 90: A very useful tool for constructing lexical analyzers is ! 91: the ! 92: .I Lex ! 93: program developed by Mike Lesk. ! 94: .[ ! 95: Lesk Lex ! 96: .] ! 97: These lexical analyzers are designed to work in close ! 98: harmony with Yacc parsers. ! 99: The specifications for these lexical analyzers ! 100: use regular expressions instead of grammar rules. ! 101: Lex can be easily used to produce quite complicated lexical analyzers, ! 102: but there remain some languages (such as FORTRAN) which do not ! 103: fit any theoretical framework, and whose lexical analyzers ! 104: must be crafted by hand.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.