Annotation of 43BSDReno/share/doc/ps1/15.yacc/ss3, revision 1.1.1.1

1.1       root        1: .\"    @(#)ss3 6.1 (Berkeley) 5/8/86
                      2: .\"
                      3: .SH
                      4: 3: Lexical Analysis
                      5: .PP
                      6: The user must supply a lexical analyzer to read the input stream and communicate tokens
                      7: (with values, if desired) to the parser.
                      8: The lexical analyzer is an integer-valued function called
                      9: .I yylex .
                     10: The function returns an integer, the
                     11: .I "token number" ,
                     12: representing the kind of token read.
                     13: If there is a value associated with that token, it should be assigned
                     14: to the external variable
                     15: .I yylval .
                     16: .PP
                     17: The parser and the lexical analyzer must agree on these token numbers in order for
                     18: communication between them to take place.
                     19: The numbers may be chosen by Yacc, or chosen by the user.
                     20: In either case, the ``# define'' mechanism of C is used to allow the lexical analyzer
                     21: to return these numbers symbolically.
                     22: For example, suppose that the token name DIGIT has been defined in the declarations section of the
                     23: Yacc specification file.
                     24: The relevant portion of the lexical analyzer might look like:
                     25: .DS
                     26: yylex(){
                     27:        extern int yylval;
                     28:        int c;
                     29:        . . .
                     30:        c = getchar();
                     31:        . . .
                     32:        switch( c ) {
                     33:                . . .
                     34:        case \'0\':
                     35:        case \'1\':
                     36:          . . .
                     37:        case \'9\':
                     38:                yylval = c\-\'0\';
                     39:                return( DIGIT );
                     40:                . . .
                     41:                }
                     42:        . . .
                     43: .DE
                     44: .PP
                     45: The intent is to return a token number of DIGIT, and a value equal to the numerical value of the
                     46: digit.
                     47: Provided that the lexical analyzer code is placed in the programs section of the specification file,
                     48: the identifier DIGIT will be defined as the token number associated
                     49: with the token DIGIT.
                     50: .PP
                     51: This mechanism leads to clear,
                     52: easily modified lexical analyzers; the only pitfall is the need
                     53: to avoid using any token names in the grammar that are reserved
                     54: or significant in C or the parser; for example, the use of
                     55: token names
                     56: .I if
                     57: or
                     58: .I while
                     59: will almost certainly cause severe
                     60: difficulties when the lexical analyzer is compiled.
                     61: The token name
                     62: .I error
                     63: is reserved for error handling, and should not be used naively
                     64: (see Section 7).
                     65: .PP
                     66: As mentioned above, the token numbers may be chosen by Yacc or by the user.
                     67: In the default situation, the numbers are chosen by Yacc.
                     68: The default token number for a literal
                     69: character is the numerical value of the character in the local character set.
                     70: Other names are assigned token numbers
                     71: starting at 257.
                     72: .PP
                     73: To assign a token number to a token (including literals),
                     74: the first appearance of the token name or literal
                     75: .I
                     76: in the declarations section
                     77: .R
                     78: can be immediately followed by
                     79: a nonnegative integer.
                     80: This integer is taken to be the token number of the name or literal.
                     81: Names and literals not defined by this mechanism retain their default definition.
                     82: It is important that all token numbers be distinct.
                     83: .PP
                     84: For historical reasons, the endmarker must have token
                     85: number 0 or negative.
                     86: This token number cannot be redefined by the user; thus, all
                     87: lexical analyzers should be prepared to return 0 or negative as a token number
                     88: upon reaching the end of their input.
                     89: .PP
                     90: A very useful tool for constructing lexical analyzers is
                     91: the
                     92: .I Lex
                     93: program developed by Mike Lesk.
                     94: .[
                     95: Lesk Lex
                     96: .]
                     97: These lexical analyzers are designed to work in close
                     98: harmony with Yacc parsers.
                     99: The specifications for these lexical analyzers
                    100: use regular expressions instead of grammar rules.
                    101: Lex can be easily used to produce quite complicated lexical analyzers,
                    102: but there remain some languages (such as FORTRAN) which do not
                    103: fit any theoretical framework, and whose lexical analyzers
                    104: must be crafted by hand.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.