Annotation of 43BSDReno/share/doc/ps1/15.yacc/ss3, revision 1.1

1.1     ! root        1: .\"    @(#)ss3 6.1 (Berkeley) 5/8/86
        !             2: .\"
        !             3: .SH
        !             4: 3: Lexical Analysis
        !             5: .PP
        !             6: The user must supply a lexical analyzer to read the input stream and communicate tokens
        !             7: (with values, if desired) to the parser.
        !             8: The lexical analyzer is an integer-valued function called
        !             9: .I yylex .
        !            10: The function returns an integer, the
        !            11: .I "token number" ,
        !            12: representing the kind of token read.
        !            13: If there is a value associated with that token, it should be assigned
        !            14: to the external variable
        !            15: .I yylval .
        !            16: .PP
        !            17: The parser and the lexical analyzer must agree on these token numbers in order for
        !            18: communication between them to take place.
        !            19: The numbers may be chosen by Yacc, or chosen by the user.
        !            20: In either case, the ``# define'' mechanism of C is used to allow the lexical analyzer
        !            21: to return these numbers symbolically.
        !            22: For example, suppose that the token name DIGIT has been defined in the declarations section of the
        !            23: Yacc specification file.
        !            24: The relevant portion of the lexical analyzer might look like:
        !            25: .DS
        !            26: yylex(){
        !            27:        extern int yylval;
        !            28:        int c;
        !            29:        . . .
        !            30:        c = getchar();
        !            31:        . . .
        !            32:        switch( c ) {
        !            33:                . . .
        !            34:        case \'0\':
        !            35:        case \'1\':
        !            36:          . . .
        !            37:        case \'9\':
        !            38:                yylval = c\-\'0\';
        !            39:                return( DIGIT );
        !            40:                . . .
        !            41:                }
        !            42:        . . .
        !            43: .DE
        !            44: .PP
        !            45: The intent is to return a token number of DIGIT, and a value equal to the numerical value of the
        !            46: digit.
        !            47: Provided that the lexical analyzer code is placed in the programs section of the specification file,
        !            48: the identifier DIGIT will be defined as the token number associated
        !            49: with the token DIGIT.
        !            50: .PP
        !            51: This mechanism leads to clear,
        !            52: easily modified lexical analyzers; the only pitfall is the need
        !            53: to avoid using any token names in the grammar that are reserved
        !            54: or significant in C or the parser; for example, the use of
        !            55: token names
        !            56: .I if
        !            57: or
        !            58: .I while
        !            59: will almost certainly cause severe
        !            60: difficulties when the lexical analyzer is compiled.
        !            61: The token name
        !            62: .I error
        !            63: is reserved for error handling, and should not be used naively
        !            64: (see Section 7).
        !            65: .PP
        !            66: As mentioned above, the token numbers may be chosen by Yacc or by the user.
        !            67: In the default situation, the numbers are chosen by Yacc.
        !            68: The default token number for a literal
        !            69: character is the numerical value of the character in the local character set.
        !            70: Other names are assigned token numbers
        !            71: starting at 257.
        !            72: .PP
        !            73: To assign a token number to a token (including literals),
        !            74: the first appearance of the token name or literal
        !            75: .I
        !            76: in the declarations section
        !            77: .R
        !            78: can be immediately followed by
        !            79: a nonnegative integer.
        !            80: This integer is taken to be the token number of the name or literal.
        !            81: Names and literals not defined by this mechanism retain their default definition.
        !            82: It is important that all token numbers be distinct.
        !            83: .PP
        !            84: For historical reasons, the endmarker must have token
        !            85: number 0 or negative.
        !            86: This token number cannot be redefined by the user; thus, all
        !            87: lexical analyzers should be prepared to return 0 or negative as a token number
        !            88: upon reaching the end of their input.
        !            89: .PP
        !            90: A very useful tool for constructing lexical analyzers is
        !            91: the
        !            92: .I Lex
        !            93: program developed by Mike Lesk.
        !            94: .[
        !            95: Lesk Lex
        !            96: .]
        !            97: These lexical analyzers are designed to work in close
        !            98: harmony with Yacc parsers.
        !            99: The specifications for these lexical analyzers
        !           100: use regular expressions instead of grammar rules.
        !           101: Lex can be easily used to produce quite complicated lexical analyzers,
        !           102: but there remain some languages (such as FORTRAN) which do not
        !           103: fit any theoretical framework, and whose lexical analyzers
        !           104: must be crafted by hand.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.