43BSDReno/pgrm/lex/lex.1 - annotate

Return to lex.1 CVS log
Up to [CSRG BSD Unix] / 43BSDReno / pgrm / lex
Annotation of 43BSDReno/pgrm/lex/lex.1, revision 1.1.1.1

1.1       root        1: .\" Copyright (c) 1990 The Regents of the University of California.
                      2: .\" All rights reserved.
                      3: .\"
                      4: .\" Redistribution and use in source and binary forms are permitted provided
                      5: .\" that: (1) source distributions retain this entire copyright notice and
                      6: .\" comment, and (2) distributions including binaries display the following
                      7: .\" acknowledgement:  ``This product includes software developed by the
                      8: .\" University of California, Berkeley and its contributors'' in the
                      9: .\" documentation or other materials provided with the distribution and in
                     10: .\" all advertising materials mentioning features or use of this software.
                     11: .\" Neither the name of the University nor the names of its contributors may
                     12: .\" be used to endorse or promote products derived from this software without
                     13: .\" specific prior written permission.
                     14: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
                     15: .\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
                     16: .\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
                     17: .\"
                     18: .\"     @(#)lex.1      5.10 (Berkeley) 7/24/90
                     19: .\"
                     20: .Dd July 24, 1990
                     21: .Dt LEX 1
                     22: .Sh NAME
                     23: .Nm lex
                     24: .Nd fast lexical analyzer generator
                     25: .Sh SYNOPSIS
                     26: .Nm lex
                     27: .Ob
                     28: .Op Fl bcdfinpstvFILT8
                     29: .Cx Fl C
                     30: .Op efmF
                     31: .Cx
                     32: .Cx Fl S
                     33: .Ar skeleton
                     34: .Cx
                     35: .Oe
                     36: .Nm lex
                     37: .Ar
                     38: .Sh DESCRIPTION
                     39: .Nm Lex
                     40: is a tool for generating
                     41: .Ar scanners :
                     42: programs which recognized lexical patterns in text.
                     43: .Nm Lex
                     44: reads
                     45: the given input files, or its standard input if no file names are given,
                     46: for a description of a scanner to generate.  The description is in
                     47: the form of pairs
                     48: of regular expressions and C code, called
                     49: .Em rules .
                     50: .Nm Lex
                     51: generates as output a C source file,
                     52: .Pa lex.yy.c ,
                     53: which defines a routine
                     54: .Fn yylex .
                     55: This file is compiled and linked with the
                     56: .Fl lfl
                     57: library to produce an executable.  When the executable is run,
                     58: it analyzes its input for occurrences
                     59: of the regular expressions.  Whenever it finds one, it executes
                     60: the corresponding C code.
                     61: .Pp
                     62: For full documentation, see
                     63: .Em Lexdoc .
                     64: This manual entry is intended for use as a quick reference.
                     65: .Sh OPTIONS
                     66: .Nm Lex
                     67: has the following options:
                     68: .Tw Ds
                     69: .Tp Fl b
                     70: Generate backtracking information to
                     71: .Va lex.backtrack .
                     72: This is a list of scanner states which require backtracking
                     73: and the input characters on which they do so.  By adding rules one
                     74: can remove backtracking states.  If all backtracking states
                     75: are eliminated and
                     76: .Fl f
                     77: or
                     78: .Fl F
                     79: is used, the generated scanner will run faster.
                     80: .Tp Fl c
                     81: is a do-nothing, deprecated option included for POSIX compliance.
                     82: .Pp
                     83: .Ar NOTE :
                     84: in previous releases of
                     85: .Nm Lex
                     86: .Op Fl c
                     87: specified table-compression options.  This functionality is
                     88: now given by the
                     89: .Fl C
                     90: flag.  To ease the the impact of this change, when
                     91: .Nm lex
                     92: encounters
                     93: .Fl c,
                     94: it currently issues a warning message and assumes that
                     95: .Fl C
                     96: was desired instead.  In the future this "promotion" of
                     97: .Fl c
                     98: to
                     99: .Fl C
                    100: will go away in the name of full POSIX compliance (unless
                    101: the POSIX meaning is removed first).
                    102: .Tp Fl d
                    103: makes the generated scanner run in
                    104: .Ar debug
                    105: mode.  Whenever a pattern is recognized and the global
                    106: .Va yy_Lex_debug
                    107: is non-zero (which is the default), the scanner will
                    108: write to
                    109: .Li stderr
                    110: a line of the form:
                    111: .Pp
                    112: .Dl --accepting rule at line 53 ("the matched text")
                    113: .Pp
                    114: The line number refers to the location of the rule in the file
                    115: defining the scanner (i.e., the file that was fed to lex).  Messages
                    116: are also generated when the scanner backtracks, accepts the
                    117: default rule, reaches the end of its input buffer (or encounters
                    118: a NUL; the two look the same as far as the scanner's concerned),
                    119: or reaches an end-of-file.
                    120: .Tp Fl f
                    121: specifies (take your pick)
                    122: .Em full table
                    123: or
                    124: .Em fast scanner .
                    125: No table compression is done.  The result is large but fast.
                    126: This option is equivalent to
                    127: .Fl Cf
                    128: (see below).
                    129: .Tp Fl i
                    130: instructs
                    131: .Nm lex
                    132: to generate a
                    133: .Em case-insensitive
                    134: scanner.  The case of letters given in the
                    135: .Nm lex
                    136: input patterns will
                    137: be ignored, and tokens in the input will be matched regardless of case.  The
                    138: matched text given in
                    139: .Va yytext
                    140: will have the preserved case (i.e., it will not be folded).
                    141: .Tp Fl n
                    142: is another do-nothing, deprecated option included only for
                    143: POSIX compliance.
                    144: .Tp Fl p
                    145: generates a performance report to stderr.  The report
                    146: consists of comments regarding features of the
                    147: .Nm lex
                    148: input file which will cause a loss of performance in the resulting scanner.
                    149: .Tp Fl s
                    150: causes the
                    151: .Ar default rule
                    152: (that unmatched scanner input is echoed to
                    153: .Ar stdout )
                    154: to be suppressed.  If the scanner encounters input that does not
                    155: match any of its rules, it aborts with an error.
                    156: .Tp Fl t
                    157: instructs
                    158: .Nm lex
                    159: to write the scanner it generates to standard output instead
                    160: of
                    161: .Pa lex.yy.c .
                    162: .Tp Fl v
                    163: specifies that
                    164: .Nm lex
                    165: should write to
                    166: .Li stderr
                    167: a summary of statistics regarding the scanner it generates.
                    168: .Tp Fl F
                    169: specifies that the
                    170: .Em fast
                    171: scanner table representation should be used.  This representation is
                    172: about as fast as the full table representation
                    173: .Pq Fl f ,
                    174: and for some sets of patterns will be considerably smaller (and for
                    175: others, larger).  See
                    176: .Em Lexdoc
                    177: for details.
                    178: .Pp
                    179: This option is equivalent to
                    180: .Fl CF
                    181: (see below).
                    182: .Tp Fl I
                    183: instructs
                    184: .Nm lex
                    185: to generate an
                    186: .Em interactive
                    187: scanner, that is, a scanner which stops immediately rather than
                    188: looking ahead if it knows
                    189: that the currently scanned text cannot be part of a longer rule's match.
                    190: Again, see
                    191: .Em Lexdoc
                    192: for details.
                    193: .Pp
                    194: Note,
                    195: .Fl I
                    196: cannot be used in conjunction with
                    197: .Em full
                    198: or
                    199: .Em fast tables ,
                    200: i.e., the
                    201: .Fl f , F , Cf ,
                    202: or
                    203: .Fl CF
                    204: flags.
                    205: .Tp Fl L
                    206: instructs
                    207: .Nm lex
                    208: not to generate
                    209: .Li #line
                    210: directives in
                    211: .Pa lex.yy.c .
                    212: The default is to generate such directives so error
                    213: messages in the actions will be correctly
                    214: located with respect to the original
                    215: .Nm lex
                    216: input file, and not to
                    217: the fairly meaningless line numbers of
                    218: .Pa lex.yy.c .
                    219: .Tp Fl T
                    220: makes
                    221: .Nm lex
                    222: run in
                    223: .Em trace
                    224: mode.  It will generate a lot of messages to
                    225: .Li stdout
                    226: concerning
                    227: the form of the input and the resultant non-deterministic and deterministic
                    228: finite automata.  This option is mostly for use in maintaining
                    229: .Nm lex .
                    230: .Tp Fl 8
                    231: instructs
                    232: .Nm lex
                    233: to generate an 8-bit scanner.
                    234: On some sites, this is the default.  On others, the default
                    235: is 7-bit characters.  To see which is the case, check the verbose
                    236: .Pq Fl v
                    237: output for "equivalence classes created".  If the denominator of
                    238: the number shown is 128, then by default
                    239: .Nm lex
                    240: is generating 7-bit characters.  If it is 256, then the default is
                    241: 8-bit characters.
                    242: .Tc Fl C
                    243: .Op Cm efmF
                    244: .Cx
                    245: controls the degree of table compression. The default setting is
                    246: .Fl Cem .
                    247: .Pp
                    248: .Tw Ds
                    249: .Tp Fl C
                    250: A lone
                    251: .Fl C
                    252: specifies that the scanner tables should be compressed but neither
                    253: equivalence classes nor meta-equivalence classes should be used.
                    254: .Tp Fl \&Ce
                    255: directs
                    256: .Nm lex
                    257: to construct
                    258: .Em equivalence classes ,
                    259: i.e., sets of characters
                    260: which have identical lexical properties.
                    261: Equivalence classes usually give
                    262: dramatic reductions in the final table/object file sizes (typically
                    263: a factor of 2-5) and are pretty cheap performance-wise (one array
                    264: look-up per character scanned).
                    265: .Tp Fl \&Cf
                    266: specifies that the
                    267: .Em full
                    268: scanner tables should be generated -
                    269: .Nm lex
                    270: should not compress the
                    271: tables by taking advantages of similar transition functions for
                    272: different states.
                    273: .Tp Fl \&CF
                    274: specifies that the alternate fast scanner representation (described in
                    275: .Em Lexdoc )
                    276: should be used.
                    277: .Tp Fl \&Cm
                    278: directs
                    279: .Nm lex
                    280: to construct
                    281: .Em meta-equivalence classes ,
                    282: which are sets of equivalence classes (or characters, if equivalence
                    283: classes are not being used) that are commonly used together.  Meta-equivalence
                    284: classes are often a big win when using compressed tables, but they
                    285: have a moderate performance impact (one or two "if" tests and one
                    286: array look-up per character scanned).
                    287: .Tp Fl Cem
                    288: (default)
                    289: Generate both equivalence classes
                    290: and meta-equivalence classes.  This setting provides the highest
                    291: degree of table compression.
                    292: .Tp
                    293: .Pp
                    294: Faster-executing scanners can be traded off at the cost of larger tables with
                    295: the following generally being true:
                    296: .Pp
                    297: .Ds C
                    298: slowest & smallest
                    299:       -Cem
                    300:       -Cm
                    301:       -Ce
                    302:       -C
                    303:       -C{f,F}e
                    304:       -C{f,F}
                    305: fastest & largest
                    306: .De
                    307: .Pp
                    308: .Fl C
                    309: options are not cumulative; whenever the flag is encountered, the
                    310: previous -C settings are forgotten.
                    311: .Pp
                    312: The options
                    313: .Fl \&Cf
                    314: or
                    315: .Fl \&CF
                    316: and
                    317: .Fl \&Cm
                    318: do not make sense together - there is no opportunity for meta-equivalence
                    319: classes if the table is not being compressed.  Otherwise the options
                    320: may be freely mixed.
                    321: .Tc Fl S
                    322: .Ar skeleton_file
                    323: .Cx
                    324: overrides the default skeleton file from which
                    325: .Nm lex
                    326: constructs its scanners.  Useful for
                    327: .Nm lex
                    328: maintenance or development.
                    329: .Sh SUMMARY OF Lex REGULAR EXPRESSIONS
                    330: The patterns in the input are written using an extended set of regular
                    331: expressions.  These are:
                    332: .Pp
                    333: .Dw 8n
                    334: .Di L
                    335: .Dp Li x
                    336: match the character 'x'
                    337: .Dp Li \&.
                    338: any character except newline
                    339: .Dp Op Li xyz
                    340: a "character class"; in this case, the pattern
                    341: matches either an 'x', a 'y', or a 'z'
                    342: .Dp Op Li abj-oZ
                    343: a "character class" with a range in it; matches
                    344: an 'a', a 'b', any letter from 'j' through 'o',
                    345: or a 'Z'
                    346: .Dp Op \&Li ^A-Z
                    347: a "negated character class", i.e., any character
                    348: but those in the class.  In this case, any
                    349: character EXCEPT an uppercase letter.
                    350: .Dp Op \&Li ^A-Z\en
                    351: any character EXCEPT an uppercase letter or
                    352: a newline
                    353: .Dp Li r*
                    354: zero or more r's, where r is any regular expression
                    355: .Dp Li r+
                    356: one or more r's
                    357: .Dp Li r?
                    358: zero or one r's (that is, "an optional r")
                    359: .Dp Li r{2,5}
                    360: anywhere from two to five r's
                    361: .Dp Li r{2,}
                    362: two or more r's
                    363: .Dp Li r{4}
                    364: exactly 4 r's
                    365: .Dp Li {name}
                    366: the expansion of the "name" definition
                    367: (see above)
                    368: .Dc Op Li xyz
                    369: .Li \&\e"foo"
                    370: .Cx
                    371: the literal string:
                    372: [xyz]"foo
                    373: .Dp Li \&\eX
                    374: if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
                    375: then the ANSI-C interpretation of \ex.
                    376: Otherwise, a literal 'X' (used to escape
                    377: operators such as '*')
                    378: .Dp Li \&\e123
                    379: the character with octal value 123
                    380: .Dp Li \&\ex2a
                    381: the character with hexadecimal value 2a
                    382: .Dp Li (r)
                    383: match an r; parentheses are used to override
                    384: precedence (see below)
                    385: .Dp Li rs
                    386: the regular expression r followed by the
                    387: regular expression s; called "concatenation"
                    388: .Dp Li rs
                    389: either an r or an s
                    390: .Dp Li r/s
                    391: an r but only if it is followed by an s.  The
                    392: s is not part of the matched text.  This type
                    393: of pattern is called as "trailing context".
                    394: .Dp Li \&^r
                    395: an r, but only at the beginning of a line
                    396: .Dp Li r$
                    397: an r, but only at the end of a line.  Equivalent
                    398: to "r/\en".
                    399: .Dp Li <s>r
                    400: an r, but only in start condition s (see
                    401: below for discussion of start conditions)
                    402: .Dp Li <s1,s2,s3>r
                    403: same, but in any of start conditions s1,
                    404: s2, or s3
                    405: .Dp Li <<EOF>>
                    406: an end-of-file
                    407: .Dp Li <s1,s2><<EOF>>
                    408: an end-of-file when in start condition s1 or s2
                    409: .Dp
                    410: The regular expressions listed above are grouped according to
                    411: precedence, from highest precedence at the top to lowest at the bottom.
                    412: Those grouped together have equal precedence.
                    413: .Pp
                    414: Some notes on patterns:
                    415: .Pp
                    416: Negated character classes
                    417: .Ar match newlines
                    418: unless "\en" (or an equivalent escape sequence) is one of the
                    419: characters explicitly present in the negated character class
                    420: (e.g., " [^A-Z\en] ").
                    421: .Pp
                    422: A rule can have at most one instance of trailing context (the '/' operator
                    423: or the '$' operator).  The start condition, '^', and "<<EOF>>" patterns
                    424: can only occur at the beginning of a pattern, and, as well as with '/' and '$',
                    425: cannot be grouped inside parentheses.  The following are all illegal:
                    426: .Pp
                    427: .Ds C
                    428: foo/bar$
                    429: foo(bar$)
                    430: foo^bar
                    431: <sc1>foo<sc2>bar
                    432: .De
                    433: .Sh SUMMARY OF SPECIAL ACTIONS
                    434: In addition to arbitrary C code, the following can appear in actions:
                    435: .Tw Fl
                    436: .Tp Ic ECHO
                    437: Copies
                    438: .Va yytext
                    439: to the scanner's output.
                    440: .Tp Ic BEGIN
                    441: Followed by the name of a start condition places the scanner in the
                    442: corresponding start condition.
                    443: .Tp Ic REJECT
                    444: Directs the scanner to proceed on to the "second best" rule which matched the
                    445: input (or a prefix of the input).
                    446: .Va yytext
                    447: and
                    448: .Va yyleng
                    449: are set up appropriately.  Note that
                    450: .Ic REJECT
                    451: is a particularly expensive feature in terms scanner performance;
                    452: if it is used in
                    453: .Em any
                    454: of the scanner's actions it will slow down
                    455: .Em all
                    456: of the scanner's matching.  Furthermore,
                    457: .Ic REJECT
                    458: cannot be used with the
                    459: .Fl f
                    460: or
                    461: .Fl F
                    462: options.
                    463: .Pp
                    464: Note also that unlike the other special actions,
                    465: .Ic REJECT
                    466: is a
                    467: .Em branch ;
                    468: code immediately following it in the action will
                    469: .Em not
                    470: be executed.
                    471: .Tp Fn yymore
                    472: tells the scanner that the next time it matches a rule, the corresponding
                    473: token should be
                    474: .Em appended
                    475: onto the current value of
                    476: .Va yytext
                    477: rather than replacing it.
                    478: .Tp Fn yyless \&n
                    479: returns all but the first
                    480: .Ar n
                    481: characters of the current token back to the input stream, where they
                    482: will be rescanned when the scanner looks for the next match.
                    483: .Va yytext
                    484: and
                    485: .Va yyleng
                    486: are adjusted appropriately (e.g.,
                    487: .Va yyleng
                    488: will now be equal to
                    489: .Ar n ) .
                    490: .Tp Fn unput c
                    491: puts the character
                    492: .Ar c
                    493: back onto the input stream.  It will be the next character scanned.
                    494: .Tp Fn input
                    495: reads the next character from the input stream (this routine is called
                    496: .Fn yyinput
                    497: if the scanner is compiled using
                    498: .Em C \&+\&+ ) .
                    499: .Tp Fn yyterminate
                    500: can be used in lieu of a return statement in an action.  It terminates
                    501: the scanner and returns a 0 to the scanner's caller, indicating "all done".
                    502: .Pp
                    503: By default,
                    504: .Fn yyterminate
                    505: is also called when an end-of-file is encountered.  It is a macro and
                    506: may be redefined.
                    507: .Tp Ic YY_NEW_FILE
                    508: is an action available only in <<EOF>> rules.  It means "Okay, I've
                    509: set up a new input file, continue scanning".
                    510: .Tp Fn yy_create_buffer file size
                    511: takes a
                    512: .Ic FILE
                    513: pointer and an integer
                    514: .Ar size .
                    515: It returns a YY_BUFFER_STATE
                    516: handle to a new input buffer large enough to accomodate
                    517: .Ar size
                    518: characters and associated with the given file.  When in doubt, use
                    519: .Ar YY_BUF_SIZE
                    520: for the size.
                    521: .Tp Fn yy_switch_to_buffer new_buffer
                    522: switches the scanner's processing to scan for tokens from
                    523: the given buffer, which must be a YY_BUFFER_STATE.
                    524: .Tp Fn yy_delete_buffer buffer
                    525: deletes the given buffer.
                    526: .Tp
                    527: .Sh \&VALUES\ AVAILABLE\ TO THE USER
                    528: .Tw Fl
                    529: .Tp Va \&char \&*yytext
                    530: holds the text of the current token.  It may not be modified.
                    531: .Tp Va \&int yyleng
                    532: holds the length of the current token.  It may not be modified.
                    533: .Tp Va FILE  \&*yyin
                    534: is the file which by default
                    535: .Nm lex
                    536: reads from.  It may be redefined but doing so only makes sense before
                    537: scanning begins.  Changing it in the middle of scanning will have
                    538: unexpected results since
                    539: .Nm lex
                    540: buffers its input.  Once scanning terminates because an end-of-file
                    541: has been seen,
                    542: .Fn void\ yyrestart FILE\ *new_file
                    543: may be called to point
                    544: .Va yyin
                    545: at the new input file.
                    546: .Tp Va FILE  \&*yyout
                    547: is the file to which
                    548: .Ar ECHO
                    549: actions are done.  It can be reassigned by the user.
                    550: .Tp Va YY_CURRENT_BUFFER
                    551: returns a
                    552: YY_BUFFER_STATE
                    553: handle to the current buffer.
                    554: .Tp
                    555: .Sh MACROS THE USER CAN REDEFINE
                    556: .Tw Fl
                    557: .Tp Va YY_DECL
                    558: controls how the scanning routine is declared.
                    559: By default, it is "int yylex()", or, if prototypes are being
                    560: used, "int yylex(void)".  This definition may be changed by redefining
                    561: the "YY_DECL" macro.  Note that
                    562: if you give arguments to the scanning routine using a
                    563: K&R-style/non-prototyped function declaration, you must terminate
                    564: the definition with a semi-colon (;).
                    565: .Tp Va YY_INPUT
                    566: The nature of how the scanner
                    567: gets its input can be controlled by redefining the
                    568: YY_INPUT
                    569: macro.
                    570: YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
                    571: action is to place up to
                    572: .Ar max _size
                    573: characters in the character array
                    574: .Ar buf
                    575: and return in the integer variable
                    576: .Ar result
                    577: either the
                    578: number of characters read or the constant YY_NULL (0 on Unix systems)
                    579: to indicate EOF.  The default YY_INPUT reads from the
                    580: global file-pointer "yyin".
                    581: A sample redefinition of YY_INPUT (in the definitions
                    582: section of the input file):
                    583: .Pp
                    584: .Ds I
                    585: %{
                    586: #undef YY_INPUT
                    587: #define YY_INPUT(buf,result,max_size) \\
                    588:     result = ((buf[0] = getchar()) == EOF) ? YY_NULL : 1;
                    589: %}
                    590: .De
                    591: .Tp Va YY_INPUT
                    592: When the scanner receives an end-of-file indication from YY_INPUT,
                    593: it then checks the
                    594: .Fn yywrap
                    595: function.  If
                    596: .Fn yywrap
                    597: returns false (zero), then it is assumed that the
                    598: function has gone ahead and set up
                    599: .Va yyin
                    600: to point to another input file, and scanning continues.  If it returns
                    601: true (non-zero), then the scanner terminates, returning 0 to its
                    602: caller.
                    603: .Tp Va yywrap
                    604: The default
                    605: .Fn yywrap
                    606: always returns 1.  Presently, to redefine it you must first
                    607: "#undef yywrap", as it is currently implemented as a macro.  It is
                    608: likely that
                    609: .Fn yywrap
                    610: will soon be defined to be a function rather than a macro.
                    611: .Tp Va YY_USER_ACTION
                    612: can be redefined to provide an action
                    613: which is always executed prior to the matched rule's action.
                    614: .Tp Va YY_USER_INIT
                    615: The macro
                    616: .Va YY _USER_INIT
                    617: may be redefined to provide an action which is always executed before
                    618: the first scan.
                    619: .Tp Va YY_BREAK
                    620: In the generated scanner, the actions are all gathered in one large
                    621: switch statement and separated using
                    622: .Va YY _BREAK ,
                    623: which may be redefined.  By default, it is simply a "break", to separate
                    624: each rule's action from the following rule's.
                    625: .Tp
                    626: .Sh FILES
                    627: .Dw lex.backtrack
                    628: .Di L
                    629: .Dp Pa lex.skel
                    630: skeleton scanner.
                    631: .Dp Pa lex.yy.c
                    632: generated scanner
                    633: (called
                    634: .Pa lexyy.c
                    635: on some systems).
                    636: .Dp Pa lex.backtrack
                    637: backtracking information for
                    638: .Fl b
                    639: .Dp Pa flag
                    640: (called
                    641: .Pa lex.bck
                    642: on some systems).
                    643: .Dp
                    644: .Sh SEE ALSO
                    645: .Xr lex 1 ,
                    646: .Xr yacc 1 ,
                    647: .Xr sed 1 ,
                    648: .Xr awk 1 .
                    649: .br
                    650: .Em lexdoc
                    651: .br
                    652: M.
                    653: E.
                    654: Lesk and E.
                    655: Schmidt,
                    656: .Em LEX \- Lexical Analyzer Generator
                    657: .Sh DIAGNOSTICS
                    658: .Tw Fl
                    659: .Tp Li reject_used_but_not_detected undefined
                    660: or
                    661: .Tp Li yymore_used_but_not_detected undefined
                    662: These errors can occur at compile time.
                    663: They indicate that the
                    664: scanner uses
                    665: .Ic REJECT
                    666: or
                    667: .Fn yymore
                    668: but that
                    669: .Nm lex
                    670: failed to notice the fact,
                    671: meaning that
                    672: .Nm lex
                    673: scanned the first two sections looking for occurrences of these actions
                    674: and failed to find any,
                    675: but somehow you snuck some in  via a #include
                    676: file,
                    677: for example .
                    678: Make an explicit reference to the action in your
                    679: .Nm lex
                    680: input file.
                    681: Note that previously
                    682: .Nm lex
                    683: supported a
                    684: .Li %used/%unused
                    685: mechanism for dealing with this problem;
                    686: this feature is still supported
                    687: but now deprecated,
                    688: and will go away soon unless the author hears from
                    689: people who can argue compellingly that they need it.
                    690: .Tp Li lex scanner jammed
                    691: a scanner compiled with
                    692: .Fl s
                    693: has encountered an input string which wasn't matched by
                    694: any of its rules.
                    695: .Tp Li lex input buffer overflowed
                    696: a scanner rule matched a string long enough to overflow the
                    697: scanner's internal input buffer  16K bytes - controlled by
                    698: .Va YY_BUF_MAX
                    699: in
                    700: .Pa lex.skel .
                    701: .Tp Li scanner requires  \&\-8 flag
                    702: Your scanner specification includes recognizing 8-bit characters and
                    703: you did not specify the -8 flag  and your site has not installed lex
                    704: with -8 as the default .
                    705: .Tp Li too many  \&%t classes!
                    706: You managed to put every single character into its own %t class.
                    707: .Nm Lex
                    708: requires that at least one of the classes share characters.
                    709: .Tp
                    710: .Sh HISTORY
                    711: A
                    712: .Nm lex
                    713: appeared in Version 6 AT&T Unix.
                    714: The version this man page describes is
                    715: derived from code contributed by Vern Paxson.
                    716: .Sh AUTHOR
                    717: Vern Paxson, with the help of many ideas and much inspiration from
                    718: Van Jacobson.  Original version by Jef Poskanzer.
                    719: .Pp
                    720: See
                    721: .Em Lexdoc
                    722: for additional credits and the address to send comments to.
                    723: .Sh BUGS
                    724: .Pp
                    725: Some trailing context
                    726: patterns cannot be properly matched and generate
                    727: warning messages ("Dangerous trailing context").  These are
                    728: patterns where the ending of the
                    729: first part of the rule matches the beginning of the second
                    730: part, such as "zx*/xy*", where the 'x*' matches the 'x' at
                    731: the beginning of the trailing context.  (Note that the POSIX draft
                    732: states that the text matched by such patterns is undefined.)
                    733: .Pp
                    734: For some trailing context rules, parts which are actually fixed-length are
                    735: not recognized as such, leading to the abovementioned performance loss.
                    736: In particular, parts using '\&|' or {n} (such as "foo{3}") are always
                    737: considered variable-length.
                    738: .Pp
                    739: Combining trailing context with the special '\&|' action can result in
                    740: .Em fixed
                    741: trailing context being turned into the more expensive
                    742: .Em variable
                    743: trailing context.  This happens in the following example:
                    744: .Pp
                    745: .Ds C
                    746: %%
                    747: abc  \&|
                    748: xyz/def
                    749: .De
                    750: .Pp
                    751: Use of
                    752: .Fn unput
                    753: invalidates yytext and yyleng.
                    754: .Pp
                    755: Use of
                    756: .Fn unput
                    757: to push back more text than was matched can
                    758: result in the pushed-back text matching a beginning-of-line ('^')
                    759: rule even though it didn't come at the beginning of the line
                    760: (though this is rare!).
                    761: .Pp
                    762: Pattern-matching of NUL's is substantially slower than matching other
                    763: characters.
                    764: .Pp
                    765: .Nm Lex
                    766: does not generate correct #line directives for code internal
                    767: to the scanner; thus, bugs in
                    768: .Pa lex.skel
                    769: yield bogus line numbers.
                    770: .Pp
                    771: Due to both buffering of input and read-ahead, you cannot intermix
                    772: calls to <stdio.h> routines, such as, for example,
                    773: .Fn getchar ,
                    774: with
                    775: .Nm lex
                    776: rules and expect it to work.  Call
                    777: .Fn input
                    778: instead.
                    779: .Pp
                    780: The total table entries listed by the
                    781: .Fl v
                    782: flag excludes the number of table entries needed to determine
                    783: what rule has been matched.  The number of entries is equal
                    784: to the number of DFA states if the scanner does not use
                    785: .Ic REJECT ,
                    786: and somewhat greater than the number of states if it does.
                    787: .Pp
                    788: .Ic REJECT
                    789: cannot be used with the
                    790: .Fl f
                    791: or
                    792: .Fl F
                    793: options.
                    794: .Pp
                    795: Some of the macros, such as
                    796: .Fn yywrap ,
                    797: may in the future become functions which live in the
                    798: .Fl lfl
                    799: library.  This will doubtless break a lot of code, but may be
                    800: required for POSIX-compliance.
                    801: .Pp
                    802: The
                    803: .Nm lex
                    804: internal algorithms need documentation.
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.