43BSDReno/pgrm/lex/lex.1 - annotate

Return to lex.1 CVS log
Up to [CSRG BSD Unix] / 43BSDReno / pgrm / lex
Annotation of 43BSDReno/pgrm/lex/lex.1, revision 1.1

1.1     ! root        1: .\" Copyright (c) 1990 The Regents of the University of California.
        !             2: .\" All rights reserved.
        !             3: .\"
        !             4: .\" Redistribution and use in source and binary forms are permitted provided
        !             5: .\" that: (1) source distributions retain this entire copyright notice and
        !             6: .\" comment, and (2) distributions including binaries display the following
        !             7: .\" acknowledgement:  ``This product includes software developed by the
        !             8: .\" University of California, Berkeley and its contributors'' in the
        !             9: .\" documentation or other materials provided with the distribution and in
        !            10: .\" all advertising materials mentioning features or use of this software.
        !            11: .\" Neither the name of the University nor the names of its contributors may
        !            12: .\" be used to endorse or promote products derived from this software without
        !            13: .\" specific prior written permission.
        !            14: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
        !            15: .\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
        !            16: .\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
        !            17: .\"
        !            18: .\"     @(#)lex.1      5.10 (Berkeley) 7/24/90
        !            19: .\"
        !            20: .Dd July 24, 1990
        !            21: .Dt LEX 1
        !            22: .Sh NAME
        !            23: .Nm lex
        !            24: .Nd fast lexical analyzer generator
        !            25: .Sh SYNOPSIS
        !            26: .Nm lex
        !            27: .Ob
        !            28: .Op Fl bcdfinpstvFILT8
        !            29: .Cx Fl C
        !            30: .Op efmF
        !            31: .Cx
        !            32: .Cx Fl S
        !            33: .Ar skeleton
        !            34: .Cx
        !            35: .Oe
        !            36: .Nm lex
        !            37: .Ar
        !            38: .Sh DESCRIPTION
        !            39: .Nm Lex
        !            40: is a tool for generating
        !            41: .Ar scanners :
        !            42: programs which recognized lexical patterns in text.
        !            43: .Nm Lex
        !            44: reads
        !            45: the given input files, or its standard input if no file names are given,
        !            46: for a description of a scanner to generate.  The description is in
        !            47: the form of pairs
        !            48: of regular expressions and C code, called
        !            49: .Em rules .
        !            50: .Nm Lex
        !            51: generates as output a C source file,
        !            52: .Pa lex.yy.c ,
        !            53: which defines a routine
        !            54: .Fn yylex .
        !            55: This file is compiled and linked with the
        !            56: .Fl lfl
        !            57: library to produce an executable.  When the executable is run,
        !            58: it analyzes its input for occurrences
        !            59: of the regular expressions.  Whenever it finds one, it executes
        !            60: the corresponding C code.
        !            61: .Pp
        !            62: For full documentation, see
        !            63: .Em Lexdoc .
        !            64: This manual entry is intended for use as a quick reference.
        !            65: .Sh OPTIONS
        !            66: .Nm Lex
        !            67: has the following options:
        !            68: .Tw Ds
        !            69: .Tp Fl b
        !            70: Generate backtracking information to
        !            71: .Va lex.backtrack .
        !            72: This is a list of scanner states which require backtracking
        !            73: and the input characters on which they do so.  By adding rules one
        !            74: can remove backtracking states.  If all backtracking states
        !            75: are eliminated and
        !            76: .Fl f
        !            77: or
        !            78: .Fl F
        !            79: is used, the generated scanner will run faster.
        !            80: .Tp Fl c
        !            81: is a do-nothing, deprecated option included for POSIX compliance.
        !            82: .Pp
        !            83: .Ar NOTE :
        !            84: in previous releases of
        !            85: .Nm Lex
        !            86: .Op Fl c
        !            87: specified table-compression options.  This functionality is
        !            88: now given by the
        !            89: .Fl C
        !            90: flag.  To ease the the impact of this change, when
        !            91: .Nm lex
        !            92: encounters
        !            93: .Fl c,
        !            94: it currently issues a warning message and assumes that
        !            95: .Fl C
        !            96: was desired instead.  In the future this "promotion" of
        !            97: .Fl c
        !            98: to
        !            99: .Fl C
        !           100: will go away in the name of full POSIX compliance (unless
        !           101: the POSIX meaning is removed first).
        !           102: .Tp Fl d
        !           103: makes the generated scanner run in
        !           104: .Ar debug
        !           105: mode.  Whenever a pattern is recognized and the global
        !           106: .Va yy_Lex_debug
        !           107: is non-zero (which is the default), the scanner will
        !           108: write to
        !           109: .Li stderr
        !           110: a line of the form:
        !           111: .Pp
        !           112: .Dl --accepting rule at line 53 ("the matched text")
        !           113: .Pp
        !           114: The line number refers to the location of the rule in the file
        !           115: defining the scanner (i.e., the file that was fed to lex).  Messages
        !           116: are also generated when the scanner backtracks, accepts the
        !           117: default rule, reaches the end of its input buffer (or encounters
        !           118: a NUL; the two look the same as far as the scanner's concerned),
        !           119: or reaches an end-of-file.
        !           120: .Tp Fl f
        !           121: specifies (take your pick)
        !           122: .Em full table
        !           123: or
        !           124: .Em fast scanner .
        !           125: No table compression is done.  The result is large but fast.
        !           126: This option is equivalent to
        !           127: .Fl Cf
        !           128: (see below).
        !           129: .Tp Fl i
        !           130: instructs
        !           131: .Nm lex
        !           132: to generate a
        !           133: .Em case-insensitive
        !           134: scanner.  The case of letters given in the
        !           135: .Nm lex
        !           136: input patterns will
        !           137: be ignored, and tokens in the input will be matched regardless of case.  The
        !           138: matched text given in
        !           139: .Va yytext
        !           140: will have the preserved case (i.e., it will not be folded).
        !           141: .Tp Fl n
        !           142: is another do-nothing, deprecated option included only for
        !           143: POSIX compliance.
        !           144: .Tp Fl p
        !           145: generates a performance report to stderr.  The report
        !           146: consists of comments regarding features of the
        !           147: .Nm lex
        !           148: input file which will cause a loss of performance in the resulting scanner.
        !           149: .Tp Fl s
        !           150: causes the
        !           151: .Ar default rule
        !           152: (that unmatched scanner input is echoed to
        !           153: .Ar stdout )
        !           154: to be suppressed.  If the scanner encounters input that does not
        !           155: match any of its rules, it aborts with an error.
        !           156: .Tp Fl t
        !           157: instructs
        !           158: .Nm lex
        !           159: to write the scanner it generates to standard output instead
        !           160: of
        !           161: .Pa lex.yy.c .
        !           162: .Tp Fl v
        !           163: specifies that
        !           164: .Nm lex
        !           165: should write to
        !           166: .Li stderr
        !           167: a summary of statistics regarding the scanner it generates.
        !           168: .Tp Fl F
        !           169: specifies that the
        !           170: .Em fast
        !           171: scanner table representation should be used.  This representation is
        !           172: about as fast as the full table representation
        !           173: .Pq Fl f ,
        !           174: and for some sets of patterns will be considerably smaller (and for
        !           175: others, larger).  See
        !           176: .Em Lexdoc
        !           177: for details.
        !           178: .Pp
        !           179: This option is equivalent to
        !           180: .Fl CF
        !           181: (see below).
        !           182: .Tp Fl I
        !           183: instructs
        !           184: .Nm lex
        !           185: to generate an
        !           186: .Em interactive
        !           187: scanner, that is, a scanner which stops immediately rather than
        !           188: looking ahead if it knows
        !           189: that the currently scanned text cannot be part of a longer rule's match.
        !           190: Again, see
        !           191: .Em Lexdoc
        !           192: for details.
        !           193: .Pp
        !           194: Note,
        !           195: .Fl I
        !           196: cannot be used in conjunction with
        !           197: .Em full
        !           198: or
        !           199: .Em fast tables ,
        !           200: i.e., the
        !           201: .Fl f , F , Cf ,
        !           202: or
        !           203: .Fl CF
        !           204: flags.
        !           205: .Tp Fl L
        !           206: instructs
        !           207: .Nm lex
        !           208: not to generate
        !           209: .Li #line
        !           210: directives in
        !           211: .Pa lex.yy.c .
        !           212: The default is to generate such directives so error
        !           213: messages in the actions will be correctly
        !           214: located with respect to the original
        !           215: .Nm lex
        !           216: input file, and not to
        !           217: the fairly meaningless line numbers of
        !           218: .Pa lex.yy.c .
        !           219: .Tp Fl T
        !           220: makes
        !           221: .Nm lex
        !           222: run in
        !           223: .Em trace
        !           224: mode.  It will generate a lot of messages to
        !           225: .Li stdout
        !           226: concerning
        !           227: the form of the input and the resultant non-deterministic and deterministic
        !           228: finite automata.  This option is mostly for use in maintaining
        !           229: .Nm lex .
        !           230: .Tp Fl 8
        !           231: instructs
        !           232: .Nm lex
        !           233: to generate an 8-bit scanner.
        !           234: On some sites, this is the default.  On others, the default
        !           235: is 7-bit characters.  To see which is the case, check the verbose
        !           236: .Pq Fl v
        !           237: output for "equivalence classes created".  If the denominator of
        !           238: the number shown is 128, then by default
        !           239: .Nm lex
        !           240: is generating 7-bit characters.  If it is 256, then the default is
        !           241: 8-bit characters.
        !           242: .Tc Fl C
        !           243: .Op Cm efmF
        !           244: .Cx
        !           245: controls the degree of table compression. The default setting is
        !           246: .Fl Cem .
        !           247: .Pp
        !           248: .Tw Ds
        !           249: .Tp Fl C
        !           250: A lone
        !           251: .Fl C
        !           252: specifies that the scanner tables should be compressed but neither
        !           253: equivalence classes nor meta-equivalence classes should be used.
        !           254: .Tp Fl \&Ce
        !           255: directs
        !           256: .Nm lex
        !           257: to construct
        !           258: .Em equivalence classes ,
        !           259: i.e., sets of characters
        !           260: which have identical lexical properties.
        !           261: Equivalence classes usually give
        !           262: dramatic reductions in the final table/object file sizes (typically
        !           263: a factor of 2-5) and are pretty cheap performance-wise (one array
        !           264: look-up per character scanned).
        !           265: .Tp Fl \&Cf
        !           266: specifies that the
        !           267: .Em full
        !           268: scanner tables should be generated -
        !           269: .Nm lex
        !           270: should not compress the
        !           271: tables by taking advantages of similar transition functions for
        !           272: different states.
        !           273: .Tp Fl \&CF
        !           274: specifies that the alternate fast scanner representation (described in
        !           275: .Em Lexdoc )
        !           276: should be used.
        !           277: .Tp Fl \&Cm
        !           278: directs
        !           279: .Nm lex
        !           280: to construct
        !           281: .Em meta-equivalence classes ,
        !           282: which are sets of equivalence classes (or characters, if equivalence
        !           283: classes are not being used) that are commonly used together.  Meta-equivalence
        !           284: classes are often a big win when using compressed tables, but they
        !           285: have a moderate performance impact (one or two "if" tests and one
        !           286: array look-up per character scanned).
        !           287: .Tp Fl Cem
        !           288: (default)
        !           289: Generate both equivalence classes
        !           290: and meta-equivalence classes.  This setting provides the highest
        !           291: degree of table compression.
        !           292: .Tp
        !           293: .Pp
        !           294: Faster-executing scanners can be traded off at the cost of larger tables with
        !           295: the following generally being true:
        !           296: .Pp
        !           297: .Ds C
        !           298: slowest & smallest
        !           299:       -Cem
        !           300:       -Cm
        !           301:       -Ce
        !           302:       -C
        !           303:       -C{f,F}e
        !           304:       -C{f,F}
        !           305: fastest & largest
        !           306: .De
        !           307: .Pp
        !           308: .Fl C
        !           309: options are not cumulative; whenever the flag is encountered, the
        !           310: previous -C settings are forgotten.
        !           311: .Pp
        !           312: The options
        !           313: .Fl \&Cf
        !           314: or
        !           315: .Fl \&CF
        !           316: and
        !           317: .Fl \&Cm
        !           318: do not make sense together - there is no opportunity for meta-equivalence
        !           319: classes if the table is not being compressed.  Otherwise the options
        !           320: may be freely mixed.
        !           321: .Tc Fl S
        !           322: .Ar skeleton_file
        !           323: .Cx
        !           324: overrides the default skeleton file from which
        !           325: .Nm lex
        !           326: constructs its scanners.  Useful for
        !           327: .Nm lex
        !           328: maintenance or development.
        !           329: .Sh SUMMARY OF Lex REGULAR EXPRESSIONS
        !           330: The patterns in the input are written using an extended set of regular
        !           331: expressions.  These are:
        !           332: .Pp
        !           333: .Dw 8n
        !           334: .Di L
        !           335: .Dp Li x
        !           336: match the character 'x'
        !           337: .Dp Li \&.
        !           338: any character except newline
        !           339: .Dp Op Li xyz
        !           340: a "character class"; in this case, the pattern
        !           341: matches either an 'x', a 'y', or a 'z'
        !           342: .Dp Op Li abj-oZ
        !           343: a "character class" with a range in it; matches
        !           344: an 'a', a 'b', any letter from 'j' through 'o',
        !           345: or a 'Z'
        !           346: .Dp Op \&Li ^A-Z
        !           347: a "negated character class", i.e., any character
        !           348: but those in the class.  In this case, any
        !           349: character EXCEPT an uppercase letter.
        !           350: .Dp Op \&Li ^A-Z\en
        !           351: any character EXCEPT an uppercase letter or
        !           352: a newline
        !           353: .Dp Li r*
        !           354: zero or more r's, where r is any regular expression
        !           355: .Dp Li r+
        !           356: one or more r's
        !           357: .Dp Li r?
        !           358: zero or one r's (that is, "an optional r")
        !           359: .Dp Li r{2,5}
        !           360: anywhere from two to five r's
        !           361: .Dp Li r{2,}
        !           362: two or more r's
        !           363: .Dp Li r{4}
        !           364: exactly 4 r's
        !           365: .Dp Li {name}
        !           366: the expansion of the "name" definition
        !           367: (see above)
        !           368: .Dc Op Li xyz
        !           369: .Li \&\e"foo"
        !           370: .Cx
        !           371: the literal string:
        !           372: [xyz]"foo
        !           373: .Dp Li \&\eX
        !           374: if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
        !           375: then the ANSI-C interpretation of \ex.
        !           376: Otherwise, a literal 'X' (used to escape
        !           377: operators such as '*')
        !           378: .Dp Li \&\e123
        !           379: the character with octal value 123
        !           380: .Dp Li \&\ex2a
        !           381: the character with hexadecimal value 2a
        !           382: .Dp Li (r)
        !           383: match an r; parentheses are used to override
        !           384: precedence (see below)
        !           385: .Dp Li rs
        !           386: the regular expression r followed by the
        !           387: regular expression s; called "concatenation"
        !           388: .Dp Li rs
        !           389: either an r or an s
        !           390: .Dp Li r/s
        !           391: an r but only if it is followed by an s.  The
        !           392: s is not part of the matched text.  This type
        !           393: of pattern is called as "trailing context".
        !           394: .Dp Li \&^r
        !           395: an r, but only at the beginning of a line
        !           396: .Dp Li r$
        !           397: an r, but only at the end of a line.  Equivalent
        !           398: to "r/\en".
        !           399: .Dp Li <s>r
        !           400: an r, but only in start condition s (see
        !           401: below for discussion of start conditions)
        !           402: .Dp Li <s1,s2,s3>r
        !           403: same, but in any of start conditions s1,
        !           404: s2, or s3
        !           405: .Dp Li <<EOF>>
        !           406: an end-of-file
        !           407: .Dp Li <s1,s2><<EOF>>
        !           408: an end-of-file when in start condition s1 or s2
        !           409: .Dp
        !           410: The regular expressions listed above are grouped according to
        !           411: precedence, from highest precedence at the top to lowest at the bottom.
        !           412: Those grouped together have equal precedence.
        !           413: .Pp
        !           414: Some notes on patterns:
        !           415: .Pp
        !           416: Negated character classes
        !           417: .Ar match newlines
        !           418: unless "\en" (or an equivalent escape sequence) is one of the
        !           419: characters explicitly present in the negated character class
        !           420: (e.g., " [^A-Z\en] ").
        !           421: .Pp
        !           422: A rule can have at most one instance of trailing context (the '/' operator
        !           423: or the '$' operator).  The start condition, '^', and "<<EOF>>" patterns
        !           424: can only occur at the beginning of a pattern, and, as well as with '/' and '$',
        !           425: cannot be grouped inside parentheses.  The following are all illegal:
        !           426: .Pp
        !           427: .Ds C
        !           428: foo/bar$
        !           429: foo(bar$)
        !           430: foo^bar
        !           431: <sc1>foo<sc2>bar
        !           432: .De
        !           433: .Sh SUMMARY OF SPECIAL ACTIONS
        !           434: In addition to arbitrary C code, the following can appear in actions:
        !           435: .Tw Fl
        !           436: .Tp Ic ECHO
        !           437: Copies
        !           438: .Va yytext
        !           439: to the scanner's output.
        !           440: .Tp Ic BEGIN
        !           441: Followed by the name of a start condition places the scanner in the
        !           442: corresponding start condition.
        !           443: .Tp Ic REJECT
        !           444: Directs the scanner to proceed on to the "second best" rule which matched the
        !           445: input (or a prefix of the input).
        !           446: .Va yytext
        !           447: and
        !           448: .Va yyleng
        !           449: are set up appropriately.  Note that
        !           450: .Ic REJECT
        !           451: is a particularly expensive feature in terms scanner performance;
        !           452: if it is used in
        !           453: .Em any
        !           454: of the scanner's actions it will slow down
        !           455: .Em all
        !           456: of the scanner's matching.  Furthermore,
        !           457: .Ic REJECT
        !           458: cannot be used with the
        !           459: .Fl f
        !           460: or
        !           461: .Fl F
        !           462: options.
        !           463: .Pp
        !           464: Note also that unlike the other special actions,
        !           465: .Ic REJECT
        !           466: is a
        !           467: .Em branch ;
        !           468: code immediately following it in the action will
        !           469: .Em not
        !           470: be executed.
        !           471: .Tp Fn yymore
        !           472: tells the scanner that the next time it matches a rule, the corresponding
        !           473: token should be
        !           474: .Em appended
        !           475: onto the current value of
        !           476: .Va yytext
        !           477: rather than replacing it.
        !           478: .Tp Fn yyless \&n
        !           479: returns all but the first
        !           480: .Ar n
        !           481: characters of the current token back to the input stream, where they
        !           482: will be rescanned when the scanner looks for the next match.
        !           483: .Va yytext
        !           484: and
        !           485: .Va yyleng
        !           486: are adjusted appropriately (e.g.,
        !           487: .Va yyleng
        !           488: will now be equal to
        !           489: .Ar n ) .
        !           490: .Tp Fn unput c
        !           491: puts the character
        !           492: .Ar c
        !           493: back onto the input stream.  It will be the next character scanned.
        !           494: .Tp Fn input
        !           495: reads the next character from the input stream (this routine is called
        !           496: .Fn yyinput
        !           497: if the scanner is compiled using
        !           498: .Em C \&+\&+ ) .
        !           499: .Tp Fn yyterminate
        !           500: can be used in lieu of a return statement in an action.  It terminates
        !           501: the scanner and returns a 0 to the scanner's caller, indicating "all done".
        !           502: .Pp
        !           503: By default,
        !           504: .Fn yyterminate
        !           505: is also called when an end-of-file is encountered.  It is a macro and
        !           506: may be redefined.
        !           507: .Tp Ic YY_NEW_FILE
        !           508: is an action available only in <<EOF>> rules.  It means "Okay, I've
        !           509: set up a new input file, continue scanning".
        !           510: .Tp Fn yy_create_buffer file size
        !           511: takes a
        !           512: .Ic FILE
        !           513: pointer and an integer
        !           514: .Ar size .
        !           515: It returns a YY_BUFFER_STATE
        !           516: handle to a new input buffer large enough to accomodate
        !           517: .Ar size
        !           518: characters and associated with the given file.  When in doubt, use
        !           519: .Ar YY_BUF_SIZE
        !           520: for the size.
        !           521: .Tp Fn yy_switch_to_buffer new_buffer
        !           522: switches the scanner's processing to scan for tokens from
        !           523: the given buffer, which must be a YY_BUFFER_STATE.
        !           524: .Tp Fn yy_delete_buffer buffer
        !           525: deletes the given buffer.
        !           526: .Tp
        !           527: .Sh \&VALUES\ AVAILABLE\ TO THE USER
        !           528: .Tw Fl
        !           529: .Tp Va \&char \&*yytext
        !           530: holds the text of the current token.  It may not be modified.
        !           531: .Tp Va \&int yyleng
        !           532: holds the length of the current token.  It may not be modified.
        !           533: .Tp Va FILE  \&*yyin
        !           534: is the file which by default
        !           535: .Nm lex
        !           536: reads from.  It may be redefined but doing so only makes sense before
        !           537: scanning begins.  Changing it in the middle of scanning will have
        !           538: unexpected results since
        !           539: .Nm lex
        !           540: buffers its input.  Once scanning terminates because an end-of-file
        !           541: has been seen,
        !           542: .Fn void\ yyrestart FILE\ *new_file
        !           543: may be called to point
        !           544: .Va yyin
        !           545: at the new input file.
        !           546: .Tp Va FILE  \&*yyout
        !           547: is the file to which
        !           548: .Ar ECHO
        !           549: actions are done.  It can be reassigned by the user.
        !           550: .Tp Va YY_CURRENT_BUFFER
        !           551: returns a
        !           552: YY_BUFFER_STATE
        !           553: handle to the current buffer.
        !           554: .Tp
        !           555: .Sh MACROS THE USER CAN REDEFINE
        !           556: .Tw Fl
        !           557: .Tp Va YY_DECL
        !           558: controls how the scanning routine is declared.
        !           559: By default, it is "int yylex()", or, if prototypes are being
        !           560: used, "int yylex(void)".  This definition may be changed by redefining
        !           561: the "YY_DECL" macro.  Note that
        !           562: if you give arguments to the scanning routine using a
        !           563: K&R-style/non-prototyped function declaration, you must terminate
        !           564: the definition with a semi-colon (;).
        !           565: .Tp Va YY_INPUT
        !           566: The nature of how the scanner
        !           567: gets its input can be controlled by redefining the
        !           568: YY_INPUT
        !           569: macro.
        !           570: YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
        !           571: action is to place up to
        !           572: .Ar max _size
        !           573: characters in the character array
        !           574: .Ar buf
        !           575: and return in the integer variable
        !           576: .Ar result
        !           577: either the
        !           578: number of characters read or the constant YY_NULL (0 on Unix systems)
        !           579: to indicate EOF.  The default YY_INPUT reads from the
        !           580: global file-pointer "yyin".
        !           581: A sample redefinition of YY_INPUT (in the definitions
        !           582: section of the input file):
        !           583: .Pp
        !           584: .Ds I
        !           585: %{
        !           586: #undef YY_INPUT
        !           587: #define YY_INPUT(buf,result,max_size) \\
        !           588:     result = ((buf[0] = getchar()) == EOF) ? YY_NULL : 1;
        !           589: %}
        !           590: .De
        !           591: .Tp Va YY_INPUT
        !           592: When the scanner receives an end-of-file indication from YY_INPUT,
        !           593: it then checks the
        !           594: .Fn yywrap
        !           595: function.  If
        !           596: .Fn yywrap
        !           597: returns false (zero), then it is assumed that the
        !           598: function has gone ahead and set up
        !           599: .Va yyin
        !           600: to point to another input file, and scanning continues.  If it returns
        !           601: true (non-zero), then the scanner terminates, returning 0 to its
        !           602: caller.
        !           603: .Tp Va yywrap
        !           604: The default
        !           605: .Fn yywrap
        !           606: always returns 1.  Presently, to redefine it you must first
        !           607: "#undef yywrap", as it is currently implemented as a macro.  It is
        !           608: likely that
        !           609: .Fn yywrap
        !           610: will soon be defined to be a function rather than a macro.
        !           611: .Tp Va YY_USER_ACTION
        !           612: can be redefined to provide an action
        !           613: which is always executed prior to the matched rule's action.
        !           614: .Tp Va YY_USER_INIT
        !           615: The macro
        !           616: .Va YY _USER_INIT
        !           617: may be redefined to provide an action which is always executed before
        !           618: the first scan.
        !           619: .Tp Va YY_BREAK
        !           620: In the generated scanner, the actions are all gathered in one large
        !           621: switch statement and separated using
        !           622: .Va YY _BREAK ,
        !           623: which may be redefined.  By default, it is simply a "break", to separate
        !           624: each rule's action from the following rule's.
        !           625: .Tp
        !           626: .Sh FILES
        !           627: .Dw lex.backtrack
        !           628: .Di L
        !           629: .Dp Pa lex.skel
        !           630: skeleton scanner.
        !           631: .Dp Pa lex.yy.c
        !           632: generated scanner
        !           633: (called
        !           634: .Pa lexyy.c
        !           635: on some systems).
        !           636: .Dp Pa lex.backtrack
        !           637: backtracking information for
        !           638: .Fl b
        !           639: .Dp Pa flag
        !           640: (called
        !           641: .Pa lex.bck
        !           642: on some systems).
        !           643: .Dp
        !           644: .Sh SEE ALSO
        !           645: .Xr lex 1 ,
        !           646: .Xr yacc 1 ,
        !           647: .Xr sed 1 ,
        !           648: .Xr awk 1 .
        !           649: .br
        !           650: .Em lexdoc
        !           651: .br
        !           652: M.
        !           653: E.
        !           654: Lesk and E.
        !           655: Schmidt,
        !           656: .Em LEX \- Lexical Analyzer Generator
        !           657: .Sh DIAGNOSTICS
        !           658: .Tw Fl
        !           659: .Tp Li reject_used_but_not_detected undefined
        !           660: or
        !           661: .Tp Li yymore_used_but_not_detected undefined
        !           662: These errors can occur at compile time.
        !           663: They indicate that the
        !           664: scanner uses
        !           665: .Ic REJECT
        !           666: or
        !           667: .Fn yymore
        !           668: but that
        !           669: .Nm lex
        !           670: failed to notice the fact,
        !           671: meaning that
        !           672: .Nm lex
        !           673: scanned the first two sections looking for occurrences of these actions
        !           674: and failed to find any,
        !           675: but somehow you snuck some in  via a #include
        !           676: file,
        !           677: for example .
        !           678: Make an explicit reference to the action in your
        !           679: .Nm lex
        !           680: input file.
        !           681: Note that previously
        !           682: .Nm lex
        !           683: supported a
        !           684: .Li %used/%unused
        !           685: mechanism for dealing with this problem;
        !           686: this feature is still supported
        !           687: but now deprecated,
        !           688: and will go away soon unless the author hears from
        !           689: people who can argue compellingly that they need it.
        !           690: .Tp Li lex scanner jammed
        !           691: a scanner compiled with
        !           692: .Fl s
        !           693: has encountered an input string which wasn't matched by
        !           694: any of its rules.
        !           695: .Tp Li lex input buffer overflowed
        !           696: a scanner rule matched a string long enough to overflow the
        !           697: scanner's internal input buffer  16K bytes - controlled by
        !           698: .Va YY_BUF_MAX
        !           699: in
        !           700: .Pa lex.skel .
        !           701: .Tp Li scanner requires  \&\-8 flag
        !           702: Your scanner specification includes recognizing 8-bit characters and
        !           703: you did not specify the -8 flag  and your site has not installed lex
        !           704: with -8 as the default .
        !           705: .Tp Li too many  \&%t classes!
        !           706: You managed to put every single character into its own %t class.
        !           707: .Nm Lex
        !           708: requires that at least one of the classes share characters.
        !           709: .Tp
        !           710: .Sh HISTORY
        !           711: A
        !           712: .Nm lex
        !           713: appeared in Version 6 AT&T Unix.
        !           714: The version this man page describes is
        !           715: derived from code contributed by Vern Paxson.
        !           716: .Sh AUTHOR
        !           717: Vern Paxson, with the help of many ideas and much inspiration from
        !           718: Van Jacobson.  Original version by Jef Poskanzer.
        !           719: .Pp
        !           720: See
        !           721: .Em Lexdoc
        !           722: for additional credits and the address to send comments to.
        !           723: .Sh BUGS
        !           724: .Pp
        !           725: Some trailing context
        !           726: patterns cannot be properly matched and generate
        !           727: warning messages ("Dangerous trailing context").  These are
        !           728: patterns where the ending of the
        !           729: first part of the rule matches the beginning of the second
        !           730: part, such as "zx*/xy*", where the 'x*' matches the 'x' at
        !           731: the beginning of the trailing context.  (Note that the POSIX draft
        !           732: states that the text matched by such patterns is undefined.)
        !           733: .Pp
        !           734: For some trailing context rules, parts which are actually fixed-length are
        !           735: not recognized as such, leading to the abovementioned performance loss.
        !           736: In particular, parts using '\&|' or {n} (such as "foo{3}") are always
        !           737: considered variable-length.
        !           738: .Pp
        !           739: Combining trailing context with the special '\&|' action can result in
        !           740: .Em fixed
        !           741: trailing context being turned into the more expensive
        !           742: .Em variable
        !           743: trailing context.  This happens in the following example:
        !           744: .Pp
        !           745: .Ds C
        !           746: %%
        !           747: abc  \&|
        !           748: xyz/def
        !           749: .De
        !           750: .Pp
        !           751: Use of
        !           752: .Fn unput
        !           753: invalidates yytext and yyleng.
        !           754: .Pp
        !           755: Use of
        !           756: .Fn unput
        !           757: to push back more text than was matched can
        !           758: result in the pushed-back text matching a beginning-of-line ('^')
        !           759: rule even though it didn't come at the beginning of the line
        !           760: (though this is rare!).
        !           761: .Pp
        !           762: Pattern-matching of NUL's is substantially slower than matching other
        !           763: characters.
        !           764: .Pp
        !           765: .Nm Lex
        !           766: does not generate correct #line directives for code internal
        !           767: to the scanner; thus, bugs in
        !           768: .Pa lex.skel
        !           769: yield bogus line numbers.
        !           770: .Pp
        !           771: Due to both buffering of input and read-ahead, you cannot intermix
        !           772: calls to <stdio.h> routines, such as, for example,
        !           773: .Fn getchar ,
        !           774: with
        !           775: .Nm lex
        !           776: rules and expect it to work.  Call
        !           777: .Fn input
        !           778: instead.
        !           779: .Pp
        !           780: The total table entries listed by the
        !           781: .Fl v
        !           782: flag excludes the number of table entries needed to determine
        !           783: what rule has been matched.  The number of entries is equal
        !           784: to the number of DFA states if the scanner does not use
        !           785: .Ic REJECT ,
        !           786: and somewhat greater than the number of states if it does.
        !           787: .Pp
        !           788: .Ic REJECT
        !           789: cannot be used with the
        !           790: .Fl f
        !           791: or
        !           792: .Fl F
        !           793: options.
        !           794: .Pp
        !           795: Some of the macros, such as
        !           796: .Fn yywrap ,
        !           797: may in the future become functions which live in the
        !           798: .Fl lfl
        !           799: library.  This will doubtless break a lot of code, but may be
        !           800: required for POSIX-compliance.
        !           801: .Pp
        !           802: The
        !           803: .Nm lex
        !           804: internal algorithms need documentation.
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.