43BSDReno/pgrm/lex/flexdoc.1 - annotate

Return to flexdoc.1 CVS log
Up to [CSRG BSD Unix] / 43BSDReno / pgrm / lex
Annotation of 43BSDReno/pgrm/lex/flexdoc.1, revision 1.1

1.1     ! root        1: .TH FLEX 1 "26 May 1990" "Version 2.3"
        !             2: .SH NAME
        !             3: flex - fast lexical analyzer generator
        !             4: .SH SYNOPSIS
        !             5: .B flex
        !             6: .B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton]
        !             7: .I [filename ...]
        !             8: .SH DESCRIPTION
        !             9: .I flex
        !            10: is a tool for generating
        !            11: .I scanners:
        !            12: programs which recognized lexical patterns in text.
        !            13: .I flex
        !            14: reads
        !            15: the given input files, or its standard input if no file names are given,
        !            16: for a description of a scanner to generate.  The description is in
        !            17: the form of pairs
        !            18: of regular expressions and C code, called
        !            19: .I rules.  flex
        !            20: generates as output a C source file,
        !            21: .B lex.yy.c,
        !            22: which defines a routine
        !            23: .B yylex().
        !            24: This file is compiled and linked with the
        !            25: .B -lfl
        !            26: library to produce an executable.  When the executable is run,
        !            27: it analyzes its input for occurrences
        !            28: of the regular expressions.  Whenever it finds one, it executes
        !            29: the corresponding C code.
        !            30: .SH SOME SIMPLE EXAMPLES
        !            31: .LP
        !            32: First some simple examples to get the flavor of how one uses
        !            33: .I flex.
        !            34: The following
        !            35: .I flex
        !            36: input specifies a scanner which whenever it encounters the string
        !            37: "username" will replace it with the user's login name:
        !            38: .nf
        !            39: 
        !            40:     %%
        !            41:     username    printf( "%s", getlogin() );
        !            42: 
        !            43: .fi
        !            44: By default, any text not matched by a
        !            45: .I flex
        !            46: scanner
        !            47: is copied to the output, so the net effect of this scanner is
        !            48: to copy its input file to its output with each occurrence
        !            49: of "username" expanded.
        !            50: In this input, there is just one rule.  "username" is the
        !            51: .I pattern
        !            52: and the "printf" is the
        !            53: .I action.
        !            54: The "%%" marks the beginning of the rules.
        !            55: .LP
        !            56: Here's another simple example:
        !            57: .nf
        !            58: 
        !            59:         int num_lines = 0, num_chars = 0;
        !            60: 
        !            61:     %%
        !            62:     \\n    ++num_lines; ++num_chars;
        !            63:     .     ++num_chars;
        !            64: 
        !            65:     %%
        !            66:     main()
        !            67:         {
        !            68:         yylex();
        !            69:         printf( "# of lines = %d, # of chars = %d\\n",
        !            70:                 num_lines, num_chars );
        !            71:         }
        !            72: 
        !            73: .fi
        !            74: This scanner counts the number of characters and the number
        !            75: of lines in its input (it produces no output other than the
        !            76: final report on the counts).  The first line
        !            77: declares two globals, "num_lines" and "num_chars", which are accessible
        !            78: both inside
        !            79: .B yylex()
        !            80: and in the
        !            81: .B main()
        !            82: routine declared after the second "%%".  There are two rules, one
        !            83: which matches a newline ("\\n") and increments both the line count and
        !            84: the character count, and one which matches any character other than
        !            85: a newline (indicated by the "." regular expression).
        !            86: .LP
        !            87: A somewhat more complicated example:
        !            88: .nf
        !            89: 
        !            90:     /* scanner for a toy Pascal-like language */
        !            91: 
        !            92:     %{
        !            93:     /* need this for the call to atof() below */
        !            94:     #include <math.h>
        !            95:     %}
        !            96: 
        !            97:     DIGIT    [0-9]
        !            98:     ID       [a-z][a-z0-9]*
        !            99: 
        !           100:     %%
        !           101: 
        !           102:     {DIGIT}+    {
        !           103:                 printf( "An integer: %s (%d)\\n", yytext,
        !           104:                         atoi( yytext ) );
        !           105:                 }
        !           106: 
        !           107:     {DIGIT}+"."{DIGIT}*        {
        !           108:                 printf( "A float: %s (%d)\\n", yytext,
        !           109:                         atof( yytext ) );
        !           110:                 }
        !           111: 
        !           112:     if|then|begin|end|procedure|function        {
        !           113:                 printf( "A keyword: %s\\n", yytext );
        !           114:                 }
        !           115: 
        !           116:     {ID}        printf( "An identifier: %s\\n", yytext );
        !           117: 
        !           118:     "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );
        !           119: 
        !           120:     "{"[^}\\n]*"}"     /* eat up one-line comments */
        !           121: 
        !           122:     [ \\t\\n]+          /* eat up whitespace */
        !           123: 
        !           124:     .           printf( "Unrecognized character: %s\\n", yytext );
        !           125: 
        !           126:     %%
        !           127: 
        !           128:     main( argc, argv )
        !           129:     int argc;
        !           130:     char **argv;
        !           131:         {
        !           132:         ++argv, --argc;  /* skip over program name */
        !           133:         if ( argc > 0 )
        !           134:                 yyin = fopen( argv[0], "r" );
        !           135:         else
        !           136:                 yyin = stdin;
        !           137:         
        !           138:         yylex();
        !           139:         }
        !           140: 
        !           141: .fi
        !           142: This is the beginnings of a simple scanner for a language like
        !           143: Pascal.  It identifies different types of
        !           144: .I tokens
        !           145: and reports on what it has seen.
        !           146: .LP
        !           147: The details of this example will be explained in the following
        !           148: sections.
        !           149: .SH FORMAT OF THE INPUT FILE
        !           150: The
        !           151: .I flex
        !           152: input file consists of three sections, separated by a line with just
        !           153: .B %%
        !           154: in it:
        !           155: .nf
        !           156: 
        !           157:     definitions
        !           158:     %%
        !           159:     rules
        !           160:     %%
        !           161:     user code
        !           162: 
        !           163: .fi
        !           164: The
        !           165: .I definitions
        !           166: section contains declarations of simple
        !           167: .I name
        !           168: definitions to simplify the scanner specification, and declarations of
        !           169: .I start conditions,
        !           170: which are explained in a later section.
        !           171: .LP
        !           172: Name definitions have the form:
        !           173: .nf
        !           174: 
        !           175:     name definition
        !           176: 
        !           177: .fi
        !           178: The "name" is a word beginning with a letter or an underscore ('_')
        !           179: followed by zero or more letters, digits, '_', or '-' (dash).
        !           180: The definition is taken to begin at the first non-white-space character
        !           181: following the name and continuing to the end of the line.
        !           182: The definition can subsequently be referred to using "{name}", which
        !           183: will expand to "(definition)".  For example,
        !           184: .nf
        !           185: 
        !           186:     DIGIT    [0-9]
        !           187:     ID       [a-z][a-z0-9]*
        !           188: 
        !           189: .fi
        !           190: defines "DIGIT" to be a regular expression which matches a
        !           191: single digit, and
        !           192: "ID" to be a regular expression which matches a letter
        !           193: followed by zero-or-more letters-or-digits.
        !           194: A subsequent reference to
        !           195: .nf
        !           196: 
        !           197:     {DIGIT}+"."{DIGIT}*
        !           198: 
        !           199: .fi
        !           200: is identical to
        !           201: .nf
        !           202: 
        !           203:     ([0-9])+"."([0-9])*
        !           204: 
        !           205: .fi
        !           206: and matches one-or-more digits followed by a '.' followed
        !           207: by zero-or-more digits.
        !           208: .LP
        !           209: The
        !           210: .I rules
        !           211: section of the
        !           212: .I flex
        !           213: input contains a series of rules of the form:
        !           214: .nf
        !           215: 
        !           216:     pattern   action
        !           217: 
        !           218: .fi
        !           219: where the pattern must be unindented and the action must begin
        !           220: on the same line.
        !           221: .LP
        !           222: See below for a further description of patterns and actions.
        !           223: .LP
        !           224: Finally, the user code section is simply copied to
        !           225: .B lex.yy.c
        !           226: verbatim.
        !           227: It is used for companion routines which call or are called
        !           228: by the scanner.  The presence of this section is optional;
        !           229: if it is missing, the second
        !           230: .B %%
        !           231: in the input file may be skipped, too.
        !           232: .LP
        !           233: In the definitions and rules sections, any
        !           234: .I indented
        !           235: text or text enclosed in
        !           236: .B %{
        !           237: and
        !           238: .B %}
        !           239: is copied verbatim to the output (with the %{}'s removed).
        !           240: The %{}'s must appear unindented on lines by themselves.
        !           241: .LP
        !           242: In the rules section,
        !           243: any indented or %{} text appearing before the
        !           244: first rule may be used to declare variables
        !           245: which are local to the scanning routine and (after the declarations)
        !           246: code which is to be executed whenever the scanning routine is entered.
        !           247: Other indented or %{} text in the rule section is still copied to the output,
        !           248: but its meaning is not well-defined and it may well cause compile-time
        !           249: errors (this feature is present for
        !           250: .I POSIX
        !           251: compliance; see below for other such features).
        !           252: .LP
        !           253: In the definitions section, an unindented comment (i.e., a line
        !           254: beginning with "/*") is also copied verbatim to the output up
        !           255: to the next "*/".  Also, any line in the definitions section
        !           256: beginning with '#' is ignored, though this style of comment is
        !           257: deprecated and may go away in the future.
        !           258: .SH PATTERNS
        !           259: The patterns in the input are written using an extended set of regular
        !           260: expressions.  These are:
        !           261: .nf
        !           262: 
        !           263:     x          match the character 'x'
        !           264:     .          any character except newline
        !           265:     [xyz]      a "character class"; in this case, the pattern
        !           266:                  matches either an 'x', a 'y', or a 'z'
        !           267:     [abj-oZ]   a "character class" with a range in it; matches
        !           268:                  an 'a', a 'b', any letter from 'j' through 'o',
        !           269:                  or a 'Z'
        !           270:     [^A-Z]     a "negated character class", i.e., any character
        !           271:                  but those in the class.  In this case, any
        !           272:                  character EXCEPT an uppercase letter.
        !           273:     [^A-Z\\n]   any character EXCEPT an uppercase letter or
        !           274:                  a newline
        !           275:     r*         zero or more r's, where r is any regular expression
        !           276:     r+         one or more r's
        !           277:     r?         zero or one r's (that is, "an optional r")
        !           278:     r{2,5}     anywhere from two to five r's
        !           279:     r{2,}      two or more r's
        !           280:     r{4}       exactly 4 r's
        !           281:     {name}     the expansion of the "name" definition
        !           282:                (see above)
        !           283:     "[xyz]\\"foo"
        !           284:                the literal string: [xyz]"foo
        !           285:     \\X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
        !           286:                  then the ANSI-C interpretation of \\x.
        !           287:                  Otherwise, a literal 'X' (used to escape
        !           288:                  operators such as '*')
        !           289:     \\123       the character with octal value 123
        !           290:     \\x2a       the character with hexadecimal value 2a
        !           291:     (r)        match an r; parentheses are used to override
        !           292:                  precedence (see below)
        !           293: 
        !           294: 
        !           295:     rs         the regular expression r followed by the
        !           296:                  regular expression s; called "concatenation"
        !           297: 
        !           298: 
        !           299:     r|s        either an r or an s
        !           300: 
        !           301: 
        !           302:     r/s        an r but only if it is followed by an s.  The
        !           303:                  s is not part of the matched text.  This type
        !           304:                  of pattern is called as "trailing context".
        !           305:     ^r         an r, but only at the beginning of a line
        !           306:     r$         an r, but only at the end of a line.  Equivalent
        !           307:                  to "r/\\n".
        !           308: 
        !           309: 
        !           310:     <s>r       an r, but only in start condition s (see
        !           311:                below for discussion of start conditions)
        !           312:     <s1,s2,s3>r
        !           313:                same, but in any of start conditions s1,
        !           314:                s2, or s3
        !           315: 
        !           316: 
        !           317:     <<EOF>>    an end-of-file
        !           318:     <s1,s2><<EOF>>
        !           319:                an end-of-file when in start condition s1 or s2
        !           320: 
        !           321: .fi
        !           322: The regular expressions listed above are grouped according to
        !           323: precedence, from highest precedence at the top to lowest at the bottom.
        !           324: Those grouped together have equal precedence.  For example,
        !           325: .nf
        !           326: 
        !           327:     foo|bar*
        !           328: 
        !           329: .fi
        !           330: is the same as
        !           331: .nf
        !           332: 
        !           333:     (foo)|(ba(r*))
        !           334: 
        !           335: .fi
        !           336: since the '*' operator has higher precedence than concatenation,
        !           337: and concatenation higher than alternation ('|').  This pattern
        !           338: therefore matches
        !           339: .I either
        !           340: the string "foo"
        !           341: .I or
        !           342: the string "ba" followed by zero-or-more r's.
        !           343: To match "foo" or zero-or-more "bar"'s, use:
        !           344: .nf
        !           345: 
        !           346:     foo|(bar)*
        !           347: 
        !           348: .fi
        !           349: and to match zero-or-more "foo"'s-or-"bar"'s:
        !           350: .nf
        !           351: 
        !           352:     (foo|bar)*
        !           353: 
        !           354: .fi
        !           355: .LP
        !           356: Some notes on patterns:
        !           357: .IP -
        !           358: A negated character class such as the example "[^A-Z]"
        !           359: above
        !           360: .I will match a newline
        !           361: unless "\\n" (or an equivalent escape sequence) is one of the
        !           362: characters explicitly present in the negated character class
        !           363: (e.g., "[^A-Z\\n]").  This is unlike how many other regular
        !           364: expression tools treat negated character classes, but unfortunately
        !           365: the inconsistency is historically entrenched.
        !           366: Matching newlines means that a pattern like [^"]* can match an entire
        !           367: input (overflowing the scanner's input buffer) unless there's another
        !           368: quote in the input.
        !           369: .IP -
        !           370: A rule can have at most one instance of trailing context (the '/' operator
        !           371: or the '$' operator).  The start condition, '^', and "<<EOF>>" patterns
        !           372: can only occur at the beginning of a pattern, and, as well as with '/' and '$',
        !           373: cannot be grouped inside parentheses.  A '^' which does not occur at
        !           374: the beginning of a rule or a '$' which does not occur at the end of
        !           375: a rule loses its special properties and is treated as a normal character.
        !           376: .IP
        !           377: The following are illegal:
        !           378: .nf
        !           379: 
        !           380:     foo/bar$
        !           381:     <sc1>foo<sc2>bar
        !           382: 
        !           383: .fi
        !           384: Note that the first of these, can be written "foo/bar\\n".
        !           385: .IP
        !           386: The following will result in '$' or '^' being treated as a normal character:
        !           387: .nf
        !           388: 
        !           389:     foo|(bar$)
        !           390:     foo|^bar
        !           391: 
        !           392: .fi
        !           393: If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
        !           394: could be used (the special '|' action is explained below):
        !           395: .nf
        !           396: 
        !           397:     foo      |
        !           398:     bar$     /* action goes here */
        !           399: 
        !           400: .fi
        !           401: A similar trick will work for matching a foo or a
        !           402: bar-at-the-beginning-of-a-line.
        !           403: .SH HOW THE INPUT IS MATCHED
        !           404: When the generated scanner is run, it analyzes its input looking
        !           405: for strings which match any of its patterns.  If it finds more than
        !           406: one match, it takes the one matching the most text (for trailing
        !           407: context rules, this includes the length of the trailing part, even
        !           408: though it will then be returned to the input).  If it finds two
        !           409: or more matches of the same length, the
        !           410: rule listed first in the
        !           411: .I flex
        !           412: input file is chosen.
        !           413: .LP
        !           414: Once the match is determined, the text corresponding to the match
        !           415: (called the
        !           416: .I token)
        !           417: is made available in the global character pointer
        !           418: .B yytext,
        !           419: and its length in the global integer
        !           420: .B yyleng.
        !           421: The
        !           422: .I action
        !           423: corresponding to the matched pattern is then executed (a more
        !           424: detailed description of actions follows), and then the remaining
        !           425: input is scanned for another match.
        !           426: .LP
        !           427: If no match is found, then the
        !           428: .I default rule
        !           429: is executed: the next character in the input is considered matched and
        !           430: copied to the standard output.  Thus, the simplest legal
        !           431: .I flex
        !           432: input is:
        !           433: .nf
        !           434: 
        !           435:     %%
        !           436: 
        !           437: .fi
        !           438: which generates a scanner that simply copies its input (one character
        !           439: at a time) to its output.
        !           440: .SH ACTIONS
        !           441: Each pattern in a rule has a corresponding action, which can be any
        !           442: arbitrary C statement.  The pattern ends at the first non-escaped
        !           443: whitespace character; the remainder of the line is its action.  If the
        !           444: action is empty, then when the pattern is matched the input token
        !           445: is simply discarded.  For example, here is the specification for a program
        !           446: which deletes all occurrences of "zap me" from its input:
        !           447: .nf
        !           448: 
        !           449:     %%
        !           450:     "zap me"
        !           451: 
        !           452: .fi
        !           453: (It will copy all other characters in the input to the output since
        !           454: they will be matched by the default rule.)
        !           455: .LP
        !           456: Here is a program which compresses multiple blanks and tabs down to
        !           457: a single blank, and throws away whitespace found at the end of a line:
        !           458: .nf
        !           459: 
        !           460:     %%
        !           461:     [ \\t]+        putchar( ' ' );
        !           462:     [ \\t]+$       /* ignore this token */
        !           463: 
        !           464: .fi
        !           465: .LP
        !           466: If the action contains a '{', then the action spans till the balancing '}'
        !           467: is found, and the action may cross multiple lines.
        !           468: .I flex 
        !           469: knows about C strings and comments and won't be fooled by braces found
        !           470: within them, but also allows actions to begin with
        !           471: .B %{
        !           472: and will consider the action to be all the text up to the next
        !           473: .B %}
        !           474: (regardless of ordinary braces inside the action).
        !           475: .LP
        !           476: An action consisting solely of a vertical bar ('|') means "same as
        !           477: the action for the next rule."  See below for an illustration.
        !           478: .LP
        !           479: Actions can include arbitrary C code, including
        !           480: .B return
        !           481: statements to return a value to whatever routine called
        !           482: .B yylex().
        !           483: Each time
        !           484: .B yylex()
        !           485: is called it continues processing tokens from where it last left
        !           486: off until it either reaches
        !           487: the end of the file or executes a return.  Once it reaches an end-of-file,
        !           488: however, then any subsequent call to
        !           489: .B yylex()
        !           490: will simply immediately return, unless
        !           491: .B yyrestart()
        !           492: is first called (see below).
        !           493: .LP
        !           494: Actions are not allowed to modify yytext or yyleng.
        !           495: .LP
        !           496: There are a number of special directives which can be included within
        !           497: an action:
        !           498: .IP -
        !           499: .B ECHO
        !           500: copies yytext to the scanner's output.
        !           501: .IP -
        !           502: .B BEGIN
        !           503: followed by the name of a start condition places the scanner in the
        !           504: corresponding start condition (see below).
        !           505: .IP -
        !           506: .B REJECT
        !           507: directs the scanner to proceed on to the "second best" rule which matched the
        !           508: input (or a prefix of the input).  The rule is chosen as described
        !           509: above in "How the Input is Matched", and
        !           510: .B yytext
        !           511: and
        !           512: .B yyleng
        !           513: set up appropriately.
        !           514: It may either be one which matched as much text
        !           515: as the originally chosen rule but came later in the
        !           516: .I flex
        !           517: input file, or one which matched less text.
        !           518: For example, the following will both count the
        !           519: words in the input and call the routine special() whenever "frob" is seen:
        !           520: .nf
        !           521: 
        !           522:             int word_count = 0;
        !           523:     %%
        !           524: 
        !           525:     frob        special(); REJECT;
        !           526:     [^ \\t\\n]+   ++word_count;
        !           527: 
        !           528: .fi
        !           529: Without the
        !           530: .B REJECT,
        !           531: any "frob"'s in the input would not be counted as words, since the
        !           532: scanner normally executes only one action per token.
        !           533: Multiple
        !           534: .B REJECT's
        !           535: are allowed, each one finding the next best choice to the currently
        !           536: active rule.  For example, when the following scanner scans the token
        !           537: "abcd", it will write "abcdabcaba" to the output:
        !           538: .nf
        !           539: 
        !           540:     %%
        !           541:     a        |
        !           542:     ab       |
        !           543:     abc      |
        !           544:     abcd     ECHO; REJECT;
        !           545:     .|\\n     /* eat up any unmatched character */
        !           546: 
        !           547: .fi
        !           548: (The first three rules share the fourth's action since they use
        !           549: the special '|' action.)
        !           550: .B REJECT
        !           551: is a particularly expensive feature in terms scanner performance;
        !           552: if it is used in
        !           553: .I any
        !           554: of the scanner's actions it will slow down
        !           555: .I all
        !           556: of the scanner's matching.  Furthermore,
        !           557: .B REJECT
        !           558: cannot be used with the
        !           559: .I -f
        !           560: or
        !           561: .I -F
        !           562: options (see below).
        !           563: .IP
        !           564: Note also that unlike the other special actions,
        !           565: .B REJECT
        !           566: is a
        !           567: .I branch;
        !           568: code immediately following it in the action will
        !           569: .I not
        !           570: be executed.
        !           571: .IP -
        !           572: .B yymore()
        !           573: tells the scanner that the next time it matches a rule, the corresponding
        !           574: token should be
        !           575: .I appended
        !           576: onto the current value of
        !           577: .B yytext
        !           578: rather than replacing it.  For example, given the input "mega-kludge"
        !           579: the following will write "mega-mega-kludge" to the output:
        !           580: .nf
        !           581: 
        !           582:     %%
        !           583:     mega-    ECHO; yymore();
        !           584:     kludge   ECHO;
        !           585: 
        !           586: .fi
        !           587: First "mega-" is matched and echoed to the output.  Then "kludge"
        !           588: is matched, but the previous "mega-" is still hanging around at the
        !           589: beginning of
        !           590: .B yytext
        !           591: so the
        !           592: .B ECHO
        !           593: for the "kludge" rule will actually write "mega-kludge".
        !           594: The presence of
        !           595: .B yymore()
        !           596: in the scanner's action entails a minor performance penalty in the
        !           597: scanner's matching speed.
        !           598: .IP -
        !           599: .B yyless(n)
        !           600: returns all but the first
        !           601: .I n
        !           602: characters of the current token back to the input stream, where they
        !           603: will be rescanned when the scanner looks for the next match.
        !           604: .B yytext
        !           605: and
        !           606: .B yyleng
        !           607: are adjusted appropriately (e.g.,
        !           608: .B yyleng
        !           609: will now be equal to
        !           610: .I n
        !           611: ).  For example, on the input "foobar" the following will write out
        !           612: "foobarbar":
        !           613: .nf
        !           614: 
        !           615:     %%
        !           616:     foobar    ECHO; yyless(3);
        !           617:     [a-z]+    ECHO;
        !           618: 
        !           619: .fi
        !           620: An argument of 0 to
        !           621: .B yyless
        !           622: will cause the entire current input string to be scanned again.  Unless you've
        !           623: changed how the scanner will subsequently process its input (using
        !           624: .B BEGIN,
        !           625: for example), this will result in an endless loop.
        !           626: .IP -
        !           627: .B unput(c)
        !           628: puts the character
        !           629: .I c
        !           630: back onto the input stream.  It will be the next character scanned.
        !           631: The following action will take the current token and cause it
        !           632: to be rescanned enclosed in parentheses.
        !           633: .nf
        !           634: 
        !           635:     {
        !           636:     int i;
        !           637:     unput( ')' );
        !           638:     for ( i = yyleng - 1; i >= 0; --i )
        !           639:         unput( yytext[i] );
        !           640:     unput( '(' );
        !           641:     }
        !           642: 
        !           643: .fi
        !           644: Note that since each
        !           645: .B unput()
        !           646: puts the given character back at the
        !           647: .I beginning
        !           648: of the input stream, pushing back strings must be done back-to-front.
        !           649: .IP -
        !           650: .B input()
        !           651: reads the next character from the input stream.  For example,
        !           652: the following is one way to eat up C comments:
        !           653: .nf
        !           654: 
        !           655:     %%
        !           656:     "/*"        {
        !           657:                 register int c;
        !           658: 
        !           659:                 for ( ; ; )
        !           660:                     {
        !           661:                     while ( (c = input()) != '*' &&
        !           662:                             c != EOF )
        !           663:                         ;    /* eat up text of comment */
        !           664: 
        !           665:                     if ( c == '*' )
        !           666:                         {
        !           667:                         while ( (c = input()) == '*' )
        !           668:                             ;
        !           669:                         if ( c == '/' )
        !           670:                             break;    /* found the end */
        !           671:                         }
        !           672: 
        !           673:                     if ( c == EOF )
        !           674:                         {
        !           675:                         error( "EOF in comment" );
        !           676:                         break;
        !           677:                         }
        !           678:                     }
        !           679:                 }
        !           680: 
        !           681: .fi
        !           682: (Note that if the scanner is compiled using
        !           683: .B C++,
        !           684: then
        !           685: .B input()
        !           686: is instead referred to as
        !           687: .B yyinput(),
        !           688: in order to avoid a name clash with the
        !           689: .B C++
        !           690: stream by the name of
        !           691: .I input.)
        !           692: .IP -
        !           693: .B yyterminate()
        !           694: can be used in lieu of a return statement in an action.  It terminates
        !           695: the scanner and returns a 0 to the scanner's caller, indicating "all done".
        !           696: Subsequent calls to the scanner will immediately return unless preceded
        !           697: by a call to
        !           698: .B yyrestart()
        !           699: (see below).
        !           700: By default,
        !           701: .B yyterminate()
        !           702: is also called when an end-of-file is encountered.  It is a macro and
        !           703: may be redefined.
        !           704: .SH THE GENERATED SCANNER
        !           705: The output of
        !           706: .I flex
        !           707: is the file
        !           708: .B lex.yy.c,
        !           709: which contains the scanning routine
        !           710: .B yylex(),
        !           711: a number of tables used by it for matching tokens, and a number
        !           712: of auxiliary routines and macros.  By default,
        !           713: .B yylex()
        !           714: is declared as follows:
        !           715: .nf
        !           716: 
        !           717:     int yylex()
        !           718:         {
        !           719:         ... various definitions and the actions in here ...
        !           720:         }
        !           721: 
        !           722: .fi
        !           723: (If your environment supports function prototypes, then it will
        !           724: be "int yylex( void )".)  This definition may be changed by redefining
        !           725: the "YY_DECL" macro.  For example, you could use:
        !           726: .nf
        !           727: 
        !           728:     #undef YY_DECL
        !           729:     #define YY_DECL float lexscan( a, b ) float a, b;
        !           730: 
        !           731: .fi
        !           732: to give the scanning routine the name
        !           733: .I lexscan,
        !           734: returning a float, and taking two floats as arguments.  Note that
        !           735: if you give arguments to the scanning routine using a
        !           736: K&R-style/non-prototyped function declaration, you must terminate
        !           737: the definition with a semi-colon (;).
        !           738: .LP
        !           739: Whenever
        !           740: .B yylex()
        !           741: is called, it scans tokens from the global input file
        !           742: .I yyin
        !           743: (which defaults to stdin).  It continues until it either reaches
        !           744: an end-of-file (at which point it returns the value 0) or
        !           745: one of its actions executes a
        !           746: .I return
        !           747: statement.
        !           748: In the former case, when called again the scanner will immediately
        !           749: return unless
        !           750: .B yyrestart()
        !           751: is called to point
        !           752: .I yyin
        !           753: at the new input file.  (
        !           754: .B yyrestart()
        !           755: takes one argument, a
        !           756: .B FILE *
        !           757: pointer.)
        !           758: In the latter case (i.e., when an action
        !           759: executes a return), the scanner may then be called again and it
        !           760: will resume scanning where it left off.
        !           761: .LP
        !           762: By default (and for purposes of efficiency), the scanner uses
        !           763: block-reads rather than simple
        !           764: .I getc()
        !           765: calls to read characters from
        !           766: .I yyin.
        !           767: The nature of how it gets its input can be controlled by redefining the
        !           768: .B YY_INPUT
        !           769: macro.
        !           770: YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
        !           771: action is to place up to
        !           772: .I max_size
        !           773: characters in the character array
        !           774: .I buf
        !           775: and return in the integer variable
        !           776: .I result
        !           777: either the
        !           778: number of characters read or the constant YY_NULL (0 on Unix systems)
        !           779: to indicate EOF.  The default YY_INPUT reads from the
        !           780: global file-pointer "yyin".
        !           781: .LP
        !           782: A sample redefinition of YY_INPUT (in the definitions
        !           783: section of the input file):
        !           784: .nf
        !           785: 
        !           786:     %{
        !           787:     #undef YY_INPUT
        !           788:     #define YY_INPUT(buf,result,max_size) \\
        !           789:         result = ((buf[0] = getchar()) == EOF) ? YY_NULL : 1;
        !           790:     %}
        !           791: 
        !           792: .fi
        !           793: This definition will change the input processing to occur
        !           794: one character at a time.
        !           795: .LP
        !           796: You also can add in things like keeping track of the
        !           797: input line number this way; but don't expect your scanner to
        !           798: go very fast.
        !           799: .LP
        !           800: When the scanner receives an end-of-file indication from YY_INPUT,
        !           801: it then checks the
        !           802: .B yywrap()
        !           803: function.  If
        !           804: .B yywrap()
        !           805: returns false (zero), then it is assumed that the
        !           806: function has gone ahead and set up
        !           807: .I yyin
        !           808: to point to another input file, and scanning continues.  If it returns
        !           809: true (non-zero), then the scanner terminates, returning 0 to its
        !           810: caller.
        !           811: .LP
        !           812: The default
        !           813: .B yywrap()
        !           814: always returns 1.  Presently, to redefine it you must first
        !           815: "#undef yywrap", as it is currently implemented as a macro.  As indicated
        !           816: by the hedging in the previous sentence, it may be changed to
        !           817: a true function in the near future.
        !           818: .LP
        !           819: The scanner writes its
        !           820: .B ECHO
        !           821: output to the
        !           822: .I yyout
        !           823: global (default, stdout), which may be redefined by the user simply
        !           824: by assigning it to some other
        !           825: .B FILE
        !           826: pointer.
        !           827: .SH START CONDITIONS
        !           828: .I flex
        !           829: provides a mechanism for conditionally activating rules.  Any rule
        !           830: whose pattern is prefixed with "<sc>" will only be active when
        !           831: the scanner is in the start condition named "sc".  For example,
        !           832: .nf
        !           833: 
        !           834:     <STRING>[^"]*        { /* eat up the string body ... */
        !           835:                 ...
        !           836:                 }
        !           837: 
        !           838: .fi
        !           839: will be active only when the scanner is in the "STRING" start
        !           840: condition, and
        !           841: .nf
        !           842: 
        !           843:     <INITIAL,STRING,QUOTE>\\.        { /* handle an escape ... */
        !           844:                 ...
        !           845:                 }
        !           846: 
        !           847: .fi
        !           848: will be active only when the current start condition is
        !           849: either "INITIAL", "STRING", or "QUOTE".
        !           850: .LP
        !           851: Start conditions
        !           852: are declared in the definitions (first) section of the input
        !           853: using unindented lines beginning with either
        !           854: .B %s
        !           855: or
        !           856: .B %x
        !           857: followed by a list of names.
        !           858: The former declares
        !           859: .I inclusive
        !           860: start conditions, the latter
        !           861: .I exclusive
        !           862: start conditions.  A start condition is activated using the
        !           863: .B BEGIN
        !           864: action.  Until the next
        !           865: .B BEGIN
        !           866: action is executed, rules with the given start
        !           867: condition will be active and
        !           868: rules with other start conditions will be inactive.
        !           869: If the start condition is
        !           870: .I inclusive,
        !           871: then rules with no start conditions at all will also be active.
        !           872: If it is
        !           873: .I exclusive,
        !           874: then
        !           875: .I only
        !           876: rules qualified with the start condition will be active.
        !           877: A set of rules contingent on the same exclusive start condition
        !           878: describe a scanner which is independent of any of the other rules in the
        !           879: .I flex
        !           880: input.  Because of this,
        !           881: exclusive start conditions make it easy to specify "mini-scanners"
        !           882: which scan portions of the input that are syntactically different
        !           883: from the rest (e.g., comments).
        !           884: .LP
        !           885: If the distinction between inclusive and exclusive start conditions
        !           886: is still a little vague, here's a simple example illustrating the
        !           887: connection between the two.  The set of rules:
        !           888: .nf
        !           889: 
        !           890:     %s example
        !           891:     %%
        !           892:     <example>foo           /* do something */
        !           893: 
        !           894: .fi
        !           895: is equivalent to
        !           896: .nf
        !           897: 
        !           898:     %x example
        !           899:     %%
        !           900:     <INITIAL,example>foo   /* do something */
        !           901: 
        !           902: .fi
        !           903: .LP
        !           904: The default rule (to
        !           905: .B ECHO
        !           906: any unmatched character) remains active in start conditions.
        !           907: .LP
        !           908: .B BEGIN(0)
        !           909: returns to the original state where only the rules with
        !           910: no start conditions are active.  This state can also be
        !           911: referred to as the start-condition "INITIAL", so
        !           912: .B BEGIN(INITIAL)
        !           913: is equivalent to
        !           914: .B BEGIN(0).
        !           915: (The parentheses around the start condition name are not required but
        !           916: are considered good style.)
        !           917: .LP
        !           918: .B BEGIN
        !           919: actions can also be given as indented code at the beginning
        !           920: of the rules section.  For example, the following will cause
        !           921: the scanner to enter the "SPECIAL" start condition whenever
        !           922: .I yylex()
        !           923: is called and the global variable
        !           924: .I enter_special
        !           925: is true:
        !           926: .nf
        !           927: 
        !           928:             int enter_special;
        !           929: 
        !           930:     %x SPECIAL
        !           931:     %%
        !           932:             if ( enter_special )
        !           933:                 BEGIN(SPECIAL);
        !           934: 
        !           935:     <SPECIAL>blahblahblah
        !           936:     ...more rules follow...
        !           937: 
        !           938: .fi
        !           939: .LP
        !           940: To illustrate the uses of start conditions,
        !           941: here is a scanner which provides two different interpretations
        !           942: of a string like "123.456".  By default it will treat it as
        !           943: as three tokens, the integer "123", a dot ('.'), and the integer "456".
        !           944: But if the string is preceded earlier in the line by the string
        !           945: "expect-floats"
        !           946: it will treat it as a single token, the floating-point number
        !           947: 123.456:
        !           948: .nf
        !           949: 
        !           950:     %{
        !           951:     #include <math.h>
        !           952:     %}
        !           953:     %s expect
        !           954: 
        !           955:     %%
        !           956:     expect-floats        BEGIN(expect);
        !           957: 
        !           958:     <expect>[0-9]+"."[0-9]+      {
        !           959:                 printf( "found a float, = %f\\n",
        !           960:                         atof( yytext ) );
        !           961:                 }
        !           962:     <expect>\\n           {
        !           963:                 /* that's the end of the line, so
        !           964:                  * we need another "expect-number"
        !           965:                  * before we'll recognize any more
        !           966:                  * numbers
        !           967:                  */
        !           968:                 BEGIN(INITIAL);
        !           969:                 }
        !           970: 
        !           971:     [0-9]+      {
        !           972:                 printf( "found an integer, = %d\\n",
        !           973:                         atoi( yytext ) );
        !           974:                 }
        !           975: 
        !           976:     "."         printf( "found a dot\\n" );
        !           977: 
        !           978: .fi
        !           979: Here is a scanner which recognizes (and discards) C comments while
        !           980: maintaining a count of the current input line.
        !           981: .nf
        !           982: 
        !           983:     %x comment
        !           984:     %%
        !           985:             int line_num = 1;
        !           986: 
        !           987:     "/*"         BEGIN(comment);
        !           988: 
        !           989:     <comment>[^*\\n]*        /* eat anything that's not a '*' */
        !           990:     <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
        !           991:     <comment>\\n             ++line_num;
        !           992:     <comment>"*"+"/"        BEGIN(INITIAL);
        !           993: 
        !           994: .fi
        !           995: Note that start-conditions names are really integer values and
        !           996: can be stored as such.  Thus, the above could be extended in the
        !           997: following fashion:
        !           998: .nf
        !           999: 
        !          1000:     %x comment foo
        !          1001:     %%
        !          1002:             int line_num = 1;
        !          1003:             int comment_caller;
        !          1004: 
        !          1005:     "/*"         {
        !          1006:                  comment_caller = INITIAL;
        !          1007:                  BEGIN(comment);
        !          1008:                  }
        !          1009: 
        !          1010:     ...
        !          1011: 
        !          1012:     <foo>"/*"    {
        !          1013:                  comment_caller = foo;
        !          1014:                  BEGIN(comment);
        !          1015:                  }
        !          1016: 
        !          1017:     <comment>[^*\\n]*        /* eat anything that's not a '*' */
        !          1018:     <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
        !          1019:     <comment>\\n             ++line_num;
        !          1020:     <comment>"*"+"/"        BEGIN(comment_caller);
        !          1021: 
        !          1022: .fi
        !          1023: One can then implement a "stack" of start conditions using an
        !          1024: array of integers.  (It is likely that such stacks will become
        !          1025: a full-fledged
        !          1026: .I flex
        !          1027: feature in the future.)  Note, though, that
        !          1028: start conditions do not have their own name-space; %s's and %x's
        !          1029: declare names in the same fashion as #define's.
        !          1030: .SH MULTIPLE INPUT BUFFERS
        !          1031: Some scanners (such as those which support "include" files)
        !          1032: require reading from several input streams.  As
        !          1033: .I flex
        !          1034: scanners do a large amount of buffering, one cannot control
        !          1035: where the next input will be read from by simply writing a
        !          1036: .B YY_INPUT
        !          1037: which is sensitive to the scanning context.
        !          1038: .B YY_INPUT
        !          1039: is only called when the scanner reaches the end of its buffer, which
        !          1040: may be a long time after scanning a statement such as an "include"
        !          1041: which requires switching the input source.
        !          1042: .LP
        !          1043: To negotiate these sorts of problems,
        !          1044: .I flex
        !          1045: provides a mechanism for creating and switching between multiple
        !          1046: input buffers.  An input buffer is created by using:
        !          1047: .nf
        !          1048: 
        !          1049:     YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
        !          1050: 
        !          1051: .fi
        !          1052: which takes a
        !          1053: .I FILE
        !          1054: pointer and a size and creates a buffer associated with the given
        !          1055: file and large enough to hold
        !          1056: .I size
        !          1057: characters (when in doubt, use
        !          1058: .B YY_BUF_SIZE
        !          1059: for the size).  It returns a
        !          1060: .B YY_BUFFER_STATE
        !          1061: handle, which may then be passed to other routines:
        !          1062: .nf
        !          1063: 
        !          1064:     void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
        !          1065: 
        !          1066: .fi
        !          1067: switches the scanner's input buffer so subsequent tokens will
        !          1068: come from
        !          1069: .I new_buffer.
        !          1070: Note that
        !          1071: .B yy_switch_to_buffer()
        !          1072: may be used by yywrap() to sets things up for continued scanning, instead
        !          1073: of opening a new file and pointing
        !          1074: .I yyin
        !          1075: at it.
        !          1076: .nf
        !          1077: 
        !          1078:     void yy_delete_buffer( YY_BUFFER_STATE buffer )
        !          1079: 
        !          1080: .fi
        !          1081: is used to reclaim the storage associated with a buffer.
        !          1082: .LP
        !          1083: .B yy_new_buffer()
        !          1084: is an alias for
        !          1085: .B yy_create_buffer(),
        !          1086: provided for compatibility with the C++ use of
        !          1087: .I new
        !          1088: and
        !          1089: .I delete
        !          1090: for creating and destroying dynamic objects.
        !          1091: .LP
        !          1092: Finally, the
        !          1093: .B YY_CURRENT_BUFFER
        !          1094: macro returns a
        !          1095: .B YY_BUFFER_STATE
        !          1096: handle to the current buffer.
        !          1097: .LP
        !          1098: Here is an example of using these features for writing a scanner
        !          1099: which expands include files (the
        !          1100: .B <<EOF>>
        !          1101: feature is discussed below):
        !          1102: .nf
        !          1103: 
        !          1104:     /* the "incl" state is used for picking up the name
        !          1105:      * of an include file
        !          1106:      */
        !          1107:     %x incl
        !          1108: 
        !          1109:     %{
        !          1110:     #define MAX_INCLUDE_DEPTH 10
        !          1111:     YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
        !          1112:     int include_stack_ptr = 0;
        !          1113:     %}
        !          1114: 
        !          1115:     %%
        !          1116:     include             BEGIN(incl);
        !          1117: 
        !          1118:     [a-z]+              ECHO;
        !          1119:     [^a-z\\n]*\\n?        ECHO;
        !          1120: 
        !          1121:     <incl>[ \\t]*      /* eat the whitespace */
        !          1122:     <incl>[^ \\t\\n]+   { /* got the include file name */
        !          1123:             if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
        !          1124:                 {
        !          1125:                 fprintf( stderr, "Includes nested too deeply" );
        !          1126:                 exit( 1 );
        !          1127:                 }
        !          1128: 
        !          1129:             include_stack[include_stack_ptr++] =
        !          1130:                 YY_CURRENT_BUFFER;
        !          1131: 
        !          1132:             yyin = fopen( yytext, "r" );
        !          1133: 
        !          1134:             if ( ! yyin )
        !          1135:                 error( ... );
        !          1136: 
        !          1137:             yy_switch_to_buffer(
        !          1138:                 yy_create_buffer( yyin, YY_BUF_SIZE ) );
        !          1139: 
        !          1140:             BEGIN(INITIAL);
        !          1141:             }
        !          1142: 
        !          1143:     <<EOF>> {
        !          1144:             if ( --include_stack_ptr < 0 )
        !          1145:                 {
        !          1146:                 yyterminate();
        !          1147:                 }
        !          1148: 
        !          1149:             else
        !          1150:                 yy_switch_to_buffer(
        !          1151:                      include_stack[include_stack_ptr] );
        !          1152:             }
        !          1153: 
        !          1154: .fi
        !          1155: .SH END-OF-FILE RULES
        !          1156: The special rule "<<EOF>>" indicates
        !          1157: actions which are to be taken when an end-of-file is
        !          1158: encountered and yywrap() returns non-zero (i.e., indicates
        !          1159: no further files to process).  The action must finish
        !          1160: by doing one of four things:
        !          1161: .IP -
        !          1162: the special
        !          1163: .B YY_NEW_FILE
        !          1164: action, if
        !          1165: .I yyin
        !          1166: has been pointed at a new file to process;
        !          1167: .IP -
        !          1168: a
        !          1169: .I return
        !          1170: statement;
        !          1171: .IP -
        !          1172: the special
        !          1173: .B yyterminate()
        !          1174: action;
        !          1175: .IP -
        !          1176: or, switching to a new buffer using
        !          1177: .B yy_switch_to_buffer()
        !          1178: as shown in the example above.
        !          1179: .LP
        !          1180: <<EOF>> rules may not be used with other
        !          1181: patterns; they may only be qualified with a list of start
        !          1182: conditions.  If an unqualified <<EOF>> rule is given, it
        !          1183: applies to
        !          1184: .I all
        !          1185: start conditions which do not already have <<EOF>> actions.  To
        !          1186: specify an <<EOF>> rule for only the initial start condition, use
        !          1187: .nf
        !          1188: 
        !          1189:     <INITIAL><<EOF>>
        !          1190: 
        !          1191: .fi
        !          1192: .LP
        !          1193: These rules are useful for catching things like unclosed comments.
        !          1194: An example:
        !          1195: .nf
        !          1196: 
        !          1197:     %x quote
        !          1198:     %%
        !          1199: 
        !          1200:     ...other rules for dealing with quotes...
        !          1201: 
        !          1202:     <quote><<EOF>>   {
        !          1203:              error( "unterminated quote" );
        !          1204:              yyterminate();
        !          1205:              }
        !          1206:     <<EOF>>  {
        !          1207:              if ( *++filelist )
        !          1208:                  {
        !          1209:                  yyin = fopen( *filelist, "r" );
        !          1210:                  YY_NEW_FILE;
        !          1211:                  }
        !          1212:              else
        !          1213:                 yyterminate();
        !          1214:              }
        !          1215: 
        !          1216: .fi
        !          1217: .SH MISCELLANEOUS MACROS
        !          1218: The macro
        !          1219: .bd
        !          1220: YY_USER_ACTION
        !          1221: can be redefined to provide an action
        !          1222: which is always executed prior to the matched rule's action.  For example,
        !          1223: it could be #define'd to call a routine to convert yytext to lower-case.
        !          1224: .LP
        !          1225: The macro
        !          1226: .B YY_USER_INIT
        !          1227: may be redefined to provide an action which is always executed before
        !          1228: the first scan (and before the scanner's internal initializations are done).
        !          1229: For example, it could be used to call a routine to read
        !          1230: in a data table or open a logging file.
        !          1231: .LP
        !          1232: In the generated scanner, the actions are all gathered in one large
        !          1233: switch statement and separated using
        !          1234: .B YY_BREAK,
        !          1235: which may be redefined.  By default, it is simply a "break", to separate
        !          1236: each rule's action from the following rule's.
        !          1237: Redefining
        !          1238: .B YY_BREAK
        !          1239: allows, for example, C++ users to
        !          1240: #define YY_BREAK to do nothing (while being very careful that every
        !          1241: rule ends with a "break" or a "return"!) to avoid suffering from
        !          1242: unreachable statement warnings where because a rule's action ends with
        !          1243: "return", the
        !          1244: .B YY_BREAK
        !          1245: is inaccessible.
        !          1246: .SH INTERFACING WITH YACC
        !          1247: One of the main uses of
        !          1248: .I flex
        !          1249: is as a companion to the
        !          1250: .I yacc
        !          1251: parser-generator.
        !          1252: .I yacc
        !          1253: parsers expect to call a routine named
        !          1254: .B yylex()
        !          1255: to find the next input token.  The routine is supposed to
        !          1256: return the type of the next token as well as putting any associated
        !          1257: value in the global
        !          1258: .B yylval.
        !          1259: To use
        !          1260: .I flex
        !          1261: with
        !          1262: .I yacc,
        !          1263: one specifies the
        !          1264: .B -d
        !          1265: option to
        !          1266: .I yacc
        !          1267: to instruct it to generate the file
        !          1268: .B y.tab.h
        !          1269: containing definitions of all the
        !          1270: .B %tokens
        !          1271: appearing in the
        !          1272: .I yacc
        !          1273: input.  This file is then included in the
        !          1274: .I flex
        !          1275: scanner.  For example, if one of the tokens is "TOK_NUMBER",
        !          1276: part of the scanner might look like:
        !          1277: .nf
        !          1278: 
        !          1279:     %{
        !          1280:     #include "y.tab.h"
        !          1281:     %}
        !          1282: 
        !          1283:     %%
        !          1284: 
        !          1285:     [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
        !          1286: 
        !          1287: .fi
        !          1288: .SH TRANSLATION TABLE
        !          1289: In the name of POSIX compliance,
        !          1290: .I flex
        !          1291: supports a
        !          1292: .I translation table
        !          1293: for mapping input characters into groups.
        !          1294: The table is specified in the first section, and its format looks like:
        !          1295: .nf
        !          1296: 
        !          1297:     %t
        !          1298:     1        abcd
        !          1299:     2        ABCDEFGHIJKLMNOPQRSTUVWXYZ
        !          1300:     52       0123456789
        !          1301:     6        \\t\\ \\n
        !          1302:     %t
        !          1303: 
        !          1304: .fi
        !          1305: This example specifies that the characters 'a', 'b', 'c', and 'd'
        !          1306: are to all be lumped into group #1, upper-case letters
        !          1307: in group #2, digits in group #52, tabs, blanks, and newlines into
        !          1308: group #6, and
        !          1309: .I
        !          1310: no other characters will appear in the patterns.
        !          1311: The group numbers are actually disregarded by
        !          1312: .I flex;
        !          1313: .B %t
        !          1314: serves, though, to lump characters together.  Given the above
        !          1315: table, for example, the pattern "a(AA)*5" is equivalent to "d(ZQ)*0".
        !          1316: They both say, "match any character in group #1, followed by
        !          1317: zero-or-more pairs of characters
        !          1318: from group #2, followed by a character from group #52."  Thus
        !          1319: .B %t
        !          1320: provides a crude way for introducing equivalence classes into
        !          1321: the scanner specification.
        !          1322: .LP
        !          1323: Note that the
        !          1324: .B -i
        !          1325: option (see below) coupled with the equivalence classes which
        !          1326: .I flex
        !          1327: automatically generates take care of virtually all the instances
        !          1328: when one might consider using
        !          1329: .B %t.
        !          1330: But what the hell, it's there if you want it.
        !          1331: .SH OPTIONS
        !          1332: .I flex
        !          1333: has the following options:
        !          1334: .TP
        !          1335: .B -b
        !          1336: Generate backtracking information to
        !          1337: .I lex.backtrack.
        !          1338: This is a list of scanner states which require backtracking
        !          1339: and the input characters on which they do so.  By adding rules one
        !          1340: can remove backtracking states.  If all backtracking states
        !          1341: are eliminated and
        !          1342: .B -f
        !          1343: or
        !          1344: .B -F
        !          1345: is used, the generated scanner will run faster (see the
        !          1346: .B -p
        !          1347: flag).  Only users who wish to squeeze every last cycle out of their
        !          1348: scanners need worry about this option.  (See the section on PERFORMANCE
        !          1349: CONSIDERATIONS below.)
        !          1350: .TP
        !          1351: .B -c
        !          1352: is a do-nothing, deprecated option included for POSIX compliance.
        !          1353: .IP
        !          1354: .B NOTE:
        !          1355: in previous releases of
        !          1356: .I flex
        !          1357: .B -c
        !          1358: specified table-compression options.  This functionality is
        !          1359: now given by the
        !          1360: .B -C
        !          1361: flag.  To ease the the impact of this change, when
        !          1362: .I flex
        !          1363: encounters
        !          1364: .B -c,
        !          1365: it currently issues a warning message and assumes that
        !          1366: .B -C
        !          1367: was desired instead.  In the future this "promotion" of
        !          1368: .B -c
        !          1369: to
        !          1370: .B -C
        !          1371: will go away in the name of full POSIX compliance (unless
        !          1372: the POSIX meaning is removed first).
        !          1373: .TP
        !          1374: .B -d
        !          1375: makes the generated scanner run in
        !          1376: .I debug
        !          1377: mode.  Whenever a pattern is recognized and the global
        !          1378: .B yy_flex_debug
        !          1379: is non-zero (which is the default),
        !          1380: the scanner will write to
        !          1381: .I stderr
        !          1382: a line of the form:
        !          1383: .nf
        !          1384: 
        !          1385:     --accepting rule at line 53 ("the matched text")
        !          1386: 
        !          1387: .fi
        !          1388: The line number refers to the location of the rule in the file
        !          1389: defining the scanner (i.e., the file that was fed to flex).  Messages
        !          1390: are also generated when the scanner backtracks, accepts the
        !          1391: default rule, reaches the end of its input buffer (or encounters
        !          1392: a NUL; at this point, the two look the same as far as the scanner's concerned),
        !          1393: or reaches an end-of-file.
        !          1394: .TP
        !          1395: .B -f
        !          1396: specifies (take your pick)
        !          1397: .I full table
        !          1398: or
        !          1399: .I fast scanner.
        !          1400: No table compression is done.  The result is large but fast.
        !          1401: This option is equivalent to
        !          1402: .B -Cf
        !          1403: (see below).
        !          1404: .TP
        !          1405: .B -i
        !          1406: instructs
        !          1407: .I flex
        !          1408: to generate a
        !          1409: .I case-insensitive
        !          1410: scanner.  The case of letters given in the
        !          1411: .I flex
        !          1412: input patterns will
        !          1413: be ignored, and tokens in the input will be matched regardless of case.  The
        !          1414: matched text given in
        !          1415: .I yytext
        !          1416: will have the preserved case (i.e., it will not be folded).
        !          1417: .TP
        !          1418: .B -n
        !          1419: is another do-nothing, deprecated option included only for
        !          1420: POSIX compliance.
        !          1421: .TP
        !          1422: .B -p
        !          1423: generates a performance report to stderr.  The report
        !          1424: consists of comments regarding features of the
        !          1425: .I flex
        !          1426: input file which will cause a loss of performance in the resulting scanner.
        !          1427: Note that the use of
        !          1428: .I REJECT
        !          1429: and variable trailing context (see the BUGS section in flex(1))
        !          1430: entails a substantial performance penalty; use of
        !          1431: .I yymore(),
        !          1432: the
        !          1433: .B ^
        !          1434: operator,
        !          1435: and the
        !          1436: .B -I
        !          1437: flag entail minor performance penalties.
        !          1438: .TP
        !          1439: .B -s
        !          1440: causes the
        !          1441: .I default rule
        !          1442: (that unmatched scanner input is echoed to
        !          1443: .I stdout)
        !          1444: to be suppressed.  If the scanner encounters input that does not
        !          1445: match any of its rules, it aborts with an error.  This option is
        !          1446: useful for finding holes in a scanner's rule set.
        !          1447: .TP
        !          1448: .B -t
        !          1449: instructs
        !          1450: .I flex
        !          1451: to write the scanner it generates to standard output instead
        !          1452: of
        !          1453: .B lex.yy.c.
        !          1454: .TP
        !          1455: .B -v
        !          1456: specifies that
        !          1457: .I flex
        !          1458: should write to
        !          1459: .I stderr
        !          1460: a summary of statistics regarding the scanner it generates.
        !          1461: Most of the statistics are meaningless to the casual
        !          1462: .I flex
        !          1463: user, but the
        !          1464: first line identifies the version of
        !          1465: .I flex,
        !          1466: which is useful for figuring
        !          1467: out where you stand with respect to patches and new releases,
        !          1468: and the next two lines give the date when the scanner was created
        !          1469: and a summary of the flags which were in effect.
        !          1470: .TP
        !          1471: .B -F
        !          1472: specifies that the
        !          1473: .ul
        !          1474: fast
        !          1475: scanner table representation should be used.  This representation is
        !          1476: about as fast as the full table representation
        !          1477: .ul
        !          1478: (-f),
        !          1479: and for some sets of patterns will be considerably smaller (and for
        !          1480: others, larger).  In general, if the pattern set contains both "keywords"
        !          1481: and a catch-all, "identifier" rule, such as in the set:
        !          1482: .nf
        !          1483: 
        !          1484:     "case"    return TOK_CASE;
        !          1485:     "switch"  return TOK_SWITCH;
        !          1486:     ...
        !          1487:     "default" return TOK_DEFAULT;
        !          1488:     [a-z]+    return TOK_ID;
        !          1489: 
        !          1490: .fi
        !          1491: then you're better off using the full table representation.  If only
        !          1492: the "identifier" rule is present and you then use a hash table or some such
        !          1493: to detect the keywords, you're better off using
        !          1494: .ul
        !          1495: -F.
        !          1496: .IP
        !          1497: This option is equivalent to
        !          1498: .B -CF
        !          1499: (see below).
        !          1500: .TP
        !          1501: .B -I
        !          1502: instructs
        !          1503: .I flex
        !          1504: to generate an
        !          1505: .I interactive
        !          1506: scanner.  Normally, scanners generated by
        !          1507: .I flex
        !          1508: always look ahead one
        !          1509: character before deciding that a rule has been matched.  At the cost of
        !          1510: some scanning overhead,
        !          1511: .I flex
        !          1512: will generate a scanner which only looks ahead
        !          1513: when needed.  Such scanners are called
        !          1514: .I interactive
        !          1515: because if you want to write a scanner for an interactive system such as a
        !          1516: command shell, you will probably want the user's input to be terminated
        !          1517: with a newline, and without
        !          1518: .B -I
        !          1519: the user will have to type a character in addition to the newline in order
        !          1520: to have the newline recognized.  This leads to dreadful interactive
        !          1521: performance.
        !          1522: .IP
        !          1523: If all this seems to confusing, here's the general rule: if a human will
        !          1524: be typing in input to your scanner, use
        !          1525: .B -I,
        !          1526: otherwise don't; if you don't care about squeezing the utmost performance
        !          1527: from your scanner and you
        !          1528: don't want to make any assumptions about the input to your scanner,
        !          1529: use
        !          1530: .B -I.
        !          1531: .IP
        !          1532: Note,
        !          1533: .B -I
        !          1534: cannot be used in conjunction with
        !          1535: .I full
        !          1536: or
        !          1537: .I fast tables,
        !          1538: i.e., the
        !          1539: .B -f, -F, -Cf,
        !          1540: or
        !          1541: .B -CF
        !          1542: flags.
        !          1543: .TP
        !          1544: .B -L
        !          1545: instructs
        !          1546: .I flex
        !          1547: not to generate
        !          1548: .B #line
        !          1549: directives.  Without this option,
        !          1550: .I flex
        !          1551: peppers the generated scanner
        !          1552: with #line directives so error messages in the actions will be correctly
        !          1553: located with respect to the original
        !          1554: .I flex
        !          1555: input file, and not to
        !          1556: the fairly meaningless line numbers of
        !          1557: .B lex.yy.c.
        !          1558: (Unfortunately
        !          1559: .I flex
        !          1560: does not presently generate the necessary directives
        !          1561: to "retarget" the line numbers for those parts of
        !          1562: .B lex.yy.c
        !          1563: which it generated.  So if there is an error in the generated code,
        !          1564: a meaningless line number is reported.)
        !          1565: .TP
        !          1566: .B -T
        !          1567: makes
        !          1568: .I flex
        !          1569: run in
        !          1570: .I trace
        !          1571: mode.  It will generate a lot of messages to
        !          1572: .I stdout
        !          1573: concerning
        !          1574: the form of the input and the resultant non-deterministic and deterministic
        !          1575: finite automata.  This option is mostly for use in maintaining
        !          1576: .I flex.
        !          1577: .TP
        !          1578: .B -8
        !          1579: instructs
        !          1580: .I flex
        !          1581: to generate an 8-bit scanner, i.e., one which can recognize 8-bit
        !          1582: characters.  On some sites,
        !          1583: .I flex
        !          1584: is installed with this option as the default.  On others, the default
        !          1585: is 7-bit characters.  To see which is the case, check the verbose
        !          1586: .B (-v)
        !          1587: output for "equivalence classes created".  If the denominator of
        !          1588: the number shown is 128, then by default
        !          1589: .I flex
        !          1590: is generating 7-bit characters.  If it is 256, then the default is
        !          1591: 8-bit characters and the
        !          1592: .B -8
        !          1593: flag is not required (but may be a good idea to keep the scanner
        !          1594: specification portable).  Feeding a 7-bit scanner 8-bit characters
        !          1595: will result in infinite loops, bus errors, or other such fireworks,
        !          1596: so when in doubt, use the flag.  Note that if equivalence classes
        !          1597: are used, 8-bit scanners take only slightly more table space than
        !          1598: 7-bit scanners (128 bytes, to be exact); if equivalence classes are
        !          1599: not used, however, then the tables may grow up to twice their
        !          1600: 7-bit size.
        !          1601: .TP 
        !          1602: .B -C[efmF]
        !          1603: controls the degree of table compression.
        !          1604: .IP
        !          1605: .B -Ce
        !          1606: directs
        !          1607: .I flex
        !          1608: to construct
        !          1609: .I equivalence classes,
        !          1610: i.e., sets of characters
        !          1611: which have identical lexical properties (for example, if the only
        !          1612: appearance of digits in the
        !          1613: .I flex
        !          1614: input is in the character class
        !          1615: "[0-9]" then the digits '0', '1', ..., '9' will all be put
        !          1616: in the same equivalence class).  Equivalence classes usually give
        !          1617: dramatic reductions in the final table/object file sizes (typically
        !          1618: a factor of 2-5) and are pretty cheap performance-wise (one array
        !          1619: look-up per character scanned).
        !          1620: .IP
        !          1621: .B -Cf
        !          1622: specifies that the
        !          1623: .I full
        !          1624: scanner tables should be generated -
        !          1625: .I flex
        !          1626: should not compress the
        !          1627: tables by taking advantages of similar transition functions for
        !          1628: different states.
        !          1629: .IP
        !          1630: .B -CF
        !          1631: specifies that the alternate fast scanner representation (described
        !          1632: above under the
        !          1633: .B -F
        !          1634: flag)
        !          1635: should be used.
        !          1636: .IP
        !          1637: .B -Cm
        !          1638: directs
        !          1639: .I flex
        !          1640: to construct
        !          1641: .I meta-equivalence classes,
        !          1642: which are sets of equivalence classes (or characters, if equivalence
        !          1643: classes are not being used) that are commonly used together.  Meta-equivalence
        !          1644: classes are often a big win when using compressed tables, but they
        !          1645: have a moderate performance impact (one or two "if" tests and one
        !          1646: array look-up per character scanned).
        !          1647: .IP
        !          1648: A lone
        !          1649: .B -C
        !          1650: specifies that the scanner tables should be compressed but neither
        !          1651: equivalence classes nor meta-equivalence classes should be used.
        !          1652: .IP
        !          1653: The options
        !          1654: .B -Cf
        !          1655: or
        !          1656: .B -CF
        !          1657: and
        !          1658: .B -Cm
        !          1659: do not make sense together - there is no opportunity for meta-equivalence
        !          1660: classes if the table is not being compressed.  Otherwise the options
        !          1661: may be freely mixed.
        !          1662: .IP
        !          1663: The default setting is
        !          1664: .B -Cem,
        !          1665: which specifies that
        !          1666: .I flex
        !          1667: should generate equivalence classes
        !          1668: and meta-equivalence classes.  This setting provides the highest
        !          1669: degree of table compression.  You can trade off
        !          1670: faster-executing scanners at the cost of larger tables with
        !          1671: the following generally being true:
        !          1672: .nf
        !          1673: 
        !          1674:     slowest & smallest
        !          1675:           -Cem
        !          1676:           -Cm
        !          1677:           -Ce
        !          1678:           -C
        !          1679:           -C{f,F}e
        !          1680:           -C{f,F}
        !          1681:     fastest & largest
        !          1682: 
        !          1683: .fi
        !          1684: Note that scanners with the smallest tables are usually generated and
        !          1685: compiled the quickest, so
        !          1686: during development you will usually want to use the default, maximal
        !          1687: compression.
        !          1688: .IP
        !          1689: .B -Cfe
        !          1690: is often a good compromise between speed and size for production
        !          1691: scanners.
        !          1692: .IP
        !          1693: .B -C
        !          1694: options are not cumulative; whenever the flag is encountered, the
        !          1695: previous -C settings are forgotten.
        !          1696: .TP
        !          1697: .B -Sskeleton_file
        !          1698: overrides the default skeleton file from which
        !          1699: .I flex
        !          1700: constructs its scanners.  You'll never need this option unless you are doing
        !          1701: .I flex
        !          1702: maintenance or development.
        !          1703: .SH PERFORMANCE CONSIDERATIONS
        !          1704: The main design goal of
        !          1705: .I flex
        !          1706: is that it generate high-performance scanners.  It has been optimized
        !          1707: for dealing well with large sets of rules.  Aside from the effects
        !          1708: of table compression on scanner speed outlined above,
        !          1709: there are a number of options/actions which degrade performance.  These
        !          1710: are, from most expensive to least:
        !          1711: .nf
        !          1712: 
        !          1713:     REJECT
        !          1714: 
        !          1715:     pattern sets that require backtracking
        !          1716:     arbitrary trailing context
        !          1717: 
        !          1718:     '^' beginning-of-line operator
        !          1719:     yymore()
        !          1720: 
        !          1721: .fi
        !          1722: with the first three all being quite expensive and the last two
        !          1723: being quite cheap.
        !          1724: .LP
        !          1725: .B REJECT
        !          1726: should be avoided at all costs when performance is important.
        !          1727: It is a particularly expensive option.
        !          1728: .LP
        !          1729: Getting rid of backtracking is messy and often may be an enormous
        !          1730: amount of work for a complicated scanner.  In principal, one begins
        !          1731: by using the
        !          1732: .B -b 
        !          1733: flag to generate a
        !          1734: .I lex.backtrack
        !          1735: file.  For example, on the input
        !          1736: .nf
        !          1737: 
        !          1738:     %%
        !          1739:     foo        return TOK_KEYWORD;
        !          1740:     foobar     return TOK_KEYWORD;
        !          1741: 
        !          1742: .fi
        !          1743: the file looks like:
        !          1744: .nf
        !          1745: 
        !          1746:     State #6 is non-accepting -
        !          1747:      associated rule line numbers:
        !          1748:            2       3
        !          1749:      out-transitions: [ o ]
        !          1750:      jam-transitions: EOF [ \\001-n  p-\\177 ]
        !          1751: 
        !          1752:     State #8 is non-accepting -
        !          1753:      associated rule line numbers:
        !          1754:            3
        !          1755:      out-transitions: [ a ]
        !          1756:      jam-transitions: EOF [ \\001-`  b-\\177 ]
        !          1757: 
        !          1758:     State #9 is non-accepting -
        !          1759:      associated rule line numbers:
        !          1760:            3
        !          1761:      out-transitions: [ r ]
        !          1762:      jam-transitions: EOF [ \\001-q  s-\\177 ]
        !          1763: 
        !          1764:     Compressed tables always backtrack.
        !          1765: 
        !          1766: .fi
        !          1767: The first few lines tell us that there's a scanner state in
        !          1768: which it can make a transition on an 'o' but not on any other
        !          1769: character, and that in that state the currently scanned text does not match
        !          1770: any rule.  The state occurs when trying to match the rules found
        !          1771: at lines 2 and 3 in the input file.
        !          1772: If the scanner is in that state and then reads
        !          1773: something other than an 'o', it will have to backtrack to find
        !          1774: a rule which is matched.  With
        !          1775: a bit of headscratching one can see that this must be the
        !          1776: state it's in when it has seen "fo".  When this has happened,
        !          1777: if anything other than another 'o' is seen, the scanner will
        !          1778: have to back up to simply match the 'f' (by the default rule).
        !          1779: .LP
        !          1780: The comment regarding State #8 indicates there's a problem
        !          1781: when "foob" has been scanned.  Indeed, on any character other
        !          1782: than a 'b', the scanner will have to back up to accept "foo".
        !          1783: Similarly, the comment for State #9 concerns when "fooba" has
        !          1784: been scanned.
        !          1785: .LP
        !          1786: The final comment reminds us that there's no point going to
        !          1787: all the trouble of removing backtracking from the rules unless
        !          1788: we're using
        !          1789: .B -f
        !          1790: or
        !          1791: .B -F,
        !          1792: since there's no performance gain doing so with compressed scanners.
        !          1793: .LP
        !          1794: The way to remove the backtracking is to add "error" rules:
        !          1795: .nf
        !          1796: 
        !          1797:     %%
        !          1798:     foo         return TOK_KEYWORD;
        !          1799:     foobar      return TOK_KEYWORD;
        !          1800: 
        !          1801:     fooba       |
        !          1802:     foob        |
        !          1803:     fo          {
        !          1804:                 /* false alarm, not really a keyword */
        !          1805:                 return TOK_ID;
        !          1806:                 }
        !          1807: 
        !          1808: .fi
        !          1809: .LP
        !          1810: Eliminating backtracking among a list of keywords can also be
        !          1811: done using a "catch-all" rule:
        !          1812: .nf
        !          1813: 
        !          1814:     %%
        !          1815:     foo         return TOK_KEYWORD;
        !          1816:     foobar      return TOK_KEYWORD;
        !          1817: 
        !          1818:     [a-z]+      return TOK_ID;
        !          1819: 
        !          1820: .fi
        !          1821: This is usually the best solution when appropriate.
        !          1822: .LP
        !          1823: Backtracking messages tend to cascade.
        !          1824: With a complicated set of rules it's not uncommon to get hundreds
        !          1825: of messages.  If one can decipher them, though, it often
        !          1826: only takes a dozen or so rules to eliminate the backtracking (though
        !          1827: it's easy to make a mistake and have an error rule accidentally match
        !          1828: a valid token.  A possible future
        !          1829: .I flex
        !          1830: feature will be to automatically add rules to eliminate backtracking).
        !          1831: .LP
        !          1832: .I Variable
        !          1833: trailing context (where both the leading and trailing parts do not have
        !          1834: a fixed length) entails almost the same performance loss as
        !          1835: .I REJECT
        !          1836: (i.e., substantial).  So when possible a rule like:
        !          1837: .nf
        !          1838: 
        !          1839:     %%
        !          1840:     mouse|rat/(cat|dog)   run();
        !          1841: 
        !          1842: .fi
        !          1843: is better written:
        !          1844: .nf
        !          1845: 
        !          1846:     %%
        !          1847:     mouse/cat|dog         run();
        !          1848:     rat/cat|dog           run();
        !          1849: 
        !          1850: .fi
        !          1851: or as
        !          1852: .nf
        !          1853: 
        !          1854:     %%
        !          1855:     mouse|rat/cat         run();
        !          1856:     mouse|rat/dog         run();
        !          1857: 
        !          1858: .fi
        !          1859: Note that here the special '|' action does
        !          1860: .I not
        !          1861: provide any savings, and can even make things worse (see
        !          1862: .B BUGS
        !          1863: in flex(1)).
        !          1864: .LP
        !          1865: Another area where the user can increase a scanner's performance
        !          1866: (and one that's easier to implement) arises from the fact that
        !          1867: the longer the tokens matched, the faster the scanner will run.
        !          1868: This is because with long tokens the processing of most input
        !          1869: characters takes place in the (short) inner scanning loop, and
        !          1870: does not often have to go through the additional work of setting up
        !          1871: the scanning environment (e.g.,
        !          1872: .B yytext)
        !          1873: for the action.  Recall the scanner for C comments:
        !          1874: .nf
        !          1875: 
        !          1876:     %x comment
        !          1877:     %%
        !          1878:             int line_num = 1;
        !          1879: 
        !          1880:     "/*"         BEGIN(comment);
        !          1881: 
        !          1882:     <comment>[^*\\n]*
        !          1883:     <comment>"*"+[^*/\\n]*
        !          1884:     <comment>\\n             ++line_num;
        !          1885:     <comment>"*"+"/"        BEGIN(INITIAL);
        !          1886: 
        !          1887: .fi
        !          1888: This could be sped up by writing it as:
        !          1889: .nf
        !          1890: 
        !          1891:     %x comment
        !          1892:     %%
        !          1893:             int line_num = 1;
        !          1894: 
        !          1895:     "/*"         BEGIN(comment);
        !          1896: 
        !          1897:     <comment>[^*\\n]*
        !          1898:     <comment>[^*\\n]*\\n      ++line_num;
        !          1899:     <comment>"*"+[^*/\\n]*
        !          1900:     <comment>"*"+[^*/\\n]*\\n ++line_num;
        !          1901:     <comment>"*"+"/"        BEGIN(INITIAL);
        !          1902: 
        !          1903: .fi
        !          1904: Now instead of each newline requiring the processing of another
        !          1905: action, recognizing the newlines is "distributed" over the other rules
        !          1906: to keep the matched text as long as possible.  Note that
        !          1907: .I adding
        !          1908: rules does
        !          1909: .I not
        !          1910: slow down the scanner!  The speed of the scanner is independent
        !          1911: of the number of rules or (modulo the considerations given at the
        !          1912: beginning of this section) how complicated the rules are with
        !          1913: regard to operators such as '*' and '|'.
        !          1914: .LP
        !          1915: A final example in speeding up a scanner: suppose you want to scan
        !          1916: through a file containing identifiers and keywords, one per line
        !          1917: and with no other extraneous characters, and recognize all the
        !          1918: keywords.  A natural first approach is:
        !          1919: .nf
        !          1920: 
        !          1921:     %%
        !          1922:     asm      |
        !          1923:     auto     |
        !          1924:     break    |
        !          1925:     ... etc ...
        !          1926:     volatile |
        !          1927:     while    /* it's a keyword */
        !          1928: 
        !          1929:     .|\\n     /* it's not a keyword */
        !          1930: 
        !          1931: .fi
        !          1932: To eliminate the back-tracking, introduce a catch-all rule:
        !          1933: .nf
        !          1934: 
        !          1935:     %%
        !          1936:     asm      |
        !          1937:     auto     |
        !          1938:     break    |
        !          1939:     ... etc ...
        !          1940:     volatile |
        !          1941:     while    /* it's a keyword */
        !          1942: 
        !          1943:     [a-z]+   |
        !          1944:     .|\\n     /* it's not a keyword */
        !          1945: 
        !          1946: .fi
        !          1947: Now, if it's guaranteed that there's exactly one word per line,
        !          1948: then we can reduce the total number of matches by a half by
        !          1949: merging in the recognition of newlines with that of the other
        !          1950: tokens:
        !          1951: .nf
        !          1952: 
        !          1953:     %%
        !          1954:     asm\\n    |
        !          1955:     auto\\n   |
        !          1956:     break\\n  |
        !          1957:     ... etc ...
        !          1958:     volatile\\n |
        !          1959:     while\\n  /* it's a keyword */
        !          1960: 
        !          1961:     [a-z]+\\n |
        !          1962:     .|\\n     /* it's not a keyword */
        !          1963: 
        !          1964: .fi
        !          1965: One has to be careful here, as we have now reintroduced backtracking
        !          1966: into the scanner.  In particular, while
        !          1967: .I we
        !          1968: know that there will never be any characters in the input stream
        !          1969: other than letters or newlines,
        !          1970: .I flex
        !          1971: can't figure this out, and it will plan for possibly needing backtracking
        !          1972: when it has scanned a token like "auto" and then the next character
        !          1973: is something other than a newline or a letter.  Previously it would
        !          1974: then just match the "auto" rule and be done, but now it has no "auto"
        !          1975: rule, only a "auto\\n" rule.  To eliminate the possibility of backtracking,
        !          1976: we could either duplicate all rules but without final newlines, or,
        !          1977: since we never expect to encounter such an input and therefore don't
        !          1978: how it's classified, we can introduce one more catch-all rule, this
        !          1979: one which doesn't include a newline:
        !          1980: .nf
        !          1981: 
        !          1982:     %%
        !          1983:     asm\\n    |
        !          1984:     auto\\n   |
        !          1985:     break\\n  |
        !          1986:     ... etc ...
        !          1987:     volatile\\n |
        !          1988:     while\\n  /* it's a keyword */
        !          1989: 
        !          1990:     [a-z]+\\n |
        !          1991:     [a-z]+   |
        !          1992:     .|\\n     /* it's not a keyword */
        !          1993: 
        !          1994: .fi
        !          1995: Compiled with
        !          1996: .B -Cf,
        !          1997: this is about as fast as one can get a
        !          1998: .I flex 
        !          1999: scanner to go for this particular problem.
        !          2000: .LP
        !          2001: A final note:
        !          2002: .I flex
        !          2003: is slow when matching NUL's, particularly when a token contains
        !          2004: multiple NUL's.
        !          2005: It's best to write rules which match
        !          2006: .I short
        !          2007: amounts of text if it's anticipated that the text will often include NUL's.
        !          2008: .SH INCOMPATIBILITIES WITH LEX AND POSIX
        !          2009: .I flex
        !          2010: is a rewrite of the Unix
        !          2011: .I lex
        !          2012: tool (the two implementations do not share any code, though),
        !          2013: with some extensions and incompatibilities, both of which
        !          2014: are of concern to those who wish to write scanners acceptable
        !          2015: to either implementation.  At present, the POSIX
        !          2016: .I lex
        !          2017: draft is
        !          2018: very close to the original
        !          2019: .I lex
        !          2020: implementation, so some of these
        !          2021: incompatibilities are also in conflict with the POSIX draft.  But
        !          2022: the intent is that except as noted below,
        !          2023: .I flex
        !          2024: as it presently stands will
        !          2025: ultimately be POSIX conformant (i.e., that those areas of conflict with
        !          2026: the POSIX draft will be resolved in
        !          2027: .I flex's
        !          2028: favor).  Please bear in
        !          2029: mind that all the comments which follow are with regard to the POSIX
        !          2030: .I draft
        !          2031: standard of Summer 1989, and not the final document (or subsequent
        !          2032: drafts); they are included so
        !          2033: .I flex
        !          2034: users can be aware of the standardization issues and those areas where
        !          2035: .I flex
        !          2036: may in the near future undergo changes incompatible with
        !          2037: its current definition.
        !          2038: .LP
        !          2039: .I flex
        !          2040: is fully compatible with
        !          2041: .I lex
        !          2042: with the following exceptions:
        !          2043: .IP -
        !          2044: .I lex
        !          2045: does not support exclusive start conditions (%x), though they
        !          2046: are in the current POSIX draft.
        !          2047: .IP -
        !          2048: When definitions are expanded,
        !          2049: .I flex
        !          2050: encloses them in parentheses.
        !          2051: With lex, the following:
        !          2052: .nf
        !          2053: 
        !          2054:     NAME    [A-Z][A-Z0-9]*
        !          2055:     %%
        !          2056:     foo{NAME}?      printf( "Found it\\n" );
        !          2057:     %%
        !          2058: 
        !          2059: .fi
        !          2060: will not match the string "foo" because when the macro
        !          2061: is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
        !          2062: and the precedence is such that the '?' is associated with
        !          2063: "[A-Z0-9]*".  With
        !          2064: .I flex,
        !          2065: the rule will be expanded to
        !          2066: "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
        !          2067: Note that because of this, the
        !          2068: .B ^, $, <s>, /,
        !          2069: and
        !          2070: .B <<EOF>>
        !          2071: operators cannot be used in a
        !          2072: .I flex
        !          2073: definition.
        !          2074: .IP
        !          2075: The POSIX draft interpretation is the same as
        !          2076: .I flex's.
        !          2077: .IP -
        !          2078: To specify a character class which matches anything but a left bracket (']'),
        !          2079: in
        !          2080: .I lex
        !          2081: one can use "[^]]" but with
        !          2082: .I flex
        !          2083: one must use "[^\\]]".  The latter works with
        !          2084: .I lex,
        !          2085: too.
        !          2086: .IP -
        !          2087: The undocumented
        !          2088: .I lex
        !          2089: scanner internal variable
        !          2090: .B yylineno
        !          2091: is not supported.  (The variable is not part of the POSIX draft.)
        !          2092: .IP -
        !          2093: The
        !          2094: .B input()
        !          2095: routine is not redefinable, though it may be called to read characters
        !          2096: following whatever has been matched by a rule.  If
        !          2097: .B input()
        !          2098: encounters an end-of-file the normal
        !          2099: .B yywrap()
        !          2100: processing is done.  A ``real'' end-of-file is returned by
        !          2101: .B input()
        !          2102: as
        !          2103: .I EOF.
        !          2104: .IP
        !          2105: Input is instead controlled by redefining the
        !          2106: .B YY_INPUT
        !          2107: macro.
        !          2108: .IP
        !          2109: The
        !          2110: .I flex
        !          2111: restriction that
        !          2112: .B input()
        !          2113: cannot be redefined is in accordance with the POSIX draft, but
        !          2114: .B YY_INPUT
        !          2115: has not yet been accepted into the draft.
        !          2116: .IP -
        !          2117: .B output()
        !          2118: is not supported.
        !          2119: Output from the
        !          2120: .B ECHO
        !          2121: macro is done to the file-pointer
        !          2122: .I yyout
        !          2123: (default
        !          2124: .I stdout).
        !          2125: .IP
        !          2126: The POSIX draft mentions that an
        !          2127: .B output()
        !          2128: routine exists but currently gives no details as to what it does.
        !          2129: .IP -
        !          2130: The
        !          2131: .I lex
        !          2132: .B %r
        !          2133: (generate a Ratfor scanner) option is not supported.  It is not part
        !          2134: of the POSIX draft.
        !          2135: .IP -
        !          2136: If you are providing your own yywrap() routine, you must include a
        !          2137: "#undef yywrap" in the definitions section (section 1).  Note that
        !          2138: the "#undef" will have to be enclosed in %{}'s.
        !          2139: .IP
        !          2140: The POSIX draft
        !          2141: specifies that yywrap() is a function and this is unlikely to change; so
        !          2142: .I flex users are warned
        !          2143: that
        !          2144: .B yywrap()
        !          2145: is likely to be changed to a function in the near future.
        !          2146: .IP -
        !          2147: After a call to
        !          2148: .B unput(),
        !          2149: .I yytext
        !          2150: and
        !          2151: .I yyleng
        !          2152: are undefined until the next token is matched.  This is not the case with
        !          2153: .I lex
        !          2154: or the present POSIX draft.
        !          2155: .IP -
        !          2156: The precedence of the
        !          2157: .B {}
        !          2158: (numeric range) operator is different.
        !          2159: .I lex
        !          2160: interprets "abc{1,3}" as "match one, two, or
        !          2161: three occurrences of 'abc'", whereas
        !          2162: .I flex
        !          2163: interprets it as "match 'ab'
        !          2164: followed by one, two, or three occurrences of 'c'".  The latter is
        !          2165: in agreement with the current POSIX draft.
        !          2166: .IP -
        !          2167: The precedence of the
        !          2168: .B ^
        !          2169: operator is different.
        !          2170: .I lex
        !          2171: interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
        !          2172: or 'bar' anywhere", whereas
        !          2173: .I flex
        !          2174: interprets it as "match either 'foo' or 'bar' if they come at the beginning
        !          2175: of a line".  The latter is in agreement with the current POSIX draft.
        !          2176: .IP -
        !          2177: To refer to yytext outside of the scanner source file,
        !          2178: the correct definition with
        !          2179: .I flex
        !          2180: is "extern char *yytext" rather than "extern char yytext[]".
        !          2181: This is contrary to the current POSIX draft but a point on which
        !          2182: .I flex
        !          2183: will not be changing, as the array representation entails a
        !          2184: serious performance penalty.  It is hoped that the POSIX draft will
        !          2185: be emended to support the
        !          2186: .I flex
        !          2187: variety of declaration (as this is a fairly painless change to
        !          2188: require of
        !          2189: .I lex
        !          2190: users).
        !          2191: .IP -
        !          2192: .I yyin
        !          2193: is
        !          2194: .I initialized
        !          2195: by
        !          2196: .I lex
        !          2197: to be
        !          2198: .I stdin;
        !          2199: .I flex,
        !          2200: on the other hand,
        !          2201: initializes
        !          2202: .I yyin
        !          2203: to NULL
        !          2204: and then
        !          2205: .I assigns
        !          2206: it to
        !          2207: .I stdin
        !          2208: the first time the scanner is called, providing
        !          2209: .I yyin
        !          2210: has not already been assigned to a non-NULL value.  The difference is
        !          2211: subtle, but the net effect is that with
        !          2212: .I flex
        !          2213: scanners,
        !          2214: .I yyin
        !          2215: does not have a valid value until the scanner has been called.
        !          2216: .IP -
        !          2217: The special table-size declarations such as
        !          2218: .B %a
        !          2219: supported by
        !          2220: .I lex
        !          2221: are not required by
        !          2222: .I flex
        !          2223: scanners;
        !          2224: .I flex
        !          2225: ignores them.
        !          2226: .IP -
        !          2227: The name
        !          2228: .bd
        !          2229: FLEX_SCANNER
        !          2230: is #define'd so scanners may be written for use with either
        !          2231: .I flex
        !          2232: or
        !          2233: .I lex.
        !          2234: .LP
        !          2235: The following
        !          2236: .I flex
        !          2237: features are not included in
        !          2238: .I lex
        !          2239: or the POSIX draft standard:
        !          2240: .nf
        !          2241: 
        !          2242:     yyterminate()
        !          2243:     <<EOF>>
        !          2244:     YY_DECL
        !          2245:     #line directives
        !          2246:     %{}'s around actions
        !          2247:     yyrestart()
        !          2248:     comments beginning with '#' (deprecated)
        !          2249:     multiple actions on a line
        !          2250: 
        !          2251: .fi
        !          2252: This last feature refers to the fact that with
        !          2253: .I flex
        !          2254: you can put multiple actions on the same line, separated with
        !          2255: semi-colons, while with
        !          2256: .I lex,
        !          2257: the following
        !          2258: .nf
        !          2259: 
        !          2260:     foo    handle_foo(); ++num_foos_seen;
        !          2261: 
        !          2262: .fi
        !          2263: is (rather surprisingly) truncated to
        !          2264: .nf
        !          2265: 
        !          2266:     foo    handle_foo();
        !          2267: 
        !          2268: .fi
        !          2269: .I flex
        !          2270: does not truncate the action.  Actions that are not enclosed in
        !          2271: braces are simply terminated at the end of the line.
        !          2272: .SH DIAGNOSTICS
        !          2273: .I reject_used_but_not_detected undefined
        !          2274: or
        !          2275: .I yymore_used_but_not_detected undefined -
        !          2276: These errors can occur at compile time.  They indicate that the
        !          2277: scanner uses
        !          2278: .B REJECT
        !          2279: or
        !          2280: .B yymore()
        !          2281: but that
        !          2282: .I flex
        !          2283: failed to notice the fact, meaning that
        !          2284: .I flex
        !          2285: scanned the first two sections looking for occurrences of these actions
        !          2286: and failed to find any, but somehow you snuck some in (via a #include
        !          2287: file, for example).  Make an explicit reference to the action in your
        !          2288: .I flex
        !          2289: input file.  (Note that previously
        !          2290: .I flex
        !          2291: supported a
        !          2292: .B %used/%unused
        !          2293: mechanism for dealing with this problem; this feature is still supported
        !          2294: but now deprecated, and will go away soon unless the author hears from
        !          2295: people who can argue compellingly that they need it.)
        !          2296: .LP
        !          2297: .I flex scanner jammed -
        !          2298: a scanner compiled with
        !          2299: .B -s
        !          2300: has encountered an input string which wasn't matched by
        !          2301: any of its rules.
        !          2302: .LP
        !          2303: .I flex input buffer overflowed -
        !          2304: a scanner rule matched a string long enough to overflow the
        !          2305: scanner's internal input buffer (16K bytes by default - controlled by
        !          2306: .B YY_BUF_SIZE
        !          2307: in "flex.skel".  Note that to redefine this macro, you must first
        !          2308: .B #undefine
        !          2309: it).
        !          2310: .LP
        !          2311: .I scanner requires -8 flag -
        !          2312: Your scanner specification includes recognizing 8-bit characters and
        !          2313: you did not specify the -8 flag (and your site has not installed flex
        !          2314: with -8 as the default).
        !          2315: .LP
        !          2316: .I too many %t classes! -
        !          2317: You managed to put every single character into its own %t class.
        !          2318: .I flex
        !          2319: requires that at least one of the classes share characters.
        !          2320: .SH DEFICIENCIES / BUGS
        !          2321: See flex(1).
        !          2322: .SH "SEE ALSO"
        !          2323: .LP
        !          2324: flex(1), lex(1), yacc(1), sed(1), awk(1).
        !          2325: .LP
        !          2326: M. E. Lesk and E. Schmidt,
        !          2327: .I LEX - Lexical Analyzer Generator
        !          2328: .SH AUTHOR
        !          2329: Vern Paxson, with the help of many ideas and much inspiration from
        !          2330: Van Jacobson.  Original version by Jef Poskanzer.  The fast table
        !          2331: representation is a partial implementation of a design done by Van
        !          2332: Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
        !          2333: .LP
        !          2334: Thanks to the many
        !          2335: .I flex
        !          2336: beta-testers, feedbackers, and contributors, especially Casey
        !          2337: Leedom, [email protected],
        !          2338: Frederic Brehm, Nick Christopher, Jason Coughlin,
        !          2339: Scott David Daniels, Leo Eskin,
        !          2340: Chris Faylor, Eric Goldman, Eric
        !          2341: Hughes, Jeffrey R. Jones, Kevin B. Kenny, Ronald Lamprecht,
        !          2342: Greg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Marc Nozell, Esmond Pitt,
        !          2343: Jef Poskanzer, Jim Roskind,
        !          2344: Dave Tallman, Frank Whaley, Ken Yap, and those whose names
        !          2345: have slipped my marginal mail-archiving skills but whose contributions
        !          2346: are appreciated all the same.
        !          2347: .LP
        !          2348: Thanks to Keith Bostic, John Gilmore, Craig Leres, Bob
        !          2349: Mulcahy, Rich Salz, and Richard Stallman for help with various distribution
        !          2350: headaches.
        !          2351: .LP
        !          2352: Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
        !          2353: to Benson Margulies and Fred
        !          2354: Burke for C++ support; to Ove Ewerlid for the basics of support for
        !          2355: NUL's; and to Eric Hughes for the basics of support for multiple buffers.
        !          2356: .LP
        !          2357: Work is being done on extending
        !          2358: .I flex
        !          2359: to generate scanners in which the
        !          2360: state machine is directly represented in C code rather than tables.
        !          2361: These scanners may well be substantially faster than those generated
        !          2362: using -f or -F.  If you are working in this area and are interested
        !          2363: in comparing notes and seeing whether redundant work can be avoided,
        !          2364: contact Ove Ewerlid ([email protected]).
        !          2365: .LP
        !          2366: This work was primarily done when I was at the Real Time Systems Group
        !          2367: at the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks to all there
        !          2368: for the support I received.
        !          2369: .LP
        !          2370: Send comments to:
        !          2371: .nf
        !          2372: 
        !          2373:      Vern Paxson
        !          2374:      Computer Science Department
        !          2375:      4126 Upson Hall
        !          2376:      Cornell University
        !          2377:      Ithaca, NY 14853-7501
        !          2378: 
        !          2379:      [email protected]
        !          2380:      decvax!cornell!vern
        !          2381: 
        !          2382: .fi
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.