|
|
1.1 ! root 1: .\" Copyright (c) 1990 The Regents of the University of California. ! 2: .\" All rights reserved. ! 3: .\" ! 4: .\" Redistribution and use in source and binary forms are permitted provided ! 5: .\" that: (1) source distributions retain this entire copyright notice and ! 6: .\" comment, and (2) distributions including binaries display the following ! 7: .\" acknowledgement: ``This product includes software developed by the ! 8: .\" University of California, Berkeley and its contributors'' in the ! 9: .\" documentation or other materials provided with the distribution and in ! 10: .\" all advertising materials mentioning features or use of this software. ! 11: .\" Neither the name of the University nor the names of its contributors may ! 12: .\" be used to endorse or promote products derived from this software without ! 13: .\" specific prior written permission. ! 14: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED ! 15: .\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF ! 16: .\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. ! 17: .\" ! 18: .\" @(#)lex.1 5.10 (Berkeley) 7/24/90 ! 19: .\" ! 20: .Dd July 24, 1990 ! 21: .Dt LEX 1 ! 22: .Sh NAME ! 23: .Nm lex ! 24: .Nd fast lexical analyzer generator ! 25: .Sh SYNOPSIS ! 26: .Nm lex ! 27: .Ob ! 28: .Op Fl bcdfinpstvFILT8 ! 29: .Cx Fl C ! 30: .Op efmF ! 31: .Cx ! 32: .Cx Fl S ! 33: .Ar skeleton ! 34: .Cx ! 35: .Oe ! 36: .Nm lex ! 37: .Ar ! 38: .Sh DESCRIPTION ! 39: .Nm Lex ! 40: is a tool for generating ! 41: .Ar scanners : ! 42: programs which recognized lexical patterns in text. ! 43: .Nm Lex ! 44: reads ! 45: the given input files, or its standard input if no file names are given, ! 46: for a description of a scanner to generate. The description is in ! 47: the form of pairs ! 48: of regular expressions and C code, called ! 49: .Em rules . ! 50: .Nm Lex ! 51: generates as output a C source file, ! 52: .Pa lex.yy.c , ! 53: which defines a routine ! 54: .Fn yylex . ! 55: This file is compiled and linked with the ! 56: .Fl lfl ! 57: library to produce an executable. When the executable is run, ! 58: it analyzes its input for occurrences ! 59: of the regular expressions. Whenever it finds one, it executes ! 60: the corresponding C code. ! 61: .Pp ! 62: For full documentation, see ! 63: .Em Lexdoc . ! 64: This manual entry is intended for use as a quick reference. ! 65: .Sh OPTIONS ! 66: .Nm Lex ! 67: has the following options: ! 68: .Tw Ds ! 69: .Tp Fl b ! 70: Generate backtracking information to ! 71: .Va lex.backtrack . ! 72: This is a list of scanner states which require backtracking ! 73: and the input characters on which they do so. By adding rules one ! 74: can remove backtracking states. If all backtracking states ! 75: are eliminated and ! 76: .Fl f ! 77: or ! 78: .Fl F ! 79: is used, the generated scanner will run faster. ! 80: .Tp Fl c ! 81: is a do-nothing, deprecated option included for POSIX compliance. ! 82: .Pp ! 83: .Ar NOTE : ! 84: in previous releases of ! 85: .Nm Lex ! 86: .Op Fl c ! 87: specified table-compression options. This functionality is ! 88: now given by the ! 89: .Fl C ! 90: flag. To ease the the impact of this change, when ! 91: .Nm lex ! 92: encounters ! 93: .Fl c, ! 94: it currently issues a warning message and assumes that ! 95: .Fl C ! 96: was desired instead. In the future this "promotion" of ! 97: .Fl c ! 98: to ! 99: .Fl C ! 100: will go away in the name of full POSIX compliance (unless ! 101: the POSIX meaning is removed first). ! 102: .Tp Fl d ! 103: makes the generated scanner run in ! 104: .Ar debug ! 105: mode. Whenever a pattern is recognized and the global ! 106: .Va yy_Lex_debug ! 107: is non-zero (which is the default), the scanner will ! 108: write to ! 109: .Li stderr ! 110: a line of the form: ! 111: .Pp ! 112: .Dl --accepting rule at line 53 ("the matched text") ! 113: .Pp ! 114: The line number refers to the location of the rule in the file ! 115: defining the scanner (i.e., the file that was fed to lex). Messages ! 116: are also generated when the scanner backtracks, accepts the ! 117: default rule, reaches the end of its input buffer (or encounters ! 118: a NUL; the two look the same as far as the scanner's concerned), ! 119: or reaches an end-of-file. ! 120: .Tp Fl f ! 121: specifies (take your pick) ! 122: .Em full table ! 123: or ! 124: .Em fast scanner . ! 125: No table compression is done. The result is large but fast. ! 126: This option is equivalent to ! 127: .Fl Cf ! 128: (see below). ! 129: .Tp Fl i ! 130: instructs ! 131: .Nm lex ! 132: to generate a ! 133: .Em case-insensitive ! 134: scanner. The case of letters given in the ! 135: .Nm lex ! 136: input patterns will ! 137: be ignored, and tokens in the input will be matched regardless of case. The ! 138: matched text given in ! 139: .Va yytext ! 140: will have the preserved case (i.e., it will not be folded). ! 141: .Tp Fl n ! 142: is another do-nothing, deprecated option included only for ! 143: POSIX compliance. ! 144: .Tp Fl p ! 145: generates a performance report to stderr. The report ! 146: consists of comments regarding features of the ! 147: .Nm lex ! 148: input file which will cause a loss of performance in the resulting scanner. ! 149: .Tp Fl s ! 150: causes the ! 151: .Ar default rule ! 152: (that unmatched scanner input is echoed to ! 153: .Ar stdout ) ! 154: to be suppressed. If the scanner encounters input that does not ! 155: match any of its rules, it aborts with an error. ! 156: .Tp Fl t ! 157: instructs ! 158: .Nm lex ! 159: to write the scanner it generates to standard output instead ! 160: of ! 161: .Pa lex.yy.c . ! 162: .Tp Fl v ! 163: specifies that ! 164: .Nm lex ! 165: should write to ! 166: .Li stderr ! 167: a summary of statistics regarding the scanner it generates. ! 168: .Tp Fl F ! 169: specifies that the ! 170: .Em fast ! 171: scanner table representation should be used. This representation is ! 172: about as fast as the full table representation ! 173: .Pq Fl f , ! 174: and for some sets of patterns will be considerably smaller (and for ! 175: others, larger). See ! 176: .Em Lexdoc ! 177: for details. ! 178: .Pp ! 179: This option is equivalent to ! 180: .Fl CF ! 181: (see below). ! 182: .Tp Fl I ! 183: instructs ! 184: .Nm lex ! 185: to generate an ! 186: .Em interactive ! 187: scanner, that is, a scanner which stops immediately rather than ! 188: looking ahead if it knows ! 189: that the currently scanned text cannot be part of a longer rule's match. ! 190: Again, see ! 191: .Em Lexdoc ! 192: for details. ! 193: .Pp ! 194: Note, ! 195: .Fl I ! 196: cannot be used in conjunction with ! 197: .Em full ! 198: or ! 199: .Em fast tables , ! 200: i.e., the ! 201: .Fl f , F , Cf , ! 202: or ! 203: .Fl CF ! 204: flags. ! 205: .Tp Fl L ! 206: instructs ! 207: .Nm lex ! 208: not to generate ! 209: .Li #line ! 210: directives in ! 211: .Pa lex.yy.c . ! 212: The default is to generate such directives so error ! 213: messages in the actions will be correctly ! 214: located with respect to the original ! 215: .Nm lex ! 216: input file, and not to ! 217: the fairly meaningless line numbers of ! 218: .Pa lex.yy.c . ! 219: .Tp Fl T ! 220: makes ! 221: .Nm lex ! 222: run in ! 223: .Em trace ! 224: mode. It will generate a lot of messages to ! 225: .Li stdout ! 226: concerning ! 227: the form of the input and the resultant non-deterministic and deterministic ! 228: finite automata. This option is mostly for use in maintaining ! 229: .Nm lex . ! 230: .Tp Fl 8 ! 231: instructs ! 232: .Nm lex ! 233: to generate an 8-bit scanner. ! 234: On some sites, this is the default. On others, the default ! 235: is 7-bit characters. To see which is the case, check the verbose ! 236: .Pq Fl v ! 237: output for "equivalence classes created". If the denominator of ! 238: the number shown is 128, then by default ! 239: .Nm lex ! 240: is generating 7-bit characters. If it is 256, then the default is ! 241: 8-bit characters. ! 242: .Tc Fl C ! 243: .Op Cm efmF ! 244: .Cx ! 245: controls the degree of table compression. The default setting is ! 246: .Fl Cem . ! 247: .Pp ! 248: .Tw Ds ! 249: .Tp Fl C ! 250: A lone ! 251: .Fl C ! 252: specifies that the scanner tables should be compressed but neither ! 253: equivalence classes nor meta-equivalence classes should be used. ! 254: .Tp Fl \&Ce ! 255: directs ! 256: .Nm lex ! 257: to construct ! 258: .Em equivalence classes , ! 259: i.e., sets of characters ! 260: which have identical lexical properties. ! 261: Equivalence classes usually give ! 262: dramatic reductions in the final table/object file sizes (typically ! 263: a factor of 2-5) and are pretty cheap performance-wise (one array ! 264: look-up per character scanned). ! 265: .Tp Fl \&Cf ! 266: specifies that the ! 267: .Em full ! 268: scanner tables should be generated - ! 269: .Nm lex ! 270: should not compress the ! 271: tables by taking advantages of similar transition functions for ! 272: different states. ! 273: .Tp Fl \&CF ! 274: specifies that the alternate fast scanner representation (described in ! 275: .Em Lexdoc ) ! 276: should be used. ! 277: .Tp Fl \&Cm ! 278: directs ! 279: .Nm lex ! 280: to construct ! 281: .Em meta-equivalence classes , ! 282: which are sets of equivalence classes (or characters, if equivalence ! 283: classes are not being used) that are commonly used together. Meta-equivalence ! 284: classes are often a big win when using compressed tables, but they ! 285: have a moderate performance impact (one or two "if" tests and one ! 286: array look-up per character scanned). ! 287: .Tp Fl Cem ! 288: (default) ! 289: Generate both equivalence classes ! 290: and meta-equivalence classes. This setting provides the highest ! 291: degree of table compression. ! 292: .Tp ! 293: .Pp ! 294: Faster-executing scanners can be traded off at the cost of larger tables with ! 295: the following generally being true: ! 296: .Pp ! 297: .Ds C ! 298: slowest & smallest ! 299: -Cem ! 300: -Cm ! 301: -Ce ! 302: -C ! 303: -C{f,F}e ! 304: -C{f,F} ! 305: fastest & largest ! 306: .De ! 307: .Pp ! 308: .Fl C ! 309: options are not cumulative; whenever the flag is encountered, the ! 310: previous -C settings are forgotten. ! 311: .Pp ! 312: The options ! 313: .Fl \&Cf ! 314: or ! 315: .Fl \&CF ! 316: and ! 317: .Fl \&Cm ! 318: do not make sense together - there is no opportunity for meta-equivalence ! 319: classes if the table is not being compressed. Otherwise the options ! 320: may be freely mixed. ! 321: .Tc Fl S ! 322: .Ar skeleton_file ! 323: .Cx ! 324: overrides the default skeleton file from which ! 325: .Nm lex ! 326: constructs its scanners. Useful for ! 327: .Nm lex ! 328: maintenance or development. ! 329: .Sh SUMMARY OF Lex REGULAR EXPRESSIONS ! 330: The patterns in the input are written using an extended set of regular ! 331: expressions. These are: ! 332: .Pp ! 333: .Dw 8n ! 334: .Di L ! 335: .Dp Li x ! 336: match the character 'x' ! 337: .Dp Li \&. ! 338: any character except newline ! 339: .Dp Op Li xyz ! 340: a "character class"; in this case, the pattern ! 341: matches either an 'x', a 'y', or a 'z' ! 342: .Dp Op Li abj-oZ ! 343: a "character class" with a range in it; matches ! 344: an 'a', a 'b', any letter from 'j' through 'o', ! 345: or a 'Z' ! 346: .Dp Op \&Li ^A-Z ! 347: a "negated character class", i.e., any character ! 348: but those in the class. In this case, any ! 349: character EXCEPT an uppercase letter. ! 350: .Dp Op \&Li ^A-Z\en ! 351: any character EXCEPT an uppercase letter or ! 352: a newline ! 353: .Dp Li r* ! 354: zero or more r's, where r is any regular expression ! 355: .Dp Li r+ ! 356: one or more r's ! 357: .Dp Li r? ! 358: zero or one r's (that is, "an optional r") ! 359: .Dp Li r{2,5} ! 360: anywhere from two to five r's ! 361: .Dp Li r{2,} ! 362: two or more r's ! 363: .Dp Li r{4} ! 364: exactly 4 r's ! 365: .Dp Li {name} ! 366: the expansion of the "name" definition ! 367: (see above) ! 368: .Dc Op Li xyz ! 369: .Li \&\e"foo" ! 370: .Cx ! 371: the literal string: ! 372: [xyz]"foo ! 373: .Dp Li \&\eX ! 374: if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', ! 375: then the ANSI-C interpretation of \ex. ! 376: Otherwise, a literal 'X' (used to escape ! 377: operators such as '*') ! 378: .Dp Li \&\e123 ! 379: the character with octal value 123 ! 380: .Dp Li \&\ex2a ! 381: the character with hexadecimal value 2a ! 382: .Dp Li (r) ! 383: match an r; parentheses are used to override ! 384: precedence (see below) ! 385: .Dp Li rs ! 386: the regular expression r followed by the ! 387: regular expression s; called "concatenation" ! 388: .Dp Li rs ! 389: either an r or an s ! 390: .Dp Li r/s ! 391: an r but only if it is followed by an s. The ! 392: s is not part of the matched text. This type ! 393: of pattern is called as "trailing context". ! 394: .Dp Li \&^r ! 395: an r, but only at the beginning of a line ! 396: .Dp Li r$ ! 397: an r, but only at the end of a line. Equivalent ! 398: to "r/\en". ! 399: .Dp Li <s>r ! 400: an r, but only in start condition s (see ! 401: below for discussion of start conditions) ! 402: .Dp Li <s1,s2,s3>r ! 403: same, but in any of start conditions s1, ! 404: s2, or s3 ! 405: .Dp Li <<EOF>> ! 406: an end-of-file ! 407: .Dp Li <s1,s2><<EOF>> ! 408: an end-of-file when in start condition s1 or s2 ! 409: .Dp ! 410: The regular expressions listed above are grouped according to ! 411: precedence, from highest precedence at the top to lowest at the bottom. ! 412: Those grouped together have equal precedence. ! 413: .Pp ! 414: Some notes on patterns: ! 415: .Pp ! 416: Negated character classes ! 417: .Ar match newlines ! 418: unless "\en" (or an equivalent escape sequence) is one of the ! 419: characters explicitly present in the negated character class ! 420: (e.g., " [^A-Z\en] "). ! 421: .Pp ! 422: A rule can have at most one instance of trailing context (the '/' operator ! 423: or the '$' operator). The start condition, '^', and "<<EOF>>" patterns ! 424: can only occur at the beginning of a pattern, and, as well as with '/' and '$', ! 425: cannot be grouped inside parentheses. The following are all illegal: ! 426: .Pp ! 427: .Ds C ! 428: foo/bar$ ! 429: foo(bar$) ! 430: foo^bar ! 431: <sc1>foo<sc2>bar ! 432: .De ! 433: .Sh SUMMARY OF SPECIAL ACTIONS ! 434: In addition to arbitrary C code, the following can appear in actions: ! 435: .Tw Fl ! 436: .Tp Ic ECHO ! 437: Copies ! 438: .Va yytext ! 439: to the scanner's output. ! 440: .Tp Ic BEGIN ! 441: Followed by the name of a start condition places the scanner in the ! 442: corresponding start condition. ! 443: .Tp Ic REJECT ! 444: Directs the scanner to proceed on to the "second best" rule which matched the ! 445: input (or a prefix of the input). ! 446: .Va yytext ! 447: and ! 448: .Va yyleng ! 449: are set up appropriately. Note that ! 450: .Ic REJECT ! 451: is a particularly expensive feature in terms scanner performance; ! 452: if it is used in ! 453: .Em any ! 454: of the scanner's actions it will slow down ! 455: .Em all ! 456: of the scanner's matching. Furthermore, ! 457: .Ic REJECT ! 458: cannot be used with the ! 459: .Fl f ! 460: or ! 461: .Fl F ! 462: options. ! 463: .Pp ! 464: Note also that unlike the other special actions, ! 465: .Ic REJECT ! 466: is a ! 467: .Em branch ; ! 468: code immediately following it in the action will ! 469: .Em not ! 470: be executed. ! 471: .Tp Fn yymore ! 472: tells the scanner that the next time it matches a rule, the corresponding ! 473: token should be ! 474: .Em appended ! 475: onto the current value of ! 476: .Va yytext ! 477: rather than replacing it. ! 478: .Tp Fn yyless \&n ! 479: returns all but the first ! 480: .Ar n ! 481: characters of the current token back to the input stream, where they ! 482: will be rescanned when the scanner looks for the next match. ! 483: .Va yytext ! 484: and ! 485: .Va yyleng ! 486: are adjusted appropriately (e.g., ! 487: .Va yyleng ! 488: will now be equal to ! 489: .Ar n ) . ! 490: .Tp Fn unput c ! 491: puts the character ! 492: .Ar c ! 493: back onto the input stream. It will be the next character scanned. ! 494: .Tp Fn input ! 495: reads the next character from the input stream (this routine is called ! 496: .Fn yyinput ! 497: if the scanner is compiled using ! 498: .Em C \&+\&+ ) . ! 499: .Tp Fn yyterminate ! 500: can be used in lieu of a return statement in an action. It terminates ! 501: the scanner and returns a 0 to the scanner's caller, indicating "all done". ! 502: .Pp ! 503: By default, ! 504: .Fn yyterminate ! 505: is also called when an end-of-file is encountered. It is a macro and ! 506: may be redefined. ! 507: .Tp Ic YY_NEW_FILE ! 508: is an action available only in <<EOF>> rules. It means "Okay, I've ! 509: set up a new input file, continue scanning". ! 510: .Tp Fn yy_create_buffer file size ! 511: takes a ! 512: .Ic FILE ! 513: pointer and an integer ! 514: .Ar size . ! 515: It returns a YY_BUFFER_STATE ! 516: handle to a new input buffer large enough to accomodate ! 517: .Ar size ! 518: characters and associated with the given file. When in doubt, use ! 519: .Ar YY_BUF_SIZE ! 520: for the size. ! 521: .Tp Fn yy_switch_to_buffer new_buffer ! 522: switches the scanner's processing to scan for tokens from ! 523: the given buffer, which must be a YY_BUFFER_STATE. ! 524: .Tp Fn yy_delete_buffer buffer ! 525: deletes the given buffer. ! 526: .Tp ! 527: .Sh \&VALUES\ AVAILABLE\ TO THE USER ! 528: .Tw Fl ! 529: .Tp Va \&char \&*yytext ! 530: holds the text of the current token. It may not be modified. ! 531: .Tp Va \&int yyleng ! 532: holds the length of the current token. It may not be modified. ! 533: .Tp Va FILE \&*yyin ! 534: is the file which by default ! 535: .Nm lex ! 536: reads from. It may be redefined but doing so only makes sense before ! 537: scanning begins. Changing it in the middle of scanning will have ! 538: unexpected results since ! 539: .Nm lex ! 540: buffers its input. Once scanning terminates because an end-of-file ! 541: has been seen, ! 542: .Fn void\ yyrestart FILE\ *new_file ! 543: may be called to point ! 544: .Va yyin ! 545: at the new input file. ! 546: .Tp Va FILE \&*yyout ! 547: is the file to which ! 548: .Ar ECHO ! 549: actions are done. It can be reassigned by the user. ! 550: .Tp Va YY_CURRENT_BUFFER ! 551: returns a ! 552: YY_BUFFER_STATE ! 553: handle to the current buffer. ! 554: .Tp ! 555: .Sh MACROS THE USER CAN REDEFINE ! 556: .Tw Fl ! 557: .Tp Va YY_DECL ! 558: controls how the scanning routine is declared. ! 559: By default, it is "int yylex()", or, if prototypes are being ! 560: used, "int yylex(void)". This definition may be changed by redefining ! 561: the "YY_DECL" macro. Note that ! 562: if you give arguments to the scanning routine using a ! 563: K&R-style/non-prototyped function declaration, you must terminate ! 564: the definition with a semi-colon (;). ! 565: .Tp Va YY_INPUT ! 566: The nature of how the scanner ! 567: gets its input can be controlled by redefining the ! 568: YY_INPUT ! 569: macro. ! 570: YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its ! 571: action is to place up to ! 572: .Ar max _size ! 573: characters in the character array ! 574: .Ar buf ! 575: and return in the integer variable ! 576: .Ar result ! 577: either the ! 578: number of characters read or the constant YY_NULL (0 on Unix systems) ! 579: to indicate EOF. The default YY_INPUT reads from the ! 580: global file-pointer "yyin". ! 581: A sample redefinition of YY_INPUT (in the definitions ! 582: section of the input file): ! 583: .Pp ! 584: .Ds I ! 585: %{ ! 586: #undef YY_INPUT ! 587: #define YY_INPUT(buf,result,max_size) \\ ! 588: result = ((buf[0] = getchar()) == EOF) ? YY_NULL : 1; ! 589: %} ! 590: .De ! 591: .Tp Va YY_INPUT ! 592: When the scanner receives an end-of-file indication from YY_INPUT, ! 593: it then checks the ! 594: .Fn yywrap ! 595: function. If ! 596: .Fn yywrap ! 597: returns false (zero), then it is assumed that the ! 598: function has gone ahead and set up ! 599: .Va yyin ! 600: to point to another input file, and scanning continues. If it returns ! 601: true (non-zero), then the scanner terminates, returning 0 to its ! 602: caller. ! 603: .Tp Va yywrap ! 604: The default ! 605: .Fn yywrap ! 606: always returns 1. Presently, to redefine it you must first ! 607: "#undef yywrap", as it is currently implemented as a macro. It is ! 608: likely that ! 609: .Fn yywrap ! 610: will soon be defined to be a function rather than a macro. ! 611: .Tp Va YY_USER_ACTION ! 612: can be redefined to provide an action ! 613: which is always executed prior to the matched rule's action. ! 614: .Tp Va YY_USER_INIT ! 615: The macro ! 616: .Va YY _USER_INIT ! 617: may be redefined to provide an action which is always executed before ! 618: the first scan. ! 619: .Tp Va YY_BREAK ! 620: In the generated scanner, the actions are all gathered in one large ! 621: switch statement and separated using ! 622: .Va YY _BREAK , ! 623: which may be redefined. By default, it is simply a "break", to separate ! 624: each rule's action from the following rule's. ! 625: .Tp ! 626: .Sh FILES ! 627: .Dw lex.backtrack ! 628: .Di L ! 629: .Dp Pa lex.skel ! 630: skeleton scanner. ! 631: .Dp Pa lex.yy.c ! 632: generated scanner ! 633: (called ! 634: .Pa lexyy.c ! 635: on some systems). ! 636: .Dp Pa lex.backtrack ! 637: backtracking information for ! 638: .Fl b ! 639: .Dp Pa flag ! 640: (called ! 641: .Pa lex.bck ! 642: on some systems). ! 643: .Dp ! 644: .Sh SEE ALSO ! 645: .Xr lex 1 , ! 646: .Xr yacc 1 , ! 647: .Xr sed 1 , ! 648: .Xr awk 1 . ! 649: .br ! 650: .Em lexdoc ! 651: .br ! 652: M. ! 653: E. ! 654: Lesk and E. ! 655: Schmidt, ! 656: .Em LEX \- Lexical Analyzer Generator ! 657: .Sh DIAGNOSTICS ! 658: .Tw Fl ! 659: .Tp Li reject_used_but_not_detected undefined ! 660: or ! 661: .Tp Li yymore_used_but_not_detected undefined ! 662: These errors can occur at compile time. ! 663: They indicate that the ! 664: scanner uses ! 665: .Ic REJECT ! 666: or ! 667: .Fn yymore ! 668: but that ! 669: .Nm lex ! 670: failed to notice the fact, ! 671: meaning that ! 672: .Nm lex ! 673: scanned the first two sections looking for occurrences of these actions ! 674: and failed to find any, ! 675: but somehow you snuck some in via a #include ! 676: file, ! 677: for example . ! 678: Make an explicit reference to the action in your ! 679: .Nm lex ! 680: input file. ! 681: Note that previously ! 682: .Nm lex ! 683: supported a ! 684: .Li %used/%unused ! 685: mechanism for dealing with this problem; ! 686: this feature is still supported ! 687: but now deprecated, ! 688: and will go away soon unless the author hears from ! 689: people who can argue compellingly that they need it. ! 690: .Tp Li lex scanner jammed ! 691: a scanner compiled with ! 692: .Fl s ! 693: has encountered an input string which wasn't matched by ! 694: any of its rules. ! 695: .Tp Li lex input buffer overflowed ! 696: a scanner rule matched a string long enough to overflow the ! 697: scanner's internal input buffer 16K bytes - controlled by ! 698: .Va YY_BUF_MAX ! 699: in ! 700: .Pa lex.skel . ! 701: .Tp Li scanner requires \&\-8 flag ! 702: Your scanner specification includes recognizing 8-bit characters and ! 703: you did not specify the -8 flag and your site has not installed lex ! 704: with -8 as the default . ! 705: .Tp Li too many \&%t classes! ! 706: You managed to put every single character into its own %t class. ! 707: .Nm Lex ! 708: requires that at least one of the classes share characters. ! 709: .Tp ! 710: .Sh HISTORY ! 711: A ! 712: .Nm lex ! 713: appeared in Version 6 AT&T Unix. ! 714: The version this man page describes is ! 715: derived from code contributed by Vern Paxson. ! 716: .Sh AUTHOR ! 717: Vern Paxson, with the help of many ideas and much inspiration from ! 718: Van Jacobson. Original version by Jef Poskanzer. ! 719: .Pp ! 720: See ! 721: .Em Lexdoc ! 722: for additional credits and the address to send comments to. ! 723: .Sh BUGS ! 724: .Pp ! 725: Some trailing context ! 726: patterns cannot be properly matched and generate ! 727: warning messages ("Dangerous trailing context"). These are ! 728: patterns where the ending of the ! 729: first part of the rule matches the beginning of the second ! 730: part, such as "zx*/xy*", where the 'x*' matches the 'x' at ! 731: the beginning of the trailing context. (Note that the POSIX draft ! 732: states that the text matched by such patterns is undefined.) ! 733: .Pp ! 734: For some trailing context rules, parts which are actually fixed-length are ! 735: not recognized as such, leading to the abovementioned performance loss. ! 736: In particular, parts using '\&|' or {n} (such as "foo{3}") are always ! 737: considered variable-length. ! 738: .Pp ! 739: Combining trailing context with the special '\&|' action can result in ! 740: .Em fixed ! 741: trailing context being turned into the more expensive ! 742: .Em variable ! 743: trailing context. This happens in the following example: ! 744: .Pp ! 745: .Ds C ! 746: %% ! 747: abc \&| ! 748: xyz/def ! 749: .De ! 750: .Pp ! 751: Use of ! 752: .Fn unput ! 753: invalidates yytext and yyleng. ! 754: .Pp ! 755: Use of ! 756: .Fn unput ! 757: to push back more text than was matched can ! 758: result in the pushed-back text matching a beginning-of-line ('^') ! 759: rule even though it didn't come at the beginning of the line ! 760: (though this is rare!). ! 761: .Pp ! 762: Pattern-matching of NUL's is substantially slower than matching other ! 763: characters. ! 764: .Pp ! 765: .Nm Lex ! 766: does not generate correct #line directives for code internal ! 767: to the scanner; thus, bugs in ! 768: .Pa lex.skel ! 769: yield bogus line numbers. ! 770: .Pp ! 771: Due to both buffering of input and read-ahead, you cannot intermix ! 772: calls to <stdio.h> routines, such as, for example, ! 773: .Fn getchar , ! 774: with ! 775: .Nm lex ! 776: rules and expect it to work. Call ! 777: .Fn input ! 778: instead. ! 779: .Pp ! 780: The total table entries listed by the ! 781: .Fl v ! 782: flag excludes the number of table entries needed to determine ! 783: what rule has been matched. The number of entries is equal ! 784: to the number of DFA states if the scanner does not use ! 785: .Ic REJECT , ! 786: and somewhat greater than the number of states if it does. ! 787: .Pp ! 788: .Ic REJECT ! 789: cannot be used with the ! 790: .Fl f ! 791: or ! 792: .Fl F ! 793: options. ! 794: .Pp ! 795: Some of the macros, such as ! 796: .Fn yywrap , ! 797: may in the future become functions which live in the ! 798: .Fl lfl ! 799: library. This will doubtless break a lot of code, but may be ! 800: required for POSIX-compliance. ! 801: .Pp ! 802: The ! 803: .Nm lex ! 804: internal algorithms need documentation.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.