|
|
coherent
lex Command lex
Lexical analyzer generator
lleexx [-tt][-vv][_f_i_l_e]
cccc lleexx.yyyy.cc -llll
Many programs, e.g., compilers, process highly structured input
according to rules. Two of the most complicated parts of such
programs are lexical analysis and parsing (also called syntax
analysis). The COHERENT system includes two powerful tools
called lex and yacc to help you construct these parts of a
program. lex converts a set of lexical rules into a lexical
analyzer, and yacc converts a set of parsing rules into a parser.
The output of lex may be used directly, or may be used by a par-
ser generated by yacc.
lex reads a specification from the given file (or from the stan-
dard input if none), and generates a C function called yylex().
lex writes the generated function in the file lex.yy.c, or on
standard output if you use the -t option. The -v option prints
some statistics about the generated tables.
The tutorial on lex that appear in this manual describes lex in
detail. In brief, the generated function yylex() matches por-
tions of its input to one pattern (sometimes called a regular
expression) from a set of rules, or context, and executes as-
sociated C commands. Unmatched portions of the input are copied
to the output stream. yylex() returns EOF when input has been
exhausted.
lex uses the following macros that you may replace with the
preprocessor directive #undef if you wish: iinnppuutt() (read the
standard input stream), and oouuttppuutt(_c) (write the character c to
the standard output stream). You may also replace the following
functions if you wish: mmaaiinn() (main function), eerrrroorr(...) (print
error messages; takes same arguments as printf), and yyyywwrraapp()
(handle events at the end of a file). If an action is desired on
end of file, such as arranging for more input, yywrap() should
perform it, returning zero to keep going.
A full lex specification has the following format:
* Macro definitions, of the form: name pattern
* Start condition declarations: %S NAME ...
* Context declarations: %C NAME ...
* Code to be included in the header section: %{
anything
%}
<tab or space> anything
COHERENT Lexicon Page 1
lex Command lex
* Rules section delimiter (must always be present): %%
* Code to appear at the start of yyyylleexx(): <tab or space> anything
* Rulesfor initialcontext, inanyof theforms: rule
action;
rule | (means use next action)
rule {
<tab or space> action;
<tab or space> }
* For each additional context: %C NAME
...rules for this context...
* End of rules section delimiter: %%
* Code to be copied verbatim, such as user provided iinnppuutt(),
oouuttppuutt(), yyyywwrraapp(), or other.
lex matches the longest string possible; if two rules match the
same length string, the rule specified first takes precedence.
lex puts the matched string, or token, in the char array
yytext[], and sets the variable yyleng to its length.
Actions may use the following:
EECCHHOO Output the token
RREEJJEECCTT Perform action for lower precedence match
BBEEGGIINN _N_A_M_E Set start condition to _N_A_M_E
BBEEGGIINN 00 Clear start condition
yyyysswwiittcchh(_N_A_M_E) Switch to context _N_A_M_E, return current
yyyysswwiittcchh(00) Switch to initial context
yyyynneexxtt() Steal next character from input
yyyybbaacckk(_c) Put character _c back into input
yyyylleessss(_n) Reduce token length to _n, put rest back
yyyymmoorree() Append next token to this one
yyyyllooookk() Returns number of chars in input buffer
lex rules are contiguous strings of the form
[ <_N_A_M_E,...> ][ ^ ] _t_o_k_e_n [ /_l_o_o_k_a_h_e_a_d ][ $ ]
where brackets `[]' indicate optional items.
<_N_A_M_E,...> Match only under given start conditions
^ Match the beginning of a line
$ Match the end of a line
_t_o_k_e_n Pattern that a given token is to match
/_l_o_o_k_a_h_e_a_d Pattern that given trailing text is to match
COHERENT Lexicon Page 2
lex Command lex
Pattern elements:
aa The character aa
\aa The character aa, even if special
. Any character except newline
[aabbxx-zz] Any of aa, bb, or xx through zz
[^aabbxx-zz]Any except aa, bb, or xx through zz
aabbcc The string aabbcc, even if any are special
{_n_a_m_e} The macro definition _n_a_m_e
(_e_x_p) The pattern _e_x_p (grouping operator)
Optional operators on elements:
_e? Zero or one occurrence of _e
_e* Zero or more consecutive _es
_e+ One or more consecutive _es
_e{_n} _n (a decimal number) consecutive _es
_e{_m,_n} _m through _n consecutive _es
Patterns may be of the form:
_e_1_e_2 Matches the sequence _e_1 _e_2
_e_1|_e_2 Matches either _e_1 or _e_2
lex recognizes the standard C escapes: \nn, \tt, \rr, \bb, \ff, and
\_o_o_o (octal representation). The special characters
\ ( ) < > { } % * + ? [ - ] ^ / $ . |
must be prefixed with \ or enclosed within quotation marks (ex-
cepting " and \) to be normal. Within classes, only the charac-
ters . ^ - \ and ] are special.
***** Files *****
/usr/lib/libl.a
***** See Also *****
commands, yacc
_I_n_t_r_o_d_u_c_t_i_o_n _t_o _l_e_x, _t_h_e _L_e_x_i_c_a_l _A_n_a_l_y_z_e_r
COHERENT Lexicon Page 3
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.