|
|
1.1 root 1:
2:
3: lex Command lex
4:
5:
6:
7:
8: Lexical analyzer generator
9:
10: lleexx [-tt][-vv][_f_i_l_e]
11: cccc lleexx.yyyy.cc -llll
12:
13: Many programs, e.g., compilers, process highly structured input
14: according to rules. Two of the most complicated parts of such
15: programs are lexical analysis and parsing (also called syntax
16: analysis). The COHERENT system includes two powerful tools
17: called lex and yacc to help you construct these parts of a
18: program. lex converts a set of lexical rules into a lexical
19: analyzer, and yacc converts a set of parsing rules into a parser.
20:
21: The output of lex may be used directly, or may be used by a par-
22: ser generated by yacc.
23:
24: lex reads a specification from the given file (or from the stan-
25: dard input if none), and generates a C function called yylex().
26: lex writes the generated function in the file lex.yy.c, or on
27: standard output if you use the -t option. The -v option prints
28: some statistics about the generated tables.
29:
30: The tutorial on lex that appear in this manual describes lex in
31: detail. In brief, the generated function yylex() matches por-
32: tions of its input to one pattern (sometimes called a regular
33: expression) from a set of rules, or context, and executes as-
34: sociated C commands. Unmatched portions of the input are copied
35: to the output stream. yylex() returns EOF when input has been
36: exhausted.
37:
38: lex uses the following macros that you may replace with the
39: preprocessor directive #undef if you wish: iinnppuutt() (read the
40: standard input stream), and oouuttppuutt(_c) (write the character c to
41: the standard output stream). You may also replace the following
42: functions if you wish: mmaaiinn() (main function), eerrrroorr(...) (print
43: error messages; takes same arguments as printf), and yyyywwrraapp()
44: (handle events at the end of a file). If an action is desired on
45: end of file, such as arranging for more input, yywrap() should
46: perform it, returning zero to keep going.
47:
48: A full lex specification has the following format:
49:
50: * Macro definitions, of the form: name pattern
51:
52: * Start condition declarations: %S NAME ...
53:
54: * Context declarations: %C NAME ...
55:
56: * Code to be included in the header section: %{
57: anything
58: %}
59: <tab or space> anything
60:
61:
62:
63:
64: COHERENT Lexicon Page 1
65:
66:
67:
68:
69: lex Command lex
70:
71:
72:
73: * Rules section delimiter (must always be present): %%
74:
75: * Code to appear at the start of yyyylleexx(): <tab or space> anything
76:
77: * Rulesfor initialcontext, inanyof theforms: rule
78: action;
79: rule | (means use next action)
80: rule {
81: <tab or space> action;
82: <tab or space> }
83:
84: * For each additional context: %C NAME
85: ...rules for this context...
86:
87: * End of rules section delimiter: %%
88:
89: * Code to be copied verbatim, such as user provided iinnppuutt(),
90: oouuttppuutt(), yyyywwrraapp(), or other.
91:
92: lex matches the longest string possible; if two rules match the
93: same length string, the rule specified first takes precedence.
94: lex puts the matched string, or token, in the char array
95: yytext[], and sets the variable yyleng to its length.
96:
97: Actions may use the following:
98:
99:
100: EECCHHOO Output the token
101: RREEJJEECCTT Perform action for lower precedence match
102: BBEEGGIINN _N_A_M_E Set start condition to _N_A_M_E
103: BBEEGGIINN 00 Clear start condition
104: yyyysswwiittcchh(_N_A_M_E) Switch to context _N_A_M_E, return current
105: yyyysswwiittcchh(00) Switch to initial context
106: yyyynneexxtt() Steal next character from input
107: yyyybbaacckk(_c) Put character _c back into input
108: yyyylleessss(_n) Reduce token length to _n, put rest back
109: yyyymmoorree() Append next token to this one
110: yyyyllooookk() Returns number of chars in input buffer
111:
112:
113: lex rules are contiguous strings of the form
114:
115:
116: [ <_N_A_M_E,...> ][ ^ ] _t_o_k_e_n [ /_l_o_o_k_a_h_e_a_d ][ $ ]
117:
118:
119: where brackets `[]' indicate optional items.
120:
121:
122: <_N_A_M_E,...> Match only under given start conditions
123: ^ Match the beginning of a line
124: $ Match the end of a line
125: _t_o_k_e_n Pattern that a given token is to match
126: /_l_o_o_k_a_h_e_a_d Pattern that given trailing text is to match
127:
128:
129:
130: COHERENT Lexicon Page 2
131:
132:
133:
134:
135: lex Command lex
136:
137:
138:
139:
140: Pattern elements:
141:
142:
143: aa The character aa
144: \aa The character aa, even if special
145: . Any character except newline
146: [aabbxx-zz] Any of aa, bb, or xx through zz
147: [^aabbxx-zz]Any except aa, bb, or xx through zz
148: aabbcc The string aabbcc, even if any are special
149: {_n_a_m_e} The macro definition _n_a_m_e
150: (_e_x_p) The pattern _e_x_p (grouping operator)
151:
152:
153: Optional operators on elements:
154:
155:
156: _e? Zero or one occurrence of _e
157: _e* Zero or more consecutive _es
158: _e+ One or more consecutive _es
159: _e{_n} _n (a decimal number) consecutive _es
160: _e{_m,_n} _m through _n consecutive _es
161:
162:
163: Patterns may be of the form:
164:
165:
166: _e_1_e_2 Matches the sequence _e_1 _e_2
167: _e_1|_e_2 Matches either _e_1 or _e_2
168:
169:
170: lex recognizes the standard C escapes: \nn, \tt, \rr, \bb, \ff, and
171: \_o_o_o (octal representation). The special characters
172:
173:
174: \ ( ) < > { } % * + ? [ - ] ^ / $ . |
175:
176:
177: must be prefixed with \ or enclosed within quotation marks (ex-
178: cepting " and \) to be normal. Within classes, only the charac-
179: ters . ^ - \ and ] are special.
180:
181: ***** Files *****
182:
183: /usr/lib/libl.a
184:
185: ***** See Also *****
186:
187: commands, yacc
188: _I_n_t_r_o_d_u_c_t_i_o_n _t_o _l_e_x, _t_h_e _L_e_x_i_c_a_l _A_n_a_l_y_z_e_r
189:
190:
191:
192:
193:
194:
195:
196: COHERENT Lexicon Page 3
197:
198:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.