|
|
1.1 root 1: .TH AWK 1
2: .CT 1 files prog_other
3: .SH NAME
4: awk \- pattern-directed scanning and processing language
5: .SH SYNOPSIS
6: .B awk
7: [
8: .BI -F fs
9: ]
10: [
11: .I prog
12: ]
13: [
14: .I file ...
15: ]
16: .SH DESCRIPTION
17: .I Awk
18: scans each input
19: .I file
20: for lines that match any of a set of patterns specified literally in
21: .IR prog
22: or in a file
23: specified as
24: .B -f
25: .IR file .
26: With each pattern
27: there can be an associated action that will be performed
28: when a line of a
29: .I file
30: matches the pattern.
31: Each line is matched against the
32: pattern portion of every pattern-action statement;
33: the associated action is performed for each matched pattern.
34: The file name
35: .L -
36: means the standard input.
37: Any
38: .IR file
39: of the form
40: .I var=value
41: is treated as an assignment, not a filename.
42: .PP
43: An input line is made up of fields separated by white space,
44: or by regular expression
45: .BR FS .
46: The fields are denoted
47: .BR $1 ,
48: .BR $2 ,
49: \&...;
50: .B $0
51: refers to the entire line.
52: .PP
53: A pattern-action statement has the form
54: .IP
55: .IB pattern " { " action " }
56: .PP
57: A missing
58: .BI { " action " }
59: means print the line;
60: a missing pattern always matches.
61: Pattern-action statements are separated by newlines or semicolons.
62: .PP
63: An action is a sequence of statements.
64: A statement can be one of the following:
65: .PP
66: .EX
67: .ta \w'\f5delete array[expression]'u
68: if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
69: while(\fI expression \fP)\fI statement\fP
70: for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
71: for(\fI var \fPin\fI array \fP)\fI statement\fP
72: do\fI statement \fPwhile(\fI expression \fP)
73: break
74: continue
75: {\fR [\fP\fI statement ... \fP\fR] \fP}
76: \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
77: print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
78: printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
79: return\fR [ \fP\fIexpression \fP\fR]\fP
80: next #\fR skip remaining patterns on this input line\fP
81: delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
82: exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
83: .EE
84: .DT
85: .PP
86: Statements are terminated by
87: semicolons, newlines or right braces.
88: An empty
89: .I expression-list
90: stands for
91: .BR $0 .
92: String constants are quoted \f5"\ "\fR,
93: with the usual C escapes recognized within.
94: Expressions take on string or numeric values as appropriate,
95: and are built using the operators
96: .B + - * / % ^
97: (exponentiation), and concatenation (indicated by a blank).
98: The operators
99: .B
100: ! ++ -- += -= *= /= %= ^= **= > >= < <= == != ?:
101: are also available in expressions.
102: Variables may be scalars, array elements
103: (denoted
104: .IB x [ i ] )
105: or fields.
106: Variables are initialized to the null string.
107: Array subscripts may be any string,
108: not necessarily numeric;
109: this allows for a form of associative memory.
110: Multiple subscripts such as
111: .B [i,j,k]
112: are permitted; the constituents are concatenated,
113: separated by the value of
114: .BR SUBSEP .
115: .PP
116: The
117: .B print
118: statement prints its arguments on the standard output
119: (or on a file if
120: .BI > file
121: or
122: .BI >> file
123: is present or on a pipe if
124: .BI | cmd
125: is present), separated by the current output field separator,
126: and terminated by the output record separator.
127: .I file
128: and
129: .I cmd
130: may be literal names or parenthesized expressions;
131: identical string values in different statements denote
132: the same open file.
133: The
134: .B printf
135: statement formats its expression list according to the format
136: (see
137: .IR printf (3)).
138: The built-in function
139: .BI close( expr )
140: closes the file or pipe
141: .IR expr .
142: .PP
143: The customary functions
144: .BR exp ,
145: .BR log ,
146: .BR sqrt ,
147: .BR sin ,
148: .BR cos ,
149: .BR atan2
150: are built in.
151: Other built-in functions:
152: .TF length
153: .TP
154: .B length
155: the length of its argument
156: taken as a string,
157: or of
158: .B $0
159: if no argument.
160: .TP
161: .B rand
162: random number on (0,1)
163: .TP
164: .B srand
165: sets seed for
166: .B rand
167: .TP
168: .B int
169: truncates to an integer value
170: .TP
171: .BI substr( s , " m" , " n\fB)
172: the
173: .IR n -character
174: substring of
175: .I s
176: that begins at position
177: .IR m
178: counted from 1.
179: .TP
180: .BI index( s , " t" )
181: the position in
182: .I s
183: where the string
184: .I t
185: occurs, or 0 if it does not.
186: .TP
187: .BI match( s , " r" )
188: the position in
189: .I s
190: where the regular expression
191: .I r
192: occurs, or 0 if it does not.
193: The variables
194: .B RSTART
195: and
196: .B RLENGTH
197: are set to the position and length of the matched string.
198: .TP
199: .BI split( s , " a" , " fs\fB)
200: splits the string
201: .I s
202: into array elements
203: .IB a [1] ,
204: .IB a [2] ,
205: \&...,
206: .IB a [ n ] ,
207: and returns
208: .IR n .
209: The separation is done with the regular expression
210: .I fs
211: or with the field separator
212: .B FS
213: if
214: .I fs
215: is not given.
216: .TP
217: .BI sub( r , " t" , " s\fB)
218: substitutes
219: .I t
220: for the first occurrence of the regular expression
221: .I r
222: in the string
223: .IR s .
224: If
225: .I s
226: is not given,
227: .B $0
228: is used.
229: .TP
230: .B gsub
231: same as
232: .B sub
233: except that all occurrences of the regular expression
234: are replaced;
235: .B sub
236: and
237: .B gsub
238: return the number of replacements.
239: .TP
240: .BI sprintf( fmt , " expr" , " ...\fB )
241: the string resulting from formatting
242: .I expr ...
243: according to the
244: .IR printf (3)
245: format
246: .I fmt
247: .TP
248: .BI system( cmd )
249: executes
250: .I cmd
251: and returns its exit status
252: .PD
253: .PP
254: The ``function''
255: .B getline
256: sets
257: .B $0 to
258: the next input record from the current input file;
259: .B getline
260: .BI < file
261: sets
262: .B $0
263: to the next record from
264: .IR file .
265: .B getline
266: .I x
267: sets variable
268: .I x
269: instead.
270: Finally,
271: .IB cmd " | getline
272: pipes the output of
273: .I cmd
274: into
275: .BR getline ;
276: each call of
277: .B getline
278: returns the next line of output from
279: .IR cmd .
280: In all cases,
281: .B getline
282: returns 1 for a successful input,
283: 0 for end of file, and \-1 for an error.
284: .PP
285: Patterns are arbitrary Boolean combinations
286: (with
287: .BR "! || &&" )
288: of regular expressions and
289: relational expressions.
290: Regular expressions are as in
291: .IR egrep ;
292: see
293: .IR grep (1).
294: Isolated regular expressions
295: in a pattern apply to the entire line.
296: Regular expressions may also occur in
297: relational expressions, using the operators
298: .BR ~
299: and
300: .BR !~ .
301: .BI / re /
302: is a constant regular expression;
303: any string (constant or variable) may be used
304: as a regular expression, except in the position of an isolated regular expression
305: in a pattern.
306: .PP
307: A pattern may consist of two patterns separated by a comma;
308: in this case, the action is performed for all lines
309: from an occurrence of the first pattern
310: though an occurrence of the second.
311: .PP
312: A relational expression is one of the following:
313: .IP
314: .I expression matchop regular-expression
315: .br
316: .I expression relop expression
317: .br
318: .I expression in array-name
319: .br
320: .I (expr,expr,...) in array-name
321: .PP
322: where a relop is any of the six relational operators in C,
323: and a matchop is either
324: .B ~
325: (matches)
326: or
327: .B !~
328: (does not match).
329: A conditional is an arithmetic expression,
330: a relational expression,
331: or a Boolean combination
332: of these.
333: .PP
334: The special patterns
335: .B BEGIN
336: and
337: .B END
338: may be used to capture control before the first input line is read
339: and after the last.
340: .B BEGIN
341: and
342: .B END
343: do not combine with other patterns.
344: .PP
345: Variable names with special meanings:
346: .TF SUBSEP
347: .TP
348: .B FS
349: regular expression used to separate fields; also settable
350: by option
351: .BI -F fs.
352: .TP
353: .BR NF
354: number of fields in the current record
355: .TP
356: .B NR
357: ordinal number of the current record
358: .TP
359: .B FNR
360: ordinal number of the current record in the current file
361: .TP
362: .B FILENAME
363: the name of the current input file
364: .TP
365: .B RS
366: input record separator (default newline)
367: .TP
368: .B OFS
369: output field separator (default blank)
370: .TP
371: .B ORS
372: output record separator (default newline)
373: .TP
374: .B OFMT
375: output format for numbers (default
376: .BR "%.6g" )
377: .TP
378: .B SUBSEP
379: separates multiple subscripts (default 034)
380: .TP
381: .B ARGC
382: argument count, assignable
383: .TP
384: .B ARGV
385: argument array, assignable;
386: non-null members are taken as filenames
387: .PD
388: .PP
389: Functions may be defined (at the position of a pattern-action statement) thus:
390: .IP
391: .L
392: function foo(a, b, c) { ...; return x }
393: .PP
394: Parameters are passed by value if scalar and by reference if array name;
395: functions may be called recursively.
396: Parameters are local to the function; all other variables are global.
397: .SH EXAMPLES
398: .TP
399: .L
400: length > 72
401: Print lines longer than 72 characters.
402: .TP
403: .L
404: { print $2, $1 }
405: Print first two fields in opposite order.
406: .PP
407: .EX
408: BEGIN { FS = ",[ \et]*|[ \et]+" }
409: { print $2, $1 }
410: .EE
411: .ns
412: .IP
413: Same, with input fields separated by comma and/or blanks and tabs.
414: .PP
415: .EX
416: { s += $1 }
417: END { print "sum is", s, " average is", s/NR }
418: .EE
419: .ns
420: .IP
421: Add up first column, print sum and average.
422: .TP
423: .L
424: /start/, /stop/
425: Print all lines between start/stop pairs.
426: .PP
427: .EX
428: BEGIN { # Simulate echo(1)
429: for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
430: printf "\en"
431: exit }
432: .EE
433: .SH SEE ALSO
434: .IR lex (1),
435: .IR sed (1)
436: .br
437: A. V. Aho, B. W. Kernighan, P. J. Weinberger,
438: .I
439: Awk \- a Pattern Scanning and Processing Language (Programmer'sManual),
440: CSTR 118, 1985
441: .SH BUGS
442: There are no explicit conversions between numbers and strings.
443: To force an expression to be treated as a number add 0 to it;
444: to force it to be treated as a string concatenate
445: \f5""\fP to it.
446: .br
447: The scope rules for variables in functions are a botch.
448: .br
449: .L -S
450: and
451: .L -R
452: are flaky.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.