Annotation of researchv10no/cmd/awk/awk.1, revision 1.1.1.1

1.1       root        1: .TH AWK 1
                      2: .CT 1 files prog_other
                      3: .SH NAME
                      4: awk \- pattern-directed scanning and processing language
                      5: .SH SYNOPSIS
                      6: .B awk
                      7: [
                      8: .BI -F fs
                      9: ]
                     10: [
                     11: .I prog
                     12: ]
                     13: [
                     14: .I file ...
                     15: ]
                     16: .SH DESCRIPTION
                     17: .I Awk
                     18: scans each input
                     19: .I file
                     20: for lines that match any of a set of patterns specified literally in
                     21: .IR prog
                     22: or in a file
                     23: specified as
                     24: .B -f
                     25: .IR file .
                     26: With each pattern
                     27: there can be an associated action that will be performed
                     28: when a line of a
                     29: .I file
                     30: matches the pattern.
                     31: Each line is matched against the
                     32: pattern portion of every pattern-action statement;
                     33: the associated action is performed for each matched pattern.
                     34: The file name 
                     35: .L -
                     36: means the standard input.
                     37: Any
                     38: .IR file
                     39: of the form
                     40: .I var=value
                     41: is treated as an assignment, not a filename.
                     42: .PP
                     43: An input line is made up of fields separated by white space,
                     44: or by regular expression
                     45: .BR FS .
                     46: The fields are denoted
                     47: .BR $1 ,
                     48: .BR $2 ,
                     49: \&...;
                     50: .B $0
                     51: refers to the entire line.
                     52: .PP
                     53: A pattern-action statement has the form
                     54: .IP
                     55: .IB pattern " { " action " }
                     56: .PP
                     57: A missing 
                     58: .BI { " action " }
                     59: means print the line;
                     60: a missing pattern always matches.
                     61: Pattern-action statements are separated by newlines or semicolons.
                     62: .PP
                     63: An action is a sequence of statements.
                     64: A statement can be one of the following:
                     65: .PP
                     66: .EX
                     67: .ta \w'\f5delete array[expression]'u
                     68: if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
                     69: while(\fI expression \fP)\fI statement\fP
                     70: for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
                     71: for(\fI var \fPin\fI array \fP)\fI statement\fP
                     72: do\fI statement \fPwhile(\fI expression \fP)
                     73: break
                     74: continue
                     75: {\fR [\fP\fI statement ... \fP\fR] \fP}
                     76: \fIexpression\fP       #\fR commonly\fP\fI var = expression\fP
                     77: print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
                     78: printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
                     79: return\fR [ \fP\fIexpression \fP\fR]\fP
                     80: next   #\fR skip remaining patterns on this input line\fP
                     81: delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
                     82: exit\fR [ \fP\fIexpression \fP\fR]\fP  #\fR exit immediately; status is \fP\fIexpression\fP
                     83: .EE
                     84: .DT
                     85: .PP
                     86: Statements are terminated by
                     87: semicolons, newlines or right braces.
                     88: An empty
                     89: .I expression-list
                     90: stands for
                     91: .BR $0 .
                     92: String constants are quoted \f5"\ "\fR,
                     93: with the usual C escapes recognized within.
                     94: Expressions take on string or numeric values as appropriate,
                     95: and are built using the operators
                     96: .B + - * / % ^
                     97: (exponentiation), and concatenation (indicated by a blank).
                     98: The operators
                     99: .B
                    100: ! ++ -- += -= *= /= %= ^= **= > >= < <= == != ?:
                    101: are also available in expressions.
                    102: Variables may be scalars, array elements
                    103: (denoted
                    104: .IB x  [ i ] )
                    105: or fields.
                    106: Variables are initialized to the null string.
                    107: Array subscripts may be any string,
                    108: not necessarily numeric;
                    109: this allows for a form of associative memory.
                    110: Multiple subscripts such as
                    111: .B [i,j,k]
                    112: are permitted; the constituents are concatenated,
                    113: separated by the value of
                    114: .BR SUBSEP .
                    115: .PP
                    116: The
                    117: .B print
                    118: statement prints its arguments on the standard output
                    119: (or on a file if
                    120: .BI > file
                    121: or
                    122: .BI >> file
                    123: is present or on a pipe if
                    124: .BI | cmd
                    125: is present), separated by the current output field separator,
                    126: and terminated by the output record separator.
                    127: .I file
                    128: and
                    129: .I cmd
                    130: may be literal names or parenthesized expressions;
                    131: identical string values in different statements denote
                    132: the same open file.
                    133: The
                    134: .B printf
                    135: statement formats its expression list according to the format
                    136: (see
                    137: .IR printf (3)).
                    138: The built-in function
                    139: .BI close( expr )
                    140: closes the file or pipe
                    141: .IR expr .
                    142: .PP
                    143: The customary functions
                    144: .BR exp ,
                    145: .BR log ,
                    146: .BR sqrt ,
                    147: .BR sin ,
                    148: .BR cos ,
                    149: .BR atan2 
                    150: are built in.
                    151: Other built-in functions:
                    152: .TF length
                    153: .TP
                    154: .B length
                    155: the length of its argument
                    156: taken as a string,
                    157: or of
                    158: .B $0
                    159: if no argument.
                    160: .TP
                    161: .B rand
                    162: random number on (0,1)
                    163: .TP
                    164: .B srand
                    165: sets seed for
                    166: .B rand
                    167: .TP
                    168: .B int
                    169: truncates to an integer value
                    170: .TP
                    171: .BI substr( s , " m" , " n\fB)
                    172: the
                    173: .IR n -character
                    174: substring of
                    175: .I s
                    176: that begins at position
                    177: .IR m 
                    178: counted from 1.
                    179: .TP
                    180: .BI index( s , " t" )
                    181: the position in
                    182: .I s
                    183: where the string
                    184: .I t
                    185: occurs, or 0 if it does not.
                    186: .TP
                    187: .BI match( s , " r" )
                    188: the position in
                    189: .I s
                    190: where the regular expression
                    191: .I r
                    192: occurs, or 0 if it does not.
                    193: The variables
                    194: .B RSTART
                    195: and
                    196: .B RLENGTH
                    197: are set to the position and length of the matched string.
                    198: .TP
                    199: .BI split( s , " a" , " fs\fB)
                    200: splits the string
                    201: .I s
                    202: into array elements
                    203: .IB a [1] ,
                    204: .IB a [2] ,
                    205: \&...,
                    206: .IB a [ n ] ,
                    207: and returns
                    208: .IR n .
                    209: The separation is done with the regular expression
                    210: .I fs
                    211: or with the field separator
                    212: .B FS
                    213: if
                    214: .I fs
                    215: is not given.
                    216: .TP
                    217: .BI sub( r , " t" , " s\fB)
                    218: substitutes
                    219: .I t
                    220: for the first occurrence of the regular expression
                    221: .I r
                    222: in the string
                    223: .IR s .
                    224: If
                    225: .I s
                    226: is not given,
                    227: .B $0
                    228: is used.
                    229: .TP
                    230: .B gsub
                    231: same as
                    232: .B sub
                    233: except that all occurrences of the regular expression
                    234: are replaced;
                    235: .B sub
                    236: and
                    237: .B gsub
                    238: return the number of replacements.
                    239: .TP
                    240: .BI sprintf( fmt , " expr" , " ...\fB )
                    241: the string resulting from formatting
                    242: .I expr ...
                    243: according to the
                    244: .IR printf (3)
                    245: format
                    246: .I fmt
                    247: .TP
                    248: .BI system( cmd )
                    249: executes
                    250: .I cmd
                    251: and returns its exit status
                    252: .PD
                    253: .PP
                    254: The ``function''
                    255: .B getline
                    256: sets
                    257: .B $0 to
                    258: the next input record from the current input file;
                    259: .B getline
                    260: .BI < file
                    261: sets
                    262: .B $0
                    263: to the next record from
                    264: .IR file .
                    265: .B getline
                    266: .I x
                    267: sets variable
                    268: .I x
                    269: instead.
                    270: Finally,
                    271: .IB cmd " | getline
                    272: pipes the output of
                    273: .I cmd
                    274: into
                    275: .BR getline ;
                    276: each call of
                    277: .B getline
                    278: returns the next line of output from
                    279: .IR cmd .
                    280: In all cases,
                    281: .B getline
                    282: returns 1 for a successful input,
                    283: 0 for end of file, and \-1 for an error.
                    284: .PP
                    285: Patterns are arbitrary Boolean combinations
                    286: (with
                    287: .BR "! || &&" )
                    288: of regular expressions and
                    289: relational expressions.
                    290: Regular expressions are as in
                    291: .IR egrep ; 
                    292: see
                    293: .IR grep (1).
                    294: Isolated regular expressions
                    295: in a pattern apply to the entire line.
                    296: Regular expressions may also occur in
                    297: relational expressions, using the operators
                    298: .BR ~
                    299: and
                    300: .BR !~ .
                    301: .BI / re /
                    302: is a constant regular expression;
                    303: any string (constant or variable) may be used
                    304: as a regular expression, except in the position of an isolated regular expression
                    305: in a pattern.
                    306: .PP
                    307: A pattern may consist of two patterns separated by a comma;
                    308: in this case, the action is performed for all lines
                    309: from an occurrence of the first pattern
                    310: though an occurrence of the second.
                    311: .PP
                    312: A relational expression is one of the following:
                    313: .IP
                    314: .I expression matchop regular-expression
                    315: .br
                    316: .I expression relop expression
                    317: .br
                    318: .I expression in array-name
                    319: .br
                    320: .I (expr,expr,...) in array-name
                    321: .PP
                    322: where a relop is any of the six relational operators in C,
                    323: and a matchop is either
                    324: .B ~ 
                    325: (matches)
                    326: or
                    327: .B !~
                    328: (does not match).
                    329: A conditional is an arithmetic expression,
                    330: a relational expression,
                    331: or a Boolean combination
                    332: of these.
                    333: .PP
                    334: The special patterns
                    335: .B BEGIN
                    336: and
                    337: .B END
                    338: may be used to capture control before the first input line is read
                    339: and after the last.
                    340: .B BEGIN
                    341: and
                    342: .B END
                    343: do not combine with other patterns.
                    344: .PP
                    345: Variable names with special meanings:
                    346: .TF SUBSEP
                    347: .TP
                    348: .B FS
                    349: regular expression used to separate fields; also settable
                    350: by option
                    351: .BI -F fs.
                    352: .TP
                    353: .BR NF
                    354: number of fields in the current record
                    355: .TP
                    356: .B NR
                    357: ordinal number of the current record
                    358: .TP
                    359: .B FNR
                    360: ordinal number of the current record in the current file
                    361: .TP
                    362: .B FILENAME
                    363: the name of the current input file
                    364: .TP
                    365: .B RS
                    366: input record separator (default newline)
                    367: .TP
                    368: .B OFS
                    369: output field separator (default blank)
                    370: .TP
                    371: .B ORS
                    372: output record separator (default newline)
                    373: .TP
                    374: .B OFMT
                    375: output format for numbers (default
                    376: .BR "%.6g" )
                    377: .TP
                    378: .B SUBSEP
                    379: separates multiple subscripts (default 034)
                    380: .TP
                    381: .B ARGC
                    382: argument count, assignable
                    383: .TP
                    384: .B ARGV
                    385: argument array, assignable;
                    386: non-null members are taken as filenames
                    387: .PD
                    388: .PP
                    389: Functions may be defined (at the position of a pattern-action statement) thus:
                    390: .IP
                    391: .L
                    392: function foo(a, b, c) { ...; return x }
                    393: .PP
                    394: Parameters are passed by value if scalar and by reference if array name;
                    395: functions may be called recursively.
                    396: Parameters are local to the function; all other variables are global.
                    397: .SH EXAMPLES
                    398: .TP
                    399: .L
                    400: length > 72
                    401: Print lines longer than 72 characters.
                    402: .TP
                    403: .L
                    404: { print $2, $1 }
                    405: Print first two fields in opposite order.
                    406: .PP
                    407: .EX
                    408: BEGIN { FS = ",[ \et]*|[ \et]+" }
                    409:       { print $2, $1 }
                    410: .EE
                    411: .ns
                    412: .IP
                    413: Same, with input fields separated by comma and/or blanks and tabs.
                    414: .PP
                    415: .EX
                    416:        { s += $1 }
                    417: END    { print "sum is", s, " average is", s/NR }
                    418: .EE
                    419: .ns
                    420: .IP
                    421: Add up first column, print sum and average.
                    422: .TP
                    423: .L
                    424: /start/, /stop/
                    425: Print all lines between start/stop pairs.
                    426: .PP
                    427: .EX
                    428: BEGIN  {       # Simulate echo(1)
                    429:        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
                    430:        printf "\en"
                    431:        exit }
                    432: .EE
                    433: .SH SEE ALSO
                    434: .IR lex (1), 
                    435: .IR sed (1)
                    436: .br
                    437: A. V. Aho, B. W. Kernighan, P. J. Weinberger,
                    438: .I
                    439: Awk \- a Pattern Scanning and Processing Language (Programmer'sManual),
                    440: CSTR 118, 1985
                    441: .SH BUGS
                    442: There are no explicit conversions between numbers and strings.
                    443: To force an expression to be treated as a number add 0 to it;
                    444: to force it to be treated as a string concatenate
                    445: \f5""\fP to it.
                    446: .br
                    447: The scope rules for variables in functions are a botch.
                    448: .br
                    449: .L -S
                    450: and
                    451: .L -R
                    452: are flaky.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.