43BSDReno/lib/libc/gen/regexp.3 - annotate

Return to regexp.3 CVS log
Up to [CSRG BSD Unix] / 43BSDReno / lib / libc / gen
Annotation of 43BSDReno/lib/libc/gen/regexp.3, revision 1.1

1.1     ! root        1: .\"
        !             2: .\"    @(#)regexp.3    5.1 (Berkeley) 5/19/88
        !             3: .\"
        !             4: .TH REGEXP 3 "May 19, 1988"
        !             5: .UC
        !             6: .SH NAME
        !             7: regcomp, regexec, regsub, regerror \- regular expression handlers
        !             8: .SH SYNOPSIS
        !             9: .nf
        !            10: .B #include <regexp.h>
        !            11: .PP
        !            12: .B regexp *regcomp(exp)
        !            13: .B char *exp;
        !            14: .PP
        !            15: .B int regexec(prog, string)
        !            16: .B regexp *prog;
        !            17: .B char *string;
        !            18: .PP
        !            19: .B regsub(prog, source, dest)
        !            20: .B regexp *prog;
        !            21: .B char *source;
        !            22: .B char *dest;
        !            23: .PP
        !            24: .B regerror(msg)
        !            25: .B char *msg;
        !            26: .fi
        !            27: .SH NAME
        !            28: \fIRegcomp\fP, \fIregexec\fP, \fIregsub\fP, and \fIregerror\fP implement
        !            29: .IR egrep (1)-style
        !            30: regular expressions and supporting facilities.
        !            31: .PP
        !            32: .I Regcomp
        !            33: compiles a regular expression into a structure of type
        !            34: .IR regexp ,
        !            35: and returns a pointer to it.
        !            36: The space has been allocated using
        !            37: .IR malloc (3)
        !            38: and may be released by
        !            39: .IR free .
        !            40: .PP
        !            41: .I Regexec
        !            42: matches a NUL-terminated \fIstring\fR against the compiled regular expression
        !            43: in \fIprog\fR.
        !            44: It returns 1 for success and 0 for failure, and adjusts the contents of
        !            45: \fIprog\fR's \fIstartp\fR and \fIendp\fR (see below) accordingly.
        !            46: .PP
        !            47: The members of a
        !            48: .I regexp
        !            49: structure include at least the following (not necessarily in order):
        !            50: .PP
        !            51: .RS
        !            52: char *startp[NSUBEXP];
        !            53: .br
        !            54: char *endp[NSUBEXP];
        !            55: .RE
        !            56: .PP
        !            57: where
        !            58: .I NSUBEXP
        !            59: is defined (as 10) in the header file.
        !            60: Once a successful \fIregexec\fR has been done using the \fIregexp\fR,
        !            61: each \fIstartp\fR-\fIendp\fR pair describes one substring
        !            62: within the \fIstring\fR,
        !            63: with the \fIstartp\fR pointing to the first character of the substring and
        !            64: the \fIendp\fR pointing to the first character following the substring.
        !            65: The 0th substring is the substring of \fIstring\fR that matched the whole
        !            66: regular expression.
        !            67: The others are those substrings that matched parenthesized expressions
        !            68: within the regular expression, with parenthesized expressions numbered
        !            69: in left-to-right order of their opening parentheses.
        !            70: .PP
        !            71: .I Regsub
        !            72: copies \fIsource\fR to \fIdest\fR, making substitutions according to the
        !            73: most recent \fIregexec\fR performed using \fIprog\fR.
        !            74: Each instance of `&' in \fIsource\fR is replaced by the substring
        !            75: indicated by \fIstartp\fR[\fI0\fR] and
        !            76: \fIendp\fR[\fI0\fR].
        !            77: Each instance of `\e\fIn\fR', where \fIn\fR is a digit, is replaced by
        !            78: the substring indicated by
        !            79: \fIstartp\fR[\fIn\fR] and
        !            80: \fIendp\fR[\fIn\fR].
        !            81: To get a literal `&' or `\e\fIn\fR' into \fIdest\fR, prefix it with `\e';
        !            82: to get a literal `\e' preceding `&' or `\e\fIn\fR', prefix it with
        !            83: another `\e'.
        !            84: .PP
        !            85: .I Regerror
        !            86: is called whenever an error is detected in \fIregcomp\fR, \fIregexec\fR,
        !            87: or \fIregsub\fR.
        !            88: The default \fIregerror\fR writes the string \fImsg\fR,
        !            89: with a suitable indicator of origin,
        !            90: on the standard
        !            91: error output
        !            92: and invokes \fIexit\fR(2).
        !            93: .I Regerror
        !            94: can be replaced by the user if other actions are desirable.
        !            95: .SH "REGULAR EXPRESSION SYNTAX"
        !            96: A regular expression is zero or more \fIbranches\fR, separated by `|'.
        !            97: It matches anything that matches one of the branches.
        !            98: .PP
        !            99: A branch is zero or more \fIpieces\fR, concatenated.
        !           100: It matches a match for the first, followed by a match for the second, etc.
        !           101: .PP
        !           102: A piece is an \fIatom\fR possibly followed by `*', `+', or `?'.
        !           103: An atom followed by `*' matches a sequence of 0 or more matches of the atom.
        !           104: An atom followed by `+' matches a sequence of 1 or more matches of the atom.
        !           105: An atom followed by `?' matches a match of the atom, or the null string.
        !           106: .PP
        !           107: An atom is a regular expression in parentheses (matching a match for the
        !           108: regular expression), a \fIrange\fR (see below), `.'
        !           109: (matching any single character), `^' (matching the null string at the
        !           110: beginning of the input string), `$' (matching the null string at the
        !           111: end of the input string), a `\e' followed by a single character (matching
        !           112: that character), or a single character with no other significance
        !           113: (matching that character).
        !           114: .PP
        !           115: A \fIrange\fR is a sequence of characters enclosed in `[]'.
        !           116: It normally matches any single character from the sequence.
        !           117: If the sequence begins with `^',
        !           118: it matches any single character \fInot\fR from the rest of the sequence.
        !           119: If two characters in the sequence are separated by `\-', this is shorthand
        !           120: for the full list of ASCII characters between them
        !           121: (e.g. `[0-9]' matches any decimal digit).
        !           122: To include a literal `]' in the sequence, make it the first character
        !           123: (following a possible `^').
        !           124: To include a literal `\-', make it the first or last character.
        !           125: .SH AMBIGUITY
        !           126: If a regular expression could match two different parts of the input string,
        !           127: it will match the one which begins earliest.
        !           128: If both begin in the same place but match different lengths, or match
        !           129: the same length in different ways, life gets messier, as follows.
        !           130: .PP
        !           131: In general, the possibilities in a list of branches are considered in
        !           132: left-to-right order, the possibilities for `*', `+', and `?' are
        !           133: considered longest-first, nested constructs are considered from the
        !           134: outermost in, and concatenated constructs are considered leftmost-first.
        !           135: The match that will be chosen is the one that uses the earliest
        !           136: possibility in the first choice that has to be made.
        !           137: If there is more than one choice, the next will be made in the same manner
        !           138: (earliest possibility) subject to the decision on the first choice.
        !           139: And so forth.
        !           140: .PP
        !           141: For example, `(ab|a)b*c' could match `abc' in one of two ways.
        !           142: The first choice is between `ab' and `a'; since `ab' is earlier, and does
        !           143: lead to a successful overall match, it is chosen.
        !           144: Since the `b' is already spoken for,
        !           145: the `b*' must match its last possibility\(emthe empty string\(emsince
        !           146: it must respect the earlier choice.
        !           147: .PP
        !           148: In the particular case where no `|'s are present and there is only one
        !           149: `*', `+', or `?', the net effect is that the longest possible
        !           150: match will be chosen.
        !           151: So `ab*', presented with `xabbbby', will match `abbbb'.
        !           152: Note that if `ab*' is tried against `xabyabbbz', it
        !           153: will match `ab' just after `x', due to the begins-earliest rule.
        !           154: (In effect, the decision on where to start the match is the first choice
        !           155: to be made, hence subsequent choices must respect it even if this leads them
        !           156: to less-preferred alternatives.)
        !           157: .SH DIAGNOSTICS
        !           158: \fIRegcomp\fR returns NULL for a failure
        !           159: (\fIregerror\fR permitting),
        !           160: where failures are syntax errors, exceeding implementation limits,
        !           161: or applying `+' or `*' to a possibly-null operand.
        !           162: .SH HISTORY
        !           163: Both code and manual page for \fIregcomp\fP, \fIregexec\fP, \fIregsub\fP,
        !           164: and \fIregerror\fP were written at the University of Toronto.
        !           165: They are intended to be compatible with the Bell V8 \fIregexp\fR(3),
        !           166: but are not derived from Bell code.
        !           167: .SH BUGS
        !           168: Empty branches and empty regular expressions are not portable to V8.
        !           169: .PP
        !           170: The restriction against
        !           171: applying `*' or `+' to a possibly-null operand is an artifact of the
        !           172: simplistic implementation.
        !           173: .PP
        !           174: Does not support \fIegrep\fR's newline-separated branches;
        !           175: neither does the V8 \fIregexp\fR(3), though.
        !           176: .PP
        !           177: Due to emphasis on
        !           178: compactness and simplicity,
        !           179: it's not strikingly fast.
        !           180: It does give special attention to handling simple cases quickly.
        !           181: .SH "SEE ALSO"
        !           182: ed(1), ex(1), expr(1), egrep(1), fgrep(1), grep(1), regex(3)
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.