Annotation of GNUtools/emacs/man/regex.texinfo, revision 1.1.1.1

1.1       root        1: \input texinfo
                      2: @comment -*- Mode: texinfo -*-
                      3: @comment This documents the GNU regex library
                      4: @setfilename regex
                      5: 
                      6: @comment >> @code{"foo"} for literal strings vs @b{"foo"} vs @code{foo}
                      7: @comment >>  (this file is presently using the last  --- it looks ok in
                      8: @comment >>   info; wait to see what it looks like under botex)
                      9: 
                     10: 
                     11: @comment >> superior of (dir) is temporary
                     12: @node top, syntax, , (dir)
                     13: @comment  node-name,  next,  previous,  up
                     14: @chapter @dfn{regex} regular expression matching library.
                     15: 
                     16: @section Overview
                     17: 
                     18: Regular expression matching allows you to test whether a string fits
                     19: into a specific syntactic shape.  You can also search a string for a
                     20: substring that fits a pattern.
                     21: 
                     22: A regular expression describes a set of strings.  The simplest case is
                     23: one that describes a particular string; for example, the string @samp{foo}
                     24: when regarded as a regular expression matches @samp{foo} and nothing else.
                     25: Nontrivial regular expressions use certain special constructs so that
                     26: they can match more than one string.  For example, the regular expression
                     27: @samp{foo\|bar} matches either the string @samp{foo} or the string @samp{bar}; the
                     28: regular expression @samp{c[ad]*r} matches any of the strings @samp{cr}, @samp{car},
                     29: @samp{cdr}, @samp{caar}, @samp{cadddar} and all other such strings with any number of
                     30: @samp{a}'s and @samp{d}'s.
                     31: 
                     32: The first step in matching a regular expression is to compile it.
                     33: You must supply the pattern string and also a pattern buffer to hold
                     34: the compiled result.  That result contains the pattern in an internal
                     35: format that is easier to use in matching.
                     36: 
                     37: Having compiled a pattern, you can match it against strings.  You can
                     38: match the compiled pattern any number of times against different
                     39: strings.
                     40: 
                     41: @menu
                     42: * syntax::     Syntax of regular expressions
                     43: * directives:: Meaning of characters as regex string directives.
                     44: * emacs::      Additional character directives available
                     45:                  only for use within Emacs.
                     46: * programming:: Using the regex library from C programs
                     47: * unix::       Unix-compatible entry-points to regex library
                     48: @end menu
                     49: 
                     50: @node syntax, directives, top, top
                     51: @comment  node-name,  next,  previous,  up
                     52: @section Syntax of Regular Expressions
                     53: 
                     54: Regular expressions have a syntax in which a few characters are special
                     55: constructs and the rest are @dfn{ordinary}.  An ordinary character is a
                     56: simple regular expression which matches that character and nothing else.
                     57: The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*},
                     58: @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}.  Any other character
                     59: appearing in a regular expression is ordinary, unless a @samp{\} precedes
                     60: it.@refill
                     61: 
                     62: For example, @samp{f} is not a special character, so it is ordinary,
                     63: and therefore @samp{f} is a regular expression that matches the string @samp{f}
                     64: and no other string.  (It does @emph{not} match the string @samp{ff}.)  Likewise,
                     65: @samp{o} is a regular expression that matches only @samp{o}.
                     66: 
                     67: Any two regular expressions @var{a} and @var{b} can be concatenated.
                     68: The result is a regular expression which matches a string if @var{a}
                     69: matches some amount of the beginning of that string and @var{b}
                     70: matches the rest of the string.
                     71: 
                     72: As a simple example, we can concatenate the regular expressions
                     73: @samp{f} and @samp{o} to get the regular expression @samp{fo},
                     74: which matches only the string @samp{fo}.  Still trivial.
                     75: 
                     76: Note: for Unix compatibility, special characters are treated as
                     77: ordinary ones if they are in contexts where their special meanings
                     78: make no sense.  For example, @samp{*foo} treats @samp{*} as ordinary since
                     79: there is no preceding expression on which the @samp{*} can act.
                     80: It is poor practice to depend on this behavior; better to quote
                     81: the special character anyway, regardless of where is appears.
                     82: 
                     83: 
                     84: @node directives, emacs , syntax, top
                     85: @comment  node-name,  next,  previous,  up
                     86: 
                     87: @ifinfo
                     88: The following are the characters and character sequences which have
                     89: special meaning within regular expressions.
                     90: Any character not mentioned here is not special; it stands for exactly
                     91: itself for the purposes of searching and matching.  @xref{syntax}.
                     92: @end ifinfo
                     93: 
                     94: @table @samp
                     95: @item .
                     96: is a special character that matches anything except a newline.
                     97: Using concatenation, we can make regular expressions like @samp{a.b} which
                     98: matches any three-character string which begins with @samp{a} and ends with @samp{b}.@refill
                     99: 
                    100: @item *
                    101: is not a construct by itself; it is a suffix, which means the preceding
                    102: regular expression is to be repeated as many times as possible.  In @samp{fo*},
                    103: the @samp{*} applies to the @samp{o}, so @samp{fo*} matches @samp{f} followed by any number of @samp{o}'s.@refill
                    104: 
                    105: The case of zero @samp{o}'s is allowed: @samp{fo*} does match @samp{f}.@refill
                    106: 
                    107: @samp{*} always applies to the @emph{smallest} possible preceding expression.
                    108: Thus, @samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}.@refill
                    109: 
                    110: The matcher processes a @samp{*} construct by matching, immediately, as many
                    111: repetitions as can be found.  Then it continues with the rest of the
                    112: pattern.  If that fails, backtracking occurs, discarding some of
                    113: the matches of the @samp{*}'d construct in case that makes it possible
                    114: to match the rest of the pattern.  For example, matching @samp{c[ad]*ar}
                    115: against the string @samp{caddaar}, the @samp{[ad]*} first matches @samp{addaa},
                    116: but this does not allow the next @samp{a} in the pattern to match.
                    117: So the last of the matches of @samp{[ad]} is undone and the following
                    118: @samp{a} is tried again.  Now it succeeds.@refill
                    119: 
                    120: @item +
                    121: @samp{+} is like @samp{*} except that at least one match for the preceding
                    122: pattern is required for @samp{+}.  Thus, @samp{c[ad]+r} does not match
                    123: @samp{cr} but does match anything else that @samp{c[ad]*r} would match.
                    124: 
                    125: @item ?
                    126: @samp{?} is like @samp{*} except that it allows either zero or one match
                    127: for the preceding pattern.  Thus, @samp{c[ad]?r} matches @samp{cr} or
                    128: @samp{car} or @samp{cdr}, and nothing else.
                    129: 
                    130: @item [ @dots{} ]
                    131: @samp{[} begins a @dfn{character set}, which is terminated by a @samp{]}.
                    132: In the simplest case, the characters between the two form the set.
                    133: Thus, @samp{[ad]} matches either @samp{a} or @samp{d},
                    134: and @samp{[ad]*} matches any string of @samp{a}'s and @samp{d}'s
                    135: (including the empty string), from which it follows that
                    136: @samp{c[ad]*r} matches @samp{car}, etc.@refill
                    137: 
                    138: Character ranges can also be included in a character set, by writing two
                    139: characters with a @samp{-} between them.  Thus, @samp{[a-z]} matches
                    140: any lower-case letter.  Ranges may be intermixed freely with
                    141: individual characters, as in @samp{[a-z$%.]}, which matches any
                    142: lower case letter or @samp{$}, @samp{%} or period.@refill
                    143: 
                    144: Note that the usual special characters are not special any more inside a
                    145: character set.  A completely different set of special characters exists
                    146: inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
                    147: 
                    148: To include a @samp{]} in a character set, you must make it
                    149: the first character.  For example, @samp{[]a]} matches @samp{]} or @samp{a}.
                    150: To include a @samp{-}, you must use it in a context where it cannot possibly
                    151: indicate a range: that is, as the first character, or immediately
                    152: after a range.@refill
                    153: 
                    154: @item [^ @dots{} ]
                    155: @samp{[^} begins a @dfn{complement character set}, which matches any
                    156: character except the ones specified.  Thus, @samp{[^a-z0-9A-Z]}
                    157: matches all characters @emph{except} letters and digits.@refill
                    158: 
                    159: @samp{^} is not special in a character set unless it is the first character.
                    160: The character following the @samp{^} is treated as if it were first
                    161: (it may be a @samp{-} or a @samp{]}).@refill
                    162: 
                    163: @item ^
                    164: is a special character that matches the empty string -- but only
                    165: if at the beginning of a line in the text being matched.  Otherwise
                    166: it fails to match anything.  Thus, @samp{^foo} matches a @samp{foo}
                    167: which occurs at the beginning of a line.@refill
                    168: 
                    169: @item $
                    170: is similar to @samp{^} but matches only at the end of a line.
                    171: Thus, @samp{xx*$} matches a string of one or more @samp{x}'s
                    172: at the end of a line.@refill
                    173: 
                    174: @item \
                    175: has two functions: it quotes the above special characters
                    176: (including @samp{\}), and it introduces additional special constructs.@refill
                    177: 
                    178: Because @samp{\} quotes special characters, @samp{\$} is a regular
                    179: expression which matches only @samp{$}, and @samp{\[} is a regular
                    180: expression which matches only @samp{[}, and so on.@refill
                    181: 
                    182: For the most part, @samp{\} followed by any character matches only that
                    183: character.  However, there are several exceptions: characters which, when
                    184: preceded by @samp{\}, are special constructs.  Such characters are always
                    185: ordinary when encountered on their own.@refill
                    186: 
                    187: No new special characters will ever be defined.  All extensions to
                    188: the regular expression syntax are made by defining new two-character
                    189: constructs that begin with @samp{\}.@refill
                    190: 
                    191: @item \|
                    192: specifies an alternative.
                    193: Two regular expressions @var{a} and @var{b} with @samp{\|} in
                    194: between form an expression that matches anything that either @var{a} or
                    195: @var{b} will match.@refill
                    196: 
                    197: Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
                    198: but no other string.@refill
                    199: 
                    200: @samp{\|} applies to the largest possible surrounding expressions.  Only a
                    201: surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
                    202: @samp{\|}.@refill
                    203: 
                    204: Full backtracking capability exists when multiple @samp{\|}'s are used.@refill
                    205: 
                    206: @item \( @dots{} \)
                    207: is a grouping construct that serves three purposes:
                    208: 
                    209: @enumerate
                    210: @item
                    211: To enclose a set of @samp{\|} alternatives for other operations.
                    212: Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
                    213: 
                    214: @item
                    215: To enclose a complicated expression for the postfix @samp{*} to operate on.
                    216: Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any (zero or
                    217: more) number of @samp{na}'s.@refill
                    218: 
                    219: @item
                    220: To mark a matched substring for future reference.
                    221: 
                    222: @end enumerate
                    223: 
                    224: This last application is not a consequence of the idea of a parenthetical
                    225: grouping; it is a separate feature which happens to be assigned as a
                    226: second meaning to the same @samp{\( @dots{} \)} construct because there is no
                    227: conflict in practice between the two meanings.  Here is an explanation
                    228: of this feature:@refill
                    229: 
                    230: @item \@var{digit}
                    231: After the end of a @samp{\( @dots{} \)} construct, the matcher remembers the
                    232: beginning and end of the text matched by that construct.  Then, later on
                    233: in the regular expression, you can use @samp{\} followed by @var{digit}
                    234: to mean ``match the same text matched the @var{digit}'th time by the
                    235: @samp{\( @dots{} \)} construct.''  The @samp{\( @dots{} \)} constructs
                    236: are numbered in order of commencement in the regexp.@refill
                    237: 
                    238: The strings matching the first nine @samp{\( @dots{} \)} constructs appearing
                    239: in a regular expression are assigned numbers 1 through 9 in order of their
                    240: beginnings.
                    241: @samp{\1} through @samp{\9} may be used to refer to the text matched by
                    242: the corresponding @samp{\( @dots{} \)} construct.@refill
                    243: 
                    244: For example, @samp{\(.*\)\1} matches any string that is composed of two
                    245: identical halves.  The @samp{\(.*\)} matches the first half, which may be
                    246: anything, but the @samp{\1} that follows must match the same exact text.@refill
                    247: 
                    248: @item \b
                    249: matches the empty string, but only if it is at the beginning or
                    250: end of a word.  Thus, @samp{\bfoo\b} matches any occurrence of
                    251: @samp{foo} as a separate word.  @samp{\bball\(s\|\)\b} matches
                    252: @samp{ball} or @samp{balls} as a separate word.@refill
                    253: 
                    254: @item \B
                    255: matches the empty string, provided it is @emph{not} at the beginning or
                    256: end of a word.@refill
                    257: 
                    258: @item \<
                    259: matches the empty string, but only if it is at the beginning
                    260: of a word.
                    261: 
                    262: @item \>
                    263: matches the empty string, but only if it is at the end of a word.
                    264: 
                    265: @item \w
                    266: matches any word-constituent character.
                    267: 
                    268: @item \W
                    269: matches any character that is not a word-constituent.
                    270: @end table
                    271: 
                    272: There are a number of additional @samp{\} regexp directives available for use
                    273: within Emacs only.
                    274: @ifinfo 
                    275: (@pxref{emacs}).
                    276: @comment no need to make a tex xref to something one line down!
                    277: @end ifinfo
                    278: 
                    279: @node emacs, programming, directives, top
                    280: @comment  node-name,  next,  previous,  up
                    281: @subsection Constructs Available in Emacs Only
                    282: 
                    283: @table @samp
                    284: @item \`
                    285: matches the empty string, but only if it is at the beginning
                    286: of the buffer.@refill
                    287: 
                    288: @item \'
                    289: matches the empty string, but only if it is at the end of
                    290: the buffer.@refill
                    291: 
                    292: @item \s@var{code}
                    293: matches any character whose syntax is @var{code}.
                    294: @var{code} is a letter which represents a syntax code:
                    295: thus, @samp{w} for word constituent, @samp{-} for
                    296: whitespace, @samp{(} for open-parenthesis, etc.
                    297: See the documentation for the Emacs function @samp{modify-syntax-entry}
                    298: for further details.@refill
                    299: 
                    300: Thus, @samp{\s(} matches any character with open-parenthesis syntax.
                    301: 
                    302: @item \S@var{code}
                    303: matches any character whose syntax is not @var{code}.
                    304: @end table
                    305: 
                    306: @node programming, compiling, emacs, top
                    307: @comment  node-name,  next,  previous,  up
                    308: @section Programming using the @code{regex} library
                    309: 
                    310: @ifinfo
                    311: The subnodes accessible from this menu give information on entry
                    312: points and data structures which C programs need to interface to the
                    313: @code{regex} library.
                    314: @end ifinfo
                    315: 
                    316: @menu
                    317: * compiling::  How to compile regular expressions
                    318: * matching::   Matching compiled regular expressions
                    319: * searching::  Searching for compiled regular expressions
                    320: * translation::        Translating characters into other characters
                    321:                  (for both compilation and matching)
                    322: * registers::  determining what was matched
                    323: * split::      matching data which is split into two pieces
                    324: * unix::       Unix-compatible entry-points to regex library
                    325: @end menu
                    326: 
                    327: @node compiling, matching, programming , programming
                    328: @comment  node-name,  next,  previous,  up
                    329: @subsection Compiling a Regular Expression
                    330: 
                    331: To compile a regular expression, you must supply a pattern buffer.
                    332: This is a structure defined, in the include file @file{regex.h}, as follows:
                    333:     
                    334: @example
                    335: struct re_pattern_buffer
                    336:   @{
                    337:     char *buffer   /* Space holding the compiled pattern commands. */
                    338:     int allocated  /* Size of space that  buffer  points to */
                    339:     int used       /* Length of portion of buffer actually occupied */
                    340:     char *fastmap; /* Pointer to fastmap, if any, or zero if none. */
                    341:                    /* re_search uses the fastmap, if there is one,
                    342:                       to skip quickly over totally implausible
                    343:                       characters */
                    344:     char *translate;
                    345:                    /* Translate table to apply to characters before
                    346:                       comparing, or zero for no translation.
                    347:                       The translation is applied to a pattern when
                    348:                       it is compiled and to data when it is matched. */
                    349:     char fastmap_accurate;
                    350:                    /* Set to zero when a new pattern is stored,
                    351:                       set to one when the fastmap is updated from it. */
                    352:   @};
                    353: @end example
                    354: 
                    355: Before compiling a pattern, you must initialize the @code{buffer} field to
                    356: point to a block of memory obtained with @code{malloc},
                    357: and the @code{allocated} field to the size of that block, in bytes.
                    358: The pattern compiler will replace this block with a larger one if necessary.
                    359: 
                    360: You must also initialize the @code{translate} field to point to the translate
                    361: table that you will use when you match the compiled pattern, or to zero
                    362: if you will use no translate table when you match.  @xref{translation}.
                    363: 
                    364: Then call @code{re_compile_pattern} to compile a regular expression
                    365: into the buffer:
                    366: @example
                    367: re_compile_pattern (@var{regex}, @var{regex_size}, @var{buf})
                    368: @end example
                    369: 
                    370: @var{regex} is the address of the regular expression (@code{char *}),
                    371: @var{regex_size} is its length (@code{int}),
                    372: @var{buf} is the address of the buffer (@code{struct re_pattern_buffer *}).
                    373: 
                    374: @code{re_compile_pattern} returns zero if it succeeds in compiling the regular
                    375: expression.  In that case, @code{*buf} now contains the results.
                    376: Otherwise, @code{re_compile_pattern} returns a string which serves as
                    377: an error message.
                    378: 
                    379: After compiling, if you wish to search for the pattern, you must
                    380: initialize the @code{fastmap} component of the pattern buffer.
                    381: @xref{searching}.
                    382: 
                    383: @node matching, searching, compiling, programming
                    384: @comment  node-name,  next,  previous,  up
                    385: @subsection Matching a Compiled Pattern
                    386: 
                    387: Once a regular expression has been compiled into a pattern buffer,
                    388: you can match the pattern buffer against a string with @code{re_match}.
                    389: 
                    390: @example
                    391: re_match (@var{buf}, @var{string}, @var{size}, @var{pos}, @var{regs})
                    392: @end example
                    393: 
                    394: @var{buf} is, once again, the address of the buffer (@code{struct re_pattern_buffer *}).
                    395: @var{string} is the string to be matched (@code{char *}).
                    396: @var{size} is the length of that string (@code{int}).
                    397: @var{pos} is the position within the string at which to begin matching (@code{int}).
                    398: The beginning of the string is position 0.
                    399: @var{regs} is described below.  Normally it is zero.  @xref{registers}.
                    400: 
                    401: @code{re_match} returns @code{-1} if the pattern does not match; otherwise,
                    402: it returns the length of the portion of @code{string} which was matched.
                    403: 
                    404: For example, suppose that @var{buf} points to a buffer containing the result
                    405: of compiling @code{x*}, @var{string} points to @code{xxxxxy}, and @var{size} is @code{6}.
                    406: Suppose that @var{pos} is @code{2}.  Then the last three @code{x}'s will be matched,
                    407: so @code{re_match} will return @code{3}.
                    408: If @var{pos} is zero, the value will be @code{5}.
                    409: If @var{pos} is @code{5} or @code{6}, the value will be zero, meaning that the null string
                    410: was successfully matched.
                    411: Note that since @code{x*} matches the empty string, it will never entirely fail.
                    412: 
                    413: It is up to the caller to avoid passing a value of @var{pos} that results in
                    414: matching outside the specified string.  @var{pos} must not be negative and
                    415: must not be greater than @var{size}.
                    416: 
                    417: @node searching, translation, matching, programming
                    418: @comment  node-name,  next,  previous,  up
                    419: @subsection Searching for a Match
                    420: 
                    421: Searching means trying successive starting positions for a match until a
                    422: match is found.  To search, you supply a compiled pattern buffer.  Before
                    423: searching you must initialize the @code{fastmap} field of the pattern
                    424: buffer (see below).
                    425: 
                    426: @example
                    427: re_search (@var{buf}, @var{string}, @var{size}, @var{startpos}, @var{range}, @var{regs})
                    428: @end example
                    429: 
                    430: @noindent
                    431: is called like @code{re_match} except that the @var{pos} argument is
                    432: replaced by two arguments @var{startpos} and @var{range}.  @code{re_search}
                    433: tests for a match starting at index @var{startpos}, then at
                    434: @code{@var{startpos} + 1}, and so on.  It tries @var{range} consecutive
                    435: positions before giving up and returning @code{-1}.  If a match is found,
                    436: @code{re_search} returns the index at which the match was found.@refill
                    437: 
                    438: If @var{range} is negative, @var{re_search} tries starting positions
                    439: @var{startpos}, @code{@var{startpos} - 1}, @dots{} in that order.
                    440: @code{|@var{range}|} is the number of tries made.@refill
                    441: 
                    442: It is up to the caller to avoid passing value of @var{startpos} and
                    443: @var{range} that result in matching outside the specified string.
                    444: @var{startpos} must be between zero and @var{size}, inclusive, and so must
                    445: @code{@var{startpos} + @var{range} - 1} (if @var{range} is positive) or
                    446: @code{@var{startpos} + @var{range} + 1} (if @var{range} is negative).@refill
                    447: 
                    448: If you may be searching over a long distance (that is, trying many
                    449: different match starting points) with a compiled pattern, you should use a
                    450: @dfn{fastmap} in it.  This is a block of 256 bytes, whose address is
                    451: placed in the @code{fastmap} component of the pattern buffer.  The first
                    452: time you search for a particular compiled pattern, the fastmap is set so
                    453: that @code{@var{fastmap}[@var{ch}]} is nonzero if the character @var{ch}
                    454: might possibly start a match for this pattern.  @code{re_search} checks
                    455: each character against the fastmap so that it can skip more quickly over
                    456: non-matches.
                    457: 
                    458: If you do not want a fastmap, store zero in the @code{fastmap} component of the
                    459: pattern buffer before calling @code{re_search}.
                    460: 
                    461: In either case, you must initialize this component in a pattern buffer
                    462: before you can use that buffer in a search; but you can choose as an
                    463: initial value either zero or the address of a suitable block of memory.
                    464: 
                    465: If you compile a new pattern in an existing pattern buffer, it is not
                    466: necessary to reinitialize the @code{fastmap} component (unless you
                    467: wish to override your previous choice).
                    468: 
                    469: @node translation, registers, searching, programming
                    470: @comment  node-name,  next,  previous,  up
                    471: @subsection Translate Tables
                    472: 
                    473: With a translate table, you can apply a transformation to all characters
                    474: before they are compared.  For example, a table that maps lower case letters
                    475: into upper case (or vice versa) causes differences in case to be ignored
                    476: by matching.
                    477: 
                    478: A translate table is a block of 256 bytes.  Each character of raw data is
                    479: used as an index in the translate table.  The value found there is used
                    480: instead of the original character.  Each character in a regular
                    481: expression, except for the syntactic constructs, is translated when the
                    482: expression is compiled.  Each character of a string being matched is
                    483: translated whenever it is compared or tested.
                    484: 
                    485: A suitable translate table to ignore differences in case maps all
                    486: characters into themselves, except for lower case letters, which are
                    487: mapped into the corresponding upper case letters.
                    488: It could be initialized by:
                    489: 
                    490: @example
                    491: for (i = 0; i < 0400; i++)
                    492:   table[i] = i;
                    493: for (i = 'a'; i <= 'z'; i++)
                    494:   table[i] = i - 040;
                    495: @end example
                    496: 
                    497: You specify the use of a translate table by putting its address in the
                    498: @var{translate} component of the compiled pattern buffer.  If this component
                    499: is zero, no translation is done.  Since both compilation and matching use
                    500: the translate table, you must use the same table contents for both
                    501: operations or confusing things will happen.
                    502: 
                    503: @node registers, split, translation, programming
                    504: @comment  node-name,  next,  previous,  up
                    505: @subsection Registers: or ``What Did the @samp{\( @dots{} \)} Groupings Actually Match?''
                    506: 
                    507: If you want to find out, after the match, what each of the first nine
                    508: @samp{\( @dots{} \)} groupings actually matched, you can pass the @var{regs} argument
                    509: to the match or search function.  Pass the address of a structure of this type:
                    510: 
                    511: @example
                    512: struct re_registers
                    513:   @{
                    514:     int start[RE_NREGS];
                    515:     int end[RE_NREGS];
                    516:   @};
                    517: @end example
                    518: 
                    519:   @code{re_match} and @code{re_search} will store into this structure the
                    520: data you want.  @code{@var{regs}->start[@var{reg}]} will be the index in
                    521: @var{string} of the beginning of the data matched by the @var{reg}'th
                    522: @samp{\( @dots{} \)} grouping, and @code{@var{regs}->end[@var{reg}]} will
                    523: be the index of the end of that data (the index of the first character
                    524: beyond those matched).  The values in the start and end arrays at
                    525: indexes greater than the number of @samp{\( @dots{} \)} groupings
                    526: present in the regular expression will be set to the value -1.  Register
                    527: numbers start at 1 and run to @code{RE_NREGS - 1} (normally @code{9}).
                    528: @code{@var{regs}->start[0]} and @code{@var{regs}->end[0]} are similar but
                    529: describe the extent of the substring matched by the entire pattern.@refill
                    530: 
                    531:   Both @code{struct re_registers} and @code{RE_NREGS} are defined in @file{regex.h}.
                    532: 
                    533: @node split, unix, registers, programming
                    534: @comment  node-name,  next,  previous,  up
                    535: @subsection Matching against Split Data
                    536: 
                    537: The functions @code{re_match_2} and @code{re_search_2} allow one to match in or search
                    538: data which is divided into two strings.
                    539: 
                    540: @code{re_match_2} works like @code{re_match} except that two data strings and
                    541: sizes must be given.
                    542: 
                    543: @example
                    544: re_match_2 (@var{buf}, @var{string1}, @var{size1}, @var{string2}, @var{size2}, @var{pos}, @var{regs})
                    545: @end example
                    546: 
                    547: The matcher regards the contents of @var{string1} as effectively followed by
                    548: the contents of @var{string2}, and matches the combined string against the
                    549: pattern in @var{buf}.
                    550: 
                    551: @code{re_search_2} is likewise similar to @code{re_search}:
                    552: 
                    553: @example
                    554: re_search_2 (@var{buf}, @var{string1}, @var{size1}, @var{string2}, @var{size2}, @var{startpos}, @var{range}, @var{regs})
                    555: @end example
                    556: 
                    557: The value returned by @var{re_search_2} is an index into the combined data
                    558: made up of @var{string1} and @var{string2}.  It never exceeds @code{@var{size1} + @var{size2}}.
                    559: The values returned in the @var{regs} structure (if there is one) are likewise
                    560: indices in the combined data.
                    561: 
                    562: @node unix, , split, programming
                    563: @comment  node-name,  next,  previous,  up
                    564: @subsection Unix-Compatible Entry Points
                    565: 
                    566: The standard Berkeley Unix way to compile a regular expression is to call
                    567: @code{re_comp}.  This function takes a single argument, the address of the
                    568: regular expression, which is assumed to be terminated by a null character.
                    569: 
                    570: @code{re_comp} does not ask you to specify a pattern buffer because it has its
                    571: own pattern buffer --- just one.  Using @code{re_comp}, one may match only the
                    572: most recently compiled regular expression.
                    573: 
                    574: The value of @code{re_comp} is zero for success or else an error message string,
                    575: as for @code{re_compile_pattern}.
                    576: 
                    577: Calling @code{re_comp} with the null string as argument it has no effect;
                    578: the contents of the buffer remain unchanged.
                    579: 
                    580: The standard Berkeley Unix way to match the last regular expression compiled
                    581: is to call @code{re_exec}.  This takes a single argument, the address of
                    582: the string to be matched.  This string is assumed to be terminated by
                    583: a null character.  Matching is tried starting at each position in the
                    584: string.  @code{re_exec} returns @code{1} for success or @code{0} for failure.
                    585: One cannot find out how long a substring was matched, nor what the
                    586: @samp{\( @dots{} \)} groupings matched.
                    587: 
                    588: @bye

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.