Annotation of GNUtools/emacs/man/regex.texinfo, revision 1.1

1.1     ! root        1: \input texinfo
        !             2: @comment -*- Mode: texinfo -*-
        !             3: @comment This documents the GNU regex library
        !             4: @setfilename regex
        !             5: 
        !             6: @comment >> @code{"foo"} for literal strings vs @b{"foo"} vs @code{foo}
        !             7: @comment >>  (this file is presently using the last  --- it looks ok in
        !             8: @comment >>   info; wait to see what it looks like under botex)
        !             9: 
        !            10: 
        !            11: @comment >> superior of (dir) is temporary
        !            12: @node top, syntax, , (dir)
        !            13: @comment  node-name,  next,  previous,  up
        !            14: @chapter @dfn{regex} regular expression matching library.
        !            15: 
        !            16: @section Overview
        !            17: 
        !            18: Regular expression matching allows you to test whether a string fits
        !            19: into a specific syntactic shape.  You can also search a string for a
        !            20: substring that fits a pattern.
        !            21: 
        !            22: A regular expression describes a set of strings.  The simplest case is
        !            23: one that describes a particular string; for example, the string @samp{foo}
        !            24: when regarded as a regular expression matches @samp{foo} and nothing else.
        !            25: Nontrivial regular expressions use certain special constructs so that
        !            26: they can match more than one string.  For example, the regular expression
        !            27: @samp{foo\|bar} matches either the string @samp{foo} or the string @samp{bar}; the
        !            28: regular expression @samp{c[ad]*r} matches any of the strings @samp{cr}, @samp{car},
        !            29: @samp{cdr}, @samp{caar}, @samp{cadddar} and all other such strings with any number of
        !            30: @samp{a}'s and @samp{d}'s.
        !            31: 
        !            32: The first step in matching a regular expression is to compile it.
        !            33: You must supply the pattern string and also a pattern buffer to hold
        !            34: the compiled result.  That result contains the pattern in an internal
        !            35: format that is easier to use in matching.
        !            36: 
        !            37: Having compiled a pattern, you can match it against strings.  You can
        !            38: match the compiled pattern any number of times against different
        !            39: strings.
        !            40: 
        !            41: @menu
        !            42: * syntax::     Syntax of regular expressions
        !            43: * directives:: Meaning of characters as regex string directives.
        !            44: * emacs::      Additional character directives available
        !            45:                  only for use within Emacs.
        !            46: * programming:: Using the regex library from C programs
        !            47: * unix::       Unix-compatible entry-points to regex library
        !            48: @end menu
        !            49: 
        !            50: @node syntax, directives, top, top
        !            51: @comment  node-name,  next,  previous,  up
        !            52: @section Syntax of Regular Expressions
        !            53: 
        !            54: Regular expressions have a syntax in which a few characters are special
        !            55: constructs and the rest are @dfn{ordinary}.  An ordinary character is a
        !            56: simple regular expression which matches that character and nothing else.
        !            57: The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*},
        !            58: @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}.  Any other character
        !            59: appearing in a regular expression is ordinary, unless a @samp{\} precedes
        !            60: it.@refill
        !            61: 
        !            62: For example, @samp{f} is not a special character, so it is ordinary,
        !            63: and therefore @samp{f} is a regular expression that matches the string @samp{f}
        !            64: and no other string.  (It does @emph{not} match the string @samp{ff}.)  Likewise,
        !            65: @samp{o} is a regular expression that matches only @samp{o}.
        !            66: 
        !            67: Any two regular expressions @var{a} and @var{b} can be concatenated.
        !            68: The result is a regular expression which matches a string if @var{a}
        !            69: matches some amount of the beginning of that string and @var{b}
        !            70: matches the rest of the string.
        !            71: 
        !            72: As a simple example, we can concatenate the regular expressions
        !            73: @samp{f} and @samp{o} to get the regular expression @samp{fo},
        !            74: which matches only the string @samp{fo}.  Still trivial.
        !            75: 
        !            76: Note: for Unix compatibility, special characters are treated as
        !            77: ordinary ones if they are in contexts where their special meanings
        !            78: make no sense.  For example, @samp{*foo} treats @samp{*} as ordinary since
        !            79: there is no preceding expression on which the @samp{*} can act.
        !            80: It is poor practice to depend on this behavior; better to quote
        !            81: the special character anyway, regardless of where is appears.
        !            82: 
        !            83: 
        !            84: @node directives, emacs , syntax, top
        !            85: @comment  node-name,  next,  previous,  up
        !            86: 
        !            87: @ifinfo
        !            88: The following are the characters and character sequences which have
        !            89: special meaning within regular expressions.
        !            90: Any character not mentioned here is not special; it stands for exactly
        !            91: itself for the purposes of searching and matching.  @xref{syntax}.
        !            92: @end ifinfo
        !            93: 
        !            94: @table @samp
        !            95: @item .
        !            96: is a special character that matches anything except a newline.
        !            97: Using concatenation, we can make regular expressions like @samp{a.b} which
        !            98: matches any three-character string which begins with @samp{a} and ends with @samp{b}.@refill
        !            99: 
        !           100: @item *
        !           101: is not a construct by itself; it is a suffix, which means the preceding
        !           102: regular expression is to be repeated as many times as possible.  In @samp{fo*},
        !           103: the @samp{*} applies to the @samp{o}, so @samp{fo*} matches @samp{f} followed by any number of @samp{o}'s.@refill
        !           104: 
        !           105: The case of zero @samp{o}'s is allowed: @samp{fo*} does match @samp{f}.@refill
        !           106: 
        !           107: @samp{*} always applies to the @emph{smallest} possible preceding expression.
        !           108: Thus, @samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}.@refill
        !           109: 
        !           110: The matcher processes a @samp{*} construct by matching, immediately, as many
        !           111: repetitions as can be found.  Then it continues with the rest of the
        !           112: pattern.  If that fails, backtracking occurs, discarding some of
        !           113: the matches of the @samp{*}'d construct in case that makes it possible
        !           114: to match the rest of the pattern.  For example, matching @samp{c[ad]*ar}
        !           115: against the string @samp{caddaar}, the @samp{[ad]*} first matches @samp{addaa},
        !           116: but this does not allow the next @samp{a} in the pattern to match.
        !           117: So the last of the matches of @samp{[ad]} is undone and the following
        !           118: @samp{a} is tried again.  Now it succeeds.@refill
        !           119: 
        !           120: @item +
        !           121: @samp{+} is like @samp{*} except that at least one match for the preceding
        !           122: pattern is required for @samp{+}.  Thus, @samp{c[ad]+r} does not match
        !           123: @samp{cr} but does match anything else that @samp{c[ad]*r} would match.
        !           124: 
        !           125: @item ?
        !           126: @samp{?} is like @samp{*} except that it allows either zero or one match
        !           127: for the preceding pattern.  Thus, @samp{c[ad]?r} matches @samp{cr} or
        !           128: @samp{car} or @samp{cdr}, and nothing else.
        !           129: 
        !           130: @item [ @dots{} ]
        !           131: @samp{[} begins a @dfn{character set}, which is terminated by a @samp{]}.
        !           132: In the simplest case, the characters between the two form the set.
        !           133: Thus, @samp{[ad]} matches either @samp{a} or @samp{d},
        !           134: and @samp{[ad]*} matches any string of @samp{a}'s and @samp{d}'s
        !           135: (including the empty string), from which it follows that
        !           136: @samp{c[ad]*r} matches @samp{car}, etc.@refill
        !           137: 
        !           138: Character ranges can also be included in a character set, by writing two
        !           139: characters with a @samp{-} between them.  Thus, @samp{[a-z]} matches
        !           140: any lower-case letter.  Ranges may be intermixed freely with
        !           141: individual characters, as in @samp{[a-z$%.]}, which matches any
        !           142: lower case letter or @samp{$}, @samp{%} or period.@refill
        !           143: 
        !           144: Note that the usual special characters are not special any more inside a
        !           145: character set.  A completely different set of special characters exists
        !           146: inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
        !           147: 
        !           148: To include a @samp{]} in a character set, you must make it
        !           149: the first character.  For example, @samp{[]a]} matches @samp{]} or @samp{a}.
        !           150: To include a @samp{-}, you must use it in a context where it cannot possibly
        !           151: indicate a range: that is, as the first character, or immediately
        !           152: after a range.@refill
        !           153: 
        !           154: @item [^ @dots{} ]
        !           155: @samp{[^} begins a @dfn{complement character set}, which matches any
        !           156: character except the ones specified.  Thus, @samp{[^a-z0-9A-Z]}
        !           157: matches all characters @emph{except} letters and digits.@refill
        !           158: 
        !           159: @samp{^} is not special in a character set unless it is the first character.
        !           160: The character following the @samp{^} is treated as if it were first
        !           161: (it may be a @samp{-} or a @samp{]}).@refill
        !           162: 
        !           163: @item ^
        !           164: is a special character that matches the empty string -- but only
        !           165: if at the beginning of a line in the text being matched.  Otherwise
        !           166: it fails to match anything.  Thus, @samp{^foo} matches a @samp{foo}
        !           167: which occurs at the beginning of a line.@refill
        !           168: 
        !           169: @item $
        !           170: is similar to @samp{^} but matches only at the end of a line.
        !           171: Thus, @samp{xx*$} matches a string of one or more @samp{x}'s
        !           172: at the end of a line.@refill
        !           173: 
        !           174: @item \
        !           175: has two functions: it quotes the above special characters
        !           176: (including @samp{\}), and it introduces additional special constructs.@refill
        !           177: 
        !           178: Because @samp{\} quotes special characters, @samp{\$} is a regular
        !           179: expression which matches only @samp{$}, and @samp{\[} is a regular
        !           180: expression which matches only @samp{[}, and so on.@refill
        !           181: 
        !           182: For the most part, @samp{\} followed by any character matches only that
        !           183: character.  However, there are several exceptions: characters which, when
        !           184: preceded by @samp{\}, are special constructs.  Such characters are always
        !           185: ordinary when encountered on their own.@refill
        !           186: 
        !           187: No new special characters will ever be defined.  All extensions to
        !           188: the regular expression syntax are made by defining new two-character
        !           189: constructs that begin with @samp{\}.@refill
        !           190: 
        !           191: @item \|
        !           192: specifies an alternative.
        !           193: Two regular expressions @var{a} and @var{b} with @samp{\|} in
        !           194: between form an expression that matches anything that either @var{a} or
        !           195: @var{b} will match.@refill
        !           196: 
        !           197: Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
        !           198: but no other string.@refill
        !           199: 
        !           200: @samp{\|} applies to the largest possible surrounding expressions.  Only a
        !           201: surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
        !           202: @samp{\|}.@refill
        !           203: 
        !           204: Full backtracking capability exists when multiple @samp{\|}'s are used.@refill
        !           205: 
        !           206: @item \( @dots{} \)
        !           207: is a grouping construct that serves three purposes:
        !           208: 
        !           209: @enumerate
        !           210: @item
        !           211: To enclose a set of @samp{\|} alternatives for other operations.
        !           212: Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
        !           213: 
        !           214: @item
        !           215: To enclose a complicated expression for the postfix @samp{*} to operate on.
        !           216: Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any (zero or
        !           217: more) number of @samp{na}'s.@refill
        !           218: 
        !           219: @item
        !           220: To mark a matched substring for future reference.
        !           221: 
        !           222: @end enumerate
        !           223: 
        !           224: This last application is not a consequence of the idea of a parenthetical
        !           225: grouping; it is a separate feature which happens to be assigned as a
        !           226: second meaning to the same @samp{\( @dots{} \)} construct because there is no
        !           227: conflict in practice between the two meanings.  Here is an explanation
        !           228: of this feature:@refill
        !           229: 
        !           230: @item \@var{digit}
        !           231: After the end of a @samp{\( @dots{} \)} construct, the matcher remembers the
        !           232: beginning and end of the text matched by that construct.  Then, later on
        !           233: in the regular expression, you can use @samp{\} followed by @var{digit}
        !           234: to mean ``match the same text matched the @var{digit}'th time by the
        !           235: @samp{\( @dots{} \)} construct.''  The @samp{\( @dots{} \)} constructs
        !           236: are numbered in order of commencement in the regexp.@refill
        !           237: 
        !           238: The strings matching the first nine @samp{\( @dots{} \)} constructs appearing
        !           239: in a regular expression are assigned numbers 1 through 9 in order of their
        !           240: beginnings.
        !           241: @samp{\1} through @samp{\9} may be used to refer to the text matched by
        !           242: the corresponding @samp{\( @dots{} \)} construct.@refill
        !           243: 
        !           244: For example, @samp{\(.*\)\1} matches any string that is composed of two
        !           245: identical halves.  The @samp{\(.*\)} matches the first half, which may be
        !           246: anything, but the @samp{\1} that follows must match the same exact text.@refill
        !           247: 
        !           248: @item \b
        !           249: matches the empty string, but only if it is at the beginning or
        !           250: end of a word.  Thus, @samp{\bfoo\b} matches any occurrence of
        !           251: @samp{foo} as a separate word.  @samp{\bball\(s\|\)\b} matches
        !           252: @samp{ball} or @samp{balls} as a separate word.@refill
        !           253: 
        !           254: @item \B
        !           255: matches the empty string, provided it is @emph{not} at the beginning or
        !           256: end of a word.@refill
        !           257: 
        !           258: @item \<
        !           259: matches the empty string, but only if it is at the beginning
        !           260: of a word.
        !           261: 
        !           262: @item \>
        !           263: matches the empty string, but only if it is at the end of a word.
        !           264: 
        !           265: @item \w
        !           266: matches any word-constituent character.
        !           267: 
        !           268: @item \W
        !           269: matches any character that is not a word-constituent.
        !           270: @end table
        !           271: 
        !           272: There are a number of additional @samp{\} regexp directives available for use
        !           273: within Emacs only.
        !           274: @ifinfo 
        !           275: (@pxref{emacs}).
        !           276: @comment no need to make a tex xref to something one line down!
        !           277: @end ifinfo
        !           278: 
        !           279: @node emacs, programming, directives, top
        !           280: @comment  node-name,  next,  previous,  up
        !           281: @subsection Constructs Available in Emacs Only
        !           282: 
        !           283: @table @samp
        !           284: @item \`
        !           285: matches the empty string, but only if it is at the beginning
        !           286: of the buffer.@refill
        !           287: 
        !           288: @item \'
        !           289: matches the empty string, but only if it is at the end of
        !           290: the buffer.@refill
        !           291: 
        !           292: @item \s@var{code}
        !           293: matches any character whose syntax is @var{code}.
        !           294: @var{code} is a letter which represents a syntax code:
        !           295: thus, @samp{w} for word constituent, @samp{-} for
        !           296: whitespace, @samp{(} for open-parenthesis, etc.
        !           297: See the documentation for the Emacs function @samp{modify-syntax-entry}
        !           298: for further details.@refill
        !           299: 
        !           300: Thus, @samp{\s(} matches any character with open-parenthesis syntax.
        !           301: 
        !           302: @item \S@var{code}
        !           303: matches any character whose syntax is not @var{code}.
        !           304: @end table
        !           305: 
        !           306: @node programming, compiling, emacs, top
        !           307: @comment  node-name,  next,  previous,  up
        !           308: @section Programming using the @code{regex} library
        !           309: 
        !           310: @ifinfo
        !           311: The subnodes accessible from this menu give information on entry
        !           312: points and data structures which C programs need to interface to the
        !           313: @code{regex} library.
        !           314: @end ifinfo
        !           315: 
        !           316: @menu
        !           317: * compiling::  How to compile regular expressions
        !           318: * matching::   Matching compiled regular expressions
        !           319: * searching::  Searching for compiled regular expressions
        !           320: * translation::        Translating characters into other characters
        !           321:                  (for both compilation and matching)
        !           322: * registers::  determining what was matched
        !           323: * split::      matching data which is split into two pieces
        !           324: * unix::       Unix-compatible entry-points to regex library
        !           325: @end menu
        !           326: 
        !           327: @node compiling, matching, programming , programming
        !           328: @comment  node-name,  next,  previous,  up
        !           329: @subsection Compiling a Regular Expression
        !           330: 
        !           331: To compile a regular expression, you must supply a pattern buffer.
        !           332: This is a structure defined, in the include file @file{regex.h}, as follows:
        !           333:     
        !           334: @example
        !           335: struct re_pattern_buffer
        !           336:   @{
        !           337:     char *buffer   /* Space holding the compiled pattern commands. */
        !           338:     int allocated  /* Size of space that  buffer  points to */
        !           339:     int used       /* Length of portion of buffer actually occupied */
        !           340:     char *fastmap; /* Pointer to fastmap, if any, or zero if none. */
        !           341:                    /* re_search uses the fastmap, if there is one,
        !           342:                       to skip quickly over totally implausible
        !           343:                       characters */
        !           344:     char *translate;
        !           345:                    /* Translate table to apply to characters before
        !           346:                       comparing, or zero for no translation.
        !           347:                       The translation is applied to a pattern when
        !           348:                       it is compiled and to data when it is matched. */
        !           349:     char fastmap_accurate;
        !           350:                    /* Set to zero when a new pattern is stored,
        !           351:                       set to one when the fastmap is updated from it. */
        !           352:   @};
        !           353: @end example
        !           354: 
        !           355: Before compiling a pattern, you must initialize the @code{buffer} field to
        !           356: point to a block of memory obtained with @code{malloc},
        !           357: and the @code{allocated} field to the size of that block, in bytes.
        !           358: The pattern compiler will replace this block with a larger one if necessary.
        !           359: 
        !           360: You must also initialize the @code{translate} field to point to the translate
        !           361: table that you will use when you match the compiled pattern, or to zero
        !           362: if you will use no translate table when you match.  @xref{translation}.
        !           363: 
        !           364: Then call @code{re_compile_pattern} to compile a regular expression
        !           365: into the buffer:
        !           366: @example
        !           367: re_compile_pattern (@var{regex}, @var{regex_size}, @var{buf})
        !           368: @end example
        !           369: 
        !           370: @var{regex} is the address of the regular expression (@code{char *}),
        !           371: @var{regex_size} is its length (@code{int}),
        !           372: @var{buf} is the address of the buffer (@code{struct re_pattern_buffer *}).
        !           373: 
        !           374: @code{re_compile_pattern} returns zero if it succeeds in compiling the regular
        !           375: expression.  In that case, @code{*buf} now contains the results.
        !           376: Otherwise, @code{re_compile_pattern} returns a string which serves as
        !           377: an error message.
        !           378: 
        !           379: After compiling, if you wish to search for the pattern, you must
        !           380: initialize the @code{fastmap} component of the pattern buffer.
        !           381: @xref{searching}.
        !           382: 
        !           383: @node matching, searching, compiling, programming
        !           384: @comment  node-name,  next,  previous,  up
        !           385: @subsection Matching a Compiled Pattern
        !           386: 
        !           387: Once a regular expression has been compiled into a pattern buffer,
        !           388: you can match the pattern buffer against a string with @code{re_match}.
        !           389: 
        !           390: @example
        !           391: re_match (@var{buf}, @var{string}, @var{size}, @var{pos}, @var{regs})
        !           392: @end example
        !           393: 
        !           394: @var{buf} is, once again, the address of the buffer (@code{struct re_pattern_buffer *}).
        !           395: @var{string} is the string to be matched (@code{char *}).
        !           396: @var{size} is the length of that string (@code{int}).
        !           397: @var{pos} is the position within the string at which to begin matching (@code{int}).
        !           398: The beginning of the string is position 0.
        !           399: @var{regs} is described below.  Normally it is zero.  @xref{registers}.
        !           400: 
        !           401: @code{re_match} returns @code{-1} if the pattern does not match; otherwise,
        !           402: it returns the length of the portion of @code{string} which was matched.
        !           403: 
        !           404: For example, suppose that @var{buf} points to a buffer containing the result
        !           405: of compiling @code{x*}, @var{string} points to @code{xxxxxy}, and @var{size} is @code{6}.
        !           406: Suppose that @var{pos} is @code{2}.  Then the last three @code{x}'s will be matched,
        !           407: so @code{re_match} will return @code{3}.
        !           408: If @var{pos} is zero, the value will be @code{5}.
        !           409: If @var{pos} is @code{5} or @code{6}, the value will be zero, meaning that the null string
        !           410: was successfully matched.
        !           411: Note that since @code{x*} matches the empty string, it will never entirely fail.
        !           412: 
        !           413: It is up to the caller to avoid passing a value of @var{pos} that results in
        !           414: matching outside the specified string.  @var{pos} must not be negative and
        !           415: must not be greater than @var{size}.
        !           416: 
        !           417: @node searching, translation, matching, programming
        !           418: @comment  node-name,  next,  previous,  up
        !           419: @subsection Searching for a Match
        !           420: 
        !           421: Searching means trying successive starting positions for a match until a
        !           422: match is found.  To search, you supply a compiled pattern buffer.  Before
        !           423: searching you must initialize the @code{fastmap} field of the pattern
        !           424: buffer (see below).
        !           425: 
        !           426: @example
        !           427: re_search (@var{buf}, @var{string}, @var{size}, @var{startpos}, @var{range}, @var{regs})
        !           428: @end example
        !           429: 
        !           430: @noindent
        !           431: is called like @code{re_match} except that the @var{pos} argument is
        !           432: replaced by two arguments @var{startpos} and @var{range}.  @code{re_search}
        !           433: tests for a match starting at index @var{startpos}, then at
        !           434: @code{@var{startpos} + 1}, and so on.  It tries @var{range} consecutive
        !           435: positions before giving up and returning @code{-1}.  If a match is found,
        !           436: @code{re_search} returns the index at which the match was found.@refill
        !           437: 
        !           438: If @var{range} is negative, @var{re_search} tries starting positions
        !           439: @var{startpos}, @code{@var{startpos} - 1}, @dots{} in that order.
        !           440: @code{|@var{range}|} is the number of tries made.@refill
        !           441: 
        !           442: It is up to the caller to avoid passing value of @var{startpos} and
        !           443: @var{range} that result in matching outside the specified string.
        !           444: @var{startpos} must be between zero and @var{size}, inclusive, and so must
        !           445: @code{@var{startpos} + @var{range} - 1} (if @var{range} is positive) or
        !           446: @code{@var{startpos} + @var{range} + 1} (if @var{range} is negative).@refill
        !           447: 
        !           448: If you may be searching over a long distance (that is, trying many
        !           449: different match starting points) with a compiled pattern, you should use a
        !           450: @dfn{fastmap} in it.  This is a block of 256 bytes, whose address is
        !           451: placed in the @code{fastmap} component of the pattern buffer.  The first
        !           452: time you search for a particular compiled pattern, the fastmap is set so
        !           453: that @code{@var{fastmap}[@var{ch}]} is nonzero if the character @var{ch}
        !           454: might possibly start a match for this pattern.  @code{re_search} checks
        !           455: each character against the fastmap so that it can skip more quickly over
        !           456: non-matches.
        !           457: 
        !           458: If you do not want a fastmap, store zero in the @code{fastmap} component of the
        !           459: pattern buffer before calling @code{re_search}.
        !           460: 
        !           461: In either case, you must initialize this component in a pattern buffer
        !           462: before you can use that buffer in a search; but you can choose as an
        !           463: initial value either zero or the address of a suitable block of memory.
        !           464: 
        !           465: If you compile a new pattern in an existing pattern buffer, it is not
        !           466: necessary to reinitialize the @code{fastmap} component (unless you
        !           467: wish to override your previous choice).
        !           468: 
        !           469: @node translation, registers, searching, programming
        !           470: @comment  node-name,  next,  previous,  up
        !           471: @subsection Translate Tables
        !           472: 
        !           473: With a translate table, you can apply a transformation to all characters
        !           474: before they are compared.  For example, a table that maps lower case letters
        !           475: into upper case (or vice versa) causes differences in case to be ignored
        !           476: by matching.
        !           477: 
        !           478: A translate table is a block of 256 bytes.  Each character of raw data is
        !           479: used as an index in the translate table.  The value found there is used
        !           480: instead of the original character.  Each character in a regular
        !           481: expression, except for the syntactic constructs, is translated when the
        !           482: expression is compiled.  Each character of a string being matched is
        !           483: translated whenever it is compared or tested.
        !           484: 
        !           485: A suitable translate table to ignore differences in case maps all
        !           486: characters into themselves, except for lower case letters, which are
        !           487: mapped into the corresponding upper case letters.
        !           488: It could be initialized by:
        !           489: 
        !           490: @example
        !           491: for (i = 0; i < 0400; i++)
        !           492:   table[i] = i;
        !           493: for (i = 'a'; i <= 'z'; i++)
        !           494:   table[i] = i - 040;
        !           495: @end example
        !           496: 
        !           497: You specify the use of a translate table by putting its address in the
        !           498: @var{translate} component of the compiled pattern buffer.  If this component
        !           499: is zero, no translation is done.  Since both compilation and matching use
        !           500: the translate table, you must use the same table contents for both
        !           501: operations or confusing things will happen.
        !           502: 
        !           503: @node registers, split, translation, programming
        !           504: @comment  node-name,  next,  previous,  up
        !           505: @subsection Registers: or ``What Did the @samp{\( @dots{} \)} Groupings Actually Match?''
        !           506: 
        !           507: If you want to find out, after the match, what each of the first nine
        !           508: @samp{\( @dots{} \)} groupings actually matched, you can pass the @var{regs} argument
        !           509: to the match or search function.  Pass the address of a structure of this type:
        !           510: 
        !           511: @example
        !           512: struct re_registers
        !           513:   @{
        !           514:     int start[RE_NREGS];
        !           515:     int end[RE_NREGS];
        !           516:   @};
        !           517: @end example
        !           518: 
        !           519:   @code{re_match} and @code{re_search} will store into this structure the
        !           520: data you want.  @code{@var{regs}->start[@var{reg}]} will be the index in
        !           521: @var{string} of the beginning of the data matched by the @var{reg}'th
        !           522: @samp{\( @dots{} \)} grouping, and @code{@var{regs}->end[@var{reg}]} will
        !           523: be the index of the end of that data (the index of the first character
        !           524: beyond those matched).  The values in the start and end arrays at
        !           525: indexes greater than the number of @samp{\( @dots{} \)} groupings
        !           526: present in the regular expression will be set to the value -1.  Register
        !           527: numbers start at 1 and run to @code{RE_NREGS - 1} (normally @code{9}).
        !           528: @code{@var{regs}->start[0]} and @code{@var{regs}->end[0]} are similar but
        !           529: describe the extent of the substring matched by the entire pattern.@refill
        !           530: 
        !           531:   Both @code{struct re_registers} and @code{RE_NREGS} are defined in @file{regex.h}.
        !           532: 
        !           533: @node split, unix, registers, programming
        !           534: @comment  node-name,  next,  previous,  up
        !           535: @subsection Matching against Split Data
        !           536: 
        !           537: The functions @code{re_match_2} and @code{re_search_2} allow one to match in or search
        !           538: data which is divided into two strings.
        !           539: 
        !           540: @code{re_match_2} works like @code{re_match} except that two data strings and
        !           541: sizes must be given.
        !           542: 
        !           543: @example
        !           544: re_match_2 (@var{buf}, @var{string1}, @var{size1}, @var{string2}, @var{size2}, @var{pos}, @var{regs})
        !           545: @end example
        !           546: 
        !           547: The matcher regards the contents of @var{string1} as effectively followed by
        !           548: the contents of @var{string2}, and matches the combined string against the
        !           549: pattern in @var{buf}.
        !           550: 
        !           551: @code{re_search_2} is likewise similar to @code{re_search}:
        !           552: 
        !           553: @example
        !           554: re_search_2 (@var{buf}, @var{string1}, @var{size1}, @var{string2}, @var{size2}, @var{startpos}, @var{range}, @var{regs})
        !           555: @end example
        !           556: 
        !           557: The value returned by @var{re_search_2} is an index into the combined data
        !           558: made up of @var{string1} and @var{string2}.  It never exceeds @code{@var{size1} + @var{size2}}.
        !           559: The values returned in the @var{regs} structure (if there is one) are likewise
        !           560: indices in the combined data.
        !           561: 
        !           562: @node unix, , split, programming
        !           563: @comment  node-name,  next,  previous,  up
        !           564: @subsection Unix-Compatible Entry Points
        !           565: 
        !           566: The standard Berkeley Unix way to compile a regular expression is to call
        !           567: @code{re_comp}.  This function takes a single argument, the address of the
        !           568: regular expression, which is assumed to be terminated by a null character.
        !           569: 
        !           570: @code{re_comp} does not ask you to specify a pattern buffer because it has its
        !           571: own pattern buffer --- just one.  Using @code{re_comp}, one may match only the
        !           572: most recently compiled regular expression.
        !           573: 
        !           574: The value of @code{re_comp} is zero for success or else an error message string,
        !           575: as for @code{re_compile_pattern}.
        !           576: 
        !           577: Calling @code{re_comp} with the null string as argument it has no effect;
        !           578: the contents of the buffer remain unchanged.
        !           579: 
        !           580: The standard Berkeley Unix way to match the last regular expression compiled
        !           581: is to call @code{re_exec}.  This takes a single argument, the address of
        !           582: the string to be matched.  This string is assumed to be terminated by
        !           583: a null character.  Matching is tried starting at each position in the
        !           584: string.  @code{re_exec} returns @code{1} for success or @code{0} for failure.
        !           585: One cannot find out how long a substring was matched, nor what the
        !           586: @samp{\( @dots{} \)} groupings matched.
        !           587: 
        !           588: @bye

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.