|
|
1.1 ! root 1: \input texinfo ! 2: @comment -*- Mode: texinfo -*- ! 3: @comment This documents the GNU regex library ! 4: @setfilename regex ! 5: ! 6: @comment >> @code{"foo"} for literal strings vs @b{"foo"} vs @code{foo} ! 7: @comment >> (this file is presently using the last --- it looks ok in ! 8: @comment >> info; wait to see what it looks like under botex) ! 9: ! 10: ! 11: @comment >> superior of (dir) is temporary ! 12: @node top, syntax, , (dir) ! 13: @comment node-name, next, previous, up ! 14: @chapter @dfn{regex} regular expression matching library. ! 15: ! 16: @section Overview ! 17: ! 18: Regular expression matching allows you to test whether a string fits ! 19: into a specific syntactic shape. You can also search a string for a ! 20: substring that fits a pattern. ! 21: ! 22: A regular expression describes a set of strings. The simplest case is ! 23: one that describes a particular string; for example, the string @samp{foo} ! 24: when regarded as a regular expression matches @samp{foo} and nothing else. ! 25: Nontrivial regular expressions use certain special constructs so that ! 26: they can match more than one string. For example, the regular expression ! 27: @samp{foo\|bar} matches either the string @samp{foo} or the string @samp{bar}; the ! 28: regular expression @samp{c[ad]*r} matches any of the strings @samp{cr}, @samp{car}, ! 29: @samp{cdr}, @samp{caar}, @samp{cadddar} and all other such strings with any number of ! 30: @samp{a}'s and @samp{d}'s. ! 31: ! 32: The first step in matching a regular expression is to compile it. ! 33: You must supply the pattern string and also a pattern buffer to hold ! 34: the compiled result. That result contains the pattern in an internal ! 35: format that is easier to use in matching. ! 36: ! 37: Having compiled a pattern, you can match it against strings. You can ! 38: match the compiled pattern any number of times against different ! 39: strings. ! 40: ! 41: @menu ! 42: * syntax:: Syntax of regular expressions ! 43: * directives:: Meaning of characters as regex string directives. ! 44: * emacs:: Additional character directives available ! 45: only for use within Emacs. ! 46: * programming:: Using the regex library from C programs ! 47: * unix:: Unix-compatible entry-points to regex library ! 48: @end menu ! 49: ! 50: @node syntax, directives, top, top ! 51: @comment node-name, next, previous, up ! 52: @section Syntax of Regular Expressions ! 53: ! 54: Regular expressions have a syntax in which a few characters are special ! 55: constructs and the rest are @dfn{ordinary}. An ordinary character is a ! 56: simple regular expression which matches that character and nothing else. ! 57: The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*}, ! 58: @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}. Any other character ! 59: appearing in a regular expression is ordinary, unless a @samp{\} precedes ! 60: it.@refill ! 61: ! 62: For example, @samp{f} is not a special character, so it is ordinary, ! 63: and therefore @samp{f} is a regular expression that matches the string @samp{f} ! 64: and no other string. (It does @emph{not} match the string @samp{ff}.) Likewise, ! 65: @samp{o} is a regular expression that matches only @samp{o}. ! 66: ! 67: Any two regular expressions @var{a} and @var{b} can be concatenated. ! 68: The result is a regular expression which matches a string if @var{a} ! 69: matches some amount of the beginning of that string and @var{b} ! 70: matches the rest of the string. ! 71: ! 72: As a simple example, we can concatenate the regular expressions ! 73: @samp{f} and @samp{o} to get the regular expression @samp{fo}, ! 74: which matches only the string @samp{fo}. Still trivial. ! 75: ! 76: Note: for Unix compatibility, special characters are treated as ! 77: ordinary ones if they are in contexts where their special meanings ! 78: make no sense. For example, @samp{*foo} treats @samp{*} as ordinary since ! 79: there is no preceding expression on which the @samp{*} can act. ! 80: It is poor practice to depend on this behavior; better to quote ! 81: the special character anyway, regardless of where is appears. ! 82: ! 83: ! 84: @node directives, emacs , syntax, top ! 85: @comment node-name, next, previous, up ! 86: ! 87: @ifinfo ! 88: The following are the characters and character sequences which have ! 89: special meaning within regular expressions. ! 90: Any character not mentioned here is not special; it stands for exactly ! 91: itself for the purposes of searching and matching. @xref{syntax}. ! 92: @end ifinfo ! 93: ! 94: @table @samp ! 95: @item . ! 96: is a special character that matches anything except a newline. ! 97: Using concatenation, we can make regular expressions like @samp{a.b} which ! 98: matches any three-character string which begins with @samp{a} and ends with @samp{b}.@refill ! 99: ! 100: @item * ! 101: is not a construct by itself; it is a suffix, which means the preceding ! 102: regular expression is to be repeated as many times as possible. In @samp{fo*}, ! 103: the @samp{*} applies to the @samp{o}, so @samp{fo*} matches @samp{f} followed by any number of @samp{o}'s.@refill ! 104: ! 105: The case of zero @samp{o}'s is allowed: @samp{fo*} does match @samp{f}.@refill ! 106: ! 107: @samp{*} always applies to the @emph{smallest} possible preceding expression. ! 108: Thus, @samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}.@refill ! 109: ! 110: The matcher processes a @samp{*} construct by matching, immediately, as many ! 111: repetitions as can be found. Then it continues with the rest of the ! 112: pattern. If that fails, backtracking occurs, discarding some of ! 113: the matches of the @samp{*}'d construct in case that makes it possible ! 114: to match the rest of the pattern. For example, matching @samp{c[ad]*ar} ! 115: against the string @samp{caddaar}, the @samp{[ad]*} first matches @samp{addaa}, ! 116: but this does not allow the next @samp{a} in the pattern to match. ! 117: So the last of the matches of @samp{[ad]} is undone and the following ! 118: @samp{a} is tried again. Now it succeeds.@refill ! 119: ! 120: @item + ! 121: @samp{+} is like @samp{*} except that at least one match for the preceding ! 122: pattern is required for @samp{+}. Thus, @samp{c[ad]+r} does not match ! 123: @samp{cr} but does match anything else that @samp{c[ad]*r} would match. ! 124: ! 125: @item ? ! 126: @samp{?} is like @samp{*} except that it allows either zero or one match ! 127: for the preceding pattern. Thus, @samp{c[ad]?r} matches @samp{cr} or ! 128: @samp{car} or @samp{cdr}, and nothing else. ! 129: ! 130: @item [ @dots{} ] ! 131: @samp{[} begins a @dfn{character set}, which is terminated by a @samp{]}. ! 132: In the simplest case, the characters between the two form the set. ! 133: Thus, @samp{[ad]} matches either @samp{a} or @samp{d}, ! 134: and @samp{[ad]*} matches any string of @samp{a}'s and @samp{d}'s ! 135: (including the empty string), from which it follows that ! 136: @samp{c[ad]*r} matches @samp{car}, etc.@refill ! 137: ! 138: Character ranges can also be included in a character set, by writing two ! 139: characters with a @samp{-} between them. Thus, @samp{[a-z]} matches ! 140: any lower-case letter. Ranges may be intermixed freely with ! 141: individual characters, as in @samp{[a-z$%.]}, which matches any ! 142: lower case letter or @samp{$}, @samp{%} or period.@refill ! 143: ! 144: Note that the usual special characters are not special any more inside a ! 145: character set. A completely different set of special characters exists ! 146: inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill ! 147: ! 148: To include a @samp{]} in a character set, you must make it ! 149: the first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. ! 150: To include a @samp{-}, you must use it in a context where it cannot possibly ! 151: indicate a range: that is, as the first character, or immediately ! 152: after a range.@refill ! 153: ! 154: @item [^ @dots{} ] ! 155: @samp{[^} begins a @dfn{complement character set}, which matches any ! 156: character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} ! 157: matches all characters @emph{except} letters and digits.@refill ! 158: ! 159: @samp{^} is not special in a character set unless it is the first character. ! 160: The character following the @samp{^} is treated as if it were first ! 161: (it may be a @samp{-} or a @samp{]}).@refill ! 162: ! 163: @item ^ ! 164: is a special character that matches the empty string -- but only ! 165: if at the beginning of a line in the text being matched. Otherwise ! 166: it fails to match anything. Thus, @samp{^foo} matches a @samp{foo} ! 167: which occurs at the beginning of a line.@refill ! 168: ! 169: @item $ ! 170: is similar to @samp{^} but matches only at the end of a line. ! 171: Thus, @samp{xx*$} matches a string of one or more @samp{x}'s ! 172: at the end of a line.@refill ! 173: ! 174: @item \ ! 175: has two functions: it quotes the above special characters ! 176: (including @samp{\}), and it introduces additional special constructs.@refill ! 177: ! 178: Because @samp{\} quotes special characters, @samp{\$} is a regular ! 179: expression which matches only @samp{$}, and @samp{\[} is a regular ! 180: expression which matches only @samp{[}, and so on.@refill ! 181: ! 182: For the most part, @samp{\} followed by any character matches only that ! 183: character. However, there are several exceptions: characters which, when ! 184: preceded by @samp{\}, are special constructs. Such characters are always ! 185: ordinary when encountered on their own.@refill ! 186: ! 187: No new special characters will ever be defined. All extensions to ! 188: the regular expression syntax are made by defining new two-character ! 189: constructs that begin with @samp{\}.@refill ! 190: ! 191: @item \| ! 192: specifies an alternative. ! 193: Two regular expressions @var{a} and @var{b} with @samp{\|} in ! 194: between form an expression that matches anything that either @var{a} or ! 195: @var{b} will match.@refill ! 196: ! 197: Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar} ! 198: but no other string.@refill ! 199: ! 200: @samp{\|} applies to the largest possible surrounding expressions. Only a ! 201: surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of ! 202: @samp{\|}.@refill ! 203: ! 204: Full backtracking capability exists when multiple @samp{\|}'s are used.@refill ! 205: ! 206: @item \( @dots{} \) ! 207: is a grouping construct that serves three purposes: ! 208: ! 209: @enumerate ! 210: @item ! 211: To enclose a set of @samp{\|} alternatives for other operations. ! 212: Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}. ! 213: ! 214: @item ! 215: To enclose a complicated expression for the postfix @samp{*} to operate on. ! 216: Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any (zero or ! 217: more) number of @samp{na}'s.@refill ! 218: ! 219: @item ! 220: To mark a matched substring for future reference. ! 221: ! 222: @end enumerate ! 223: ! 224: This last application is not a consequence of the idea of a parenthetical ! 225: grouping; it is a separate feature which happens to be assigned as a ! 226: second meaning to the same @samp{\( @dots{} \)} construct because there is no ! 227: conflict in practice between the two meanings. Here is an explanation ! 228: of this feature:@refill ! 229: ! 230: @item \@var{digit} ! 231: After the end of a @samp{\( @dots{} \)} construct, the matcher remembers the ! 232: beginning and end of the text matched by that construct. Then, later on ! 233: in the regular expression, you can use @samp{\} followed by @var{digit} ! 234: to mean ``match the same text matched the @var{digit}'th time by the ! 235: @samp{\( @dots{} \)} construct.'' The @samp{\( @dots{} \)} constructs ! 236: are numbered in order of commencement in the regexp.@refill ! 237: ! 238: The strings matching the first nine @samp{\( @dots{} \)} constructs appearing ! 239: in a regular expression are assigned numbers 1 through 9 in order of their ! 240: beginnings. ! 241: @samp{\1} through @samp{\9} may be used to refer to the text matched by ! 242: the corresponding @samp{\( @dots{} \)} construct.@refill ! 243: ! 244: For example, @samp{\(.*\)\1} matches any string that is composed of two ! 245: identical halves. The @samp{\(.*\)} matches the first half, which may be ! 246: anything, but the @samp{\1} that follows must match the same exact text.@refill ! 247: ! 248: @item \b ! 249: matches the empty string, but only if it is at the beginning or ! 250: end of a word. Thus, @samp{\bfoo\b} matches any occurrence of ! 251: @samp{foo} as a separate word. @samp{\bball\(s\|\)\b} matches ! 252: @samp{ball} or @samp{balls} as a separate word.@refill ! 253: ! 254: @item \B ! 255: matches the empty string, provided it is @emph{not} at the beginning or ! 256: end of a word.@refill ! 257: ! 258: @item \< ! 259: matches the empty string, but only if it is at the beginning ! 260: of a word. ! 261: ! 262: @item \> ! 263: matches the empty string, but only if it is at the end of a word. ! 264: ! 265: @item \w ! 266: matches any word-constituent character. ! 267: ! 268: @item \W ! 269: matches any character that is not a word-constituent. ! 270: @end table ! 271: ! 272: There are a number of additional @samp{\} regexp directives available for use ! 273: within Emacs only. ! 274: @ifinfo ! 275: (@pxref{emacs}). ! 276: @comment no need to make a tex xref to something one line down! ! 277: @end ifinfo ! 278: ! 279: @node emacs, programming, directives, top ! 280: @comment node-name, next, previous, up ! 281: @subsection Constructs Available in Emacs Only ! 282: ! 283: @table @samp ! 284: @item \` ! 285: matches the empty string, but only if it is at the beginning ! 286: of the buffer.@refill ! 287: ! 288: @item \' ! 289: matches the empty string, but only if it is at the end of ! 290: the buffer.@refill ! 291: ! 292: @item \s@var{code} ! 293: matches any character whose syntax is @var{code}. ! 294: @var{code} is a letter which represents a syntax code: ! 295: thus, @samp{w} for word constituent, @samp{-} for ! 296: whitespace, @samp{(} for open-parenthesis, etc. ! 297: See the documentation for the Emacs function @samp{modify-syntax-entry} ! 298: for further details.@refill ! 299: ! 300: Thus, @samp{\s(} matches any character with open-parenthesis syntax. ! 301: ! 302: @item \S@var{code} ! 303: matches any character whose syntax is not @var{code}. ! 304: @end table ! 305: ! 306: @node programming, compiling, emacs, top ! 307: @comment node-name, next, previous, up ! 308: @section Programming using the @code{regex} library ! 309: ! 310: @ifinfo ! 311: The subnodes accessible from this menu give information on entry ! 312: points and data structures which C programs need to interface to the ! 313: @code{regex} library. ! 314: @end ifinfo ! 315: ! 316: @menu ! 317: * compiling:: How to compile regular expressions ! 318: * matching:: Matching compiled regular expressions ! 319: * searching:: Searching for compiled regular expressions ! 320: * translation:: Translating characters into other characters ! 321: (for both compilation and matching) ! 322: * registers:: determining what was matched ! 323: * split:: matching data which is split into two pieces ! 324: * unix:: Unix-compatible entry-points to regex library ! 325: @end menu ! 326: ! 327: @node compiling, matching, programming , programming ! 328: @comment node-name, next, previous, up ! 329: @subsection Compiling a Regular Expression ! 330: ! 331: To compile a regular expression, you must supply a pattern buffer. ! 332: This is a structure defined, in the include file @file{regex.h}, as follows: ! 333: ! 334: @example ! 335: struct re_pattern_buffer ! 336: @{ ! 337: char *buffer /* Space holding the compiled pattern commands. */ ! 338: int allocated /* Size of space that buffer points to */ ! 339: int used /* Length of portion of buffer actually occupied */ ! 340: char *fastmap; /* Pointer to fastmap, if any, or zero if none. */ ! 341: /* re_search uses the fastmap, if there is one, ! 342: to skip quickly over totally implausible ! 343: characters */ ! 344: char *translate; ! 345: /* Translate table to apply to characters before ! 346: comparing, or zero for no translation. ! 347: The translation is applied to a pattern when ! 348: it is compiled and to data when it is matched. */ ! 349: char fastmap_accurate; ! 350: /* Set to zero when a new pattern is stored, ! 351: set to one when the fastmap is updated from it. */ ! 352: @}; ! 353: @end example ! 354: ! 355: Before compiling a pattern, you must initialize the @code{buffer} field to ! 356: point to a block of memory obtained with @code{malloc}, ! 357: and the @code{allocated} field to the size of that block, in bytes. ! 358: The pattern compiler will replace this block with a larger one if necessary. ! 359: ! 360: You must also initialize the @code{translate} field to point to the translate ! 361: table that you will use when you match the compiled pattern, or to zero ! 362: if you will use no translate table when you match. @xref{translation}. ! 363: ! 364: Then call @code{re_compile_pattern} to compile a regular expression ! 365: into the buffer: ! 366: @example ! 367: re_compile_pattern (@var{regex}, @var{regex_size}, @var{buf}) ! 368: @end example ! 369: ! 370: @var{regex} is the address of the regular expression (@code{char *}), ! 371: @var{regex_size} is its length (@code{int}), ! 372: @var{buf} is the address of the buffer (@code{struct re_pattern_buffer *}). ! 373: ! 374: @code{re_compile_pattern} returns zero if it succeeds in compiling the regular ! 375: expression. In that case, @code{*buf} now contains the results. ! 376: Otherwise, @code{re_compile_pattern} returns a string which serves as ! 377: an error message. ! 378: ! 379: After compiling, if you wish to search for the pattern, you must ! 380: initialize the @code{fastmap} component of the pattern buffer. ! 381: @xref{searching}. ! 382: ! 383: @node matching, searching, compiling, programming ! 384: @comment node-name, next, previous, up ! 385: @subsection Matching a Compiled Pattern ! 386: ! 387: Once a regular expression has been compiled into a pattern buffer, ! 388: you can match the pattern buffer against a string with @code{re_match}. ! 389: ! 390: @example ! 391: re_match (@var{buf}, @var{string}, @var{size}, @var{pos}, @var{regs}) ! 392: @end example ! 393: ! 394: @var{buf} is, once again, the address of the buffer (@code{struct re_pattern_buffer *}). ! 395: @var{string} is the string to be matched (@code{char *}). ! 396: @var{size} is the length of that string (@code{int}). ! 397: @var{pos} is the position within the string at which to begin matching (@code{int}). ! 398: The beginning of the string is position 0. ! 399: @var{regs} is described below. Normally it is zero. @xref{registers}. ! 400: ! 401: @code{re_match} returns @code{-1} if the pattern does not match; otherwise, ! 402: it returns the length of the portion of @code{string} which was matched. ! 403: ! 404: For example, suppose that @var{buf} points to a buffer containing the result ! 405: of compiling @code{x*}, @var{string} points to @code{xxxxxy}, and @var{size} is @code{6}. ! 406: Suppose that @var{pos} is @code{2}. Then the last three @code{x}'s will be matched, ! 407: so @code{re_match} will return @code{3}. ! 408: If @var{pos} is zero, the value will be @code{5}. ! 409: If @var{pos} is @code{5} or @code{6}, the value will be zero, meaning that the null string ! 410: was successfully matched. ! 411: Note that since @code{x*} matches the empty string, it will never entirely fail. ! 412: ! 413: It is up to the caller to avoid passing a value of @var{pos} that results in ! 414: matching outside the specified string. @var{pos} must not be negative and ! 415: must not be greater than @var{size}. ! 416: ! 417: @node searching, translation, matching, programming ! 418: @comment node-name, next, previous, up ! 419: @subsection Searching for a Match ! 420: ! 421: Searching means trying successive starting positions for a match until a ! 422: match is found. To search, you supply a compiled pattern buffer. Before ! 423: searching you must initialize the @code{fastmap} field of the pattern ! 424: buffer (see below). ! 425: ! 426: @example ! 427: re_search (@var{buf}, @var{string}, @var{size}, @var{startpos}, @var{range}, @var{regs}) ! 428: @end example ! 429: ! 430: @noindent ! 431: is called like @code{re_match} except that the @var{pos} argument is ! 432: replaced by two arguments @var{startpos} and @var{range}. @code{re_search} ! 433: tests for a match starting at index @var{startpos}, then at ! 434: @code{@var{startpos} + 1}, and so on. It tries @var{range} consecutive ! 435: positions before giving up and returning @code{-1}. If a match is found, ! 436: @code{re_search} returns the index at which the match was found.@refill ! 437: ! 438: If @var{range} is negative, @var{re_search} tries starting positions ! 439: @var{startpos}, @code{@var{startpos} - 1}, @dots{} in that order. ! 440: @code{|@var{range}|} is the number of tries made.@refill ! 441: ! 442: It is up to the caller to avoid passing value of @var{startpos} and ! 443: @var{range} that result in matching outside the specified string. ! 444: @var{startpos} must be between zero and @var{size}, inclusive, and so must ! 445: @code{@var{startpos} + @var{range} - 1} (if @var{range} is positive) or ! 446: @code{@var{startpos} + @var{range} + 1} (if @var{range} is negative).@refill ! 447: ! 448: If you may be searching over a long distance (that is, trying many ! 449: different match starting points) with a compiled pattern, you should use a ! 450: @dfn{fastmap} in it. This is a block of 256 bytes, whose address is ! 451: placed in the @code{fastmap} component of the pattern buffer. The first ! 452: time you search for a particular compiled pattern, the fastmap is set so ! 453: that @code{@var{fastmap}[@var{ch}]} is nonzero if the character @var{ch} ! 454: might possibly start a match for this pattern. @code{re_search} checks ! 455: each character against the fastmap so that it can skip more quickly over ! 456: non-matches. ! 457: ! 458: If you do not want a fastmap, store zero in the @code{fastmap} component of the ! 459: pattern buffer before calling @code{re_search}. ! 460: ! 461: In either case, you must initialize this component in a pattern buffer ! 462: before you can use that buffer in a search; but you can choose as an ! 463: initial value either zero or the address of a suitable block of memory. ! 464: ! 465: If you compile a new pattern in an existing pattern buffer, it is not ! 466: necessary to reinitialize the @code{fastmap} component (unless you ! 467: wish to override your previous choice). ! 468: ! 469: @node translation, registers, searching, programming ! 470: @comment node-name, next, previous, up ! 471: @subsection Translate Tables ! 472: ! 473: With a translate table, you can apply a transformation to all characters ! 474: before they are compared. For example, a table that maps lower case letters ! 475: into upper case (or vice versa) causes differences in case to be ignored ! 476: by matching. ! 477: ! 478: A translate table is a block of 256 bytes. Each character of raw data is ! 479: used as an index in the translate table. The value found there is used ! 480: instead of the original character. Each character in a regular ! 481: expression, except for the syntactic constructs, is translated when the ! 482: expression is compiled. Each character of a string being matched is ! 483: translated whenever it is compared or tested. ! 484: ! 485: A suitable translate table to ignore differences in case maps all ! 486: characters into themselves, except for lower case letters, which are ! 487: mapped into the corresponding upper case letters. ! 488: It could be initialized by: ! 489: ! 490: @example ! 491: for (i = 0; i < 0400; i++) ! 492: table[i] = i; ! 493: for (i = 'a'; i <= 'z'; i++) ! 494: table[i] = i - 040; ! 495: @end example ! 496: ! 497: You specify the use of a translate table by putting its address in the ! 498: @var{translate} component of the compiled pattern buffer. If this component ! 499: is zero, no translation is done. Since both compilation and matching use ! 500: the translate table, you must use the same table contents for both ! 501: operations or confusing things will happen. ! 502: ! 503: @node registers, split, translation, programming ! 504: @comment node-name, next, previous, up ! 505: @subsection Registers: or ``What Did the @samp{\( @dots{} \)} Groupings Actually Match?'' ! 506: ! 507: If you want to find out, after the match, what each of the first nine ! 508: @samp{\( @dots{} \)} groupings actually matched, you can pass the @var{regs} argument ! 509: to the match or search function. Pass the address of a structure of this type: ! 510: ! 511: @example ! 512: struct re_registers ! 513: @{ ! 514: int start[RE_NREGS]; ! 515: int end[RE_NREGS]; ! 516: @}; ! 517: @end example ! 518: ! 519: @code{re_match} and @code{re_search} will store into this structure the ! 520: data you want. @code{@var{regs}->start[@var{reg}]} will be the index in ! 521: @var{string} of the beginning of the data matched by the @var{reg}'th ! 522: @samp{\( @dots{} \)} grouping, and @code{@var{regs}->end[@var{reg}]} will ! 523: be the index of the end of that data (the index of the first character ! 524: beyond those matched). The values in the start and end arrays at ! 525: indexes greater than the number of @samp{\( @dots{} \)} groupings ! 526: present in the regular expression will be set to the value -1. Register ! 527: numbers start at 1 and run to @code{RE_NREGS - 1} (normally @code{9}). ! 528: @code{@var{regs}->start[0]} and @code{@var{regs}->end[0]} are similar but ! 529: describe the extent of the substring matched by the entire pattern.@refill ! 530: ! 531: Both @code{struct re_registers} and @code{RE_NREGS} are defined in @file{regex.h}. ! 532: ! 533: @node split, unix, registers, programming ! 534: @comment node-name, next, previous, up ! 535: @subsection Matching against Split Data ! 536: ! 537: The functions @code{re_match_2} and @code{re_search_2} allow one to match in or search ! 538: data which is divided into two strings. ! 539: ! 540: @code{re_match_2} works like @code{re_match} except that two data strings and ! 541: sizes must be given. ! 542: ! 543: @example ! 544: re_match_2 (@var{buf}, @var{string1}, @var{size1}, @var{string2}, @var{size2}, @var{pos}, @var{regs}) ! 545: @end example ! 546: ! 547: The matcher regards the contents of @var{string1} as effectively followed by ! 548: the contents of @var{string2}, and matches the combined string against the ! 549: pattern in @var{buf}. ! 550: ! 551: @code{re_search_2} is likewise similar to @code{re_search}: ! 552: ! 553: @example ! 554: re_search_2 (@var{buf}, @var{string1}, @var{size1}, @var{string2}, @var{size2}, @var{startpos}, @var{range}, @var{regs}) ! 555: @end example ! 556: ! 557: The value returned by @var{re_search_2} is an index into the combined data ! 558: made up of @var{string1} and @var{string2}. It never exceeds @code{@var{size1} + @var{size2}}. ! 559: The values returned in the @var{regs} structure (if there is one) are likewise ! 560: indices in the combined data. ! 561: ! 562: @node unix, , split, programming ! 563: @comment node-name, next, previous, up ! 564: @subsection Unix-Compatible Entry Points ! 565: ! 566: The standard Berkeley Unix way to compile a regular expression is to call ! 567: @code{re_comp}. This function takes a single argument, the address of the ! 568: regular expression, which is assumed to be terminated by a null character. ! 569: ! 570: @code{re_comp} does not ask you to specify a pattern buffer because it has its ! 571: own pattern buffer --- just one. Using @code{re_comp}, one may match only the ! 572: most recently compiled regular expression. ! 573: ! 574: The value of @code{re_comp} is zero for success or else an error message string, ! 575: as for @code{re_compile_pattern}. ! 576: ! 577: Calling @code{re_comp} with the null string as argument it has no effect; ! 578: the contents of the buffer remain unchanged. ! 579: ! 580: The standard Berkeley Unix way to match the last regular expression compiled ! 581: is to call @code{re_exec}. This takes a single argument, the address of ! 582: the string to be matched. This string is assumed to be terminated by ! 583: a null character. Matching is tried starting at each position in the ! 584: string. @code{re_exec} returns @code{1} for success or @code{0} for failure. ! 585: One cannot find out how long a substring was matched, nor what the ! 586: @samp{\( @dots{} \)} groupings matched. ! 587: ! 588: @bye
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.