|
|
1.1 ! root 1: This is a nearly-public-domain reimplementation of the V8 regexp(3) package. ! 2: It gives C programs the ability to use egrep-style regular expressions, and ! 3: does it in a much cleaner fashion than the analogous routines in SysV. ! 4: ! 5: Copyright (c) 1986 by University of Toronto. ! 6: Written by Henry Spencer. Not derived from licensed software. ! 7: ! 8: Permission is granted to anyone to use this software for any ! 9: purpose on any computer system, and to redistribute it freely, ! 10: subject to the following restrictions: ! 11: ! 12: 1. The author is not responsible for the consequences of use of ! 13: this software, no matter how awful, even if they arise ! 14: from defects in it. ! 15: ! 16: 2. The origin of this software must not be misrepresented, either ! 17: by explicit claim or by omission. ! 18: ! 19: 3. Altered versions must be plainly marked as such, and must not ! 20: be misrepresented as being the original software. ! 21: ! 22: Barring a couple of small items in the BUGS list, this implementation is ! 23: believed 100% compatible with V8. It should even be binary-compatible, ! 24: sort of, since the only fields in a "struct regexp" that other people have ! 25: any business touching are declared in exactly the same way at the same ! 26: location in the struct (the beginning). ! 27: ! 28: This implementation is *NOT* AT&T/Bell code, and is not derived from licensed ! 29: software. Even though U of T is a V8 licensee. This software is based on ! 30: a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed ! 31: here is a complete rewrite and hence is not covered by AT&T copyright). ! 32: The software was nearly complete at the time of arrival of our V8 tape. ! 33: I haven't even looked at V8 yet, although a friend elsewhere at U of T has ! 34: been kind enough to run a few test programs using the V8 regexp(3) to resolve ! 35: a few fine points. I admit to some familiarity with regular-expression ! 36: implementations of the past, but the only one that this code traces any ! 37: ancestry to is the one published in Kernighan & Plauger (from which this ! 38: one draws ideas but not code). ! 39: ! 40: Simplistically: put this stuff into a source directory, copy regexp.h into ! 41: /usr/include, inspect Makefile for compilation options that need changing ! 42: to suit your local environment, and then do "make r". This compiles the ! 43: regexp(3) functions, compiles a test program, and runs a large set of ! 44: regression tests. If there are no complaints, then put regexp.o, regsub.o, ! 45: and regerror.o into your C library, and regexp.3 into your manual-pages ! 46: directory. ! 47: ! 48: Note that if you don't put regexp.h into /usr/include *before* compiling, ! 49: you'll have to add "-I." to CFLAGS before compiling. ! 50: ! 51: The files are: ! 52: ! 53: Makefile instructions to make everything ! 54: regexp.3 manual page ! 55: regexp.h header file, for /usr/include ! 56: regexp.c source for regcomp() and regexec() ! 57: regsub.c source for regsub() ! 58: regerror.c source for default regerror() ! 59: regmagic.h internal header file ! 60: try.c source for test program ! 61: timer.c source for timing program ! 62: tests test list for try and timer ! 63: ! 64: This implementation uses nondeterministic automata rather than the ! 65: deterministic ones found in some other implementations, which makes it ! 66: simpler, smaller, and faster at compiling regular expressions, but slower ! 67: at executing them. In theory, anyway. This implementation does employ ! 68: some special-case optimizations to make the simpler cases (which do make ! 69: up the bulk of regular expressions actually used) run quickly. In general, ! 70: if you want blazing speed you're in the wrong place. Replacing the insides ! 71: of egrep with this stuff is probably a mistake; if you want your own egrep ! 72: you're going to have to do a lot more work. But if you want to use regular ! 73: expressions a little bit in something else, you're in luck. Note that many ! 74: existing text editors use nondeterministic regular-expression implementations, ! 75: so you're in good company. ! 76: ! 77: This stuff should be pretty portable, given appropriate option settings. ! 78: If your chars have less than 8 bits, you're going to have to change the ! 79: internal representation of the automaton, although knowledge of the details ! 80: of this is fairly localized. There are no "reserved" char values except for ! 81: NUL, and no special significance is attached to the top bit of chars. ! 82: The string(3) functions are used a fair bit, on the grounds that they are ! 83: probably faster than coding the operations in line. Some attempts at code ! 84: tuning have been made, but this is invariably a bit machine-specific.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.