Annotation of 43BSDTahoe/lib/libc/gen/regexp/README, revision 1.1.1.1

1.1       root        1: This is a nearly-public-domain reimplementation of the V8 regexp(3) package.
                      2: It gives C programs the ability to use egrep-style regular expressions, and
                      3: does it in a much cleaner fashion than the analogous routines in SysV.
                      4: 
                      5:        Copyright (c) 1986 by University of Toronto.
                      6:        Written by Henry Spencer.  Not derived from licensed software.
                      7: 
                      8:        Permission is granted to anyone to use this software for any
                      9:        purpose on any computer system, and to redistribute it freely,
                     10:        subject to the following restrictions:
                     11: 
                     12:        1. The author is not responsible for the consequences of use of
                     13:                this software, no matter how awful, even if they arise
                     14:                from defects in it.
                     15: 
                     16:        2. The origin of this software must not be misrepresented, either
                     17:                by explicit claim or by omission.
                     18: 
                     19:        3. Altered versions must be plainly marked as such, and must not
                     20:                be misrepresented as being the original software.
                     21: 
                     22: Barring a couple of small items in the BUGS list, this implementation is
                     23: believed 100% compatible with V8.  It should even be binary-compatible,
                     24: sort of, since the only fields in a "struct regexp" that other people have
                     25: any business touching are declared in exactly the same way at the same
                     26: location in the struct (the beginning).
                     27: 
                     28: This implementation is *NOT* AT&T/Bell code, and is not derived from licensed
                     29: software.  Even though U of T is a V8 licensee.  This software is based on
                     30: a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
                     31: here is a complete rewrite and hence is not covered by AT&T copyright).
                     32: The software was nearly complete at the time of arrival of our V8 tape.
                     33: I haven't even looked at V8 yet, although a friend elsewhere at U of T has
                     34: been kind enough to run a few test programs using the V8 regexp(3) to resolve
                     35: a few fine points.  I admit to some familiarity with regular-expression
                     36: implementations of the past, but the only one that this code traces any
                     37: ancestry to is the one published in Kernighan & Plauger (from which this
                     38: one draws ideas but not code).
                     39: 
                     40: Simplistically:  put this stuff into a source directory, copy regexp.h into
                     41: /usr/include, inspect Makefile for compilation options that need changing
                     42: to suit your local environment, and then do "make r".  This compiles the
                     43: regexp(3) functions, compiles a test program, and runs a large set of
                     44: regression tests.  If there are no complaints, then put regexp.o, regsub.o,
                     45: and regerror.o into your C library, and regexp.3 into your manual-pages
                     46: directory.
                     47: 
                     48: Note that if you don't put regexp.h into /usr/include *before* compiling,
                     49: you'll have to add "-I." to CFLAGS before compiling.
                     50: 
                     51: The files are:
                     52: 
                     53: Makefile       instructions to make everything
                     54: regexp.3       manual page
                     55: regexp.h       header file, for /usr/include
                     56: regexp.c       source for regcomp() and regexec()
                     57: regsub.c       source for regsub()
                     58: regerror.c     source for default regerror()
                     59: regmagic.h     internal header file
                     60: try.c          source for test program
                     61: timer.c                source for timing program
                     62: tests          test list for try and timer
                     63: 
                     64: This implementation uses nondeterministic automata rather than the
                     65: deterministic ones found in some other implementations, which makes it
                     66: simpler, smaller, and faster at compiling regular expressions, but slower
                     67: at executing them.  In theory, anyway.  This implementation does employ
                     68: some special-case optimizations to make the simpler cases (which do make
                     69: up the bulk of regular expressions actually used) run quickly.  In general,
                     70: if you want blazing speed you're in the wrong place.  Replacing the insides
                     71: of egrep with this stuff is probably a mistake; if you want your own egrep
                     72: you're going to have to do a lot more work.  But if you want to use regular
                     73: expressions a little bit in something else, you're in luck.  Note that many
                     74: existing text editors use nondeterministic regular-expression implementations,
                     75: so you're in good company.
                     76: 
                     77: This stuff should be pretty portable, given appropriate option settings.
                     78: If your chars have less than 8 bits, you're going to have to change the
                     79: internal representation of the automaton, although knowledge of the details
                     80: of this is fairly localized.  There are no "reserved" char values except for
                     81: NUL, and no special significance is attached to the top bit of chars.
                     82: The string(3) functions are used a fair bit, on the grounds that they are
                     83: probably faster than coding the operations in line.  Some attempts at code
                     84: tuning have been made, but this is invariably a bit machine-specific.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.