|
|
1.1 ! root 1: The following is a summary of the somewhat plausible ideas ! 2: suggested for the new grep. I thank leo de witt particularly and others ! 3: for clearing up misconceptions and pointing out (correctly) that ! 4: existing tools like sed already do (or at least nearly do) what some people ! 5: asked for. The following points are in no particular order and no slight is ! 6: intended by my presentation. ! 7: ! 8: 1) named character classes, e.g. \alpha, \digit. ! 9: i think this is a hokey idea and dismissed it as unnecessary crud ! 10: but then found out it is part of the proposed regular expression ! 11: stuff for posix. it may creep in but i hope not. ! 12: ! 13: 2) matching multi-line patterns (\n as part of pattern) ! 14: this actually requires a lot of infrastructure support and thought. ! 15: i prefer to leave that to other more powerful programs such as sam. ! 16: ! 17: 3) print lines with context. ! 18: the second most requested feature but i'm not doing it. this is ! 19: just the job for sed. to be consistent, we just took the context ! 20: crap out of diff too. this is actually reasonable; showing context ! 21: is the job for a separate tool (pipeline difficulties apart). ! 22: ! 23: 4) print one(first matching) line and go onto the next file. ! 24: most of the justification for this seemed to be scanning ! 25: mail and/or netnews articles for the subject line; neither ! 26: of which gets any sympathy from me. but it is easy to do ! 27: and doesn't add an option; we add a new option (say -1) ! 28: and remove -s. -1 is just like -s except it prints the matching line. ! 29: then the old grep -s pattern is now grep -1 pattern > /dev/null ! 30: and within epsilon of being as efficent. ! 31: ! 32: 5) divert matching lines onto one fd, nonmatching onto another. ! 33: sorry, run grep twice. ! 34: ! 35: 6) print the Nth occurence of the pattern (N is number or list). ! 36: it may be possible to think of a real reason for this (i couldn't) ! 37: but the answer is no. ! 38: ! 39: 7) -w (pattern matches only words) ! 40: the most requested feature. well, it turns out that -x (exact) ! 41: is there because doug mcilroy wanted to match words against a dictionary. ! 42: it seems to have no other use. Therefore, -x is being dropped ! 43: (after all, it only costs a quick edit to do it yourself) and is ! 44: replaced by -w == (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9]). ! 45: ! 46: 8) grep should work on binary files and kanji. ! 47: that it should work on kanji or any character set is a given ! 48: (at least, any character set supported by the system V international ! 49: character set stuff). binary files will work too modulo the ! 50: following restraint: lines (between \n's) have to fit in a ! 51: buffer (current size 64K). violations are an error (exit 2). ! 52: ! 53: 9) -b has bogus units. ! 54: agreed. -b now is in bytes. ! 55: ! 56: 10) -B (add an ^ to the front of the given pattern, analogous to -x and -w) ! 57: -x (and -w) is enough. sorry. ! 58: ! 59: 11) recursively descend through argument lists ! 60: no. find | xargs is going to have to do. ! 61: ! 62: 12) read filenames on standard input ! 63: no. xargs will have to do. ! 64: ! 65: 13) should be as fast as bm. ! 66: no worries. in fact, our egrep is 3xfaster than bm. i intend to be ! 67: competetive with woods' egrep. it should also be as fast as fgrep for ! 68: multiple keywords. the new grep incorporates boyer-moore ! 69: as a degenerate case of Commentz-Walter, a faster replacement ! 70: for the fgrep algorithm. ! 71: ! 72: 14) -lv (files that don't have any matching lines) ! 73: -lv means print names of files that have any nonmatching lines ! 74: (useful, say, for checking input syntax). -L will mean print ! 75: names of files without selected lines. ! 76: ! 77: 15) print the part of the line that matched. ! 78: no. that is available at the subroutine level. ! 79: ! 80: 16) compatability with old grep/fgrep/egrep. ! 81: the current name for the new command is gre (aho chose it). ! 82: after a while, it will become our grep. there will be a -G ! 83: flag to take patterns a la old grep and a -F to take ! 84: patterns a la fgrep (that is, no metacharacters except \n == |). ! 85: gre is close enough to egrep to not matter. ! 86: ! 87: 17) fewer limits. ! 88: so far, gre will have only one limit, a line length of 64K. ! 89: (NO, i am not supporting arbitrary length lines (yet)!) ! 90: we forsee no need for any other limit. for example, the ! 91: current gre acts like fgrep. it is 4 times faster than ! 92: fgrep and has no limits; we can gre -f /usr/dict/words ! 93: (72K words, 600KB). ! 94: ! 95: 18) recognise file types (ignore binaries, unpack packed files etc). ! 96: get real. go back to your macintosh or pyramid. gre will just grep ! 97: files, not understand them. ! 98: ! 99: 19) handle patterns occurring multiple times per line ! 100: this is illdefined (how many time does aaaa occur in a line of 20 'a's? ! 101: in order of decreasing correctness, the answers are >=1, 17, 5). ! 102: For the cases people mentioned (words), pipe it thru ! 103: tr to put the words one per line. ! 104: ! 105: 20) why use \{\} instead of \(\)? ! 106: this is not yet resolved (mcilroy&ritchie vs aho&pike&me). ! 107: grouping is an orthogonal issue to subexpressions so why ! 108: use the same parentheses? the latest suggestion (by ritchie) ! 109: is to allow both \(\) and \{\} as grouping operators but ! 110: the \3 would only count one type (say \(\)). this would be much ! 111: better for complicated patterns with much grouping. ! 112: ! 113: 21) subroutine versions of the pattern matching stuff. ! 114: in a deep sense, the new grep will have no pattern matching code in it. ! 115: all the pattern matching code will be in libc with a uniform ! 116: interface. the boyer-moore and commentz-walter routines have been ! 117: done. the other two are egrep and back-referencing egrep. ! 118: lastly, regexp will be reimplemented. ! 119: ! 120: 22) support a filename of - to mean standard input. ! 121: a unix with /dev/stdin is largely bogus but as a sop to the poor ! 122: barstards having to work on BSD, gre will support - ! 123: as stdin (at least for a while). ! 124: ! 125: Thus, the current proposal is the following flags. it would take a GOOD ! 126: argument to change my mind on this list (unless it is to get rid of a flag). ! 127: ! 128: -f file pattern is (`cat file`) ! 129: -v nonmatching lines are 'selected' ! 130: -i ignore aphabetic case ! 131: -n print line number ! 132: -c print count of selected lines only ! 133: -l print filenames which have a selected line ! 134: -L print filenames who do not have a selected line ! 135: -b print byte offset of line begin ! 136: -h do not print filenames in front of matching lines ! 137: -H always print filenames in front of matching lines ! 138: -w pattern is (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9]) ! 139: -1 print only first selected line per file ! 140: -e expr use expr as the pattern ! 141: ! 142: research!andrew
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.