researchv10no/cmd/gre/gre.reply - annotate

Return to gre.reply CVS log
Up to [Research Unix] / researchv10no / cmd / gre
Annotation of researchv10no/cmd/gre/gre.reply, revision 1.1

1.1     ! root        1:        The following is a summary of the somewhat plausible ideas
        !             2: suggested for the new grep. I thank leo de witt particularly and others
        !             3: for clearing up misconceptions and pointing out (correctly) that
        !             4: existing tools like sed already do (or at least nearly do) what some people
        !             5: asked for. The following points are in no particular order and no slight is
        !             6: intended by my presentation.
        !             7: 
        !             8: 1) named character classes, e.g. \alpha, \digit.
        !             9:        i think this is a hokey idea and dismissed it as unnecessary crud
        !            10:        but then found out it is part of the proposed regular expression
        !            11:        stuff for posix. it may creep in but i hope not.
        !            12: 
        !            13: 2) matching multi-line patterns (\n as part of pattern)
        !            14:        this actually requires a lot of infrastructure support and thought.
        !            15:        i prefer to leave that to other more powerful programs such as sam.
        !            16: 
        !            17: 3) print lines with context.
        !            18:        the second most requested feature but i'm not doing it. this is
        !            19:        just the job for sed. to be consistent, we just took the context
        !            20:        crap out of diff too. this is actually reasonable; showing context
        !            21:        is the job for a separate tool (pipeline difficulties apart).
        !            22: 
        !            23: 4) print one(first matching) line and go onto the next file.
        !            24:        most of the justification for this seemed to be scanning
        !            25:        mail and/or netnews articles for the subject line; neither
        !            26:        of which gets any sympathy from me. but it is easy to do
        !            27:        and doesn't add an option; we add a new option (say -1)
        !            28:        and remove -s. -1 is just like -s except it prints the matching line.
        !            29:        then the old grep -s pattern is now grep -1 pattern > /dev/null
        !            30:        and within epsilon of being as efficent.
        !            31: 
        !            32: 5) divert matching lines onto one fd, nonmatching onto another.
        !            33:        sorry, run grep twice.
        !            34: 
        !            35: 6) print the Nth occurence of the pattern (N is number or list).
        !            36:        it may be possible to think of a real reason for this (i couldn't)
        !            37:        but the answer is no.
        !            38: 
        !            39: 7) -w (pattern matches only words)
        !            40:        the most requested feature. well, it turns out that -x (exact)
        !            41:        is there because doug mcilroy wanted to match words against a dictionary.
        !            42:        it seems to have no other use. Therefore, -x is being dropped
        !            43:        (after all, it only costs a quick edit to do it yourself) and is
        !            44:        replaced by -w == (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9]).
        !            45: 
        !            46: 8) grep should work on binary files and kanji.
        !            47:        that it should work on kanji or any character set is a given
        !            48:        (at least, any character set supported by the system V international
        !            49:        character set stuff). binary files will work too modulo the
        !            50:        following restraint: lines (between \n's) have to fit in a
        !            51:        buffer (current size 64K). violations are an error (exit 2).
        !            52: 
        !            53: 9) -b has bogus units.
        !            54:        agreed. -b now is in bytes.
        !            55: 
        !            56: 10) -B (add an ^ to the front of the given pattern, analogous to -x and -w)
        !            57:        -x (and -w) is enough. sorry.
        !            58: 
        !            59: 11) recursively descend through argument lists
        !            60:        no. find | xargs is going to have to do.
        !            61: 
        !            62: 12) read filenames on standard input
        !            63:        no. xargs will have to do.
        !            64: 
        !            65: 13) should be as fast as bm.
        !            66:        no worries. in fact, our egrep is 3xfaster than bm. i intend to be
        !            67:        competetive with woods' egrep. it should also be as fast as fgrep for
        !            68:        multiple keywords. the new grep incorporates boyer-moore
        !            69:        as a degenerate case of Commentz-Walter, a faster replacement
        !            70:        for the fgrep algorithm.
        !            71: 
        !            72: 14) -lv (files that don't have any matching lines)
        !            73:        -lv means print names of files that have any nonmatching lines
        !            74:        (useful, say, for checking input syntax). -L will mean print
        !            75:        names of files without selected lines.
        !            76: 
        !            77: 15) print the part of the line that matched.
        !            78:        no. that is available at the subroutine level.
        !            79: 
        !            80: 16) compatability with old grep/fgrep/egrep.
        !            81:        the current name for the new command is gre (aho chose it).
        !            82:        after a while, it will become our grep. there will be a -G
        !            83:        flag to take patterns a la old grep and a -F to take
        !            84:        patterns a la fgrep (that is, no metacharacters except \n == |).
        !            85:        gre is close enough to egrep to not matter.
        !            86: 
        !            87: 17) fewer limits.
        !            88:        so far, gre will have only one limit, a line length of 64K.
        !            89:        (NO, i am not supporting arbitrary length lines (yet)!)
        !            90:        we forsee no need for any other limit. for example, the
        !            91:        current gre acts like fgrep. it is 4 times faster than
        !            92:        fgrep and has no limits; we can gre -f /usr/dict/words
        !            93:        (72K words, 600KB).
        !            94: 
        !            95: 18) recognise file types (ignore binaries, unpack packed files etc).
        !            96:        get real. go back to your macintosh or pyramid. gre will just grep
        !            97:        files, not understand them.
        !            98: 
        !            99: 19) handle patterns occurring multiple times per line
        !           100:        this is illdefined (how many time does aaaa occur in a line of 20 'a's?
        !           101:        in order of decreasing correctness, the answers are >=1, 17, 5).
        !           102:        For the cases people mentioned (words), pipe it thru
        !           103:        tr to put the words one per line.
        !           104: 
        !           105: 20) why use \{\} instead of \(\)?
        !           106:        this is not yet resolved (mcilroy&ritchie vs aho&pike&me).
        !           107:        grouping is an orthogonal issue to subexpressions so why
        !           108:        use the same parentheses? the latest suggestion (by ritchie)
        !           109:        is to allow both \(\) and \{\} as grouping operators but
        !           110:        the \3 would only count one type (say \(\)). this would be much
        !           111:        better for complicated patterns with much grouping.
        !           112: 
        !           113: 21) subroutine versions of the pattern matching stuff.
        !           114:        in a deep sense, the new grep will have no pattern matching code in it.
        !           115:        all the pattern matching code will be in libc with a uniform
        !           116:        interface. the boyer-moore and commentz-walter routines have been
        !           117:        done. the other two are egrep and back-referencing egrep.
        !           118:        lastly, regexp will be reimplemented.
        !           119: 
        !           120: 22) support a filename of - to mean standard input.
        !           121:        a unix with /dev/stdin is largely bogus but as a sop to the poor
        !           122:        barstards having to work on BSD, gre will support -
        !           123:        as stdin (at least for a while).
        !           124: 
        !           125: Thus, the current proposal is the following flags. it would take a GOOD
        !           126: argument to change my mind on this list (unless it is to get rid of a flag).
        !           127: 
        !           128: -f file        pattern is (`cat file`)
        !           129: -v     nonmatching lines are 'selected'
        !           130: -i     ignore aphabetic case
        !           131: -n     print line number
        !           132: -c     print count of selected lines only
        !           133: -l     print filenames which have a selected line
        !           134: -L     print filenames who do not have a selected line
        !           135: -b     print byte offset of line begin
        !           136: -h     do not print filenames in front of matching lines
        !           137: -H     always print filenames in front of matching lines
        !           138: -w     pattern is (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9])
        !           139: -1     print only first selected line per file
        !           140: -e expr        use expr as the pattern
        !           141: 
        !           142: research!andrew
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.