43BSDReno/share/doc/usd/30.invert/refer - annotate

Return to refer CVS log
Up to [CSRG BSD Unix] / 43BSDReno / share / doc / usd / 30.invert
Annotation of 43BSDReno/share/doc/usd/30.invert/refer, revision 1.1

1.1     ! root        1: .\"    @(#)refer       6.1 (Berkeley) 5/22/86
        !             2: .\"
        !             3: .... refer | tbl | nroff -ms
        !             4: .EH 'USD:30-%''Some Applications of Inverted Indexes on the UNIX System'
        !             5: .OH 'Some Applications of Inverted Indexes on the UNIX System''USD:30-%'
        !             6: .nr LL 6.5i
        !             7: .nr LT 6.5i
        !             8: .de UC
        !             9: \\s-2\\$1\\s0\\$2
        !            10: ..
        !            11: .ds . \&\s+2.\s0
        !            12: .if t .ds -- \(em
        !            13: .if n .ds -- --
        !            14: .TR 69
        !            15: \".TM 77-1274-17 39199 39199-11
        !            16: .ND October 27, 1977
        !            17: .ND June 21, 1978
        !            18: .TL
        !            19: Some Applications of Inverted Indexes on the UNIX System
        !            20: .AU "MH 2C-572" 6377
        !            21: M. E. Lesk
        !            22: .AI
        !            23: .MH
        !            24: .\".AB
        !            25: .\".LP
        !            26: .\".ft B
        !            27: .\"I. Some Applications of Inverted Indexes \- Overview
        !            28: .\".ft R
        !            29: .\".PP
        !            30: .\"This memorandum describes a set of programs which
        !            31: .\"make inverted indexes to
        !            32: .\"UNIX*
        !            33: .\"text files, and their
        !            34: .\"application to
        !            35: .\"retrieving and formatting citations for documents prepared using
        !            36: .\".I troff.
        !            37: .\".PP
        !            38: .\"The indexing and searching programs make keyword
        !            39: .\"indexes to volumes of material too large for linear searching.
        !            40: .\"Searches for combinations of single words can be performed quickly.
        !            41: .\"The programs for general searching are divided into
        !            42: .\"two phases.  The first makes an index from the original
        !            43: .\"data; the second searches the index and retrieves
        !            44: .\"items.
        !            45: .\"Both of these phases are further divided into two parts
        !            46: .\"to separate the data-dependent and algorithm dependent
        !            47: .\"code.
        !            48: .\".PP
        !            49: .\"The major current application of these programs is
        !            50: .\"the
        !            51: .\".I troff
        !            52: .\"preprocessor
        !            53: .\".I refer.
        !            54: .\"A list of 4300 references is maintained on line,
        !            55: .\"containing primarily papers written and cited by
        !            56: .\"local authors.
        !            57: .\"Whenever one of these references is required
        !            58: .\"in a paper, a few words from the title or author list
        !            59: .\"will retrieve it, and the user need not bother to re-enter
        !            60: .\"the exact citation.
        !            61: .\"Alternatively, authors can use their own lists of papers.
        !            62: .\".PP
        !            63: .\"This memorandum is of interest to
        !            64: .\"those who are interested in facilities for searching large
        !            65: .\"but relatively unchanging text files on
        !            66: .\"the
        !            67: .\"UNIX
        !            68: .\"system,
        !            69: .\"and those who are interested in handling bibliographic
        !            70: .\"citations with
        !            71: .\"UNIX
        !            72: .\".I troff.
        !            73: .\".LP
        !            74: .\".ft B
        !            75: .\"II. Updating Publication Lists
        !            76: .\".PP
        !            77: .\"This section is a brief note describing the
        !            78: .\"auxiliary programs for managing the updating
        !            79: .\"processing.
        !            80: .\"It is written to aid clerical users in
        !            81: .\"maintaining lists of references.
        !            82: .\"Primarily, the programs described permit a large
        !            83: .\"amount of individual control over the content
        !            84: .\"of publication lists while retaining the
        !            85: .\"usefulness of the files to other users.
        !            86: .\".LP
        !            87: .\".ft B
        !            88: .\"III. Manual Pages
        !            89: .\".PP
        !            90: .\"This section contains the pages from the
        !            91: .\"UNIX programmer's manual
        !            92: .\"dealing with these commands.
        !            93: .\"It is useful for reference.
        !            94: .\".sp
        !            95: .\"\l'3i'
        !            96: .\".br
        !            97: .\"* UNIX is a trademark of Bell Laboratories.
        !            98: .\".AE
        !            99: .CS 10 4 14 0 0 4
        !           100: .NH
        !           101: Introduction.
        !           102: .PP
        !           103: The
        !           104: .UX
        !           105: system
        !           106: has many utilities
        !           107: (e.g. \fIgrep, awk, lex, egrep, fgrep, ...\fR)
        !           108: to search through files of text,
        !           109: but most of them are based on a linear scan through the
        !           110: entire file, using some deterministic automaton.
        !           111: .ev 1
        !           112: .ps 8
        !           113: .vs 10p
        !           114: .ev
        !           115: This memorandum discusses a program which uses inverted
        !           116: indexes
        !           117: .[
        !           118: %A D. Knuth
        !           119: %T The Art of Computer Programming: Vol. 3, Sorting and Searching
        !           120: %I Addison-Wesley
        !           121: %C Reading, Mass.
        !           122: %D 1977
        !           123: %O See section 6.5.
        !           124: .]
        !           125: and can thus be used on much larger data bases.
        !           126: .PP
        !           127: As with any indexing system, of course, there are some disadvantages;
        !           128: once an index is made, the files that have been indexed can not be changed
        !           129: without remaking the index.
        !           130: Thus applications are restricted
        !           131: to those making many searches
        !           132: of relatively stable data.
        !           133: Furthermore, these programs depend on hashing, and can only
        !           134: search for exact matches of whole keywords.
        !           135: It is not possible to look for
        !           136: arithmetic or logical expressions (e.g. ``date greater than 1970'') or
        !           137: for regular expression searching such as that in
        !           138: .I lex.
        !           139: .[
        !           140: lex lesk cstr
        !           141: .]
        !           142: .PP
        !           143: Currently there are two uses of this software,
        !           144: the
        !           145: .I refer
        !           146: preprocessor to format references,
        !           147: and the
        !           148: .I lookall
        !           149: command to search through all text files on
        !           150: the
        !           151: .UX
        !           152: system.\(dd
        !           153: .FS
        !           154: \(dd \fIlookall\fP is not part of the Berkeley UNIX distribution.
        !           155: .FE
        !           156: .PP
        !           157: The remaining sections of this memorandum discuss
        !           158: the searching programs and their uses.
        !           159: Section 2 explains the operation of the searching algorithm and describes
        !           160: the data collected for use with the
        !           161: .I lookall
        !           162: command.
        !           163: The more important application,
        !           164: .I refer
        !           165: has a user's description in section 3.
        !           166: Section 4 goes into more detail on
        !           167: reference files
        !           168: for the benefit of those who
        !           169: wish to add references to data bases or
        !           170: write new
        !           171: .I troff
        !           172: macros for use with
        !           173: .I refer.
        !           174: The options to make
        !           175: .I refer
        !           176: collect identical citations, or otherwise relocate and adjust references,
        !           177: are described in section 5.
        !           178: .NH
        !           179: Searching.
        !           180: .PP
        !           181: The indexing and searching process is divided into two phases,
        !           182: each made of two parts.
        !           183: These are
        !           184: shown below.
        !           185: .IP A.
        !           186: Construct the index.
        !           187: .RS
        !           188: .IP (1)
        !           189: Find keys \*(-- turn the input files into a sequence of tags and keys,
        !           190: where each tag identifies a distinct item in the input
        !           191: and the keys for each such item are the strings under which it is
        !           192: to be indexed.
        !           193: .IP (2)
        !           194: Hash and sort \*(--
        !           195: prepare a set of inverted indexes from which, given a set of keys,
        !           196: the appropriate item tags can be found quickly.
        !           197: .RE
        !           198: .IP B.
        !           199: Retrieve an item in response to a query.
        !           200: .RS
        !           201: .IP (3)
        !           202: Search \*(--
        !           203: Given some keys, look through the files prepared by the hashing
        !           204: and sorting facility and derive the appropriate tags.
        !           205: .IP (4)
        !           206: Deliver \*(--
        !           207: Given the tags, find the original items.  This completes the
        !           208: searching process.
        !           209: .RE
        !           210: .LP
        !           211: The first phase, making the index, is presumably done relatively infrequently.
        !           212: It should, of course, be done whenever the data being
        !           213: indexed change.
        !           214: In contrast, the second phase, retrieving items,
        !           215: is presumably done often, and must be rapid.
        !           216: .PP
        !           217: An effort is made to separate code which depends on the data
        !           218: being handled from code which depends on the searching procedure.
        !           219: The search algorithm is involved only in programs
        !           220: (2) and (3), while knowledge of the actual data files is
        !           221: needed only by programs (1) and (4).
        !           222: Thus it is easy to adapt to different data files or different
        !           223: search algorithms.
        !           224: .PP
        !           225: To start with, it is necessary to have some way of selecting
        !           226: or generating keys from input files.
        !           227: For dealing with files that are basically English, we have
        !           228: a key-making program which automatically selects words
        !           229: and passes them to the hashing and sorting program (step 2).
        !           230: The format used has one line for each input item,
        !           231: arranged
        !           232: as follows:
        !           233: .DS
        !           234: name:start,length (tab) key1 key2 key3 ...
        !           235: .DE
        !           236: where
        !           237: .I name
        !           238: is the file name,
        !           239: .I start
        !           240: is the starting byte number,
        !           241: and
        !           242: .I length
        !           243: is the number of bytes in the entry.
        !           244: .PP
        !           245: These lines are the only input used to make the
        !           246: index.
        !           247: The first field (the file name, byte position, and byte count)
        !           248: is the tag of the item
        !           249: and can be used to retrieve it quickly.
        !           250: Normally, an item is either a whole file or a section of a file
        !           251: delimited by blank lines.
        !           252: After the tab, the second field contains the keys.
        !           253: The keys, if selected by the automatic program, are
        !           254: any alphanumeric strings which
        !           255: are not among the 100 most frequent words in English
        !           256: and which are not entirely numeric (except for four-digit
        !           257: numbers beginning 19, which are accepted as dates).
        !           258: Keys are truncated to six characters and converted to lower case.
        !           259: Some selection is needed if the original items are very large.
        !           260: We normally just take the first
        !           261: .I n
        !           262: keys, with
        !           263: .I n
        !           264: less than 100 or so; this replaces any attempt at intelligent selection.
        !           265: One file in our system is
        !           266: a complete English dictionary; it would presumably be retrieved for all queries.
        !           267: .PP
        !           268: To generate an inverted index to the list of record tags and keys,
        !           269: the keys
        !           270: are hashed
        !           271: and sorted to produce an index.
        !           272: What is wanted, ideally, is a series of lists showing the tags associated
        !           273: with each key.
        !           274: To condense this,
        !           275: what is actually produced is a list showing the tags associated
        !           276: with each hash code, and thus with some set of keys.
        !           277: To speed up access and further save space,
        !           278: a set of three or possibly four files is produced.
        !           279: These files are:
        !           280: .KS
        !           281: .bd 2 2
        !           282: .TS
        !           283: center;
        !           284: c c
        !           285: lI l.
        !           286: File   Contents
        !           287: entry  Pointers to posting file
        !           288:        for each hash code
        !           289: posting        Lists of tag pointers for
        !           290:        each hash code
        !           291: tag    Tags for each item
        !           292: key    Keys for each item
        !           293:        (optional)
        !           294: .TE
        !           295: .bd 2
        !           296: .KE
        !           297: The posting file comprises the real data: it contains a sequence of lists
        !           298: of items posted under each hash code.  To speed up searching,
        !           299: the entry file is an array of pointers into the posting file, one per potential
        !           300: hash code.
        !           301: Furthermore, the items in the lists in the posting file are not referred to by their
        !           302: complete tag, but just by an address in the tag file, which
        !           303: gives the complete tags.
        !           304: The key file is optional and contains a copy of the keys
        !           305: used in the indexing.
        !           306: .PP
        !           307: The searching process starts with a query, containing several keys.
        !           308: The goal is to obtain all items which were indexed under these keys.
        !           309: The query keys are hashed, and the pointers in the entry file used
        !           310: to access the lists in the posting file.  These lists
        !           311: are addresses in the tag file of documents posted under the
        !           312: hash codes derived from the query.
        !           313: The common items from all lists are determined;
        !           314: this must include the items indexed by every key, but may also
        !           315: contain some items which are false drops, since items referenced by
        !           316: the correct hash codes need not actually have contained the correct keys.
        !           317: Normally, if there are several keys in the query, there are not
        !           318: likely to be many false drops in the final combined list even though
        !           319: each hash code is somewhat ambiguous.
        !           320: The actual tags are then obtained from the tag file, and to guard against
        !           321: the possibility that an item has false-dropped on some hash code
        !           322: in the query, the original items are normally obtained from the delivery
        !           323: program (4) and the query keys checked against them
        !           324: by string comparison.
        !           325: .PP
        !           326: Usually, therefore, the check for bad drops is made against the original file.
        !           327: However, if the key derivation procedure is complex, it may be preferable
        !           328: to check against the keys fed to program (2).
        !           329: In this case the optional key file which contains the
        !           330: keys associated with each item is generated, and the item tag is supplemented
        !           331: by a string
        !           332: .DS
        !           333: ;start,length
        !           334: .DE
        !           335: which indicates the starting byte number in the key file and the length of
        !           336: the string of keys for each item.
        !           337: This file is not usually necessary with the present
        !           338: key-selection program, since the keys
        !           339: always appear in the original document.
        !           340: .PP
        !           341: There is also an option
        !           342: (\f3-C\f2n\|\f1)
        !           343: for coordination level searching.
        !           344: This retrieves items which match all but
        !           345: .I n
        !           346: of the query keys.
        !           347: The items are retrieved in the order of the number
        !           348: of keys that they match.
        !           349: Of course,
        !           350: .I n
        !           351: must be less than the number of query keys (nothing is
        !           352: retrieved unless it matches at least one key).
        !           353: .PP
        !           354: As an example, consider one set of 4377 references, comprising
        !           355: 660,000 bytes.
        !           356: This included 51,000 keys, of which 5,900 were distinct
        !           357: keys.
        !           358: The hash table is kept full to save space (at the expense of time);
        !           359: 995 of 997 possible hash codes were used.
        !           360: The total set of index files (no key file) included 171,000 bytes,
        !           361: about 26% of the original file size.
        !           362: It took 8 minutes of processor time to
        !           363: hash, sort, and write the index.
        !           364: To search for a single query with the resulting index took 1.9 seconds
        !           365: of processor time,
        !           366: while to find the same paper
        !           367: with a sequential linear search
        !           368: using
        !           369: .I grep
        !           370: (reading all of the tags and keys)
        !           371: took 12.3 seconds of processor time.
        !           372: .PP
        !           373: We have also used this software to index all of the English stored on our
        !           374: .UX
        !           375: system.
        !           376: This is the index searched by the
        !           377: .I lookall
        !           378: command.
        !           379: On a typical day there were
        !           380: 29,000 files in our user file system, containing about 152,000,000
        !           381: bytes.
        !           382: Of these
        !           383: 5,300 files, containing 32,000,000 bytes (about 21%)
        !           384: were English text.
        !           385: The total number of `words' (determined mechanically)
        !           386: was 5,100,000.
        !           387: Of these 227,000 were selected as keys;
        !           388: 19,000 were distinct, hashing to 4,900 (of 5,000 possible) different hash codes.
        !           389: The
        !           390: resulting inverted file indexes used 845,000 bytes, or about
        !           391: 2.6% of the size of the original files.
        !           392: The particularly small indexes are caused by the
        !           393: fact that keys are taken from only the first 50 non-common words of
        !           394: some very long input files.
        !           395: .PP
        !           396: Even this large \f2lookall\f1 index can be searched quickly.
        !           397: For example, to find this document
        !           398: by looking for the keys
        !           399: ``lesk inverted indexes''
        !           400: required
        !           401: 1.7 seconds of processor time
        !           402: and system time.
        !           403: By comparison, just to search the 800,000 byte dictionary (smaller than even
        !           404: the inverted indexes, let alone the 27,000,000 bytes of text files) with
        !           405: .I grep
        !           406: takes 29 seconds of processor time.
        !           407: The
        !           408: .I lookall
        !           409: program is thus useful when looking for a document which you believe
        !           410: is stored on-line, but do not know where.  For example, many memos
        !           411: from our center are in the file system, but it is often
        !           412: difficult to guess where a particular memo might be (it might have several
        !           413: authors, each with many directories, and have been worked on by
        !           414: a secretary with yet more directories).
        !           415: Instructions for the use of the
        !           416: .I lookall
        !           417: command are given in the manual section, shown
        !           418: in the appendix to this memorandum.
        !           419: .PP
        !           420: The only indexes maintained routinely are those of publication lists and
        !           421: all English files.
        !           422: To make other indexes, the programs for making keys, sorting them,
        !           423: searching the indexes, and delivering answers must be used.
        !           424: Since they are usually invoked as parts of higher-level commands,
        !           425: they are not in the default command
        !           426: directory, but are available to any user in the directory
        !           427: .I /usr/lib/refer .
        !           428: Three programs are of interest:
        !           429: .I mkey ,
        !           430: which isolates keys from input files;
        !           431: .I inv ,
        !           432: which makes an index from a set of keys;
        !           433: and
        !           434: .I hunt ,
        !           435: which searches the index and delivers the items.
        !           436: Note that the two parts of the retrieval phase are combined into
        !           437: one program, to avoid the excessive system work and delay which
        !           438: would result from running these as separate processes.
        !           439: .PP
        !           440: These three commands have a large number of options to adapt to different
        !           441: kinds of input.
        !           442: The user not interested in the detailed description that now follows may
        !           443: skip to section 3, which describes the
        !           444: .I refer
        !           445: program, a packaged-up version of these tools specifically
        !           446: oriented towards formatting references.
        !           447: .PP
        !           448: .B
        !           449: Make Keys.
        !           450: .R
        !           451: The program
        !           452: .I mkey
        !           453: is the key-making program corresponding to step (1) in phase A.
        !           454: Normally, it reads its input from the file names given as arguments,
        !           455: and if there are no arguments it reads from the standard input.
        !           456: It assumes that blank lines in the input delimit
        !           457: separate items, for each of which a different line of
        !           458: keys should be generated.
        !           459: The lines of keys are written on the standard output.
        !           460: Keys are any alphanumeric string in the input not
        !           461: among the most frequent words in English and not entirely numeric
        !           462: (except that all-numeric strings are acceptable if they
        !           463: are between 1900 and 1999).
        !           464: In the output, keys are translated to lower case, and truncated
        !           465: to six characters in length; any associated punctuation is removed.
        !           466: The following flag arguments are recognized by
        !           467: .I mkey:
        !           468: .TS
        !           469: center;
        !           470: lB lw(4i).
        !           471: \-c \f2name    T{
        !           472: Name of file of common words;
        !           473: default is
        !           474: .I /usr/lib/eign.
        !           475: T}
        !           476: \-f \f2name    T{
        !           477: Read a list of files from
        !           478: .I name
        !           479: and take each as an input argument.
        !           480: T}
        !           481: \-i \f2chars   T{
        !           482: Ignore all lines which begin with `%' followed by any character
        !           483: in
        !           484: .I chars .
        !           485: T}
        !           486: \-k\f2n        T{
        !           487: Use at most
        !           488: .I n
        !           489: keys per input item.
        !           490: T}
        !           491: \-l\f2n        T{
        !           492: Ignore items shorter than
        !           493: .I n
        !           494: letters long.
        !           495: T}
        !           496: \-n\f2m        T{
        !           497: Ignore as a key any word in the first
        !           498: .I m
        !           499: words of the list of common English words.
        !           500: The default is 100.
        !           501: T}
        !           502: \-s    T{
        !           503: Remove the labels
        !           504: .I (file:start,length)
        !           505: from the output; just give the keys.
        !           506: Used when searching rather than indexing.
        !           507: T}
        !           508: \-w    T{
        !           509: Each whole file is a separate item;
        !           510: blank lines in files are irrelevant.
        !           511: T}
        !           512: .TE
        !           513: .PP
        !           514: The normal arguments for indexing references are
        !           515: the defaults, which are
        !           516: .I "\-c /usr/lib/eign" ,
        !           517: .I \-n100 ,
        !           518: and
        !           519: .I \-l3 .
        !           520: For searching, the
        !           521: .I \-s
        !           522: option is also needed.
        !           523: When the big
        !           524: .I lookall
        !           525: index of all English files is run,
        !           526: the options are
        !           527: .I \-w ,
        !           528: .I \-k50 ,
        !           529: and
        !           530: .I "\-f (filelist)" .
        !           531: When running on textual input,
        !           532: the
        !           533: .I mkey
        !           534: program processes about 1000 English words per processor second.
        !           535: Unless the
        !           536: .I \-k
        !           537: option is used (and the input files are long enough for
        !           538: it to take effect)
        !           539: the output of
        !           540: .I mkey 
        !           541: is comparable in size to its input.
        !           542: .PP
        !           543: .B
        !           544: Hash and invert.
        !           545: .R
        !           546: The
        !           547: .I inv
        !           548: program computes the hash codes and writes
        !           549: the inverted files.
        !           550: It reads the output of
        !           551: .I mkey
        !           552: and writes the set of files described earlier
        !           553: in this section.
        !           554: It expects one argument, which is used as the base name for
        !           555: the three (or four) files to be written.
        !           556: Assuming an argument of
        !           557: .I Index
        !           558: (the default)
        !           559: the entry file is named
        !           560: .I Index.ia ,
        !           561: the posting file
        !           562: .I Index.ib ,
        !           563: the tag file
        !           564: .I Index.ic ,
        !           565: and the key file (if present)
        !           566: .I Index.id .
        !           567: The
        !           568: .I inv
        !           569: program recognizes the following options:
        !           570: .TS
        !           571: center;
        !           572: lB lw(4i).
        !           573: \-a    T{
        !           574: Append the new keys to a previous set of inverted files,
        !           575: making new files if there is no old set using the same base name.
        !           576: T}
        !           577: \-d    T{
        !           578: Write the optional key file.
        !           579: This is needed when you can not check for false drops by looking
        !           580: for the keys in the original inputs, i.e. when the key derivation
        !           581: procedure is complicated and
        !           582: the output keys are not words from the input files.
        !           583: T}
        !           584: \-h\f2n        T{
        !           585: The hash table size is
        !           586: .I n
        !           587: (default 997);
        !           588: .I n
        !           589: should be prime.
        !           590: Making \f2n\f1 bigger saves search time and spends disk space.
        !           591: T}
        !           592: \-i[u] \f2name T{
        !           593: Take input from file
        !           594: .I name ,
        !           595: instead of the standard input;
        !           596: if
        !           597: .B u
        !           598: is present
        !           599: .I name
        !           600: is unlinked when the sort is started.
        !           601: Using this option permits the sort scratch space
        !           602: to overlap the disk space used for input keys.
        !           603: T}
        !           604: \-n    T{
        !           605: Make a completely new set of inverted files, ignoring
        !           606: previous files.
        !           607: T}
        !           608: \-p    T{
        !           609: Pipe into the sort program, rather than writing a temporary
        !           610: input file.
        !           611: This saves disk space and spends processor time.
        !           612: T}
        !           613: \-v    T{
        !           614: Verbose mode; print a summary of the number of keys which
        !           615: finished indexing.
        !           616: T}
        !           617: .TE
        !           618: .PP
        !           619: About half the time used in
        !           620: .I inv
        !           621: is in the contained sort.
        !           622: Assuming the sort is roughly linear, however,
        !           623: a guess at the total timing for
        !           624: .I inv
        !           625: is 250 keys per second.
        !           626: The space used is usually of more importance:
        !           627: the entry file uses four bytes per possible hash (note
        !           628: the
        !           629: .B \-h
        !           630: option),
        !           631: and the tag file around 15-20 bytes per item indexed.
        !           632: Roughly, the posting file contains one item for each key instance
        !           633: and one item for each possible hash code; the items are two bytes
        !           634: long if the tag file is less than 65336 bytes long, and the
        !           635: items are four bytes wide if the tag file is greater than
        !           636: 65536 bytes long.
        !           637: Note that to minimize storage, the hash tables should be
        !           638: over-full;
        !           639: for most of the files indexed in this way, there is no
        !           640: other real choice, since the
        !           641: .I entry
        !           642: file must fit in memory.
        !           643: .PP
        !           644: .B
        !           645: Searching and Retrieving.
        !           646: .R
        !           647: The
        !           648: .I hunt
        !           649: program retrieves items from an index.
        !           650: It combines, as mentioned above, the two parts of phase (B):
        !           651: search and delivery.
        !           652: The reason why it is efficient to combine delivery and search
        !           653: is partly to avoid starting unnecessary processes, and partly
        !           654: because the delivery operation must be a part of the search
        !           655: operation in any case.
        !           656: Because of the hashing, the search part takes place in two stages:
        !           657: first items are retrieved which have the right hash codes associated with them,
        !           658: and then the actual items are inspected to determine false drops, i.e.
        !           659: to determine if anything with the right hash codes doesn't really have the right
        !           660: keys.
        !           661: Since the original item is retrieved to check on false drops,
        !           662: it is efficient to present it immediately, rather than only
        !           663: giving the tag as output and later retrieving the
        !           664: item again.
        !           665: If there were a separate key file, this argument would not apply,
        !           666: but separate key files are not common.
        !           667: .PP
        !           668: Input to
        !           669: .I hunt
        !           670: is taken from the standard input,
        !           671: one query per line.
        !           672: Each query should be in
        !           673: .I "mkey \-s"
        !           674: output format;
        !           675: all lower case, no punctuation.
        !           676: The
        !           677: .I hunt
        !           678: program takes one argument which specifies the base name of the index
        !           679: files to be searched.
        !           680: Only one set of index files can be searched at a time,
        !           681: although many text files may be indexed as a group, of course.
        !           682: If one of the text files has been changed since the index, that file
        !           683: is searched with
        !           684: .I fgrep;
        !           685: this may occasionally slow down the searching, and care should be taken to
        !           686: avoid having many out of date files.
        !           687: The following option arguments are recognized by
        !           688: .I hunt:
        !           689: .TS
        !           690: center;
        !           691: lB lw(4i).
        !           692: \-a    T{
        !           693: Give all output; ignore checking for false drops.
        !           694: T}
        !           695: \-C\f2n        T{
        !           696: Coordination level
        !           697: .I n;
        !           698: retrieve items with not more than
        !           699: .I n
        !           700: terms of the input missing;
        !           701: default
        !           702: .I C0 ,
        !           703: implying that each search term must be in the output items.
        !           704: T}
        !           705: \-F[yn\f2d\f3\|]       T{
        !           706: ``\-Fy'' gives the text of all the items found;
        !           707: ``\-Fn'' suppresses them.
        !           708: ``\-F\f2d\|\f1'' where \f2d\f1\| is an integer
        !           709: gives the text of the first \f2d\f1 items.
        !           710: The default is
        !           711: .I \-Fy.
        !           712: T}
        !           713: \-g    T{
        !           714: Do not use
        !           715: .I fgrep
        !           716: to search files changed since the index was made;
        !           717: print an error comment instead.
        !           718: T}
        !           719: \-i \f2string  T{
        !           720: Take
        !           721: .I string
        !           722: as input, instead of reading the standard input.
        !           723: T}
        !           724: \-l \f2n       T{
        !           725: The maximum length of internal lists of candidate
        !           726: items is
        !           727: .I n;
        !           728: default 1000.
        !           729: T}
        !           730: \-o \f2string  T{
        !           731: Put text output (``\-Fy'') in
        !           732: .I string;
        !           733: of use
        !           734: .I only
        !           735: when
        !           736: invoked from another program.
        !           737: T}
        !           738: \-p    T{
        !           739: Print hash code frequencies; mostly
        !           740: for use in optimizing hash table sizes.
        !           741: T}
        !           742: \-T[yn\f2d\|\f3]       T{
        !           743: ``\-Ty'' gives the tags of the items found;
        !           744: ``\-Tn'' suppresses them.
        !           745: ``\-T\f2d\f1\|'' where \f2d\f1\| is an integer
        !           746: gives the first \f2d\f1 tags.
        !           747: The default is
        !           748: .I \-Tn .
        !           749: T}
        !           750: \-t \f2string  T{
        !           751: Put tag output (``\-Ty'') in
        !           752: .I string;
        !           753: of use
        !           754: .I only
        !           755: when invoked from another program.
        !           756: T}
        !           757: .TE
        !           758: .PP
        !           759: The timing of
        !           760: .I hunt
        !           761: is complex.
        !           762: Normally the hash table is overfull, so that there will
        !           763: be many false drops on any single term;
        !           764: but a multi-term query will have few false drops on
        !           765: all terms.
        !           766: Thus if a query is underspecified (one search term)
        !           767: many potential items will be examined and discarded as false
        !           768: drops, wasting time.
        !           769: If the query is overspecified (a dozen search terms)
        !           770: many keys will be examined only to verify that
        !           771: the single item under consideration has that key posted.
        !           772: The variation of search time with number of keys is
        !           773: shown in the table below.
        !           774: Queries of varying length were constructed to retrieve
        !           775: a particular document from the file of references.
        !           776: In the sequence to the left, search terms were chosen so as
        !           777: to select the desired paper as quickly as possible.
        !           778: In the sequence on the right, terms were chosen inefficiently,
        !           779: so that the query did not uniquely select the desired document
        !           780: until four keys had been used.
        !           781: The same document was the target in each case,
        !           782: and the final set of eight keys are also identical; the differences
        !           783: at five, six and seven keys are produced by measurement error, not
        !           784: by the slightly different key lists.
        !           785: .TS
        !           786: center;
        !           787: c   s   s   s5  | c   s   s   s
        !           788: cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8
        !           789: cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8
        !           790: n   n   n   n   | n   n   n   n  .
        !           791: Efficient Keys Inefficient Keys
        !           792: No. keys       Total drops     Retrieved       Search time     No. keys        Total drops     Retrieved       Search time
        !           793:        (incl. false)   Documents       (seconds)               (incl. false)   Documents       (seconds)
        !           794: 1      15      3       1.27    1       68      55      5.96
        !           795: 2      1       1       0.11    2       29      29      2.72
        !           796: 3      1       1       0.14    3       8       8       0.95
        !           797: 4      1       1       0.17    4       1       1       0.18
        !           798: 5      1       1       0.19    5       1       1       0.21
        !           799: 6      1       1       0.23    6       1       1       0.22
        !           800: 7      1       1       0.27    7       1       1       0.26
        !           801: 8      1       1       0.29    8       1       1       0.29
        !           802: .TE
        !           803: As would be expected, the optimal search is achieved
        !           804: when the query just specifies the answer; however,
        !           805: overspecification is quite cheap.
        !           806: Roughly, the time required by
        !           807: .I hunt
        !           808: can be approximated as
        !           809: 30 milliseconds per search key plus 75 milliseconds
        !           810: per dropped document (whether it is a false drop or
        !           811: a real answer).
        !           812: In general, overspecification can be recommended;
        !           813: it protects the user against additions to the data base
        !           814: which turn previously uniquely-answered queries
        !           815: into ambiguous queries.
        !           816: .PP
        !           817: The careful reader will have noted an enormous discrepancy between these times
        !           818: and the earlier quoted time of around 1.9 seconds for a search.  The times
        !           819: here are purely for the search and retrieval: they are measured by
        !           820: running many searches through a single invocation of the
        !           821: .I hunt
        !           822: program alone.
        !           823: The normal retrieval operation involves using the shell to
        !           824: set up a pipeline through
        !           825: .I mkey
        !           826: to
        !           827: .I hunt
        !           828: and starting both processes; this adds a fixed overhead of about 1.7 seconds
        !           829: of processor time
        !           830: to any single search.
        !           831: Furthermore, remember that all these times are processor times:
        !           832: on a typical morning on our \s-2PDP\s0 11/70 system, with about one dozen
        !           833: people logged on,
        !           834: to obtain 1 second of processor time for the search program
        !           835: took between 2 and 12 seconds of real time, with a median of
        !           836: 3.9 seconds and a mean of 4.8 seconds.
        !           837: Thus, although the work involved in a single search may be only
        !           838: 200 milliseconds, after you add the 1.7 seconds of startup processor
        !           839: time
        !           840: and then assume a 4:1 elapsed/processor time
        !           841: ratio, it will be 8 seconds before any response is printed.
        !           842: .NH
        !           843: Selecting and Formatting References for T\s-2ROFF\s0
        !           844: .PP
        !           845: The major application of the retrieval software
        !           846: is
        !           847: .I refer,
        !           848: which is a
        !           849: .I troff
        !           850: preprocessor
        !           851: like
        !           852: .I eqn .
        !           853: .[
        !           854: kernighan cherry acm 1975
        !           855: .]
        !           856: It scans its input looking for items of the form
        !           857: .DS
        !           858: \*.[
        !           859: imprecise citation
        !           860: \*.\^]
        !           861: .DE
        !           862: where an imprecise citation is merely a string
        !           863: of words found in the relevant bibliographic citation.
        !           864: This is translated into a properly formatted reference.
        !           865: If the imprecise citation does not correctly identify
        !           866: a single paper
        !           867: (either
        !           868: selecting no papers or too many) a message is given.
        !           869: The data base of citations searched may be tailored to each
        !           870: system, and individual users may specify their own
        !           871: citation
        !           872: files.
        !           873: On our system, the default data base is accumulated from
        !           874: the publication lists of the members of our organization, plus
        !           875: about half a dozen personal bibliographies that were collected.
        !           876: The present total is about 4300 citations, but this increases steadily.
        !           877: Even now,
        !           878: the data base covers a large fraction of local citations.
        !           879: .PP
        !           880: For example, the reference for the
        !           881: .I eqn
        !           882: paper above was specified as
        !           883: .DS
        !           884: \&\*.\*.\*.
        !           885: \&preprocessor like
        !           886: \&.I eqn.
        !           887: \&.[
        !           888: \&kernighan cherry acm 1975
        !           889: \&.]
        !           890: \&It scans its input looking for items
        !           891: \&\*.\*.\*.
        !           892: .DE
        !           893: This paper was itself printed using
        !           894: .I refer.
        !           895: The above input text was processed by
        !           896: .I refer
        !           897: as well as
        !           898: .I tbl
        !           899: and
        !           900: .I troff
        !           901: by the command
        !           902: .DS
        !           903: .ft I
        !           904: refer memo-file | tbl | troff \-ms
        !           905: .ft R
        !           906: .DE
        !           907: and the reference was automatically translated into a correct
        !           908: citation to the ACM paper on mathematical typesetting.
        !           909: .PP
        !           910: The procedure to use to place a reference in a paper
        !           911: using
        !           912: .I refer
        !           913: is as follows.
        !           914: First, use the
        !           915: .I lookbib
        !           916: command to check that the paper is in the data base
        !           917: and to find out what keys are necessary to retrieve it.
        !           918: This is done by typing
        !           919: .I lookbib
        !           920: and then typing some potential queries until
        !           921: a suitable query is found.
        !           922: For example, had one started to find
        !           923: the
        !           924: .I eqn
        !           925: paper shown above by presenting the query
        !           926: .DS
        !           927:        $ lookbib
        !           928:        kernighan cherry
        !           929:        (EOT)
        !           930: .DE
        !           931: .I lookbib
        !           932: would have found several items; experimentation would quickly
        !           933: have shown that the query given above is adequate.
        !           934: Overspecifying the query is of course harmless.
        !           935: A particularly careful reader may have noticed that ``acm'' does not
        !           936: appear in the printed citation;
        !           937: we have supplemented some of the data base items with common
        !           938: extra keywords, such as common abbreviations for journals
        !           939: or other sources, to aid in searching.
        !           940: .PP
        !           941: If the reference is in the data base, the query
        !           942: that retrieved it can be inserted in the text,
        !           943: between
        !           944: .B \*.[
        !           945: and 
        !           946: .B \*.\^]
        !           947: brackets.
        !           948: If it is not in the data base, it can be typed
        !           949: into a private file of references, using the format
        !           950: discussed in the next section, and then
        !           951: the
        !           952: .B \-p
        !           953: option
        !           954: used to search this private file.
        !           955: Such a command might read
        !           956: (if the private references are called
        !           957: .I myfile )
        !           958: .DS
        !           959: .ft 2
        !           960: refer \-p myfile document | tbl | eqn | troff \-ms \*. \*. \*.
        !           961: .ft 1
        !           962: .DE
        !           963: where
        !           964: .I tbl
        !           965: and/or
        !           966: .I eqn
        !           967: could be omitted if not needed.
        !           968: The use
        !           969: of the
        !           970: .I \-ms
        !           971: macros
        !           972: .[
        !           973: lesk typing documents unix gcos
        !           974: .]
        !           975: or some other macro package, however,
        !           976: is essential.
        !           977: .I Refer
        !           978: only generates the data for the references; exact formatting
        !           979: is done by some macro package, and if none is supplied the
        !           980: references will not be printed.
        !           981: .PP
        !           982: By default,
        !           983: the references are numbered sequentially,
        !           984: and
        !           985: the
        !           986: .I \-ms
        !           987: macros format references as footnotes at the bottom of the page.
        !           988: This memorandum is an example of that style.
        !           989: Other possibilities are discussed in section 5 below.
        !           990: .NH
        !           991: Reference Files.
        !           992: .PP
        !           993: A reference file is a set of bibliographic references usable with
        !           994: .I refer.
        !           995: It can be indexed using the software described in section 2
        !           996: for fast searching.
        !           997: What
        !           998: .I refer
        !           999: does is to read the input document stream,
        !          1000: looking for imprecise citation references.
        !          1001: It then searches through reference files to find
        !          1002: the full citations, and inserts them into the
        !          1003: document.
        !          1004: The format of the full citation is arranged to make it
        !          1005: convenient for a macro package, such as the
        !          1006: .I \-ms
        !          1007: macros, to format the reference
        !          1008: for printing.
        !          1009: Since
        !          1010: the format of the final reference is determined
        !          1011: by the desired style of output,
        !          1012: which is determined by the macros used,
        !          1013: .I refer
        !          1014: avoids forcing any kind of reference appearance.
        !          1015: All it does is define a set of string registers which
        !          1016: contain the basic information about the reference;
        !          1017: and provide a macro call which is expanded by the macro
        !          1018: package to format the reference.
        !          1019: It is the responsibility of the final macro package
        !          1020: to see that the reference is actually printed; if no
        !          1021: macros are used, and the output of
        !          1022: .I refer
        !          1023: fed untranslated to
        !          1024: .I troff,
        !          1025: nothing at all will be printed.
        !          1026: .PP
        !          1027: The strings defined by
        !          1028: .I refer
        !          1029: are taken directly from the files of references, which
        !          1030: are in the following format.
        !          1031: The references should be separated
        !          1032: by blank lines.
        !          1033: Each reference is a sequence of lines beginning with
        !          1034: .B %
        !          1035: and followed
        !          1036: by a key-letter.
        !          1037: The remainder of that line, and successive lines until the next line beginning
        !          1038: with
        !          1039: .B % ,
        !          1040: contain the information specified by the key-letter.
        !          1041: In general,
        !          1042: .I refer
        !          1043: does not interpret the information, but merely presents
        !          1044: it to the macro package for final formatting.
        !          1045: A user with a separate macro package, for example,
        !          1046: can add new key-letters or use the existing ones for other purposes
        !          1047: without bothering
        !          1048: .I refer.
        !          1049: .PP
        !          1050: The meaning of the key-letters given below, in particular,
        !          1051: is that assigned by the
        !          1052: .I \-ms
        !          1053: macros.
        !          1054: Not all information, obviously, is used with each citation.
        !          1055: For example, if a document is both an internal memorandum and a journal article,
        !          1056: the macros ignore the memorandum version and cite only the journal article.
        !          1057: Some kinds of information are not used at all in printing the reference;
        !          1058: if a user does not like finding references by specifying title
        !          1059: or author keywords, and prefers to add specific keywords to the
        !          1060: citation, a field is available which is searched but not
        !          1061: printed (\f3K\f1).
        !          1062: .PP
        !          1063: The key letters currently recognized by
        !          1064: .I refer
        !          1065: and
        !          1066: .I \-ms,
        !          1067: with the kind of information implied, are:
        !          1068: .KS
        !          1069: .TS
        !          1070: center;
        !          1071: c c6 c c
        !          1072: c l c l.
        !          1073: Key    Information specified   Key     Information specified
        !          1074: A      Author's name   N       Issue number
        !          1075: B      Title of book containing item   O       Other information
        !          1076: C      City of publication     P       Page(s) of article
        !          1077: D      Date    R       Technical report reference
        !          1078: E      Editor of book containing item  T       Title
        !          1079: G      Government (NTIS) ordering number       V       Volume number
        !          1080: I      Issuer (publisher)
        !          1081: J      Journal name
        !          1082: K      Keys (for searching)    X       or
        !          1083: L      Label   Y       or
        !          1084: M      Memorandum label        Z       Information not used by \f2refer\f1
        !          1085: .TE
        !          1086: .KE
        !          1087: For example, a sample reference could be
        !          1088: typed as:
        !          1089: .DS
        !          1090: %T Bounds on the Complexity of the Maximal
        !          1091: Common Subsequence Problem
        !          1092: %Z ctr127
        !          1093: %A A. V. Aho
        !          1094: %A D. S. Hirschberg
        !          1095: %A J. D. Ullman
        !          1096: %J J. ACM
        !          1097: %V 23
        !          1098: %N 1
        !          1099: %P 1-12
        !          1100: .\"%M TM 75-1271-7
        !          1101: %M abcd-78
        !          1102: %D Jan. 1976
        !          1103: .DE
        !          1104: Order is irrelevant, except that authors are shown in the order
        !          1105: given.  The output of
        !          1106: .I refer
        !          1107: is a stream of string definitions, one
        !          1108: for each of the fields of each reference, as
        !          1109: shown below.
        !          1110: .DS
        !          1111: \*.]-
        !          1112: \*.ds [A authors' names \*.\*.\*.
        !          1113: \*.ds [T title \*.\*.\*.
        !          1114: \*.ds [J journal \*.\*.\*.
        !          1115: \*.\*.\*.
        !          1116: \*.]\|[ type-number
        !          1117: .DE
        !          1118: The special macro
        !          1119: .B \&\*.]\-
        !          1120: precedes the string definitions
        !          1121: and the special macro
        !          1122: .B \*.]\|[
        !          1123: follows.
        !          1124: These are changed from the input
        !          1125: .B \*.[
        !          1126: and 
        !          1127: .B \*.\^]
        !          1128: so that running the same file through
        !          1129: .I refer
        !          1130: again is harmless.
        !          1131: The 
        !          1132: .B \*.]\-
        !          1133: macro can be used by the macro package to
        !          1134: initialize.
        !          1135: The 
        !          1136: .B \*.]\|[
        !          1137: macro, which should be used
        !          1138: to print the reference, is given an
        !          1139: argument
        !          1140: .I type-number
        !          1141: to indicate the kind of reference, as follows:
        !          1142: .KS
        !          1143: .TS
        !          1144: center;
        !          1145: c c
        !          1146: n l.
        !          1147: Value  Kind of reference
        !          1148: 1      Journal article
        !          1149: 2      Book
        !          1150: 3      Article within book
        !          1151: 4      Technical report
        !          1152: 5      Bell Labs technical memorandum
        !          1153: 0      Other
        !          1154: .TE
        !          1155: .KE
        !          1156: The reference is flagged in the text
        !          1157: with the sequence
        !          1158: .DS
        !          1159: \e*\|([\*.number\e*\|(\*.\^]
        !          1160: .DE
        !          1161: where
        !          1162: .I number
        !          1163: is the footnote number.
        !          1164: The strings
        !          1165: .B [\*.
        !          1166: and 
        !          1167: .B \*.\^]
        !          1168: should be used by the macro package
        !          1169: to format the reference flag in the text.
        !          1170: These strings can be replaced for a particular
        !          1171: footnote, as described in section 5.
        !          1172: The footnote number (or other signal) is available
        !          1173: to the reference macro
        !          1174: .B \*.]\|[
        !          1175: as the
        !          1176: string register
        !          1177: .B [F .
        !          1178: .PP
        !          1179: In some cases users wish to suspend the searching, and merely
        !          1180: use the reference macro formatting.
        !          1181: That is, the user doesn't want to provide a search key
        !          1182: between
        !          1183: .B \*.[
        !          1184: and 
        !          1185: .B \*.\^]
        !          1186: brackets, but merely
        !          1187: the reference lines for the appropriate document.
        !          1188: Alternatively, the user
        !          1189: can wish
        !          1190: to add a few fields to those in the reference
        !          1191: as in the standard file, or
        !          1192: override some fields.
        !          1193: Altering or replacing fields, or supplying whole references, is easily done
        !          1194: by inserting lines beginning
        !          1195: with
        !          1196: .B % ;
        !          1197: any such line is taken as
        !          1198: direct input to the reference
        !          1199: processor rather than keys to be searched.
        !          1200: Thus
        !          1201: .DS
        !          1202: \*.[
        !          1203: key1 key2 key3 \*.\*.\*.
        !          1204: %Q New format item
        !          1205: %R Override report name
        !          1206: \*.\^]
        !          1207: .DE
        !          1208: makes the indicated changes to the result of searching for
        !          1209: the keys.
        !          1210: All of the search keys must be given before the first
        !          1211: \f3%\f1 line.
        !          1212: .PP
        !          1213: If no search keys are provided, an entire citation can
        !          1214: be provided in-line in the text.
        !          1215: For example, if the
        !          1216: .I eqn
        !          1217: paper citation were to be inserted in
        !          1218: this way, rather than by searching for it in the data base,
        !          1219: the input would read
        !          1220: .DS
        !          1221: \&\*.\*.\*.
        !          1222: \&preprocessor like
        !          1223: \&.I eqn.
        !          1224: \&.[
        !          1225: \&%A B. W. Kernighan
        !          1226: \&%A L. L. Cherry
        !          1227: \&%T A System for Typesetting Mathematics
        !          1228: \&%J Comm. ACM
        !          1229: \&%V 18
        !          1230: \&%N 3
        !          1231: \&%P 151-157
        !          1232: \&%D March 1975
        !          1233: \&.]
        !          1234: \&It scans its input looking for items
        !          1235: \&\*.\*.\*.
        !          1236: .DE
        !          1237: This would produce a citation of the same appearance as that
        !          1238: resulting from the file search.
        !          1239: .PP
        !          1240: As shown, fields are normally turned into
        !          1241: .I troff
        !          1242: strings.
        !          1243: Sometimes users would rather have them defined as macros,
        !          1244: so that other
        !          1245: .I troff
        !          1246: commands can be placed into the data.
        !          1247: When this is necessary, simply double the control character
        !          1248: .B %
        !          1249: in the data.
        !          1250: Thus the input
        !          1251: .DS
        !          1252: \&.[
        !          1253: %V 23
        !          1254: %%M
        !          1255: Bell Laboratories,
        !          1256: Murray Hill, N.J. 07974
        !          1257: \&.]
        !          1258: .DE
        !          1259: is processed by
        !          1260: .I refer
        !          1261: into
        !          1262: .DS
        !          1263: \&.ds [V 23
        !          1264: \&.de [M
        !          1265: Bell Laboratories,
        !          1266: Murray Hill, N.J. 07974
        !          1267: \&..
        !          1268: .DE
        !          1269: The information after
        !          1270: .B %%M
        !          1271: is defined as a macro to be invoked by
        !          1272: .B .[M
        !          1273: while the information after
        !          1274: .B %V
        !          1275: is turned into a string to be invoked by
        !          1276: .B \e\(**([V .
        !          1277: At present
        !          1278: .I \-ms
        !          1279: expects all information as strings.
        !          1280: .NH
        !          1281: Collecting References and other Refer Options
        !          1282: .PP
        !          1283: Normally, the combination of
        !          1284: .I refer
        !          1285: and
        !          1286: .I \-ms
        !          1287: formats output as 
        !          1288: .I troff
        !          1289: footnotes which are consecutively numbered and placed
        !          1290: at the bottom of the page.  However,
        !          1291: options exist to
        !          1292: place the references at the end; to arrange references alphabetically
        !          1293: by senior author; and to indicate references by strings in the text of the form
        !          1294: [Name1975a]
        !          1295: rather than by number.
        !          1296: Whenever references are not placed at the bottom of a page
        !          1297: identical references are coalesced.
        !          1298: .PP
        !          1299: For example, the
        !          1300: .B \-e
        !          1301: option to
        !          1302: .I refer
        !          1303: specifies that references are to be collected; in this case
        !          1304: they are output whenever the sequence
        !          1305: .DS
        !          1306: \*.[
        !          1307: $LIST$
        !          1308: \*.\^]
        !          1309: .DE
        !          1310: is encountered.
        !          1311: Thus, to place references at the end of a paper, the user would run
        !          1312: .I refer
        !          1313: with the
        !          1314: .I \-e
        !          1315: option and place the above $LIST$ commands after the last
        !          1316: line of the text.
        !          1317: .I Refer
        !          1318: will then move all the references to that point.
        !          1319: To aid in formatting the collected references,
        !          1320: .I refer
        !          1321: writes the references preceded by the line
        !          1322: .DS
        !          1323: .B .]<
        !          1324: .DE
        !          1325: and
        !          1326: followed by the line
        !          1327: .DS
        !          1328: .B .]>
        !          1329: .DE
        !          1330: to invoke special macros before and after the references.
        !          1331: .PP
        !          1332: Another possible option to
        !          1333: .I refer
        !          1334: is the
        !          1335: .B \-s
        !          1336: option to specify
        !          1337: sorting of references.  The default,
        !          1338: of course, is to list references in the order presented.
        !          1339: The
        !          1340: .B \-s
        !          1341: option implies the
        !          1342: .B \-e
        !          1343: option, and thus requires
        !          1344: a
        !          1345: .DS
        !          1346: \*.[
        !          1347: $LIST$
        !          1348: \*.\^]
        !          1349: .DE
        !          1350: entry to call out the reference list.
        !          1351: The
        !          1352: .B \-s
        !          1353: option may be followed by a string of letters, numbers, and `+' signs indicating how
        !          1354: the references are to be sorted.
        !          1355: The sort is done using the fields whose key-letters are
        !          1356: in the string as sorting keys; the numbers indicate how many
        !          1357: of the fields are to be considered, with `+'
        !          1358: taken as a large number.
        !          1359: Thus the default is
        !          1360: .B \-sAD
        !          1361: meaning ``Sort on senior author, then date.''  To
        !          1362: sort on all authors and then title, specify
        !          1363: .B \-sA+T .
        !          1364: And to sort on two authors and then the journal,
        !          1365: write
        !          1366: .B \-sA2J .
        !          1367: .PP
        !          1368: Other options to
        !          1369: .I refer
        !          1370: change the signal or label inserted in the text for each reference.
        !          1371: Normally these are just sequential numbers,
        !          1372: and their exact placement (within brackets, as superscripts, etc.) is determined
        !          1373: by the macro package.
        !          1374: The
        !          1375: .B \-l
        !          1376: option replaces reference numbers by
        !          1377: strings composed of the senior author's last name, the date,
        !          1378: and a disambiguating letter.
        !          1379: If a number follows the
        !          1380: .B l
        !          1381: as in
        !          1382: .B \-l3
        !          1383: only that many letters of the last name are used
        !          1384: in the label string.
        !          1385: To abbreviate the date as well the form
        !          1386: \f3-l\f2m,n\f1
        !          1387: shortens the last name to the
        !          1388: first
        !          1389: .I m
        !          1390: letters and the date to the
        !          1391: last
        !          1392: .I n
        !          1393: digits.
        !          1394: For example, the option
        !          1395: .B \-l3,2
        !          1396: would refer to the
        !          1397: .I eqn
        !          1398: paper (reference 3) by the signal
        !          1399: .I Ker75a ,
        !          1400: since it is the first cited reference by Kernighan in 1975.
        !          1401: .PP
        !          1402: A user wishing to specify particular labels for
        !          1403: a private bibliography may use the
        !          1404: .B \-k
        !          1405: option.
        !          1406: Specifying
        !          1407: \f3\-k\f2x\f1
        !          1408: causes the field \f2x\f1 to be used as a label.
        !          1409: The default is \f3L\f1.
        !          1410: If this field ends in \f3\-\f1, that character
        !          1411: is replaced by a sequence letter; otherwise the field
        !          1412: is used exactly as given.
        !          1413: .PP
        !          1414: If none of the
        !          1415: .I refer -produced
        !          1416: signals are desired,
        !          1417: the
        !          1418: .B \-b
        !          1419: option entirely suppresses automatic text signals.
        !          1420: .PP
        !          1421: If the user wishes to override the
        !          1422: .I \-ms
        !          1423: treatment of the reference signal (which is normally to
        !          1424: enclose the number in brackets in
        !          1425: .I nroff
        !          1426: and make it a superscript in
        !          1427: .I troff\\| )
        !          1428: this can be done easily.
        !          1429: If the lines
        !          1430: .B \&.[
        !          1431: or
        !          1432: .B \&.]
        !          1433: contain anything following these characters,
        !          1434: the remainders of these lines are used to surround
        !          1435: the reference signal, instead of the default.
        !          1436: Thus, for example, to say ``See reference (2).''
        !          1437: and avoid
        !          1438: ``See reference.\s-3\u2\d\s+3'' the
        !          1439: input might appear
        !          1440: .DS
        !          1441: \&See reference
        !          1442: \&\*.[ (
        !          1443: imprecise citation ...
        !          1444: \&\*.\^])\*.
        !          1445: .DE
        !          1446: Note that blanks are significant in this construction.
        !          1447: If a permanent change is desired in the style of reference
        !          1448: signals, however, it is probably easier to redefine the strings
        !          1449: .B \&[.
        !          1450: and
        !          1451: .B \&.]
        !          1452: (which are used to bracket each signal)
        !          1453: than to change each citation.
        !          1454: .PP
        !          1455: Although normally
        !          1456: .I refer
        !          1457: limits itself to retrieving the data for the reference,
        !          1458: and leaves to a macro package the job of arranging that
        !          1459: data as required by the local format, there are two
        !          1460: special options for rearrangements that can not be
        !          1461: done by macro packages.
        !          1462: The
        !          1463: .B \-c
        !          1464: option puts fields into all upper case
        !          1465: (C\s-2APS\s+2-S\s-2MALL\s+2 C\s-2APS\s+2
        !          1466: in
        !          1467: .I troff
        !          1468: output).
        !          1469: The key-letters indicated what information is to be translated
        !          1470: to upper case follow the
        !          1471: .B c ,
        !          1472: so that
        !          1473: .B \-cAJ
        !          1474: means that authors' names and journals are to be in caps.
        !          1475: The
        !          1476: .B \-a
        !          1477: option writes the names of authors last name first, that is
        !          1478: .I "A. D. Hall, Jr."
        !          1479: is written as
        !          1480: .I "Hall, A. D. Jr" .
        !          1481: The citation form of
        !          1482: the
        !          1483: .I "Journal of the ACM" ,
        !          1484: for example, would require
        !          1485: both
        !          1486: .B \-cA
        !          1487: and
        !          1488: .B \-a
        !          1489: options.
        !          1490: This produces authors' names in the style
        !          1491: .I
        !          1492: K\s-2ERNIGHAN\s0, B. W. \s-2AND\s0 C\s-2HERRY\s0, L. L.\&
        !          1493: .R
        !          1494: for the previous example.
        !          1495: The
        !          1496: .B \-a
        !          1497: option may be followed by a number to indicate how many
        !          1498: author names should be reversed;
        !          1499: .B \-a1
        !          1500: (without any
        !          1501: .B \-c
        !          1502: option)
        !          1503: would produce
        !          1504: .I
        !          1505: Kernighan, B. W. and L. L. Cherry,
        !          1506: .R
        !          1507: for example.
        !          1508: .PP
        !          1509: Finally, there is also the previously-mentioned
        !          1510: .B \-p
        !          1511: option to let the user specify
        !          1512: a private file of references to be searched before the public files.
        !          1513: Note that
        !          1514: .I refer
        !          1515: does not insist on a previously made index for these files.
        !          1516: If a file is named which contains reference
        !          1517: data but is not indexed, it will be searched
        !          1518: (more slowly)
        !          1519: by
        !          1520: .I refer
        !          1521: using
        !          1522: .I fgrep.
        !          1523: In this way
        !          1524: it is easy for users to keep small files of
        !          1525: new references, which can later be added to the
        !          1526: public data bases.
        !          1527: .SG MH-1274-MEL-\s8UNIX\s0
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.