Annotation of 43BSDReno/share/doc/usd/30.invert/refer, revision 1.1.1.1

1.1       root        1: .\"    @(#)refer       6.1 (Berkeley) 5/22/86
                      2: .\"
                      3: .... refer | tbl | nroff -ms
                      4: .EH 'USD:30-%''Some Applications of Inverted Indexes on the UNIX System'
                      5: .OH 'Some Applications of Inverted Indexes on the UNIX System''USD:30-%'
                      6: .nr LL 6.5i
                      7: .nr LT 6.5i
                      8: .de UC
                      9: \\s-2\\$1\\s0\\$2
                     10: ..
                     11: .ds . \&\s+2.\s0
                     12: .if t .ds -- \(em
                     13: .if n .ds -- --
                     14: .TR 69
                     15: \".TM 77-1274-17 39199 39199-11
                     16: .ND October 27, 1977
                     17: .ND June 21, 1978
                     18: .TL
                     19: Some Applications of Inverted Indexes on the UNIX System
                     20: .AU "MH 2C-572" 6377
                     21: M. E. Lesk
                     22: .AI
                     23: .MH
                     24: .\".AB
                     25: .\".LP
                     26: .\".ft B
                     27: .\"I. Some Applications of Inverted Indexes \- Overview
                     28: .\".ft R
                     29: .\".PP
                     30: .\"This memorandum describes a set of programs which
                     31: .\"make inverted indexes to
                     32: .\"UNIX*
                     33: .\"text files, and their
                     34: .\"application to
                     35: .\"retrieving and formatting citations for documents prepared using
                     36: .\".I troff.
                     37: .\".PP
                     38: .\"The indexing and searching programs make keyword
                     39: .\"indexes to volumes of material too large for linear searching.
                     40: .\"Searches for combinations of single words can be performed quickly.
                     41: .\"The programs for general searching are divided into
                     42: .\"two phases.  The first makes an index from the original
                     43: .\"data; the second searches the index and retrieves
                     44: .\"items.
                     45: .\"Both of these phases are further divided into two parts
                     46: .\"to separate the data-dependent and algorithm dependent
                     47: .\"code.
                     48: .\".PP
                     49: .\"The major current application of these programs is
                     50: .\"the
                     51: .\".I troff
                     52: .\"preprocessor
                     53: .\".I refer.
                     54: .\"A list of 4300 references is maintained on line,
                     55: .\"containing primarily papers written and cited by
                     56: .\"local authors.
                     57: .\"Whenever one of these references is required
                     58: .\"in a paper, a few words from the title or author list
                     59: .\"will retrieve it, and the user need not bother to re-enter
                     60: .\"the exact citation.
                     61: .\"Alternatively, authors can use their own lists of papers.
                     62: .\".PP
                     63: .\"This memorandum is of interest to
                     64: .\"those who are interested in facilities for searching large
                     65: .\"but relatively unchanging text files on
                     66: .\"the
                     67: .\"UNIX
                     68: .\"system,
                     69: .\"and those who are interested in handling bibliographic
                     70: .\"citations with
                     71: .\"UNIX
                     72: .\".I troff.
                     73: .\".LP
                     74: .\".ft B
                     75: .\"II. Updating Publication Lists
                     76: .\".PP
                     77: .\"This section is a brief note describing the
                     78: .\"auxiliary programs for managing the updating
                     79: .\"processing.
                     80: .\"It is written to aid clerical users in
                     81: .\"maintaining lists of references.
                     82: .\"Primarily, the programs described permit a large
                     83: .\"amount of individual control over the content
                     84: .\"of publication lists while retaining the
                     85: .\"usefulness of the files to other users.
                     86: .\".LP
                     87: .\".ft B
                     88: .\"III. Manual Pages
                     89: .\".PP
                     90: .\"This section contains the pages from the
                     91: .\"UNIX programmer's manual
                     92: .\"dealing with these commands.
                     93: .\"It is useful for reference.
                     94: .\".sp
                     95: .\"\l'3i'
                     96: .\".br
                     97: .\"* UNIX is a trademark of Bell Laboratories.
                     98: .\".AE
                     99: .CS 10 4 14 0 0 4
                    100: .NH
                    101: Introduction.
                    102: .PP
                    103: The
                    104: .UX
                    105: system
                    106: has many utilities
                    107: (e.g. \fIgrep, awk, lex, egrep, fgrep, ...\fR)
                    108: to search through files of text,
                    109: but most of them are based on a linear scan through the
                    110: entire file, using some deterministic automaton.
                    111: .ev 1
                    112: .ps 8
                    113: .vs 10p
                    114: .ev
                    115: This memorandum discusses a program which uses inverted
                    116: indexes
                    117: .[
                    118: %A D. Knuth
                    119: %T The Art of Computer Programming: Vol. 3, Sorting and Searching
                    120: %I Addison-Wesley
                    121: %C Reading, Mass.
                    122: %D 1977
                    123: %O See section 6.5.
                    124: .]
                    125: and can thus be used on much larger data bases.
                    126: .PP
                    127: As with any indexing system, of course, there are some disadvantages;
                    128: once an index is made, the files that have been indexed can not be changed
                    129: without remaking the index.
                    130: Thus applications are restricted
                    131: to those making many searches
                    132: of relatively stable data.
                    133: Furthermore, these programs depend on hashing, and can only
                    134: search for exact matches of whole keywords.
                    135: It is not possible to look for
                    136: arithmetic or logical expressions (e.g. ``date greater than 1970'') or
                    137: for regular expression searching such as that in
                    138: .I lex.
                    139: .[
                    140: lex lesk cstr
                    141: .]
                    142: .PP
                    143: Currently there are two uses of this software,
                    144: the
                    145: .I refer
                    146: preprocessor to format references,
                    147: and the
                    148: .I lookall
                    149: command to search through all text files on
                    150: the
                    151: .UX
                    152: system.\(dd
                    153: .FS
                    154: \(dd \fIlookall\fP is not part of the Berkeley UNIX distribution.
                    155: .FE
                    156: .PP
                    157: The remaining sections of this memorandum discuss
                    158: the searching programs and their uses.
                    159: Section 2 explains the operation of the searching algorithm and describes
                    160: the data collected for use with the
                    161: .I lookall
                    162: command.
                    163: The more important application,
                    164: .I refer
                    165: has a user's description in section 3.
                    166: Section 4 goes into more detail on
                    167: reference files
                    168: for the benefit of those who
                    169: wish to add references to data bases or
                    170: write new
                    171: .I troff
                    172: macros for use with
                    173: .I refer.
                    174: The options to make
                    175: .I refer
                    176: collect identical citations, or otherwise relocate and adjust references,
                    177: are described in section 5.
                    178: .NH
                    179: Searching.
                    180: .PP
                    181: The indexing and searching process is divided into two phases,
                    182: each made of two parts.
                    183: These are
                    184: shown below.
                    185: .IP A.
                    186: Construct the index.
                    187: .RS
                    188: .IP (1)
                    189: Find keys \*(-- turn the input files into a sequence of tags and keys,
                    190: where each tag identifies a distinct item in the input
                    191: and the keys for each such item are the strings under which it is
                    192: to be indexed.
                    193: .IP (2)
                    194: Hash and sort \*(--
                    195: prepare a set of inverted indexes from which, given a set of keys,
                    196: the appropriate item tags can be found quickly.
                    197: .RE
                    198: .IP B.
                    199: Retrieve an item in response to a query.
                    200: .RS
                    201: .IP (3)
                    202: Search \*(--
                    203: Given some keys, look through the files prepared by the hashing
                    204: and sorting facility and derive the appropriate tags.
                    205: .IP (4)
                    206: Deliver \*(--
                    207: Given the tags, find the original items.  This completes the
                    208: searching process.
                    209: .RE
                    210: .LP
                    211: The first phase, making the index, is presumably done relatively infrequently.
                    212: It should, of course, be done whenever the data being
                    213: indexed change.
                    214: In contrast, the second phase, retrieving items,
                    215: is presumably done often, and must be rapid.
                    216: .PP
                    217: An effort is made to separate code which depends on the data
                    218: being handled from code which depends on the searching procedure.
                    219: The search algorithm is involved only in programs
                    220: (2) and (3), while knowledge of the actual data files is
                    221: needed only by programs (1) and (4).
                    222: Thus it is easy to adapt to different data files or different
                    223: search algorithms.
                    224: .PP
                    225: To start with, it is necessary to have some way of selecting
                    226: or generating keys from input files.
                    227: For dealing with files that are basically English, we have
                    228: a key-making program which automatically selects words
                    229: and passes them to the hashing and sorting program (step 2).
                    230: The format used has one line for each input item,
                    231: arranged
                    232: as follows:
                    233: .DS
                    234: name:start,length (tab) key1 key2 key3 ...
                    235: .DE
                    236: where
                    237: .I name
                    238: is the file name,
                    239: .I start
                    240: is the starting byte number,
                    241: and
                    242: .I length
                    243: is the number of bytes in the entry.
                    244: .PP
                    245: These lines are the only input used to make the
                    246: index.
                    247: The first field (the file name, byte position, and byte count)
                    248: is the tag of the item
                    249: and can be used to retrieve it quickly.
                    250: Normally, an item is either a whole file or a section of a file
                    251: delimited by blank lines.
                    252: After the tab, the second field contains the keys.
                    253: The keys, if selected by the automatic program, are
                    254: any alphanumeric strings which
                    255: are not among the 100 most frequent words in English
                    256: and which are not entirely numeric (except for four-digit
                    257: numbers beginning 19, which are accepted as dates).
                    258: Keys are truncated to six characters and converted to lower case.
                    259: Some selection is needed if the original items are very large.
                    260: We normally just take the first
                    261: .I n
                    262: keys, with
                    263: .I n
                    264: less than 100 or so; this replaces any attempt at intelligent selection.
                    265: One file in our system is
                    266: a complete English dictionary; it would presumably be retrieved for all queries.
                    267: .PP
                    268: To generate an inverted index to the list of record tags and keys,
                    269: the keys
                    270: are hashed
                    271: and sorted to produce an index.
                    272: What is wanted, ideally, is a series of lists showing the tags associated
                    273: with each key.
                    274: To condense this,
                    275: what is actually produced is a list showing the tags associated
                    276: with each hash code, and thus with some set of keys.
                    277: To speed up access and further save space,
                    278: a set of three or possibly four files is produced.
                    279: These files are:
                    280: .KS
                    281: .bd 2 2
                    282: .TS
                    283: center;
                    284: c c
                    285: lI l.
                    286: File   Contents
                    287: entry  Pointers to posting file
                    288:        for each hash code
                    289: posting        Lists of tag pointers for
                    290:        each hash code
                    291: tag    Tags for each item
                    292: key    Keys for each item
                    293:        (optional)
                    294: .TE
                    295: .bd 2
                    296: .KE
                    297: The posting file comprises the real data: it contains a sequence of lists
                    298: of items posted under each hash code.  To speed up searching,
                    299: the entry file is an array of pointers into the posting file, one per potential
                    300: hash code.
                    301: Furthermore, the items in the lists in the posting file are not referred to by their
                    302: complete tag, but just by an address in the tag file, which
                    303: gives the complete tags.
                    304: The key file is optional and contains a copy of the keys
                    305: used in the indexing.
                    306: .PP
                    307: The searching process starts with a query, containing several keys.
                    308: The goal is to obtain all items which were indexed under these keys.
                    309: The query keys are hashed, and the pointers in the entry file used
                    310: to access the lists in the posting file.  These lists
                    311: are addresses in the tag file of documents posted under the
                    312: hash codes derived from the query.
                    313: The common items from all lists are determined;
                    314: this must include the items indexed by every key, but may also
                    315: contain some items which are false drops, since items referenced by
                    316: the correct hash codes need not actually have contained the correct keys.
                    317: Normally, if there are several keys in the query, there are not
                    318: likely to be many false drops in the final combined list even though
                    319: each hash code is somewhat ambiguous.
                    320: The actual tags are then obtained from the tag file, and to guard against
                    321: the possibility that an item has false-dropped on some hash code
                    322: in the query, the original items are normally obtained from the delivery
                    323: program (4) and the query keys checked against them
                    324: by string comparison.
                    325: .PP
                    326: Usually, therefore, the check for bad drops is made against the original file.
                    327: However, if the key derivation procedure is complex, it may be preferable
                    328: to check against the keys fed to program (2).
                    329: In this case the optional key file which contains the
                    330: keys associated with each item is generated, and the item tag is supplemented
                    331: by a string
                    332: .DS
                    333: ;start,length
                    334: .DE
                    335: which indicates the starting byte number in the key file and the length of
                    336: the string of keys for each item.
                    337: This file is not usually necessary with the present
                    338: key-selection program, since the keys
                    339: always appear in the original document.
                    340: .PP
                    341: There is also an option
                    342: (\f3-C\f2n\|\f1)
                    343: for coordination level searching.
                    344: This retrieves items which match all but
                    345: .I n
                    346: of the query keys.
                    347: The items are retrieved in the order of the number
                    348: of keys that they match.
                    349: Of course,
                    350: .I n
                    351: must be less than the number of query keys (nothing is
                    352: retrieved unless it matches at least one key).
                    353: .PP
                    354: As an example, consider one set of 4377 references, comprising
                    355: 660,000 bytes.
                    356: This included 51,000 keys, of which 5,900 were distinct
                    357: keys.
                    358: The hash table is kept full to save space (at the expense of time);
                    359: 995 of 997 possible hash codes were used.
                    360: The total set of index files (no key file) included 171,000 bytes,
                    361: about 26% of the original file size.
                    362: It took 8 minutes of processor time to
                    363: hash, sort, and write the index.
                    364: To search for a single query with the resulting index took 1.9 seconds
                    365: of processor time,
                    366: while to find the same paper
                    367: with a sequential linear search
                    368: using
                    369: .I grep
                    370: (reading all of the tags and keys)
                    371: took 12.3 seconds of processor time.
                    372: .PP
                    373: We have also used this software to index all of the English stored on our
                    374: .UX
                    375: system.
                    376: This is the index searched by the
                    377: .I lookall
                    378: command.
                    379: On a typical day there were
                    380: 29,000 files in our user file system, containing about 152,000,000
                    381: bytes.
                    382: Of these
                    383: 5,300 files, containing 32,000,000 bytes (about 21%)
                    384: were English text.
                    385: The total number of `words' (determined mechanically)
                    386: was 5,100,000.
                    387: Of these 227,000 were selected as keys;
                    388: 19,000 were distinct, hashing to 4,900 (of 5,000 possible) different hash codes.
                    389: The
                    390: resulting inverted file indexes used 845,000 bytes, or about
                    391: 2.6% of the size of the original files.
                    392: The particularly small indexes are caused by the
                    393: fact that keys are taken from only the first 50 non-common words of
                    394: some very long input files.
                    395: .PP
                    396: Even this large \f2lookall\f1 index can be searched quickly.
                    397: For example, to find this document
                    398: by looking for the keys
                    399: ``lesk inverted indexes''
                    400: required
                    401: 1.7 seconds of processor time
                    402: and system time.
                    403: By comparison, just to search the 800,000 byte dictionary (smaller than even
                    404: the inverted indexes, let alone the 27,000,000 bytes of text files) with
                    405: .I grep
                    406: takes 29 seconds of processor time.
                    407: The
                    408: .I lookall
                    409: program is thus useful when looking for a document which you believe
                    410: is stored on-line, but do not know where.  For example, many memos
                    411: from our center are in the file system, but it is often
                    412: difficult to guess where a particular memo might be (it might have several
                    413: authors, each with many directories, and have been worked on by
                    414: a secretary with yet more directories).
                    415: Instructions for the use of the
                    416: .I lookall
                    417: command are given in the manual section, shown
                    418: in the appendix to this memorandum.
                    419: .PP
                    420: The only indexes maintained routinely are those of publication lists and
                    421: all English files.
                    422: To make other indexes, the programs for making keys, sorting them,
                    423: searching the indexes, and delivering answers must be used.
                    424: Since they are usually invoked as parts of higher-level commands,
                    425: they are not in the default command
                    426: directory, but are available to any user in the directory
                    427: .I /usr/lib/refer .
                    428: Three programs are of interest:
                    429: .I mkey ,
                    430: which isolates keys from input files;
                    431: .I inv ,
                    432: which makes an index from a set of keys;
                    433: and
                    434: .I hunt ,
                    435: which searches the index and delivers the items.
                    436: Note that the two parts of the retrieval phase are combined into
                    437: one program, to avoid the excessive system work and delay which
                    438: would result from running these as separate processes.
                    439: .PP
                    440: These three commands have a large number of options to adapt to different
                    441: kinds of input.
                    442: The user not interested in the detailed description that now follows may
                    443: skip to section 3, which describes the
                    444: .I refer
                    445: program, a packaged-up version of these tools specifically
                    446: oriented towards formatting references.
                    447: .PP
                    448: .B
                    449: Make Keys.
                    450: .R
                    451: The program
                    452: .I mkey
                    453: is the key-making program corresponding to step (1) in phase A.
                    454: Normally, it reads its input from the file names given as arguments,
                    455: and if there are no arguments it reads from the standard input.
                    456: It assumes that blank lines in the input delimit
                    457: separate items, for each of which a different line of
                    458: keys should be generated.
                    459: The lines of keys are written on the standard output.
                    460: Keys are any alphanumeric string in the input not
                    461: among the most frequent words in English and not entirely numeric
                    462: (except that all-numeric strings are acceptable if they
                    463: are between 1900 and 1999).
                    464: In the output, keys are translated to lower case, and truncated
                    465: to six characters in length; any associated punctuation is removed.
                    466: The following flag arguments are recognized by
                    467: .I mkey:
                    468: .TS
                    469: center;
                    470: lB lw(4i).
                    471: \-c \f2name    T{
                    472: Name of file of common words;
                    473: default is
                    474: .I /usr/lib/eign.
                    475: T}
                    476: \-f \f2name    T{
                    477: Read a list of files from
                    478: .I name
                    479: and take each as an input argument.
                    480: T}
                    481: \-i \f2chars   T{
                    482: Ignore all lines which begin with `%' followed by any character
                    483: in
                    484: .I chars .
                    485: T}
                    486: \-k\f2n        T{
                    487: Use at most
                    488: .I n
                    489: keys per input item.
                    490: T}
                    491: \-l\f2n        T{
                    492: Ignore items shorter than
                    493: .I n
                    494: letters long.
                    495: T}
                    496: \-n\f2m        T{
                    497: Ignore as a key any word in the first
                    498: .I m
                    499: words of the list of common English words.
                    500: The default is 100.
                    501: T}
                    502: \-s    T{
                    503: Remove the labels
                    504: .I (file:start,length)
                    505: from the output; just give the keys.
                    506: Used when searching rather than indexing.
                    507: T}
                    508: \-w    T{
                    509: Each whole file is a separate item;
                    510: blank lines in files are irrelevant.
                    511: T}
                    512: .TE
                    513: .PP
                    514: The normal arguments for indexing references are
                    515: the defaults, which are
                    516: .I "\-c /usr/lib/eign" ,
                    517: .I \-n100 ,
                    518: and
                    519: .I \-l3 .
                    520: For searching, the
                    521: .I \-s
                    522: option is also needed.
                    523: When the big
                    524: .I lookall
                    525: index of all English files is run,
                    526: the options are
                    527: .I \-w ,
                    528: .I \-k50 ,
                    529: and
                    530: .I "\-f (filelist)" .
                    531: When running on textual input,
                    532: the
                    533: .I mkey
                    534: program processes about 1000 English words per processor second.
                    535: Unless the
                    536: .I \-k
                    537: option is used (and the input files are long enough for
                    538: it to take effect)
                    539: the output of
                    540: .I mkey 
                    541: is comparable in size to its input.
                    542: .PP
                    543: .B
                    544: Hash and invert.
                    545: .R
                    546: The
                    547: .I inv
                    548: program computes the hash codes and writes
                    549: the inverted files.
                    550: It reads the output of
                    551: .I mkey
                    552: and writes the set of files described earlier
                    553: in this section.
                    554: It expects one argument, which is used as the base name for
                    555: the three (or four) files to be written.
                    556: Assuming an argument of
                    557: .I Index
                    558: (the default)
                    559: the entry file is named
                    560: .I Index.ia ,
                    561: the posting file
                    562: .I Index.ib ,
                    563: the tag file
                    564: .I Index.ic ,
                    565: and the key file (if present)
                    566: .I Index.id .
                    567: The
                    568: .I inv
                    569: program recognizes the following options:
                    570: .TS
                    571: center;
                    572: lB lw(4i).
                    573: \-a    T{
                    574: Append the new keys to a previous set of inverted files,
                    575: making new files if there is no old set using the same base name.
                    576: T}
                    577: \-d    T{
                    578: Write the optional key file.
                    579: This is needed when you can not check for false drops by looking
                    580: for the keys in the original inputs, i.e. when the key derivation
                    581: procedure is complicated and
                    582: the output keys are not words from the input files.
                    583: T}
                    584: \-h\f2n        T{
                    585: The hash table size is
                    586: .I n
                    587: (default 997);
                    588: .I n
                    589: should be prime.
                    590: Making \f2n\f1 bigger saves search time and spends disk space.
                    591: T}
                    592: \-i[u] \f2name T{
                    593: Take input from file
                    594: .I name ,
                    595: instead of the standard input;
                    596: if
                    597: .B u
                    598: is present
                    599: .I name
                    600: is unlinked when the sort is started.
                    601: Using this option permits the sort scratch space
                    602: to overlap the disk space used for input keys.
                    603: T}
                    604: \-n    T{
                    605: Make a completely new set of inverted files, ignoring
                    606: previous files.
                    607: T}
                    608: \-p    T{
                    609: Pipe into the sort program, rather than writing a temporary
                    610: input file.
                    611: This saves disk space and spends processor time.
                    612: T}
                    613: \-v    T{
                    614: Verbose mode; print a summary of the number of keys which
                    615: finished indexing.
                    616: T}
                    617: .TE
                    618: .PP
                    619: About half the time used in
                    620: .I inv
                    621: is in the contained sort.
                    622: Assuming the sort is roughly linear, however,
                    623: a guess at the total timing for
                    624: .I inv
                    625: is 250 keys per second.
                    626: The space used is usually of more importance:
                    627: the entry file uses four bytes per possible hash (note
                    628: the
                    629: .B \-h
                    630: option),
                    631: and the tag file around 15-20 bytes per item indexed.
                    632: Roughly, the posting file contains one item for each key instance
                    633: and one item for each possible hash code; the items are two bytes
                    634: long if the tag file is less than 65336 bytes long, and the
                    635: items are four bytes wide if the tag file is greater than
                    636: 65536 bytes long.
                    637: Note that to minimize storage, the hash tables should be
                    638: over-full;
                    639: for most of the files indexed in this way, there is no
                    640: other real choice, since the
                    641: .I entry
                    642: file must fit in memory.
                    643: .PP
                    644: .B
                    645: Searching and Retrieving.
                    646: .R
                    647: The
                    648: .I hunt
                    649: program retrieves items from an index.
                    650: It combines, as mentioned above, the two parts of phase (B):
                    651: search and delivery.
                    652: The reason why it is efficient to combine delivery and search
                    653: is partly to avoid starting unnecessary processes, and partly
                    654: because the delivery operation must be a part of the search
                    655: operation in any case.
                    656: Because of the hashing, the search part takes place in two stages:
                    657: first items are retrieved which have the right hash codes associated with them,
                    658: and then the actual items are inspected to determine false drops, i.e.
                    659: to determine if anything with the right hash codes doesn't really have the right
                    660: keys.
                    661: Since the original item is retrieved to check on false drops,
                    662: it is efficient to present it immediately, rather than only
                    663: giving the tag as output and later retrieving the
                    664: item again.
                    665: If there were a separate key file, this argument would not apply,
                    666: but separate key files are not common.
                    667: .PP
                    668: Input to
                    669: .I hunt
                    670: is taken from the standard input,
                    671: one query per line.
                    672: Each query should be in
                    673: .I "mkey \-s"
                    674: output format;
                    675: all lower case, no punctuation.
                    676: The
                    677: .I hunt
                    678: program takes one argument which specifies the base name of the index
                    679: files to be searched.
                    680: Only one set of index files can be searched at a time,
                    681: although many text files may be indexed as a group, of course.
                    682: If one of the text files has been changed since the index, that file
                    683: is searched with
                    684: .I fgrep;
                    685: this may occasionally slow down the searching, and care should be taken to
                    686: avoid having many out of date files.
                    687: The following option arguments are recognized by
                    688: .I hunt:
                    689: .TS
                    690: center;
                    691: lB lw(4i).
                    692: \-a    T{
                    693: Give all output; ignore checking for false drops.
                    694: T}
                    695: \-C\f2n        T{
                    696: Coordination level
                    697: .I n;
                    698: retrieve items with not more than
                    699: .I n
                    700: terms of the input missing;
                    701: default
                    702: .I C0 ,
                    703: implying that each search term must be in the output items.
                    704: T}
                    705: \-F[yn\f2d\f3\|]       T{
                    706: ``\-Fy'' gives the text of all the items found;
                    707: ``\-Fn'' suppresses them.
                    708: ``\-F\f2d\|\f1'' where \f2d\f1\| is an integer
                    709: gives the text of the first \f2d\f1 items.
                    710: The default is
                    711: .I \-Fy.
                    712: T}
                    713: \-g    T{
                    714: Do not use
                    715: .I fgrep
                    716: to search files changed since the index was made;
                    717: print an error comment instead.
                    718: T}
                    719: \-i \f2string  T{
                    720: Take
                    721: .I string
                    722: as input, instead of reading the standard input.
                    723: T}
                    724: \-l \f2n       T{
                    725: The maximum length of internal lists of candidate
                    726: items is
                    727: .I n;
                    728: default 1000.
                    729: T}
                    730: \-o \f2string  T{
                    731: Put text output (``\-Fy'') in
                    732: .I string;
                    733: of use
                    734: .I only
                    735: when
                    736: invoked from another program.
                    737: T}
                    738: \-p    T{
                    739: Print hash code frequencies; mostly
                    740: for use in optimizing hash table sizes.
                    741: T}
                    742: \-T[yn\f2d\|\f3]       T{
                    743: ``\-Ty'' gives the tags of the items found;
                    744: ``\-Tn'' suppresses them.
                    745: ``\-T\f2d\f1\|'' where \f2d\f1\| is an integer
                    746: gives the first \f2d\f1 tags.
                    747: The default is
                    748: .I \-Tn .
                    749: T}
                    750: \-t \f2string  T{
                    751: Put tag output (``\-Ty'') in
                    752: .I string;
                    753: of use
                    754: .I only
                    755: when invoked from another program.
                    756: T}
                    757: .TE
                    758: .PP
                    759: The timing of
                    760: .I hunt
                    761: is complex.
                    762: Normally the hash table is overfull, so that there will
                    763: be many false drops on any single term;
                    764: but a multi-term query will have few false drops on
                    765: all terms.
                    766: Thus if a query is underspecified (one search term)
                    767: many potential items will be examined and discarded as false
                    768: drops, wasting time.
                    769: If the query is overspecified (a dozen search terms)
                    770: many keys will be examined only to verify that
                    771: the single item under consideration has that key posted.
                    772: The variation of search time with number of keys is
                    773: shown in the table below.
                    774: Queries of varying length were constructed to retrieve
                    775: a particular document from the file of references.
                    776: In the sequence to the left, search terms were chosen so as
                    777: to select the desired paper as quickly as possible.
                    778: In the sequence on the right, terms were chosen inefficiently,
                    779: so that the query did not uniquely select the desired document
                    780: until four keys had been used.
                    781: The same document was the target in each case,
                    782: and the final set of eight keys are also identical; the differences
                    783: at five, six and seven keys are produced by measurement error, not
                    784: by the slightly different key lists.
                    785: .TS
                    786: center;
                    787: c   s   s   s5  | c   s   s   s
                    788: cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8
                    789: cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8
                    790: n   n   n   n   | n   n   n   n  .
                    791: Efficient Keys Inefficient Keys
                    792: No. keys       Total drops     Retrieved       Search time     No. keys        Total drops     Retrieved       Search time
                    793:        (incl. false)   Documents       (seconds)               (incl. false)   Documents       (seconds)
                    794: 1      15      3       1.27    1       68      55      5.96
                    795: 2      1       1       0.11    2       29      29      2.72
                    796: 3      1       1       0.14    3       8       8       0.95
                    797: 4      1       1       0.17    4       1       1       0.18
                    798: 5      1       1       0.19    5       1       1       0.21
                    799: 6      1       1       0.23    6       1       1       0.22
                    800: 7      1       1       0.27    7       1       1       0.26
                    801: 8      1       1       0.29    8       1       1       0.29
                    802: .TE
                    803: As would be expected, the optimal search is achieved
                    804: when the query just specifies the answer; however,
                    805: overspecification is quite cheap.
                    806: Roughly, the time required by
                    807: .I hunt
                    808: can be approximated as
                    809: 30 milliseconds per search key plus 75 milliseconds
                    810: per dropped document (whether it is a false drop or
                    811: a real answer).
                    812: In general, overspecification can be recommended;
                    813: it protects the user against additions to the data base
                    814: which turn previously uniquely-answered queries
                    815: into ambiguous queries.
                    816: .PP
                    817: The careful reader will have noted an enormous discrepancy between these times
                    818: and the earlier quoted time of around 1.9 seconds for a search.  The times
                    819: here are purely for the search and retrieval: they are measured by
                    820: running many searches through a single invocation of the
                    821: .I hunt
                    822: program alone.
                    823: The normal retrieval operation involves using the shell to
                    824: set up a pipeline through
                    825: .I mkey
                    826: to
                    827: .I hunt
                    828: and starting both processes; this adds a fixed overhead of about 1.7 seconds
                    829: of processor time
                    830: to any single search.
                    831: Furthermore, remember that all these times are processor times:
                    832: on a typical morning on our \s-2PDP\s0 11/70 system, with about one dozen
                    833: people logged on,
                    834: to obtain 1 second of processor time for the search program
                    835: took between 2 and 12 seconds of real time, with a median of
                    836: 3.9 seconds and a mean of 4.8 seconds.
                    837: Thus, although the work involved in a single search may be only
                    838: 200 milliseconds, after you add the 1.7 seconds of startup processor
                    839: time
                    840: and then assume a 4:1 elapsed/processor time
                    841: ratio, it will be 8 seconds before any response is printed.
                    842: .NH
                    843: Selecting and Formatting References for T\s-2ROFF\s0
                    844: .PP
                    845: The major application of the retrieval software
                    846: is
                    847: .I refer,
                    848: which is a
                    849: .I troff
                    850: preprocessor
                    851: like
                    852: .I eqn .
                    853: .[
                    854: kernighan cherry acm 1975
                    855: .]
                    856: It scans its input looking for items of the form
                    857: .DS
                    858: \*.[
                    859: imprecise citation
                    860: \*.\^]
                    861: .DE
                    862: where an imprecise citation is merely a string
                    863: of words found in the relevant bibliographic citation.
                    864: This is translated into a properly formatted reference.
                    865: If the imprecise citation does not correctly identify
                    866: a single paper
                    867: (either
                    868: selecting no papers or too many) a message is given.
                    869: The data base of citations searched may be tailored to each
                    870: system, and individual users may specify their own
                    871: citation
                    872: files.
                    873: On our system, the default data base is accumulated from
                    874: the publication lists of the members of our organization, plus
                    875: about half a dozen personal bibliographies that were collected.
                    876: The present total is about 4300 citations, but this increases steadily.
                    877: Even now,
                    878: the data base covers a large fraction of local citations.
                    879: .PP
                    880: For example, the reference for the
                    881: .I eqn
                    882: paper above was specified as
                    883: .DS
                    884: \&\*.\*.\*.
                    885: \&preprocessor like
                    886: \&.I eqn.
                    887: \&.[
                    888: \&kernighan cherry acm 1975
                    889: \&.]
                    890: \&It scans its input looking for items
                    891: \&\*.\*.\*.
                    892: .DE
                    893: This paper was itself printed using
                    894: .I refer.
                    895: The above input text was processed by
                    896: .I refer
                    897: as well as
                    898: .I tbl
                    899: and
                    900: .I troff
                    901: by the command
                    902: .DS
                    903: .ft I
                    904: refer memo-file | tbl | troff \-ms
                    905: .ft R
                    906: .DE
                    907: and the reference was automatically translated into a correct
                    908: citation to the ACM paper on mathematical typesetting.
                    909: .PP
                    910: The procedure to use to place a reference in a paper
                    911: using
                    912: .I refer
                    913: is as follows.
                    914: First, use the
                    915: .I lookbib
                    916: command to check that the paper is in the data base
                    917: and to find out what keys are necessary to retrieve it.
                    918: This is done by typing
                    919: .I lookbib
                    920: and then typing some potential queries until
                    921: a suitable query is found.
                    922: For example, had one started to find
                    923: the
                    924: .I eqn
                    925: paper shown above by presenting the query
                    926: .DS
                    927:        $ lookbib
                    928:        kernighan cherry
                    929:        (EOT)
                    930: .DE
                    931: .I lookbib
                    932: would have found several items; experimentation would quickly
                    933: have shown that the query given above is adequate.
                    934: Overspecifying the query is of course harmless.
                    935: A particularly careful reader may have noticed that ``acm'' does not
                    936: appear in the printed citation;
                    937: we have supplemented some of the data base items with common
                    938: extra keywords, such as common abbreviations for journals
                    939: or other sources, to aid in searching.
                    940: .PP
                    941: If the reference is in the data base, the query
                    942: that retrieved it can be inserted in the text,
                    943: between
                    944: .B \*.[
                    945: and 
                    946: .B \*.\^]
                    947: brackets.
                    948: If it is not in the data base, it can be typed
                    949: into a private file of references, using the format
                    950: discussed in the next section, and then
                    951: the
                    952: .B \-p
                    953: option
                    954: used to search this private file.
                    955: Such a command might read
                    956: (if the private references are called
                    957: .I myfile )
                    958: .DS
                    959: .ft 2
                    960: refer \-p myfile document | tbl | eqn | troff \-ms \*. \*. \*.
                    961: .ft 1
                    962: .DE
                    963: where
                    964: .I tbl
                    965: and/or
                    966: .I eqn
                    967: could be omitted if not needed.
                    968: The use
                    969: of the
                    970: .I \-ms
                    971: macros
                    972: .[
                    973: lesk typing documents unix gcos
                    974: .]
                    975: or some other macro package, however,
                    976: is essential.
                    977: .I Refer
                    978: only generates the data for the references; exact formatting
                    979: is done by some macro package, and if none is supplied the
                    980: references will not be printed.
                    981: .PP
                    982: By default,
                    983: the references are numbered sequentially,
                    984: and
                    985: the
                    986: .I \-ms
                    987: macros format references as footnotes at the bottom of the page.
                    988: This memorandum is an example of that style.
                    989: Other possibilities are discussed in section 5 below.
                    990: .NH
                    991: Reference Files.
                    992: .PP
                    993: A reference file is a set of bibliographic references usable with
                    994: .I refer.
                    995: It can be indexed using the software described in section 2
                    996: for fast searching.
                    997: What
                    998: .I refer
                    999: does is to read the input document stream,
                   1000: looking for imprecise citation references.
                   1001: It then searches through reference files to find
                   1002: the full citations, and inserts them into the
                   1003: document.
                   1004: The format of the full citation is arranged to make it
                   1005: convenient for a macro package, such as the
                   1006: .I \-ms
                   1007: macros, to format the reference
                   1008: for printing.
                   1009: Since
                   1010: the format of the final reference is determined
                   1011: by the desired style of output,
                   1012: which is determined by the macros used,
                   1013: .I refer
                   1014: avoids forcing any kind of reference appearance.
                   1015: All it does is define a set of string registers which
                   1016: contain the basic information about the reference;
                   1017: and provide a macro call which is expanded by the macro
                   1018: package to format the reference.
                   1019: It is the responsibility of the final macro package
                   1020: to see that the reference is actually printed; if no
                   1021: macros are used, and the output of
                   1022: .I refer
                   1023: fed untranslated to
                   1024: .I troff,
                   1025: nothing at all will be printed.
                   1026: .PP
                   1027: The strings defined by
                   1028: .I refer
                   1029: are taken directly from the files of references, which
                   1030: are in the following format.
                   1031: The references should be separated
                   1032: by blank lines.
                   1033: Each reference is a sequence of lines beginning with
                   1034: .B %
                   1035: and followed
                   1036: by a key-letter.
                   1037: The remainder of that line, and successive lines until the next line beginning
                   1038: with
                   1039: .B % ,
                   1040: contain the information specified by the key-letter.
                   1041: In general,
                   1042: .I refer
                   1043: does not interpret the information, but merely presents
                   1044: it to the macro package for final formatting.
                   1045: A user with a separate macro package, for example,
                   1046: can add new key-letters or use the existing ones for other purposes
                   1047: without bothering
                   1048: .I refer.
                   1049: .PP
                   1050: The meaning of the key-letters given below, in particular,
                   1051: is that assigned by the
                   1052: .I \-ms
                   1053: macros.
                   1054: Not all information, obviously, is used with each citation.
                   1055: For example, if a document is both an internal memorandum and a journal article,
                   1056: the macros ignore the memorandum version and cite only the journal article.
                   1057: Some kinds of information are not used at all in printing the reference;
                   1058: if a user does not like finding references by specifying title
                   1059: or author keywords, and prefers to add specific keywords to the
                   1060: citation, a field is available which is searched but not
                   1061: printed (\f3K\f1).
                   1062: .PP
                   1063: The key letters currently recognized by
                   1064: .I refer
                   1065: and
                   1066: .I \-ms,
                   1067: with the kind of information implied, are:
                   1068: .KS
                   1069: .TS
                   1070: center;
                   1071: c c6 c c
                   1072: c l c l.
                   1073: Key    Information specified   Key     Information specified
                   1074: A      Author's name   N       Issue number
                   1075: B      Title of book containing item   O       Other information
                   1076: C      City of publication     P       Page(s) of article
                   1077: D      Date    R       Technical report reference
                   1078: E      Editor of book containing item  T       Title
                   1079: G      Government (NTIS) ordering number       V       Volume number
                   1080: I      Issuer (publisher)
                   1081: J      Journal name
                   1082: K      Keys (for searching)    X       or
                   1083: L      Label   Y       or
                   1084: M      Memorandum label        Z       Information not used by \f2refer\f1
                   1085: .TE
                   1086: .KE
                   1087: For example, a sample reference could be
                   1088: typed as:
                   1089: .DS
                   1090: %T Bounds on the Complexity of the Maximal
                   1091: Common Subsequence Problem
                   1092: %Z ctr127
                   1093: %A A. V. Aho
                   1094: %A D. S. Hirschberg
                   1095: %A J. D. Ullman
                   1096: %J J. ACM
                   1097: %V 23
                   1098: %N 1
                   1099: %P 1-12
                   1100: .\"%M TM 75-1271-7
                   1101: %M abcd-78
                   1102: %D Jan. 1976
                   1103: .DE
                   1104: Order is irrelevant, except that authors are shown in the order
                   1105: given.  The output of
                   1106: .I refer
                   1107: is a stream of string definitions, one
                   1108: for each of the fields of each reference, as
                   1109: shown below.
                   1110: .DS
                   1111: \*.]-
                   1112: \*.ds [A authors' names \*.\*.\*.
                   1113: \*.ds [T title \*.\*.\*.
                   1114: \*.ds [J journal \*.\*.\*.
                   1115: \*.\*.\*.
                   1116: \*.]\|[ type-number
                   1117: .DE
                   1118: The special macro
                   1119: .B \&\*.]\-
                   1120: precedes the string definitions
                   1121: and the special macro
                   1122: .B \*.]\|[
                   1123: follows.
                   1124: These are changed from the input
                   1125: .B \*.[
                   1126: and 
                   1127: .B \*.\^]
                   1128: so that running the same file through
                   1129: .I refer
                   1130: again is harmless.
                   1131: The 
                   1132: .B \*.]\-
                   1133: macro can be used by the macro package to
                   1134: initialize.
                   1135: The 
                   1136: .B \*.]\|[
                   1137: macro, which should be used
                   1138: to print the reference, is given an
                   1139: argument
                   1140: .I type-number
                   1141: to indicate the kind of reference, as follows:
                   1142: .KS
                   1143: .TS
                   1144: center;
                   1145: c c
                   1146: n l.
                   1147: Value  Kind of reference
                   1148: 1      Journal article
                   1149: 2      Book
                   1150: 3      Article within book
                   1151: 4      Technical report
                   1152: 5      Bell Labs technical memorandum
                   1153: 0      Other
                   1154: .TE
                   1155: .KE
                   1156: The reference is flagged in the text
                   1157: with the sequence
                   1158: .DS
                   1159: \e*\|([\*.number\e*\|(\*.\^]
                   1160: .DE
                   1161: where
                   1162: .I number
                   1163: is the footnote number.
                   1164: The strings
                   1165: .B [\*.
                   1166: and 
                   1167: .B \*.\^]
                   1168: should be used by the macro package
                   1169: to format the reference flag in the text.
                   1170: These strings can be replaced for a particular
                   1171: footnote, as described in section 5.
                   1172: The footnote number (or other signal) is available
                   1173: to the reference macro
                   1174: .B \*.]\|[
                   1175: as the
                   1176: string register
                   1177: .B [F .
                   1178: .PP
                   1179: In some cases users wish to suspend the searching, and merely
                   1180: use the reference macro formatting.
                   1181: That is, the user doesn't want to provide a search key
                   1182: between
                   1183: .B \*.[
                   1184: and 
                   1185: .B \*.\^]
                   1186: brackets, but merely
                   1187: the reference lines for the appropriate document.
                   1188: Alternatively, the user
                   1189: can wish
                   1190: to add a few fields to those in the reference
                   1191: as in the standard file, or
                   1192: override some fields.
                   1193: Altering or replacing fields, or supplying whole references, is easily done
                   1194: by inserting lines beginning
                   1195: with
                   1196: .B % ;
                   1197: any such line is taken as
                   1198: direct input to the reference
                   1199: processor rather than keys to be searched.
                   1200: Thus
                   1201: .DS
                   1202: \*.[
                   1203: key1 key2 key3 \*.\*.\*.
                   1204: %Q New format item
                   1205: %R Override report name
                   1206: \*.\^]
                   1207: .DE
                   1208: makes the indicated changes to the result of searching for
                   1209: the keys.
                   1210: All of the search keys must be given before the first
                   1211: \f3%\f1 line.
                   1212: .PP
                   1213: If no search keys are provided, an entire citation can
                   1214: be provided in-line in the text.
                   1215: For example, if the
                   1216: .I eqn
                   1217: paper citation were to be inserted in
                   1218: this way, rather than by searching for it in the data base,
                   1219: the input would read
                   1220: .DS
                   1221: \&\*.\*.\*.
                   1222: \&preprocessor like
                   1223: \&.I eqn.
                   1224: \&.[
                   1225: \&%A B. W. Kernighan
                   1226: \&%A L. L. Cherry
                   1227: \&%T A System for Typesetting Mathematics
                   1228: \&%J Comm. ACM
                   1229: \&%V 18
                   1230: \&%N 3
                   1231: \&%P 151-157
                   1232: \&%D March 1975
                   1233: \&.]
                   1234: \&It scans its input looking for items
                   1235: \&\*.\*.\*.
                   1236: .DE
                   1237: This would produce a citation of the same appearance as that
                   1238: resulting from the file search.
                   1239: .PP
                   1240: As shown, fields are normally turned into
                   1241: .I troff
                   1242: strings.
                   1243: Sometimes users would rather have them defined as macros,
                   1244: so that other
                   1245: .I troff
                   1246: commands can be placed into the data.
                   1247: When this is necessary, simply double the control character
                   1248: .B %
                   1249: in the data.
                   1250: Thus the input
                   1251: .DS
                   1252: \&.[
                   1253: %V 23
                   1254: %%M
                   1255: Bell Laboratories,
                   1256: Murray Hill, N.J. 07974
                   1257: \&.]
                   1258: .DE
                   1259: is processed by
                   1260: .I refer
                   1261: into
                   1262: .DS
                   1263: \&.ds [V 23
                   1264: \&.de [M
                   1265: Bell Laboratories,
                   1266: Murray Hill, N.J. 07974
                   1267: \&..
                   1268: .DE
                   1269: The information after
                   1270: .B %%M
                   1271: is defined as a macro to be invoked by
                   1272: .B .[M
                   1273: while the information after
                   1274: .B %V
                   1275: is turned into a string to be invoked by
                   1276: .B \e\(**([V .
                   1277: At present
                   1278: .I \-ms
                   1279: expects all information as strings.
                   1280: .NH
                   1281: Collecting References and other Refer Options
                   1282: .PP
                   1283: Normally, the combination of
                   1284: .I refer
                   1285: and
                   1286: .I \-ms
                   1287: formats output as 
                   1288: .I troff
                   1289: footnotes which are consecutively numbered and placed
                   1290: at the bottom of the page.  However,
                   1291: options exist to
                   1292: place the references at the end; to arrange references alphabetically
                   1293: by senior author; and to indicate references by strings in the text of the form
                   1294: [Name1975a]
                   1295: rather than by number.
                   1296: Whenever references are not placed at the bottom of a page
                   1297: identical references are coalesced.
                   1298: .PP
                   1299: For example, the
                   1300: .B \-e
                   1301: option to
                   1302: .I refer
                   1303: specifies that references are to be collected; in this case
                   1304: they are output whenever the sequence
                   1305: .DS
                   1306: \*.[
                   1307: $LIST$
                   1308: \*.\^]
                   1309: .DE
                   1310: is encountered.
                   1311: Thus, to place references at the end of a paper, the user would run
                   1312: .I refer
                   1313: with the
                   1314: .I \-e
                   1315: option and place the above $LIST$ commands after the last
                   1316: line of the text.
                   1317: .I Refer
                   1318: will then move all the references to that point.
                   1319: To aid in formatting the collected references,
                   1320: .I refer
                   1321: writes the references preceded by the line
                   1322: .DS
                   1323: .B .]<
                   1324: .DE
                   1325: and
                   1326: followed by the line
                   1327: .DS
                   1328: .B .]>
                   1329: .DE
                   1330: to invoke special macros before and after the references.
                   1331: .PP
                   1332: Another possible option to
                   1333: .I refer
                   1334: is the
                   1335: .B \-s
                   1336: option to specify
                   1337: sorting of references.  The default,
                   1338: of course, is to list references in the order presented.
                   1339: The
                   1340: .B \-s
                   1341: option implies the
                   1342: .B \-e
                   1343: option, and thus requires
                   1344: a
                   1345: .DS
                   1346: \*.[
                   1347: $LIST$
                   1348: \*.\^]
                   1349: .DE
                   1350: entry to call out the reference list.
                   1351: The
                   1352: .B \-s
                   1353: option may be followed by a string of letters, numbers, and `+' signs indicating how
                   1354: the references are to be sorted.
                   1355: The sort is done using the fields whose key-letters are
                   1356: in the string as sorting keys; the numbers indicate how many
                   1357: of the fields are to be considered, with `+'
                   1358: taken as a large number.
                   1359: Thus the default is
                   1360: .B \-sAD
                   1361: meaning ``Sort on senior author, then date.''  To
                   1362: sort on all authors and then title, specify
                   1363: .B \-sA+T .
                   1364: And to sort on two authors and then the journal,
                   1365: write
                   1366: .B \-sA2J .
                   1367: .PP
                   1368: Other options to
                   1369: .I refer
                   1370: change the signal or label inserted in the text for each reference.
                   1371: Normally these are just sequential numbers,
                   1372: and their exact placement (within brackets, as superscripts, etc.) is determined
                   1373: by the macro package.
                   1374: The
                   1375: .B \-l
                   1376: option replaces reference numbers by
                   1377: strings composed of the senior author's last name, the date,
                   1378: and a disambiguating letter.
                   1379: If a number follows the
                   1380: .B l
                   1381: as in
                   1382: .B \-l3
                   1383: only that many letters of the last name are used
                   1384: in the label string.
                   1385: To abbreviate the date as well the form
                   1386: \f3-l\f2m,n\f1
                   1387: shortens the last name to the
                   1388: first
                   1389: .I m
                   1390: letters and the date to the
                   1391: last
                   1392: .I n
                   1393: digits.
                   1394: For example, the option
                   1395: .B \-l3,2
                   1396: would refer to the
                   1397: .I eqn
                   1398: paper (reference 3) by the signal
                   1399: .I Ker75a ,
                   1400: since it is the first cited reference by Kernighan in 1975.
                   1401: .PP
                   1402: A user wishing to specify particular labels for
                   1403: a private bibliography may use the
                   1404: .B \-k
                   1405: option.
                   1406: Specifying
                   1407: \f3\-k\f2x\f1
                   1408: causes the field \f2x\f1 to be used as a label.
                   1409: The default is \f3L\f1.
                   1410: If this field ends in \f3\-\f1, that character
                   1411: is replaced by a sequence letter; otherwise the field
                   1412: is used exactly as given.
                   1413: .PP
                   1414: If none of the
                   1415: .I refer -produced
                   1416: signals are desired,
                   1417: the
                   1418: .B \-b
                   1419: option entirely suppresses automatic text signals.
                   1420: .PP
                   1421: If the user wishes to override the
                   1422: .I \-ms
                   1423: treatment of the reference signal (which is normally to
                   1424: enclose the number in brackets in
                   1425: .I nroff
                   1426: and make it a superscript in
                   1427: .I troff\\| )
                   1428: this can be done easily.
                   1429: If the lines
                   1430: .B \&.[
                   1431: or
                   1432: .B \&.]
                   1433: contain anything following these characters,
                   1434: the remainders of these lines are used to surround
                   1435: the reference signal, instead of the default.
                   1436: Thus, for example, to say ``See reference (2).''
                   1437: and avoid
                   1438: ``See reference.\s-3\u2\d\s+3'' the
                   1439: input might appear
                   1440: .DS
                   1441: \&See reference
                   1442: \&\*.[ (
                   1443: imprecise citation ...
                   1444: \&\*.\^])\*.
                   1445: .DE
                   1446: Note that blanks are significant in this construction.
                   1447: If a permanent change is desired in the style of reference
                   1448: signals, however, it is probably easier to redefine the strings
                   1449: .B \&[.
                   1450: and
                   1451: .B \&.]
                   1452: (which are used to bracket each signal)
                   1453: than to change each citation.
                   1454: .PP
                   1455: Although normally
                   1456: .I refer
                   1457: limits itself to retrieving the data for the reference,
                   1458: and leaves to a macro package the job of arranging that
                   1459: data as required by the local format, there are two
                   1460: special options for rearrangements that can not be
                   1461: done by macro packages.
                   1462: The
                   1463: .B \-c
                   1464: option puts fields into all upper case
                   1465: (C\s-2APS\s+2-S\s-2MALL\s+2 C\s-2APS\s+2
                   1466: in
                   1467: .I troff
                   1468: output).
                   1469: The key-letters indicated what information is to be translated
                   1470: to upper case follow the
                   1471: .B c ,
                   1472: so that
                   1473: .B \-cAJ
                   1474: means that authors' names and journals are to be in caps.
                   1475: The
                   1476: .B \-a
                   1477: option writes the names of authors last name first, that is
                   1478: .I "A. D. Hall, Jr."
                   1479: is written as
                   1480: .I "Hall, A. D. Jr" .
                   1481: The citation form of
                   1482: the
                   1483: .I "Journal of the ACM" ,
                   1484: for example, would require
                   1485: both
                   1486: .B \-cA
                   1487: and
                   1488: .B \-a
                   1489: options.
                   1490: This produces authors' names in the style
                   1491: .I
                   1492: K\s-2ERNIGHAN\s0, B. W. \s-2AND\s0 C\s-2HERRY\s0, L. L.\&
                   1493: .R
                   1494: for the previous example.
                   1495: The
                   1496: .B \-a
                   1497: option may be followed by a number to indicate how many
                   1498: author names should be reversed;
                   1499: .B \-a1
                   1500: (without any
                   1501: .B \-c
                   1502: option)
                   1503: would produce
                   1504: .I
                   1505: Kernighan, B. W. and L. L. Cherry,
                   1506: .R
                   1507: for example.
                   1508: .PP
                   1509: Finally, there is also the previously-mentioned
                   1510: .B \-p
                   1511: option to let the user specify
                   1512: a private file of references to be searched before the public files.
                   1513: Note that
                   1514: .I refer
                   1515: does not insist on a previously made index for these files.
                   1516: If a file is named which contains reference
                   1517: data but is not indexed, it will be searched
                   1518: (more slowly)
                   1519: by
                   1520: .I refer
                   1521: using
                   1522: .I fgrep.
                   1523: In this way
                   1524: it is easy for users to keep small files of
                   1525: new references, which can later be added to the
                   1526: public data bases.
                   1527: .SG MH-1274-MEL-\s8UNIX\s0

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.