|
|
1.1 ! root 1: .\" @(#)refer 6.1 (Berkeley) 5/22/86 ! 2: .\" ! 3: .... refer | tbl | nroff -ms ! 4: .EH 'USD:30-%''Some Applications of Inverted Indexes on the UNIX System' ! 5: .OH 'Some Applications of Inverted Indexes on the UNIX System''USD:30-%' ! 6: .nr LL 6.5i ! 7: .nr LT 6.5i ! 8: .de UC ! 9: \\s-2\\$1\\s0\\$2 ! 10: .. ! 11: .ds . \&\s+2.\s0 ! 12: .if t .ds -- \(em ! 13: .if n .ds -- -- ! 14: .TR 69 ! 15: \".TM 77-1274-17 39199 39199-11 ! 16: .ND October 27, 1977 ! 17: .ND June 21, 1978 ! 18: .TL ! 19: Some Applications of Inverted Indexes on the UNIX System ! 20: .AU "MH 2C-572" 6377 ! 21: M. E. Lesk ! 22: .AI ! 23: .MH ! 24: .\".AB ! 25: .\".LP ! 26: .\".ft B ! 27: .\"I. Some Applications of Inverted Indexes \- Overview ! 28: .\".ft R ! 29: .\".PP ! 30: .\"This memorandum describes a set of programs which ! 31: .\"make inverted indexes to ! 32: .\"UNIX* ! 33: .\"text files, and their ! 34: .\"application to ! 35: .\"retrieving and formatting citations for documents prepared using ! 36: .\".I troff. ! 37: .\".PP ! 38: .\"The indexing and searching programs make keyword ! 39: .\"indexes to volumes of material too large for linear searching. ! 40: .\"Searches for combinations of single words can be performed quickly. ! 41: .\"The programs for general searching are divided into ! 42: .\"two phases. The first makes an index from the original ! 43: .\"data; the second searches the index and retrieves ! 44: .\"items. ! 45: .\"Both of these phases are further divided into two parts ! 46: .\"to separate the data-dependent and algorithm dependent ! 47: .\"code. ! 48: .\".PP ! 49: .\"The major current application of these programs is ! 50: .\"the ! 51: .\".I troff ! 52: .\"preprocessor ! 53: .\".I refer. ! 54: .\"A list of 4300 references is maintained on line, ! 55: .\"containing primarily papers written and cited by ! 56: .\"local authors. ! 57: .\"Whenever one of these references is required ! 58: .\"in a paper, a few words from the title or author list ! 59: .\"will retrieve it, and the user need not bother to re-enter ! 60: .\"the exact citation. ! 61: .\"Alternatively, authors can use their own lists of papers. ! 62: .\".PP ! 63: .\"This memorandum is of interest to ! 64: .\"those who are interested in facilities for searching large ! 65: .\"but relatively unchanging text files on ! 66: .\"the ! 67: .\"UNIX ! 68: .\"system, ! 69: .\"and those who are interested in handling bibliographic ! 70: .\"citations with ! 71: .\"UNIX ! 72: .\".I troff. ! 73: .\".LP ! 74: .\".ft B ! 75: .\"II. Updating Publication Lists ! 76: .\".PP ! 77: .\"This section is a brief note describing the ! 78: .\"auxiliary programs for managing the updating ! 79: .\"processing. ! 80: .\"It is written to aid clerical users in ! 81: .\"maintaining lists of references. ! 82: .\"Primarily, the programs described permit a large ! 83: .\"amount of individual control over the content ! 84: .\"of publication lists while retaining the ! 85: .\"usefulness of the files to other users. ! 86: .\".LP ! 87: .\".ft B ! 88: .\"III. Manual Pages ! 89: .\".PP ! 90: .\"This section contains the pages from the ! 91: .\"UNIX programmer's manual ! 92: .\"dealing with these commands. ! 93: .\"It is useful for reference. ! 94: .\".sp ! 95: .\"\l'3i' ! 96: .\".br ! 97: .\"* UNIX is a trademark of Bell Laboratories. ! 98: .\".AE ! 99: .CS 10 4 14 0 0 4 ! 100: .NH ! 101: Introduction. ! 102: .PP ! 103: The ! 104: .UX ! 105: system ! 106: has many utilities ! 107: (e.g. \fIgrep, awk, lex, egrep, fgrep, ...\fR) ! 108: to search through files of text, ! 109: but most of them are based on a linear scan through the ! 110: entire file, using some deterministic automaton. ! 111: .ev 1 ! 112: .ps 8 ! 113: .vs 10p ! 114: .ev ! 115: This memorandum discusses a program which uses inverted ! 116: indexes ! 117: .[ ! 118: %A D. Knuth ! 119: %T The Art of Computer Programming: Vol. 3, Sorting and Searching ! 120: %I Addison-Wesley ! 121: %C Reading, Mass. ! 122: %D 1977 ! 123: %O See section 6.5. ! 124: .] ! 125: and can thus be used on much larger data bases. ! 126: .PP ! 127: As with any indexing system, of course, there are some disadvantages; ! 128: once an index is made, the files that have been indexed can not be changed ! 129: without remaking the index. ! 130: Thus applications are restricted ! 131: to those making many searches ! 132: of relatively stable data. ! 133: Furthermore, these programs depend on hashing, and can only ! 134: search for exact matches of whole keywords. ! 135: It is not possible to look for ! 136: arithmetic or logical expressions (e.g. ``date greater than 1970'') or ! 137: for regular expression searching such as that in ! 138: .I lex. ! 139: .[ ! 140: lex lesk cstr ! 141: .] ! 142: .PP ! 143: Currently there are two uses of this software, ! 144: the ! 145: .I refer ! 146: preprocessor to format references, ! 147: and the ! 148: .I lookall ! 149: command to search through all text files on ! 150: the ! 151: .UX ! 152: system.\(dd ! 153: .FS ! 154: \(dd \fIlookall\fP is not part of the Berkeley UNIX distribution. ! 155: .FE ! 156: .PP ! 157: The remaining sections of this memorandum discuss ! 158: the searching programs and their uses. ! 159: Section 2 explains the operation of the searching algorithm and describes ! 160: the data collected for use with the ! 161: .I lookall ! 162: command. ! 163: The more important application, ! 164: .I refer ! 165: has a user's description in section 3. ! 166: Section 4 goes into more detail on ! 167: reference files ! 168: for the benefit of those who ! 169: wish to add references to data bases or ! 170: write new ! 171: .I troff ! 172: macros for use with ! 173: .I refer. ! 174: The options to make ! 175: .I refer ! 176: collect identical citations, or otherwise relocate and adjust references, ! 177: are described in section 5. ! 178: .NH ! 179: Searching. ! 180: .PP ! 181: The indexing and searching process is divided into two phases, ! 182: each made of two parts. ! 183: These are ! 184: shown below. ! 185: .IP A. ! 186: Construct the index. ! 187: .RS ! 188: .IP (1) ! 189: Find keys \*(-- turn the input files into a sequence of tags and keys, ! 190: where each tag identifies a distinct item in the input ! 191: and the keys for each such item are the strings under which it is ! 192: to be indexed. ! 193: .IP (2) ! 194: Hash and sort \*(-- ! 195: prepare a set of inverted indexes from which, given a set of keys, ! 196: the appropriate item tags can be found quickly. ! 197: .RE ! 198: .IP B. ! 199: Retrieve an item in response to a query. ! 200: .RS ! 201: .IP (3) ! 202: Search \*(-- ! 203: Given some keys, look through the files prepared by the hashing ! 204: and sorting facility and derive the appropriate tags. ! 205: .IP (4) ! 206: Deliver \*(-- ! 207: Given the tags, find the original items. This completes the ! 208: searching process. ! 209: .RE ! 210: .LP ! 211: The first phase, making the index, is presumably done relatively infrequently. ! 212: It should, of course, be done whenever the data being ! 213: indexed change. ! 214: In contrast, the second phase, retrieving items, ! 215: is presumably done often, and must be rapid. ! 216: .PP ! 217: An effort is made to separate code which depends on the data ! 218: being handled from code which depends on the searching procedure. ! 219: The search algorithm is involved only in programs ! 220: (2) and (3), while knowledge of the actual data files is ! 221: needed only by programs (1) and (4). ! 222: Thus it is easy to adapt to different data files or different ! 223: search algorithms. ! 224: .PP ! 225: To start with, it is necessary to have some way of selecting ! 226: or generating keys from input files. ! 227: For dealing with files that are basically English, we have ! 228: a key-making program which automatically selects words ! 229: and passes them to the hashing and sorting program (step 2). ! 230: The format used has one line for each input item, ! 231: arranged ! 232: as follows: ! 233: .DS ! 234: name:start,length (tab) key1 key2 key3 ... ! 235: .DE ! 236: where ! 237: .I name ! 238: is the file name, ! 239: .I start ! 240: is the starting byte number, ! 241: and ! 242: .I length ! 243: is the number of bytes in the entry. ! 244: .PP ! 245: These lines are the only input used to make the ! 246: index. ! 247: The first field (the file name, byte position, and byte count) ! 248: is the tag of the item ! 249: and can be used to retrieve it quickly. ! 250: Normally, an item is either a whole file or a section of a file ! 251: delimited by blank lines. ! 252: After the tab, the second field contains the keys. ! 253: The keys, if selected by the automatic program, are ! 254: any alphanumeric strings which ! 255: are not among the 100 most frequent words in English ! 256: and which are not entirely numeric (except for four-digit ! 257: numbers beginning 19, which are accepted as dates). ! 258: Keys are truncated to six characters and converted to lower case. ! 259: Some selection is needed if the original items are very large. ! 260: We normally just take the first ! 261: .I n ! 262: keys, with ! 263: .I n ! 264: less than 100 or so; this replaces any attempt at intelligent selection. ! 265: One file in our system is ! 266: a complete English dictionary; it would presumably be retrieved for all queries. ! 267: .PP ! 268: To generate an inverted index to the list of record tags and keys, ! 269: the keys ! 270: are hashed ! 271: and sorted to produce an index. ! 272: What is wanted, ideally, is a series of lists showing the tags associated ! 273: with each key. ! 274: To condense this, ! 275: what is actually produced is a list showing the tags associated ! 276: with each hash code, and thus with some set of keys. ! 277: To speed up access and further save space, ! 278: a set of three or possibly four files is produced. ! 279: These files are: ! 280: .KS ! 281: .bd 2 2 ! 282: .TS ! 283: center; ! 284: c c ! 285: lI l. ! 286: File Contents ! 287: entry Pointers to posting file ! 288: for each hash code ! 289: posting Lists of tag pointers for ! 290: each hash code ! 291: tag Tags for each item ! 292: key Keys for each item ! 293: (optional) ! 294: .TE ! 295: .bd 2 ! 296: .KE ! 297: The posting file comprises the real data: it contains a sequence of lists ! 298: of items posted under each hash code. To speed up searching, ! 299: the entry file is an array of pointers into the posting file, one per potential ! 300: hash code. ! 301: Furthermore, the items in the lists in the posting file are not referred to by their ! 302: complete tag, but just by an address in the tag file, which ! 303: gives the complete tags. ! 304: The key file is optional and contains a copy of the keys ! 305: used in the indexing. ! 306: .PP ! 307: The searching process starts with a query, containing several keys. ! 308: The goal is to obtain all items which were indexed under these keys. ! 309: The query keys are hashed, and the pointers in the entry file used ! 310: to access the lists in the posting file. These lists ! 311: are addresses in the tag file of documents posted under the ! 312: hash codes derived from the query. ! 313: The common items from all lists are determined; ! 314: this must include the items indexed by every key, but may also ! 315: contain some items which are false drops, since items referenced by ! 316: the correct hash codes need not actually have contained the correct keys. ! 317: Normally, if there are several keys in the query, there are not ! 318: likely to be many false drops in the final combined list even though ! 319: each hash code is somewhat ambiguous. ! 320: The actual tags are then obtained from the tag file, and to guard against ! 321: the possibility that an item has false-dropped on some hash code ! 322: in the query, the original items are normally obtained from the delivery ! 323: program (4) and the query keys checked against them ! 324: by string comparison. ! 325: .PP ! 326: Usually, therefore, the check for bad drops is made against the original file. ! 327: However, if the key derivation procedure is complex, it may be preferable ! 328: to check against the keys fed to program (2). ! 329: In this case the optional key file which contains the ! 330: keys associated with each item is generated, and the item tag is supplemented ! 331: by a string ! 332: .DS ! 333: ;start,length ! 334: .DE ! 335: which indicates the starting byte number in the key file and the length of ! 336: the string of keys for each item. ! 337: This file is not usually necessary with the present ! 338: key-selection program, since the keys ! 339: always appear in the original document. ! 340: .PP ! 341: There is also an option ! 342: (\f3-C\f2n\|\f1) ! 343: for coordination level searching. ! 344: This retrieves items which match all but ! 345: .I n ! 346: of the query keys. ! 347: The items are retrieved in the order of the number ! 348: of keys that they match. ! 349: Of course, ! 350: .I n ! 351: must be less than the number of query keys (nothing is ! 352: retrieved unless it matches at least one key). ! 353: .PP ! 354: As an example, consider one set of 4377 references, comprising ! 355: 660,000 bytes. ! 356: This included 51,000 keys, of which 5,900 were distinct ! 357: keys. ! 358: The hash table is kept full to save space (at the expense of time); ! 359: 995 of 997 possible hash codes were used. ! 360: The total set of index files (no key file) included 171,000 bytes, ! 361: about 26% of the original file size. ! 362: It took 8 minutes of processor time to ! 363: hash, sort, and write the index. ! 364: To search for a single query with the resulting index took 1.9 seconds ! 365: of processor time, ! 366: while to find the same paper ! 367: with a sequential linear search ! 368: using ! 369: .I grep ! 370: (reading all of the tags and keys) ! 371: took 12.3 seconds of processor time. ! 372: .PP ! 373: We have also used this software to index all of the English stored on our ! 374: .UX ! 375: system. ! 376: This is the index searched by the ! 377: .I lookall ! 378: command. ! 379: On a typical day there were ! 380: 29,000 files in our user file system, containing about 152,000,000 ! 381: bytes. ! 382: Of these ! 383: 5,300 files, containing 32,000,000 bytes (about 21%) ! 384: were English text. ! 385: The total number of `words' (determined mechanically) ! 386: was 5,100,000. ! 387: Of these 227,000 were selected as keys; ! 388: 19,000 were distinct, hashing to 4,900 (of 5,000 possible) different hash codes. ! 389: The ! 390: resulting inverted file indexes used 845,000 bytes, or about ! 391: 2.6% of the size of the original files. ! 392: The particularly small indexes are caused by the ! 393: fact that keys are taken from only the first 50 non-common words of ! 394: some very long input files. ! 395: .PP ! 396: Even this large \f2lookall\f1 index can be searched quickly. ! 397: For example, to find this document ! 398: by looking for the keys ! 399: ``lesk inverted indexes'' ! 400: required ! 401: 1.7 seconds of processor time ! 402: and system time. ! 403: By comparison, just to search the 800,000 byte dictionary (smaller than even ! 404: the inverted indexes, let alone the 27,000,000 bytes of text files) with ! 405: .I grep ! 406: takes 29 seconds of processor time. ! 407: The ! 408: .I lookall ! 409: program is thus useful when looking for a document which you believe ! 410: is stored on-line, but do not know where. For example, many memos ! 411: from our center are in the file system, but it is often ! 412: difficult to guess where a particular memo might be (it might have several ! 413: authors, each with many directories, and have been worked on by ! 414: a secretary with yet more directories). ! 415: Instructions for the use of the ! 416: .I lookall ! 417: command are given in the manual section, shown ! 418: in the appendix to this memorandum. ! 419: .PP ! 420: The only indexes maintained routinely are those of publication lists and ! 421: all English files. ! 422: To make other indexes, the programs for making keys, sorting them, ! 423: searching the indexes, and delivering answers must be used. ! 424: Since they are usually invoked as parts of higher-level commands, ! 425: they are not in the default command ! 426: directory, but are available to any user in the directory ! 427: .I /usr/lib/refer . ! 428: Three programs are of interest: ! 429: .I mkey , ! 430: which isolates keys from input files; ! 431: .I inv , ! 432: which makes an index from a set of keys; ! 433: and ! 434: .I hunt , ! 435: which searches the index and delivers the items. ! 436: Note that the two parts of the retrieval phase are combined into ! 437: one program, to avoid the excessive system work and delay which ! 438: would result from running these as separate processes. ! 439: .PP ! 440: These three commands have a large number of options to adapt to different ! 441: kinds of input. ! 442: The user not interested in the detailed description that now follows may ! 443: skip to section 3, which describes the ! 444: .I refer ! 445: program, a packaged-up version of these tools specifically ! 446: oriented towards formatting references. ! 447: .PP ! 448: .B ! 449: Make Keys. ! 450: .R ! 451: The program ! 452: .I mkey ! 453: is the key-making program corresponding to step (1) in phase A. ! 454: Normally, it reads its input from the file names given as arguments, ! 455: and if there are no arguments it reads from the standard input. ! 456: It assumes that blank lines in the input delimit ! 457: separate items, for each of which a different line of ! 458: keys should be generated. ! 459: The lines of keys are written on the standard output. ! 460: Keys are any alphanumeric string in the input not ! 461: among the most frequent words in English and not entirely numeric ! 462: (except that all-numeric strings are acceptable if they ! 463: are between 1900 and 1999). ! 464: In the output, keys are translated to lower case, and truncated ! 465: to six characters in length; any associated punctuation is removed. ! 466: The following flag arguments are recognized by ! 467: .I mkey: ! 468: .TS ! 469: center; ! 470: lB lw(4i). ! 471: \-c \f2name T{ ! 472: Name of file of common words; ! 473: default is ! 474: .I /usr/lib/eign. ! 475: T} ! 476: \-f \f2name T{ ! 477: Read a list of files from ! 478: .I name ! 479: and take each as an input argument. ! 480: T} ! 481: \-i \f2chars T{ ! 482: Ignore all lines which begin with `%' followed by any character ! 483: in ! 484: .I chars . ! 485: T} ! 486: \-k\f2n T{ ! 487: Use at most ! 488: .I n ! 489: keys per input item. ! 490: T} ! 491: \-l\f2n T{ ! 492: Ignore items shorter than ! 493: .I n ! 494: letters long. ! 495: T} ! 496: \-n\f2m T{ ! 497: Ignore as a key any word in the first ! 498: .I m ! 499: words of the list of common English words. ! 500: The default is 100. ! 501: T} ! 502: \-s T{ ! 503: Remove the labels ! 504: .I (file:start,length) ! 505: from the output; just give the keys. ! 506: Used when searching rather than indexing. ! 507: T} ! 508: \-w T{ ! 509: Each whole file is a separate item; ! 510: blank lines in files are irrelevant. ! 511: T} ! 512: .TE ! 513: .PP ! 514: The normal arguments for indexing references are ! 515: the defaults, which are ! 516: .I "\-c /usr/lib/eign" , ! 517: .I \-n100 , ! 518: and ! 519: .I \-l3 . ! 520: For searching, the ! 521: .I \-s ! 522: option is also needed. ! 523: When the big ! 524: .I lookall ! 525: index of all English files is run, ! 526: the options are ! 527: .I \-w , ! 528: .I \-k50 , ! 529: and ! 530: .I "\-f (filelist)" . ! 531: When running on textual input, ! 532: the ! 533: .I mkey ! 534: program processes about 1000 English words per processor second. ! 535: Unless the ! 536: .I \-k ! 537: option is used (and the input files are long enough for ! 538: it to take effect) ! 539: the output of ! 540: .I mkey ! 541: is comparable in size to its input. ! 542: .PP ! 543: .B ! 544: Hash and invert. ! 545: .R ! 546: The ! 547: .I inv ! 548: program computes the hash codes and writes ! 549: the inverted files. ! 550: It reads the output of ! 551: .I mkey ! 552: and writes the set of files described earlier ! 553: in this section. ! 554: It expects one argument, which is used as the base name for ! 555: the three (or four) files to be written. ! 556: Assuming an argument of ! 557: .I Index ! 558: (the default) ! 559: the entry file is named ! 560: .I Index.ia , ! 561: the posting file ! 562: .I Index.ib , ! 563: the tag file ! 564: .I Index.ic , ! 565: and the key file (if present) ! 566: .I Index.id . ! 567: The ! 568: .I inv ! 569: program recognizes the following options: ! 570: .TS ! 571: center; ! 572: lB lw(4i). ! 573: \-a T{ ! 574: Append the new keys to a previous set of inverted files, ! 575: making new files if there is no old set using the same base name. ! 576: T} ! 577: \-d T{ ! 578: Write the optional key file. ! 579: This is needed when you can not check for false drops by looking ! 580: for the keys in the original inputs, i.e. when the key derivation ! 581: procedure is complicated and ! 582: the output keys are not words from the input files. ! 583: T} ! 584: \-h\f2n T{ ! 585: The hash table size is ! 586: .I n ! 587: (default 997); ! 588: .I n ! 589: should be prime. ! 590: Making \f2n\f1 bigger saves search time and spends disk space. ! 591: T} ! 592: \-i[u] \f2name T{ ! 593: Take input from file ! 594: .I name , ! 595: instead of the standard input; ! 596: if ! 597: .B u ! 598: is present ! 599: .I name ! 600: is unlinked when the sort is started. ! 601: Using this option permits the sort scratch space ! 602: to overlap the disk space used for input keys. ! 603: T} ! 604: \-n T{ ! 605: Make a completely new set of inverted files, ignoring ! 606: previous files. ! 607: T} ! 608: \-p T{ ! 609: Pipe into the sort program, rather than writing a temporary ! 610: input file. ! 611: This saves disk space and spends processor time. ! 612: T} ! 613: \-v T{ ! 614: Verbose mode; print a summary of the number of keys which ! 615: finished indexing. ! 616: T} ! 617: .TE ! 618: .PP ! 619: About half the time used in ! 620: .I inv ! 621: is in the contained sort. ! 622: Assuming the sort is roughly linear, however, ! 623: a guess at the total timing for ! 624: .I inv ! 625: is 250 keys per second. ! 626: The space used is usually of more importance: ! 627: the entry file uses four bytes per possible hash (note ! 628: the ! 629: .B \-h ! 630: option), ! 631: and the tag file around 15-20 bytes per item indexed. ! 632: Roughly, the posting file contains one item for each key instance ! 633: and one item for each possible hash code; the items are two bytes ! 634: long if the tag file is less than 65336 bytes long, and the ! 635: items are four bytes wide if the tag file is greater than ! 636: 65536 bytes long. ! 637: Note that to minimize storage, the hash tables should be ! 638: over-full; ! 639: for most of the files indexed in this way, there is no ! 640: other real choice, since the ! 641: .I entry ! 642: file must fit in memory. ! 643: .PP ! 644: .B ! 645: Searching and Retrieving. ! 646: .R ! 647: The ! 648: .I hunt ! 649: program retrieves items from an index. ! 650: It combines, as mentioned above, the two parts of phase (B): ! 651: search and delivery. ! 652: The reason why it is efficient to combine delivery and search ! 653: is partly to avoid starting unnecessary processes, and partly ! 654: because the delivery operation must be a part of the search ! 655: operation in any case. ! 656: Because of the hashing, the search part takes place in two stages: ! 657: first items are retrieved which have the right hash codes associated with them, ! 658: and then the actual items are inspected to determine false drops, i.e. ! 659: to determine if anything with the right hash codes doesn't really have the right ! 660: keys. ! 661: Since the original item is retrieved to check on false drops, ! 662: it is efficient to present it immediately, rather than only ! 663: giving the tag as output and later retrieving the ! 664: item again. ! 665: If there were a separate key file, this argument would not apply, ! 666: but separate key files are not common. ! 667: .PP ! 668: Input to ! 669: .I hunt ! 670: is taken from the standard input, ! 671: one query per line. ! 672: Each query should be in ! 673: .I "mkey \-s" ! 674: output format; ! 675: all lower case, no punctuation. ! 676: The ! 677: .I hunt ! 678: program takes one argument which specifies the base name of the index ! 679: files to be searched. ! 680: Only one set of index files can be searched at a time, ! 681: although many text files may be indexed as a group, of course. ! 682: If one of the text files has been changed since the index, that file ! 683: is searched with ! 684: .I fgrep; ! 685: this may occasionally slow down the searching, and care should be taken to ! 686: avoid having many out of date files. ! 687: The following option arguments are recognized by ! 688: .I hunt: ! 689: .TS ! 690: center; ! 691: lB lw(4i). ! 692: \-a T{ ! 693: Give all output; ignore checking for false drops. ! 694: T} ! 695: \-C\f2n T{ ! 696: Coordination level ! 697: .I n; ! 698: retrieve items with not more than ! 699: .I n ! 700: terms of the input missing; ! 701: default ! 702: .I C0 , ! 703: implying that each search term must be in the output items. ! 704: T} ! 705: \-F[yn\f2d\f3\|] T{ ! 706: ``\-Fy'' gives the text of all the items found; ! 707: ``\-Fn'' suppresses them. ! 708: ``\-F\f2d\|\f1'' where \f2d\f1\| is an integer ! 709: gives the text of the first \f2d\f1 items. ! 710: The default is ! 711: .I \-Fy. ! 712: T} ! 713: \-g T{ ! 714: Do not use ! 715: .I fgrep ! 716: to search files changed since the index was made; ! 717: print an error comment instead. ! 718: T} ! 719: \-i \f2string T{ ! 720: Take ! 721: .I string ! 722: as input, instead of reading the standard input. ! 723: T} ! 724: \-l \f2n T{ ! 725: The maximum length of internal lists of candidate ! 726: items is ! 727: .I n; ! 728: default 1000. ! 729: T} ! 730: \-o \f2string T{ ! 731: Put text output (``\-Fy'') in ! 732: .I string; ! 733: of use ! 734: .I only ! 735: when ! 736: invoked from another program. ! 737: T} ! 738: \-p T{ ! 739: Print hash code frequencies; mostly ! 740: for use in optimizing hash table sizes. ! 741: T} ! 742: \-T[yn\f2d\|\f3] T{ ! 743: ``\-Ty'' gives the tags of the items found; ! 744: ``\-Tn'' suppresses them. ! 745: ``\-T\f2d\f1\|'' where \f2d\f1\| is an integer ! 746: gives the first \f2d\f1 tags. ! 747: The default is ! 748: .I \-Tn . ! 749: T} ! 750: \-t \f2string T{ ! 751: Put tag output (``\-Ty'') in ! 752: .I string; ! 753: of use ! 754: .I only ! 755: when invoked from another program. ! 756: T} ! 757: .TE ! 758: .PP ! 759: The timing of ! 760: .I hunt ! 761: is complex. ! 762: Normally the hash table is overfull, so that there will ! 763: be many false drops on any single term; ! 764: but a multi-term query will have few false drops on ! 765: all terms. ! 766: Thus if a query is underspecified (one search term) ! 767: many potential items will be examined and discarded as false ! 768: drops, wasting time. ! 769: If the query is overspecified (a dozen search terms) ! 770: many keys will be examined only to verify that ! 771: the single item under consideration has that key posted. ! 772: The variation of search time with number of keys is ! 773: shown in the table below. ! 774: Queries of varying length were constructed to retrieve ! 775: a particular document from the file of references. ! 776: In the sequence to the left, search terms were chosen so as ! 777: to select the desired paper as quickly as possible. ! 778: In the sequence on the right, terms were chosen inefficiently, ! 779: so that the query did not uniquely select the desired document ! 780: until four keys had been used. ! 781: The same document was the target in each case, ! 782: and the final set of eight keys are also identical; the differences ! 783: at five, six and seven keys are produced by measurement error, not ! 784: by the slightly different key lists. ! 785: .TS ! 786: center; ! 787: c s s s5 | c s s s ! 788: cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8 ! 789: cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8 ! 790: n n n n | n n n n . ! 791: Efficient Keys Inefficient Keys ! 792: No. keys Total drops Retrieved Search time No. keys Total drops Retrieved Search time ! 793: (incl. false) Documents (seconds) (incl. false) Documents (seconds) ! 794: 1 15 3 1.27 1 68 55 5.96 ! 795: 2 1 1 0.11 2 29 29 2.72 ! 796: 3 1 1 0.14 3 8 8 0.95 ! 797: 4 1 1 0.17 4 1 1 0.18 ! 798: 5 1 1 0.19 5 1 1 0.21 ! 799: 6 1 1 0.23 6 1 1 0.22 ! 800: 7 1 1 0.27 7 1 1 0.26 ! 801: 8 1 1 0.29 8 1 1 0.29 ! 802: .TE ! 803: As would be expected, the optimal search is achieved ! 804: when the query just specifies the answer; however, ! 805: overspecification is quite cheap. ! 806: Roughly, the time required by ! 807: .I hunt ! 808: can be approximated as ! 809: 30 milliseconds per search key plus 75 milliseconds ! 810: per dropped document (whether it is a false drop or ! 811: a real answer). ! 812: In general, overspecification can be recommended; ! 813: it protects the user against additions to the data base ! 814: which turn previously uniquely-answered queries ! 815: into ambiguous queries. ! 816: .PP ! 817: The careful reader will have noted an enormous discrepancy between these times ! 818: and the earlier quoted time of around 1.9 seconds for a search. The times ! 819: here are purely for the search and retrieval: they are measured by ! 820: running many searches through a single invocation of the ! 821: .I hunt ! 822: program alone. ! 823: The normal retrieval operation involves using the shell to ! 824: set up a pipeline through ! 825: .I mkey ! 826: to ! 827: .I hunt ! 828: and starting both processes; this adds a fixed overhead of about 1.7 seconds ! 829: of processor time ! 830: to any single search. ! 831: Furthermore, remember that all these times are processor times: ! 832: on a typical morning on our \s-2PDP\s0 11/70 system, with about one dozen ! 833: people logged on, ! 834: to obtain 1 second of processor time for the search program ! 835: took between 2 and 12 seconds of real time, with a median of ! 836: 3.9 seconds and a mean of 4.8 seconds. ! 837: Thus, although the work involved in a single search may be only ! 838: 200 milliseconds, after you add the 1.7 seconds of startup processor ! 839: time ! 840: and then assume a 4:1 elapsed/processor time ! 841: ratio, it will be 8 seconds before any response is printed. ! 842: .NH ! 843: Selecting and Formatting References for T\s-2ROFF\s0 ! 844: .PP ! 845: The major application of the retrieval software ! 846: is ! 847: .I refer, ! 848: which is a ! 849: .I troff ! 850: preprocessor ! 851: like ! 852: .I eqn . ! 853: .[ ! 854: kernighan cherry acm 1975 ! 855: .] ! 856: It scans its input looking for items of the form ! 857: .DS ! 858: \*.[ ! 859: imprecise citation ! 860: \*.\^] ! 861: .DE ! 862: where an imprecise citation is merely a string ! 863: of words found in the relevant bibliographic citation. ! 864: This is translated into a properly formatted reference. ! 865: If the imprecise citation does not correctly identify ! 866: a single paper ! 867: (either ! 868: selecting no papers or too many) a message is given. ! 869: The data base of citations searched may be tailored to each ! 870: system, and individual users may specify their own ! 871: citation ! 872: files. ! 873: On our system, the default data base is accumulated from ! 874: the publication lists of the members of our organization, plus ! 875: about half a dozen personal bibliographies that were collected. ! 876: The present total is about 4300 citations, but this increases steadily. ! 877: Even now, ! 878: the data base covers a large fraction of local citations. ! 879: .PP ! 880: For example, the reference for the ! 881: .I eqn ! 882: paper above was specified as ! 883: .DS ! 884: \&\*.\*.\*. ! 885: \&preprocessor like ! 886: \&.I eqn. ! 887: \&.[ ! 888: \&kernighan cherry acm 1975 ! 889: \&.] ! 890: \&It scans its input looking for items ! 891: \&\*.\*.\*. ! 892: .DE ! 893: This paper was itself printed using ! 894: .I refer. ! 895: The above input text was processed by ! 896: .I refer ! 897: as well as ! 898: .I tbl ! 899: and ! 900: .I troff ! 901: by the command ! 902: .DS ! 903: .ft I ! 904: refer memo-file | tbl | troff \-ms ! 905: .ft R ! 906: .DE ! 907: and the reference was automatically translated into a correct ! 908: citation to the ACM paper on mathematical typesetting. ! 909: .PP ! 910: The procedure to use to place a reference in a paper ! 911: using ! 912: .I refer ! 913: is as follows. ! 914: First, use the ! 915: .I lookbib ! 916: command to check that the paper is in the data base ! 917: and to find out what keys are necessary to retrieve it. ! 918: This is done by typing ! 919: .I lookbib ! 920: and then typing some potential queries until ! 921: a suitable query is found. ! 922: For example, had one started to find ! 923: the ! 924: .I eqn ! 925: paper shown above by presenting the query ! 926: .DS ! 927: $ lookbib ! 928: kernighan cherry ! 929: (EOT) ! 930: .DE ! 931: .I lookbib ! 932: would have found several items; experimentation would quickly ! 933: have shown that the query given above is adequate. ! 934: Overspecifying the query is of course harmless. ! 935: A particularly careful reader may have noticed that ``acm'' does not ! 936: appear in the printed citation; ! 937: we have supplemented some of the data base items with common ! 938: extra keywords, such as common abbreviations for journals ! 939: or other sources, to aid in searching. ! 940: .PP ! 941: If the reference is in the data base, the query ! 942: that retrieved it can be inserted in the text, ! 943: between ! 944: .B \*.[ ! 945: and ! 946: .B \*.\^] ! 947: brackets. ! 948: If it is not in the data base, it can be typed ! 949: into a private file of references, using the format ! 950: discussed in the next section, and then ! 951: the ! 952: .B \-p ! 953: option ! 954: used to search this private file. ! 955: Such a command might read ! 956: (if the private references are called ! 957: .I myfile ) ! 958: .DS ! 959: .ft 2 ! 960: refer \-p myfile document | tbl | eqn | troff \-ms \*. \*. \*. ! 961: .ft 1 ! 962: .DE ! 963: where ! 964: .I tbl ! 965: and/or ! 966: .I eqn ! 967: could be omitted if not needed. ! 968: The use ! 969: of the ! 970: .I \-ms ! 971: macros ! 972: .[ ! 973: lesk typing documents unix gcos ! 974: .] ! 975: or some other macro package, however, ! 976: is essential. ! 977: .I Refer ! 978: only generates the data for the references; exact formatting ! 979: is done by some macro package, and if none is supplied the ! 980: references will not be printed. ! 981: .PP ! 982: By default, ! 983: the references are numbered sequentially, ! 984: and ! 985: the ! 986: .I \-ms ! 987: macros format references as footnotes at the bottom of the page. ! 988: This memorandum is an example of that style. ! 989: Other possibilities are discussed in section 5 below. ! 990: .NH ! 991: Reference Files. ! 992: .PP ! 993: A reference file is a set of bibliographic references usable with ! 994: .I refer. ! 995: It can be indexed using the software described in section 2 ! 996: for fast searching. ! 997: What ! 998: .I refer ! 999: does is to read the input document stream, ! 1000: looking for imprecise citation references. ! 1001: It then searches through reference files to find ! 1002: the full citations, and inserts them into the ! 1003: document. ! 1004: The format of the full citation is arranged to make it ! 1005: convenient for a macro package, such as the ! 1006: .I \-ms ! 1007: macros, to format the reference ! 1008: for printing. ! 1009: Since ! 1010: the format of the final reference is determined ! 1011: by the desired style of output, ! 1012: which is determined by the macros used, ! 1013: .I refer ! 1014: avoids forcing any kind of reference appearance. ! 1015: All it does is define a set of string registers which ! 1016: contain the basic information about the reference; ! 1017: and provide a macro call which is expanded by the macro ! 1018: package to format the reference. ! 1019: It is the responsibility of the final macro package ! 1020: to see that the reference is actually printed; if no ! 1021: macros are used, and the output of ! 1022: .I refer ! 1023: fed untranslated to ! 1024: .I troff, ! 1025: nothing at all will be printed. ! 1026: .PP ! 1027: The strings defined by ! 1028: .I refer ! 1029: are taken directly from the files of references, which ! 1030: are in the following format. ! 1031: The references should be separated ! 1032: by blank lines. ! 1033: Each reference is a sequence of lines beginning with ! 1034: .B % ! 1035: and followed ! 1036: by a key-letter. ! 1037: The remainder of that line, and successive lines until the next line beginning ! 1038: with ! 1039: .B % , ! 1040: contain the information specified by the key-letter. ! 1041: In general, ! 1042: .I refer ! 1043: does not interpret the information, but merely presents ! 1044: it to the macro package for final formatting. ! 1045: A user with a separate macro package, for example, ! 1046: can add new key-letters or use the existing ones for other purposes ! 1047: without bothering ! 1048: .I refer. ! 1049: .PP ! 1050: The meaning of the key-letters given below, in particular, ! 1051: is that assigned by the ! 1052: .I \-ms ! 1053: macros. ! 1054: Not all information, obviously, is used with each citation. ! 1055: For example, if a document is both an internal memorandum and a journal article, ! 1056: the macros ignore the memorandum version and cite only the journal article. ! 1057: Some kinds of information are not used at all in printing the reference; ! 1058: if a user does not like finding references by specifying title ! 1059: or author keywords, and prefers to add specific keywords to the ! 1060: citation, a field is available which is searched but not ! 1061: printed (\f3K\f1). ! 1062: .PP ! 1063: The key letters currently recognized by ! 1064: .I refer ! 1065: and ! 1066: .I \-ms, ! 1067: with the kind of information implied, are: ! 1068: .KS ! 1069: .TS ! 1070: center; ! 1071: c c6 c c ! 1072: c l c l. ! 1073: Key Information specified Key Information specified ! 1074: A Author's name N Issue number ! 1075: B Title of book containing item O Other information ! 1076: C City of publication P Page(s) of article ! 1077: D Date R Technical report reference ! 1078: E Editor of book containing item T Title ! 1079: G Government (NTIS) ordering number V Volume number ! 1080: I Issuer (publisher) ! 1081: J Journal name ! 1082: K Keys (for searching) X or ! 1083: L Label Y or ! 1084: M Memorandum label Z Information not used by \f2refer\f1 ! 1085: .TE ! 1086: .KE ! 1087: For example, a sample reference could be ! 1088: typed as: ! 1089: .DS ! 1090: %T Bounds on the Complexity of the Maximal ! 1091: Common Subsequence Problem ! 1092: %Z ctr127 ! 1093: %A A. V. Aho ! 1094: %A D. S. Hirschberg ! 1095: %A J. D. Ullman ! 1096: %J J. ACM ! 1097: %V 23 ! 1098: %N 1 ! 1099: %P 1-12 ! 1100: .\"%M TM 75-1271-7 ! 1101: %M abcd-78 ! 1102: %D Jan. 1976 ! 1103: .DE ! 1104: Order is irrelevant, except that authors are shown in the order ! 1105: given. The output of ! 1106: .I refer ! 1107: is a stream of string definitions, one ! 1108: for each of the fields of each reference, as ! 1109: shown below. ! 1110: .DS ! 1111: \*.]- ! 1112: \*.ds [A authors' names \*.\*.\*. ! 1113: \*.ds [T title \*.\*.\*. ! 1114: \*.ds [J journal \*.\*.\*. ! 1115: \*.\*.\*. ! 1116: \*.]\|[ type-number ! 1117: .DE ! 1118: The special macro ! 1119: .B \&\*.]\- ! 1120: precedes the string definitions ! 1121: and the special macro ! 1122: .B \*.]\|[ ! 1123: follows. ! 1124: These are changed from the input ! 1125: .B \*.[ ! 1126: and ! 1127: .B \*.\^] ! 1128: so that running the same file through ! 1129: .I refer ! 1130: again is harmless. ! 1131: The ! 1132: .B \*.]\- ! 1133: macro can be used by the macro package to ! 1134: initialize. ! 1135: The ! 1136: .B \*.]\|[ ! 1137: macro, which should be used ! 1138: to print the reference, is given an ! 1139: argument ! 1140: .I type-number ! 1141: to indicate the kind of reference, as follows: ! 1142: .KS ! 1143: .TS ! 1144: center; ! 1145: c c ! 1146: n l. ! 1147: Value Kind of reference ! 1148: 1 Journal article ! 1149: 2 Book ! 1150: 3 Article within book ! 1151: 4 Technical report ! 1152: 5 Bell Labs technical memorandum ! 1153: 0 Other ! 1154: .TE ! 1155: .KE ! 1156: The reference is flagged in the text ! 1157: with the sequence ! 1158: .DS ! 1159: \e*\|([\*.number\e*\|(\*.\^] ! 1160: .DE ! 1161: where ! 1162: .I number ! 1163: is the footnote number. ! 1164: The strings ! 1165: .B [\*. ! 1166: and ! 1167: .B \*.\^] ! 1168: should be used by the macro package ! 1169: to format the reference flag in the text. ! 1170: These strings can be replaced for a particular ! 1171: footnote, as described in section 5. ! 1172: The footnote number (or other signal) is available ! 1173: to the reference macro ! 1174: .B \*.]\|[ ! 1175: as the ! 1176: string register ! 1177: .B [F . ! 1178: .PP ! 1179: In some cases users wish to suspend the searching, and merely ! 1180: use the reference macro formatting. ! 1181: That is, the user doesn't want to provide a search key ! 1182: between ! 1183: .B \*.[ ! 1184: and ! 1185: .B \*.\^] ! 1186: brackets, but merely ! 1187: the reference lines for the appropriate document. ! 1188: Alternatively, the user ! 1189: can wish ! 1190: to add a few fields to those in the reference ! 1191: as in the standard file, or ! 1192: override some fields. ! 1193: Altering or replacing fields, or supplying whole references, is easily done ! 1194: by inserting lines beginning ! 1195: with ! 1196: .B % ; ! 1197: any such line is taken as ! 1198: direct input to the reference ! 1199: processor rather than keys to be searched. ! 1200: Thus ! 1201: .DS ! 1202: \*.[ ! 1203: key1 key2 key3 \*.\*.\*. ! 1204: %Q New format item ! 1205: %R Override report name ! 1206: \*.\^] ! 1207: .DE ! 1208: makes the indicated changes to the result of searching for ! 1209: the keys. ! 1210: All of the search keys must be given before the first ! 1211: \f3%\f1 line. ! 1212: .PP ! 1213: If no search keys are provided, an entire citation can ! 1214: be provided in-line in the text. ! 1215: For example, if the ! 1216: .I eqn ! 1217: paper citation were to be inserted in ! 1218: this way, rather than by searching for it in the data base, ! 1219: the input would read ! 1220: .DS ! 1221: \&\*.\*.\*. ! 1222: \&preprocessor like ! 1223: \&.I eqn. ! 1224: \&.[ ! 1225: \&%A B. W. Kernighan ! 1226: \&%A L. L. Cherry ! 1227: \&%T A System for Typesetting Mathematics ! 1228: \&%J Comm. ACM ! 1229: \&%V 18 ! 1230: \&%N 3 ! 1231: \&%P 151-157 ! 1232: \&%D March 1975 ! 1233: \&.] ! 1234: \&It scans its input looking for items ! 1235: \&\*.\*.\*. ! 1236: .DE ! 1237: This would produce a citation of the same appearance as that ! 1238: resulting from the file search. ! 1239: .PP ! 1240: As shown, fields are normally turned into ! 1241: .I troff ! 1242: strings. ! 1243: Sometimes users would rather have them defined as macros, ! 1244: so that other ! 1245: .I troff ! 1246: commands can be placed into the data. ! 1247: When this is necessary, simply double the control character ! 1248: .B % ! 1249: in the data. ! 1250: Thus the input ! 1251: .DS ! 1252: \&.[ ! 1253: %V 23 ! 1254: %%M ! 1255: Bell Laboratories, ! 1256: Murray Hill, N.J. 07974 ! 1257: \&.] ! 1258: .DE ! 1259: is processed by ! 1260: .I refer ! 1261: into ! 1262: .DS ! 1263: \&.ds [V 23 ! 1264: \&.de [M ! 1265: Bell Laboratories, ! 1266: Murray Hill, N.J. 07974 ! 1267: \&.. ! 1268: .DE ! 1269: The information after ! 1270: .B %%M ! 1271: is defined as a macro to be invoked by ! 1272: .B .[M ! 1273: while the information after ! 1274: .B %V ! 1275: is turned into a string to be invoked by ! 1276: .B \e\(**([V . ! 1277: At present ! 1278: .I \-ms ! 1279: expects all information as strings. ! 1280: .NH ! 1281: Collecting References and other Refer Options ! 1282: .PP ! 1283: Normally, the combination of ! 1284: .I refer ! 1285: and ! 1286: .I \-ms ! 1287: formats output as ! 1288: .I troff ! 1289: footnotes which are consecutively numbered and placed ! 1290: at the bottom of the page. However, ! 1291: options exist to ! 1292: place the references at the end; to arrange references alphabetically ! 1293: by senior author; and to indicate references by strings in the text of the form ! 1294: [Name1975a] ! 1295: rather than by number. ! 1296: Whenever references are not placed at the bottom of a page ! 1297: identical references are coalesced. ! 1298: .PP ! 1299: For example, the ! 1300: .B \-e ! 1301: option to ! 1302: .I refer ! 1303: specifies that references are to be collected; in this case ! 1304: they are output whenever the sequence ! 1305: .DS ! 1306: \*.[ ! 1307: $LIST$ ! 1308: \*.\^] ! 1309: .DE ! 1310: is encountered. ! 1311: Thus, to place references at the end of a paper, the user would run ! 1312: .I refer ! 1313: with the ! 1314: .I \-e ! 1315: option and place the above $LIST$ commands after the last ! 1316: line of the text. ! 1317: .I Refer ! 1318: will then move all the references to that point. ! 1319: To aid in formatting the collected references, ! 1320: .I refer ! 1321: writes the references preceded by the line ! 1322: .DS ! 1323: .B .]< ! 1324: .DE ! 1325: and ! 1326: followed by the line ! 1327: .DS ! 1328: .B .]> ! 1329: .DE ! 1330: to invoke special macros before and after the references. ! 1331: .PP ! 1332: Another possible option to ! 1333: .I refer ! 1334: is the ! 1335: .B \-s ! 1336: option to specify ! 1337: sorting of references. The default, ! 1338: of course, is to list references in the order presented. ! 1339: The ! 1340: .B \-s ! 1341: option implies the ! 1342: .B \-e ! 1343: option, and thus requires ! 1344: a ! 1345: .DS ! 1346: \*.[ ! 1347: $LIST$ ! 1348: \*.\^] ! 1349: .DE ! 1350: entry to call out the reference list. ! 1351: The ! 1352: .B \-s ! 1353: option may be followed by a string of letters, numbers, and `+' signs indicating how ! 1354: the references are to be sorted. ! 1355: The sort is done using the fields whose key-letters are ! 1356: in the string as sorting keys; the numbers indicate how many ! 1357: of the fields are to be considered, with `+' ! 1358: taken as a large number. ! 1359: Thus the default is ! 1360: .B \-sAD ! 1361: meaning ``Sort on senior author, then date.'' To ! 1362: sort on all authors and then title, specify ! 1363: .B \-sA+T . ! 1364: And to sort on two authors and then the journal, ! 1365: write ! 1366: .B \-sA2J . ! 1367: .PP ! 1368: Other options to ! 1369: .I refer ! 1370: change the signal or label inserted in the text for each reference. ! 1371: Normally these are just sequential numbers, ! 1372: and their exact placement (within brackets, as superscripts, etc.) is determined ! 1373: by the macro package. ! 1374: The ! 1375: .B \-l ! 1376: option replaces reference numbers by ! 1377: strings composed of the senior author's last name, the date, ! 1378: and a disambiguating letter. ! 1379: If a number follows the ! 1380: .B l ! 1381: as in ! 1382: .B \-l3 ! 1383: only that many letters of the last name are used ! 1384: in the label string. ! 1385: To abbreviate the date as well the form ! 1386: \f3-l\f2m,n\f1 ! 1387: shortens the last name to the ! 1388: first ! 1389: .I m ! 1390: letters and the date to the ! 1391: last ! 1392: .I n ! 1393: digits. ! 1394: For example, the option ! 1395: .B \-l3,2 ! 1396: would refer to the ! 1397: .I eqn ! 1398: paper (reference 3) by the signal ! 1399: .I Ker75a , ! 1400: since it is the first cited reference by Kernighan in 1975. ! 1401: .PP ! 1402: A user wishing to specify particular labels for ! 1403: a private bibliography may use the ! 1404: .B \-k ! 1405: option. ! 1406: Specifying ! 1407: \f3\-k\f2x\f1 ! 1408: causes the field \f2x\f1 to be used as a label. ! 1409: The default is \f3L\f1. ! 1410: If this field ends in \f3\-\f1, that character ! 1411: is replaced by a sequence letter; otherwise the field ! 1412: is used exactly as given. ! 1413: .PP ! 1414: If none of the ! 1415: .I refer -produced ! 1416: signals are desired, ! 1417: the ! 1418: .B \-b ! 1419: option entirely suppresses automatic text signals. ! 1420: .PP ! 1421: If the user wishes to override the ! 1422: .I \-ms ! 1423: treatment of the reference signal (which is normally to ! 1424: enclose the number in brackets in ! 1425: .I nroff ! 1426: and make it a superscript in ! 1427: .I troff\\| ) ! 1428: this can be done easily. ! 1429: If the lines ! 1430: .B \&.[ ! 1431: or ! 1432: .B \&.] ! 1433: contain anything following these characters, ! 1434: the remainders of these lines are used to surround ! 1435: the reference signal, instead of the default. ! 1436: Thus, for example, to say ``See reference (2).'' ! 1437: and avoid ! 1438: ``See reference.\s-3\u2\d\s+3'' the ! 1439: input might appear ! 1440: .DS ! 1441: \&See reference ! 1442: \&\*.[ ( ! 1443: imprecise citation ... ! 1444: \&\*.\^])\*. ! 1445: .DE ! 1446: Note that blanks are significant in this construction. ! 1447: If a permanent change is desired in the style of reference ! 1448: signals, however, it is probably easier to redefine the strings ! 1449: .B \&[. ! 1450: and ! 1451: .B \&.] ! 1452: (which are used to bracket each signal) ! 1453: than to change each citation. ! 1454: .PP ! 1455: Although normally ! 1456: .I refer ! 1457: limits itself to retrieving the data for the reference, ! 1458: and leaves to a macro package the job of arranging that ! 1459: data as required by the local format, there are two ! 1460: special options for rearrangements that can not be ! 1461: done by macro packages. ! 1462: The ! 1463: .B \-c ! 1464: option puts fields into all upper case ! 1465: (C\s-2APS\s+2-S\s-2MALL\s+2 C\s-2APS\s+2 ! 1466: in ! 1467: .I troff ! 1468: output). ! 1469: The key-letters indicated what information is to be translated ! 1470: to upper case follow the ! 1471: .B c , ! 1472: so that ! 1473: .B \-cAJ ! 1474: means that authors' names and journals are to be in caps. ! 1475: The ! 1476: .B \-a ! 1477: option writes the names of authors last name first, that is ! 1478: .I "A. D. Hall, Jr." ! 1479: is written as ! 1480: .I "Hall, A. D. Jr" . ! 1481: The citation form of ! 1482: the ! 1483: .I "Journal of the ACM" , ! 1484: for example, would require ! 1485: both ! 1486: .B \-cA ! 1487: and ! 1488: .B \-a ! 1489: options. ! 1490: This produces authors' names in the style ! 1491: .I ! 1492: K\s-2ERNIGHAN\s0, B. W. \s-2AND\s0 C\s-2HERRY\s0, L. L.\& ! 1493: .R ! 1494: for the previous example. ! 1495: The ! 1496: .B \-a ! 1497: option may be followed by a number to indicate how many ! 1498: author names should be reversed; ! 1499: .B \-a1 ! 1500: (without any ! 1501: .B \-c ! 1502: option) ! 1503: would produce ! 1504: .I ! 1505: Kernighan, B. W. and L. L. Cherry, ! 1506: .R ! 1507: for example. ! 1508: .PP ! 1509: Finally, there is also the previously-mentioned ! 1510: .B \-p ! 1511: option to let the user specify ! 1512: a private file of references to be searched before the public files. ! 1513: Note that ! 1514: .I refer ! 1515: does not insist on a previously made index for these files. ! 1516: If a file is named which contains reference ! 1517: data but is not indexed, it will be searched ! 1518: (more slowly) ! 1519: by ! 1520: .I refer ! 1521: using ! 1522: .I fgrep. ! 1523: In this way ! 1524: it is easy for users to keep small files of ! 1525: new references, which can later be added to the ! 1526: public data bases. ! 1527: .SG MH-1274-MEL-\s8UNIX\s0
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.