INVERT(1) UNIX Programmer's Manual INVERT(1) NNAAMMEE invert, lookup - create and access an inverted index SSYYNNOOPPSSIISS iinnvveerrtt [option ... ] file ... llooookkuupp [option ... ] DDEESSCCRRIIPPTTIIOONN _I_n_v_e_r_t creates an inverted index to one or more files. _L_o_o_k_u_p retrieves records from files for which an inverted index exists. The inverted indices are intended for use with _b_i_b(1). _I_n_v_e_r_t creates one inverted index to all of its input files. The index must be stored in the current directory and may not be moved. Input files may be absolute path names or paths relative to the current directory. Each input file is viewed as a set of records; each record consists of non- blank lines; records are separated by blank lines. _L_o_o_k_u_p retrieves records based on its input (_s_t_d_i_n). Each line of input is a retrieval request. All records that con- tain all of the keywords in the retrieval request are sent to _s_t_d_o_u_t. If there are no matching references, ``No refer- ences found.'' is sent to _s_t_d_o_u_t. _L_o_o_k_u_p first searches in the user's private index (default INDEX) and then, if no references are found, in the system index (/usr/dict/papers/INDEX). The system index was produced using _i_n_v_e_r_t with the default options; in general, the user is advised to use the defaults. Keywords are a sequence of non-white space characters with non-alphanumeric characters removed. Keywords must be at least two characters and are truncated (default length is 6). Some common words are ignored. Some lines of input are ignored for the purpose of collecting keywords. The following options are available for _i_n_v_e_r_t: -c _f_i_l_e -c_f_i_l_e File contains common words, one per line. Common words are not used as keys. (Default /usr/new/lib/bmac/common.) -k _i -k_i Maximum number of keys kept per record. (Default 100) -l _i Printed 8/22/89 28 July 1983 1 INVERT(1) UNIX Programmer's Manual INVERT(1) -l_i Maximum length of keys. (Default 6) -p _f_i_l_e -p_f_i_l_e File is the name of the private index file (output of _i_n_v_e_r_t). (Default is INDEX.) The index must be stored in the current directory. (Be careful of the second form. The shell will not know to expand the file name. E.g. -p~/index won't work; use -p ~/index.) -s Silent. Suppress statistics. -%_s_t_r Ignore lines that begin with %x where x is in _s_t_r. (Default is CNOPVX. See _b_i_b(1) for explanation of field names.) _L_o_o_k_u_p has only the options cc, ll, and pp with the same mean- ings as _b_i_b. In particular, the pp option can be followed by a list of comma separated index files. These are searched in order from left to right until at least one reference is found. FFIILLEESS INDEX inverted index /usr/tmp/invertxxxxxx scratch file for invert /usr/new/lib/bmac/common default list of common words /usr/dict/papers/INDEX default system index SSEEEE AALLSSOO _A _U_N_I_X _B_i_b_l_i_o_g_r_a_p_h_i_c _D_a_t_a_b_a_s_e _F_a_c_i_l_i_t_y, Timothy A. Budd and Gary M. Levin, University of Arizona Technical Report 82-1, 1982. bib(1) DDIIAAGGNNOOSSTTIICCSS Messages indicating trouble accessing files are sent on _s_t_d_e_r_r. There is an explicit message on _s_t_d_o_u_t from _l_o_o_k_u_p if no references are found. _I_n_v_e_r_t produces a one line message of the form, %D documents %D distinct keys %D key occurrences. This can be suppressed with the -s option. The message locate: first key (%s) matched too many refs indicates that the first key matched more references than could be stored in memory. The simple solution is to use a less frequently occurring key as the first key in the cita- tion. BBUUGGSS No attempt is made to check the compatibility between an Printed 8/22/89 28 July 1983 2 INVERT(1) UNIX Programmer's Manual INVERT(1) index and the files indexed. The user must create a new index whenever the files that are indexed are modified. Attempting to invert a file containing unprintable charac- ters can cause chaos. Printed 8/22/89 28 July 1983 3