Annotation of GNUtools/libg++/libio/dbz/dbz.3z, revision 1.1.1.1

1.1       root        1: .TH DBZ 3Z "3 Feb 1991"
                      2: .BY "C News"
                      3: .SH NAME
                      4: dbminit, fetch, store, dbmclose \- somewhat dbm-compatible database routines
                      5: .br
                      6: dbzfresh, dbzagain, dbzfetch, dbzstore \- database routines
                      7: .br
                      8: dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug \- database routines
                      9: .SH SYNOPSIS
                     10: .nf
                     11: .B #include <dbz.h>
                     12: .PP
                     13: .B dbminit(base)
                     14: .B char *base;
                     15: .PP
                     16: .B datum
                     17: .B fetch(key)
                     18: .B datum key;
                     19: .PP
                     20: .B store(key, value)
                     21: .B datum key;
                     22: .B datum value;
                     23: .PP
                     24: .B dbmclose()
                     25: .PP
                     26: .B dbzfresh(base, size, fieldsep, cmap, tagmask)
                     27: .B char *base;
                     28: .B long size;
                     29: .B int fieldsep;
                     30: .B int cmap;
                     31: .B long tagmask;
                     32: .PP
                     33: .B dbzagain(base, oldbase)
                     34: .B char *base;
                     35: .B char *oldbase;
                     36: .PP
                     37: .B datum
                     38: .B dbzfetch(key)
                     39: .B datum key;
                     40: .PP
                     41: .B dbzstore(key, value)
                     42: .B datum key;
                     43: .B datum value;
                     44: .PP
                     45: .B dbzsync()
                     46: .PP
                     47: .B long
                     48: .B dbzsize(nentries)
                     49: .B long nentries;
                     50: .PP
                     51: .B dbzincore(newvalue)
                     52: .PP
                     53: .B dbzcancel()
                     54: .PP
                     55: .B dbzdebug(newvalue)
                     56: .SH DESCRIPTION
                     57: These functions provide an indexing system for rapid random access to a
                     58: text file (the
                     59: .I base 
                     60: .IR file ).
                     61: Subject to certain constraints, they are call-compatible with
                     62: .IR dbm (3),
                     63: although they also provide some extensions.
                     64: (Note that they are
                     65: .I not
                     66: file-compatible with
                     67: .I dbm
                     68: or any variant thereof.)
                     69: .PP
                     70: In principle,
                     71: .I dbz
                     72: stores key-value pairs, where both key and value are arbitrary sequences
                     73: of bytes, specified to the functions by
                     74: values of type
                     75: .IR datum ,
                     76: typedefed in the header file to be a structure with members
                     77: .I dptr
                     78: (a value of type
                     79: .I char *
                     80: pointing to the bytes)
                     81: and
                     82: .I dsize
                     83: (a value of type
                     84: .I int
                     85: indicating how long the byte sequence is).
                     86: .PP
                     87: In practice,
                     88: .I dbz
                     89: is more restricted than
                     90: .IR dbm .
                     91: A
                     92: .I dbz
                     93: database
                     94: must be an index into a base file,
                     95: with the database
                     96: .IR value s
                     97: being
                     98: .IR fseek (3)
                     99: offsets into the base file.
                    100: Each such
                    101: .I value
                    102: must ``point to'' a place in the base file where the corresponding
                    103: .I key
                    104: sequence is found.
                    105: A key can be no longer than
                    106: .SM DBZMAXKEY
                    107: (a constant defined in the header file) bytes.
                    108: No key can be an initial subsequence of another,
                    109: which in most applications requires that keys be
                    110: either bracketed or terminated in some way (see the
                    111: discussion of the
                    112: .I fieldsep
                    113: parameter of
                    114: .IR dbzfresh ,
                    115: below,
                    116: for a fine point on terminators).
                    117: .PP
                    118: .I Dbminit
                    119: opens a database,
                    120: an index into the base file
                    121: .IR base ,
                    122: consisting of files
                    123: .IB base .dir
                    124: and
                    125: .IB base .pag
                    126: which must already exist.
                    127: (If the database is new, they should be zero-length files.)
                    128: Subsequent accesses go to that database until
                    129: .I dbmclose
                    130: is called to close the database.
                    131: The base file need not exist at the time of the
                    132: .IR dbminit ,
                    133: but it must exist before accesses are attempted.
                    134: .PP
                    135: .I Fetch
                    136: searches the database for the specified
                    137: .IR key ,
                    138: returning the corresponding
                    139: .IR value
                    140: if any.
                    141: .I Store
                    142: stores the
                    143: .IR key - value
                    144: pair in the database.
                    145: .I Store
                    146: will fail unless the database files are writeable.
                    147: See below for a complication arising from case mapping.
                    148: .PP
                    149: .I Dbzfresh
                    150: is a variant of
                    151: .I dbminit
                    152: for creating a new database with more control over details.
                    153: Unlike for
                    154: .IR dbminit ,
                    155: the database files need not exist:
                    156: they will be created if necessary,
                    157: and truncated in any case.
                    158: .PP
                    159: .IR Dbzfresh 's
                    160: .I size
                    161: parameter specifies the size of the first hash table within the database,
                    162: in key-value pairs.
                    163: Performance will be best if
                    164: .I size
                    165: is a prime number and
                    166: the number of key-value pairs stored in the database does not exceed
                    167: about 2/3 of
                    168: .IR size .
                    169: (The
                    170: .I dbzsize
                    171: function, given the expected number of key-value pairs,
                    172: will suggest a database size that meets these criteria.)
                    173: Assuming that an
                    174: .I fseek
                    175: offset is 4 bytes,
                    176: the
                    177: .B .pag
                    178: file will be
                    179: .RI 4* size
                    180: bytes
                    181: (the
                    182: .B .dir
                    183: file is tiny and roughly constant in size)
                    184: until
                    185: the number of key-value pairs exceeds about 80% of
                    186: .IR size .
                    187: (Nothing awful will happen if the database grows beyond 100% of
                    188: .IR size ,
                    189: but accesses will slow down somewhat and the
                    190: .B .pag
                    191: file will grow somewhat.)
                    192: .PP
                    193: .IR Dbzfresh 's
                    194: .I fieldsep
                    195: parameter specifies the field separator in the base file.
                    196: If this is not
                    197: NUL (0), and the last character of a
                    198: .I key
                    199: argument is NUL, that NUL compares equal to either a NUL or a
                    200: .I fieldsep
                    201: in the base file.
                    202: This permits use of NUL to terminate key strings without requiring that
                    203: NULs appear in the base file.
                    204: The
                    205: .I fieldsep
                    206: of a database created with
                    207: .I dbminit
                    208: is the horizontal-tab character.
                    209: .PP
                    210: For use in news systems, various forms of case mapping (e.g. uppercase to
                    211: lowercase) in keys are available.
                    212: The
                    213: .I cmap
                    214: parameter to
                    215: .I dbzfresh
                    216: is a single character specifying which of several mapping algorithms to use.
                    217: Available algorithms are:
                    218: .RS
                    219: .TP
                    220: .B 0
                    221: case-sensitive:  no case mapping
                    222: .TP
                    223: .B B
                    224: same as
                    225: .B 0
                    226: .TP
                    227: .B NUL
                    228: same as
                    229: .B 0
                    230: .TP
                    231: .B =
                    232: case-insensitive:  uppercase and lowercase equivalent
                    233: .TP
                    234: .B b
                    235: same as
                    236: .B =
                    237: .TP
                    238: .B C
                    239: RFC822 message-ID rules, case-sensitive before `@' (with certain exceptions)
                    240: and case-insensitive after
                    241: .TP
                    242: .B ?
                    243: whatever the local default is, normally
                    244: .B C
                    245: .RE
                    246: .PP
                    247: Mapping algorithm
                    248: .B 0
                    249: (no mapping) is faster than the others and is overwhelmingly the correct
                    250: choice for most applications.
                    251: Unless compatibility constraints interfere, it is more efficient to pre-map
                    252: the keys, storing mapped keys in the base file, than to have
                    253: .I dbz
                    254: do the mapping on every search.
                    255: .PP
                    256: For historical reasons,
                    257: .I fetch
                    258: and
                    259: .I store
                    260: expect their
                    261: .I key
                    262: arguments to be pre-mapped, but expect unmapped keys in the base file.
                    263: .I Dbzfetch
                    264: and
                    265: .I dbzstore
                    266: do the same jobs but handle all case mapping internally,
                    267: so the customer need not worry about it.
                    268: .PP
                    269: .I Dbz
                    270: stores only the database
                    271: .IR value s
                    272: in its files, relying on reference to the base file to confirm a hit on a key.
                    273: References to the base file can be minimized, greatly speeding up searches,
                    274: if a little bit of information about the keys can be stored in the
                    275: .I dbz
                    276: files.
                    277: This is ``free'' if there are some unused bits in an
                    278: .I fseek
                    279: offset,
                    280: so that the offset can be
                    281: .I tagged
                    282: with some information about the key.
                    283: The
                    284: .I tagmask
                    285: parameter of
                    286: .I dbzfresh
                    287: allows specifying the location of unused bits.
                    288: .I Tagmask
                    289: should be a mask with
                    290: one group of
                    291: contiguous
                    292: .B 1
                    293: bits.
                    294: The bits in the mask should
                    295: be unused (0) in
                    296: .I most
                    297: offsets.
                    298: The bit immediately above the mask (the
                    299: .I flag
                    300: bit) should be unused (0) in
                    301: .I all
                    302: offsets;
                    303: .I (dbz)store
                    304: will reject attempts to store a key-value pair in which the
                    305: .I value
                    306: has the flag bit on.
                    307: Apart from this restriction, tagging is invisible to the user.
                    308: As a special case, a
                    309: .I tagmask
                    310: of 1 means ``no tagging'', for use with enormous base files or
                    311: on systems with unusual offset representations.
                    312: .PP
                    313: A
                    314: .I size
                    315: of 0
                    316: given to
                    317: .I dbzfresh
                    318: is synonymous with the local default;
                    319: the normal default is suitable for tables of 90-100,000
                    320: key-value pairs.
                    321: A
                    322: .I cmap
                    323: of 0 (NUL) is synonymous with the character
                    324: .BR 0 ,
                    325: signifying no case mapping
                    326: (note that the character
                    327: .B ?
                    328: specifies the local default mapping,
                    329: normally
                    330: .BR C ).
                    331: A
                    332: .I tagmask
                    333: of 0 is synonymous with the local default tag mask,
                    334: normally 0x7f000000 (specifying the top bit in a 32-bit offset
                    335: as the flag bit, and the next 7 bits as the mask,
                    336: which is suitable for base files up to circa 24MB).
                    337: Calling
                    338: .I dbminit(name)
                    339: with the database files empty is equivalent to calling
                    340: .IR dbzfresh(name,0,'\et','?',0) .
                    341: .PP
                    342: When databases are regenerated periodically, as in news,
                    343: it is simplest to pick the parameters for a new database based on the old one.
                    344: This also permits some memory of past sizes of the old database, so that
                    345: a new database size can be chosen to cover expected fluctuations.
                    346: .I Dbzagain
                    347: is a variant of
                    348: .I dbminit
                    349: for creating a new database as a new generation of an old database.
                    350: The database files for
                    351: .I oldbase
                    352: must exist.
                    353: .I Dbzagain
                    354: is equivalent to calling
                    355: .I dbzfresh
                    356: with the same field separator, case mapping, and tag mask as the old database,
                    357: and a
                    358: .I size
                    359: equal to the result of applying
                    360: .I dbzsize
                    361: to the largest number of entries in the
                    362: .I oldbase
                    363: database and its previous 10 generations.
                    364: .PP
                    365: When many accesses are being done by the same program,
                    366: .I dbz
                    367: is massively faster if its first hash table is in memory.
                    368: If an internal flag is 1,
                    369: an attempt is made to read the table in when
                    370: the database is opened, and
                    371: .I dbmclose
                    372: writes it out to disk again (if it was read successfully and
                    373: has been modified).
                    374: .I Dbzincore
                    375: sets the flag to
                    376: .I newvalue
                    377: (which should be 0 or 1)
                    378: and returns the previous value;
                    379: this does not affect the status of a database that has already been opened.
                    380: The default is 0.
                    381: The attempt to read the table in may fail due to memory shortage;
                    382: in this case
                    383: .I dbz
                    384: quietly falls back on its default behavior.
                    385: .IR Store s
                    386: to an in-memory database are not (in general) written out to the file
                    387: until
                    388: .IR dbmclose
                    389: or
                    390: .IR dbzsync ,
                    391: so if robustness in the presence of crashes
                    392: or concurrent accesses
                    393: is crucial, in-memory databases
                    394: should probably be avoided.
                    395: .PP
                    396: .I Dbzsync
                    397: causes all buffers etc. to be flushed out to the files.
                    398: It is typically used as a precaution against crashes or concurrent accesses
                    399: when a
                    400: .IR dbz -using
                    401: process will be running for a long time.
                    402: It is a somewhat expensive operation,
                    403: especially
                    404: for an in-memory database.
                    405: .PP
                    406: .I Dbzcancel
                    407: cancels any pending writes from buffers.
                    408: This is typically useful only for in-core databases, since writes are
                    409: otherwise done immediately.
                    410: Its main purpose is to let a child process, in the wake of a
                    411: .IR fork ,
                    412: do a
                    413: .I dbmclose
                    414: without writing its parent's data to disk.
                    415: .PP
                    416: If
                    417: .I dbz
                    418: has been compiled with debugging facilities available (which makes it
                    419: bigger and a bit slower),
                    420: .I dbzdebug
                    421: alters the value (and returns the previous value) of an internal flag
                    422: which (when 1; default is 0) causes
                    423: verbose and cryptic debugging output on standard output.
                    424: .PP
                    425: Concurrent reading of databases is fairly safe,
                    426: but there is no (inter)locking,
                    427: so concurrent updating is not.
                    428: .PP
                    429: The database files include a record of the byte order of the processor
                    430: creating the database, and accesses by processors with different byte
                    431: order will work, although they will be slightly slower.
                    432: Byte order is preserved by
                    433: .IR dbzagain .
                    434: However,
                    435: agreement on the size and internal structure of an
                    436: .I fseek
                    437: offset is necessary, as is consensus on
                    438: the character set.
                    439: .PP
                    440: An open database occupies three
                    441: .I stdio
                    442: streams and their corresponding file descriptors;
                    443: a fourth is needed for an in-memory database.
                    444: Memory consumption is negligible (except for
                    445: .I stdio
                    446: buffers) except for in-memory databases.
                    447: .SH SEE ALSO
                    448: dbz(1), dbm(3)
                    449: .SH DIAGNOSTICS
                    450: Functions returning
                    451: .I int
                    452: values return 0 for success, \-1 for failure.
                    453: Functions returning
                    454: .I datum
                    455: values return a value with
                    456: .I dptr
                    457: set to NULL for failure.
                    458: .I Dbminit
                    459: attempts to have
                    460: .I errno
                    461: set plausibly on return, but otherwise this is not guaranteed.
                    462: An
                    463: .I errno
                    464: of
                    465: .B EDOM
                    466: from
                    467: .I dbminit
                    468: indicates that the database did not appear to be in
                    469: .I dbz
                    470: format.
                    471: .SH HISTORY
                    472: The original
                    473: .I dbz
                    474: was written by
                    475: Jon Zeeff ([email protected]).
                    476: Later contributions by David Butler and Mark Moraes.
                    477: Extensive reworking,
                    478: including this documentation,
                    479: by Henry Spencer ([email protected]) as
                    480: part of the C News project.
                    481: Hashing function by Peter Honeyman.
                    482: .SH BUGS
                    483: The
                    484: .I dptr
                    485: members of returned
                    486: .I datum
                    487: values point to static storage which is overwritten by later calls.
                    488: .PP
                    489: Unlike
                    490: .IR dbm ,
                    491: .I dbz
                    492: will misbehave if an existing key-value pair is `overwritten' by
                    493: a new
                    494: .I (dbz)store
                    495: with the same key.
                    496: The user is responsible for avoiding this by using
                    497: .I (dbz)fetch
                    498: first to check for duplicates;
                    499: an internal optimization remembers the result of the
                    500: first search so there is minimal overhead in this.
                    501: .PP
                    502: Waiting until after
                    503: .I dbminit
                    504: to bring the base file into existence
                    505: will fail if
                    506: .IR chdir (2)
                    507: has been used meanwhile.
                    508: .PP
                    509: The RFC822 case mapper implements only a first approximation to the
                    510: hideously-complex RFC822 case rules.
                    511: .PP
                    512: The prime finder in
                    513: .I dbzsize
                    514: is not particularly quick.
                    515: .PP
                    516: Should implement the
                    517: .I dbm
                    518: functions
                    519: .IR delete ,
                    520: .IR firstkey ,
                    521: and
                    522: .IR nextkey .
                    523: .PP
                    524: On C implementations which trap integer overflow,
                    525: .I dbz
                    526: will refuse to
                    527: .I (dbz)store
                    528: an
                    529: .I fseek
                    530: offset equal to the greatest
                    531: representable
                    532: positive number,
                    533: as this would cause overflow in the biased representation used.
                    534: .PP
                    535: .I Dbzagain
                    536: perhaps ought to notice when many offsets
                    537: in the old database were
                    538: too big for
                    539: tagging, and shrink the tag mask to match.
                    540: .PP
                    541: Marking
                    542: .IR dbz 's
                    543: file descriptors
                    544: .RI close-on- exec
                    545: would be a better approach to the problem
                    546: .I dbzcancel
                    547: tries to address, but that's harder to do portably.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.