|
|
1.1 ! root 1: .TH DBZ 3Z "3 Feb 1991" ! 2: .BY "C News" ! 3: .SH NAME ! 4: dbminit, fetch, store, dbmclose \- somewhat dbm-compatible database routines ! 5: .br ! 6: dbzfresh, dbzagain, dbzfetch, dbzstore \- database routines ! 7: .br ! 8: dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug \- database routines ! 9: .SH SYNOPSIS ! 10: .nf ! 11: .B #include <dbz.h> ! 12: .PP ! 13: .B dbminit(base) ! 14: .B char *base; ! 15: .PP ! 16: .B datum ! 17: .B fetch(key) ! 18: .B datum key; ! 19: .PP ! 20: .B store(key, value) ! 21: .B datum key; ! 22: .B datum value; ! 23: .PP ! 24: .B dbmclose() ! 25: .PP ! 26: .B dbzfresh(base, size, fieldsep, cmap, tagmask) ! 27: .B char *base; ! 28: .B long size; ! 29: .B int fieldsep; ! 30: .B int cmap; ! 31: .B long tagmask; ! 32: .PP ! 33: .B dbzagain(base, oldbase) ! 34: .B char *base; ! 35: .B char *oldbase; ! 36: .PP ! 37: .B datum ! 38: .B dbzfetch(key) ! 39: .B datum key; ! 40: .PP ! 41: .B dbzstore(key, value) ! 42: .B datum key; ! 43: .B datum value; ! 44: .PP ! 45: .B dbzsync() ! 46: .PP ! 47: .B long ! 48: .B dbzsize(nentries) ! 49: .B long nentries; ! 50: .PP ! 51: .B dbzincore(newvalue) ! 52: .PP ! 53: .B dbzcancel() ! 54: .PP ! 55: .B dbzdebug(newvalue) ! 56: .SH DESCRIPTION ! 57: These functions provide an indexing system for rapid random access to a ! 58: text file (the ! 59: .I base ! 60: .IR file ). ! 61: Subject to certain constraints, they are call-compatible with ! 62: .IR dbm (3), ! 63: although they also provide some extensions. ! 64: (Note that they are ! 65: .I not ! 66: file-compatible with ! 67: .I dbm ! 68: or any variant thereof.) ! 69: .PP ! 70: In principle, ! 71: .I dbz ! 72: stores key-value pairs, where both key and value are arbitrary sequences ! 73: of bytes, specified to the functions by ! 74: values of type ! 75: .IR datum , ! 76: typedefed in the header file to be a structure with members ! 77: .I dptr ! 78: (a value of type ! 79: .I char * ! 80: pointing to the bytes) ! 81: and ! 82: .I dsize ! 83: (a value of type ! 84: .I int ! 85: indicating how long the byte sequence is). ! 86: .PP ! 87: In practice, ! 88: .I dbz ! 89: is more restricted than ! 90: .IR dbm . ! 91: A ! 92: .I dbz ! 93: database ! 94: must be an index into a base file, ! 95: with the database ! 96: .IR value s ! 97: being ! 98: .IR fseek (3) ! 99: offsets into the base file. ! 100: Each such ! 101: .I value ! 102: must ``point to'' a place in the base file where the corresponding ! 103: .I key ! 104: sequence is found. ! 105: A key can be no longer than ! 106: .SM DBZMAXKEY ! 107: (a constant defined in the header file) bytes. ! 108: No key can be an initial subsequence of another, ! 109: which in most applications requires that keys be ! 110: either bracketed or terminated in some way (see the ! 111: discussion of the ! 112: .I fieldsep ! 113: parameter of ! 114: .IR dbzfresh , ! 115: below, ! 116: for a fine point on terminators). ! 117: .PP ! 118: .I Dbminit ! 119: opens a database, ! 120: an index into the base file ! 121: .IR base , ! 122: consisting of files ! 123: .IB base .dir ! 124: and ! 125: .IB base .pag ! 126: which must already exist. ! 127: (If the database is new, they should be zero-length files.) ! 128: Subsequent accesses go to that database until ! 129: .I dbmclose ! 130: is called to close the database. ! 131: The base file need not exist at the time of the ! 132: .IR dbminit , ! 133: but it must exist before accesses are attempted. ! 134: .PP ! 135: .I Fetch ! 136: searches the database for the specified ! 137: .IR key , ! 138: returning the corresponding ! 139: .IR value ! 140: if any. ! 141: .I Store ! 142: stores the ! 143: .IR key - value ! 144: pair in the database. ! 145: .I Store ! 146: will fail unless the database files are writeable. ! 147: See below for a complication arising from case mapping. ! 148: .PP ! 149: .I Dbzfresh ! 150: is a variant of ! 151: .I dbminit ! 152: for creating a new database with more control over details. ! 153: Unlike for ! 154: .IR dbminit , ! 155: the database files need not exist: ! 156: they will be created if necessary, ! 157: and truncated in any case. ! 158: .PP ! 159: .IR Dbzfresh 's ! 160: .I size ! 161: parameter specifies the size of the first hash table within the database, ! 162: in key-value pairs. ! 163: Performance will be best if ! 164: .I size ! 165: is a prime number and ! 166: the number of key-value pairs stored in the database does not exceed ! 167: about 2/3 of ! 168: .IR size . ! 169: (The ! 170: .I dbzsize ! 171: function, given the expected number of key-value pairs, ! 172: will suggest a database size that meets these criteria.) ! 173: Assuming that an ! 174: .I fseek ! 175: offset is 4 bytes, ! 176: the ! 177: .B .pag ! 178: file will be ! 179: .RI 4* size ! 180: bytes ! 181: (the ! 182: .B .dir ! 183: file is tiny and roughly constant in size) ! 184: until ! 185: the number of key-value pairs exceeds about 80% of ! 186: .IR size . ! 187: (Nothing awful will happen if the database grows beyond 100% of ! 188: .IR size , ! 189: but accesses will slow down somewhat and the ! 190: .B .pag ! 191: file will grow somewhat.) ! 192: .PP ! 193: .IR Dbzfresh 's ! 194: .I fieldsep ! 195: parameter specifies the field separator in the base file. ! 196: If this is not ! 197: NUL (0), and the last character of a ! 198: .I key ! 199: argument is NUL, that NUL compares equal to either a NUL or a ! 200: .I fieldsep ! 201: in the base file. ! 202: This permits use of NUL to terminate key strings without requiring that ! 203: NULs appear in the base file. ! 204: The ! 205: .I fieldsep ! 206: of a database created with ! 207: .I dbminit ! 208: is the horizontal-tab character. ! 209: .PP ! 210: For use in news systems, various forms of case mapping (e.g. uppercase to ! 211: lowercase) in keys are available. ! 212: The ! 213: .I cmap ! 214: parameter to ! 215: .I dbzfresh ! 216: is a single character specifying which of several mapping algorithms to use. ! 217: Available algorithms are: ! 218: .RS ! 219: .TP ! 220: .B 0 ! 221: case-sensitive: no case mapping ! 222: .TP ! 223: .B B ! 224: same as ! 225: .B 0 ! 226: .TP ! 227: .B NUL ! 228: same as ! 229: .B 0 ! 230: .TP ! 231: .B = ! 232: case-insensitive: uppercase and lowercase equivalent ! 233: .TP ! 234: .B b ! 235: same as ! 236: .B = ! 237: .TP ! 238: .B C ! 239: RFC822 message-ID rules, case-sensitive before `@' (with certain exceptions) ! 240: and case-insensitive after ! 241: .TP ! 242: .B ? ! 243: whatever the local default is, normally ! 244: .B C ! 245: .RE ! 246: .PP ! 247: Mapping algorithm ! 248: .B 0 ! 249: (no mapping) is faster than the others and is overwhelmingly the correct ! 250: choice for most applications. ! 251: Unless compatibility constraints interfere, it is more efficient to pre-map ! 252: the keys, storing mapped keys in the base file, than to have ! 253: .I dbz ! 254: do the mapping on every search. ! 255: .PP ! 256: For historical reasons, ! 257: .I fetch ! 258: and ! 259: .I store ! 260: expect their ! 261: .I key ! 262: arguments to be pre-mapped, but expect unmapped keys in the base file. ! 263: .I Dbzfetch ! 264: and ! 265: .I dbzstore ! 266: do the same jobs but handle all case mapping internally, ! 267: so the customer need not worry about it. ! 268: .PP ! 269: .I Dbz ! 270: stores only the database ! 271: .IR value s ! 272: in its files, relying on reference to the base file to confirm a hit on a key. ! 273: References to the base file can be minimized, greatly speeding up searches, ! 274: if a little bit of information about the keys can be stored in the ! 275: .I dbz ! 276: files. ! 277: This is ``free'' if there are some unused bits in an ! 278: .I fseek ! 279: offset, ! 280: so that the offset can be ! 281: .I tagged ! 282: with some information about the key. ! 283: The ! 284: .I tagmask ! 285: parameter of ! 286: .I dbzfresh ! 287: allows specifying the location of unused bits. ! 288: .I Tagmask ! 289: should be a mask with ! 290: one group of ! 291: contiguous ! 292: .B 1 ! 293: bits. ! 294: The bits in the mask should ! 295: be unused (0) in ! 296: .I most ! 297: offsets. ! 298: The bit immediately above the mask (the ! 299: .I flag ! 300: bit) should be unused (0) in ! 301: .I all ! 302: offsets; ! 303: .I (dbz)store ! 304: will reject attempts to store a key-value pair in which the ! 305: .I value ! 306: has the flag bit on. ! 307: Apart from this restriction, tagging is invisible to the user. ! 308: As a special case, a ! 309: .I tagmask ! 310: of 1 means ``no tagging'', for use with enormous base files or ! 311: on systems with unusual offset representations. ! 312: .PP ! 313: A ! 314: .I size ! 315: of 0 ! 316: given to ! 317: .I dbzfresh ! 318: is synonymous with the local default; ! 319: the normal default is suitable for tables of 90-100,000 ! 320: key-value pairs. ! 321: A ! 322: .I cmap ! 323: of 0 (NUL) is synonymous with the character ! 324: .BR 0 , ! 325: signifying no case mapping ! 326: (note that the character ! 327: .B ? ! 328: specifies the local default mapping, ! 329: normally ! 330: .BR C ). ! 331: A ! 332: .I tagmask ! 333: of 0 is synonymous with the local default tag mask, ! 334: normally 0x7f000000 (specifying the top bit in a 32-bit offset ! 335: as the flag bit, and the next 7 bits as the mask, ! 336: which is suitable for base files up to circa 24MB). ! 337: Calling ! 338: .I dbminit(name) ! 339: with the database files empty is equivalent to calling ! 340: .IR dbzfresh(name,0,'\et','?',0) . ! 341: .PP ! 342: When databases are regenerated periodically, as in news, ! 343: it is simplest to pick the parameters for a new database based on the old one. ! 344: This also permits some memory of past sizes of the old database, so that ! 345: a new database size can be chosen to cover expected fluctuations. ! 346: .I Dbzagain ! 347: is a variant of ! 348: .I dbminit ! 349: for creating a new database as a new generation of an old database. ! 350: The database files for ! 351: .I oldbase ! 352: must exist. ! 353: .I Dbzagain ! 354: is equivalent to calling ! 355: .I dbzfresh ! 356: with the same field separator, case mapping, and tag mask as the old database, ! 357: and a ! 358: .I size ! 359: equal to the result of applying ! 360: .I dbzsize ! 361: to the largest number of entries in the ! 362: .I oldbase ! 363: database and its previous 10 generations. ! 364: .PP ! 365: When many accesses are being done by the same program, ! 366: .I dbz ! 367: is massively faster if its first hash table is in memory. ! 368: If an internal flag is 1, ! 369: an attempt is made to read the table in when ! 370: the database is opened, and ! 371: .I dbmclose ! 372: writes it out to disk again (if it was read successfully and ! 373: has been modified). ! 374: .I Dbzincore ! 375: sets the flag to ! 376: .I newvalue ! 377: (which should be 0 or 1) ! 378: and returns the previous value; ! 379: this does not affect the status of a database that has already been opened. ! 380: The default is 0. ! 381: The attempt to read the table in may fail due to memory shortage; ! 382: in this case ! 383: .I dbz ! 384: quietly falls back on its default behavior. ! 385: .IR Store s ! 386: to an in-memory database are not (in general) written out to the file ! 387: until ! 388: .IR dbmclose ! 389: or ! 390: .IR dbzsync , ! 391: so if robustness in the presence of crashes ! 392: or concurrent accesses ! 393: is crucial, in-memory databases ! 394: should probably be avoided. ! 395: .PP ! 396: .I Dbzsync ! 397: causes all buffers etc. to be flushed out to the files. ! 398: It is typically used as a precaution against crashes or concurrent accesses ! 399: when a ! 400: .IR dbz -using ! 401: process will be running for a long time. ! 402: It is a somewhat expensive operation, ! 403: especially ! 404: for an in-memory database. ! 405: .PP ! 406: .I Dbzcancel ! 407: cancels any pending writes from buffers. ! 408: This is typically useful only for in-core databases, since writes are ! 409: otherwise done immediately. ! 410: Its main purpose is to let a child process, in the wake of a ! 411: .IR fork , ! 412: do a ! 413: .I dbmclose ! 414: without writing its parent's data to disk. ! 415: .PP ! 416: If ! 417: .I dbz ! 418: has been compiled with debugging facilities available (which makes it ! 419: bigger and a bit slower), ! 420: .I dbzdebug ! 421: alters the value (and returns the previous value) of an internal flag ! 422: which (when 1; default is 0) causes ! 423: verbose and cryptic debugging output on standard output. ! 424: .PP ! 425: Concurrent reading of databases is fairly safe, ! 426: but there is no (inter)locking, ! 427: so concurrent updating is not. ! 428: .PP ! 429: The database files include a record of the byte order of the processor ! 430: creating the database, and accesses by processors with different byte ! 431: order will work, although they will be slightly slower. ! 432: Byte order is preserved by ! 433: .IR dbzagain . ! 434: However, ! 435: agreement on the size and internal structure of an ! 436: .I fseek ! 437: offset is necessary, as is consensus on ! 438: the character set. ! 439: .PP ! 440: An open database occupies three ! 441: .I stdio ! 442: streams and their corresponding file descriptors; ! 443: a fourth is needed for an in-memory database. ! 444: Memory consumption is negligible (except for ! 445: .I stdio ! 446: buffers) except for in-memory databases. ! 447: .SH SEE ALSO ! 448: dbz(1), dbm(3) ! 449: .SH DIAGNOSTICS ! 450: Functions returning ! 451: .I int ! 452: values return 0 for success, \-1 for failure. ! 453: Functions returning ! 454: .I datum ! 455: values return a value with ! 456: .I dptr ! 457: set to NULL for failure. ! 458: .I Dbminit ! 459: attempts to have ! 460: .I errno ! 461: set plausibly on return, but otherwise this is not guaranteed. ! 462: An ! 463: .I errno ! 464: of ! 465: .B EDOM ! 466: from ! 467: .I dbminit ! 468: indicates that the database did not appear to be in ! 469: .I dbz ! 470: format. ! 471: .SH HISTORY ! 472: The original ! 473: .I dbz ! 474: was written by ! 475: Jon Zeeff ([email protected]). ! 476: Later contributions by David Butler and Mark Moraes. ! 477: Extensive reworking, ! 478: including this documentation, ! 479: by Henry Spencer ([email protected]) as ! 480: part of the C News project. ! 481: Hashing function by Peter Honeyman. ! 482: .SH BUGS ! 483: The ! 484: .I dptr ! 485: members of returned ! 486: .I datum ! 487: values point to static storage which is overwritten by later calls. ! 488: .PP ! 489: Unlike ! 490: .IR dbm , ! 491: .I dbz ! 492: will misbehave if an existing key-value pair is `overwritten' by ! 493: a new ! 494: .I (dbz)store ! 495: with the same key. ! 496: The user is responsible for avoiding this by using ! 497: .I (dbz)fetch ! 498: first to check for duplicates; ! 499: an internal optimization remembers the result of the ! 500: first search so there is minimal overhead in this. ! 501: .PP ! 502: Waiting until after ! 503: .I dbminit ! 504: to bring the base file into existence ! 505: will fail if ! 506: .IR chdir (2) ! 507: has been used meanwhile. ! 508: .PP ! 509: The RFC822 case mapper implements only a first approximation to the ! 510: hideously-complex RFC822 case rules. ! 511: .PP ! 512: The prime finder in ! 513: .I dbzsize ! 514: is not particularly quick. ! 515: .PP ! 516: Should implement the ! 517: .I dbm ! 518: functions ! 519: .IR delete , ! 520: .IR firstkey , ! 521: and ! 522: .IR nextkey . ! 523: .PP ! 524: On C implementations which trap integer overflow, ! 525: .I dbz ! 526: will refuse to ! 527: .I (dbz)store ! 528: an ! 529: .I fseek ! 530: offset equal to the greatest ! 531: representable ! 532: positive number, ! 533: as this would cause overflow in the biased representation used. ! 534: .PP ! 535: .I Dbzagain ! 536: perhaps ought to notice when many offsets ! 537: in the old database were ! 538: too big for ! 539: tagging, and shrink the tag mask to match. ! 540: .PP ! 541: Marking ! 542: .IR dbz 's ! 543: file descriptors ! 544: .RI close-on- exec ! 545: would be a better approach to the problem ! 546: .I dbzcancel ! 547: tries to address, but that's harder to do portably.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.