|
|
1.1 root 1: .TH DBZ 3Z "3 Feb 1991"
2: .BY "C News"
3: .SH NAME
4: dbminit, fetch, store, dbmclose \- somewhat dbm-compatible database routines
5: .br
6: dbzfresh, dbzagain, dbzfetch, dbzstore \- database routines
7: .br
8: dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug \- database routines
9: .SH SYNOPSIS
10: .nf
11: .B #include <dbz.h>
12: .PP
13: .B dbminit(base)
14: .B char *base;
15: .PP
16: .B datum
17: .B fetch(key)
18: .B datum key;
19: .PP
20: .B store(key, value)
21: .B datum key;
22: .B datum value;
23: .PP
24: .B dbmclose()
25: .PP
26: .B dbzfresh(base, size, fieldsep, cmap, tagmask)
27: .B char *base;
28: .B long size;
29: .B int fieldsep;
30: .B int cmap;
31: .B long tagmask;
32: .PP
33: .B dbzagain(base, oldbase)
34: .B char *base;
35: .B char *oldbase;
36: .PP
37: .B datum
38: .B dbzfetch(key)
39: .B datum key;
40: .PP
41: .B dbzstore(key, value)
42: .B datum key;
43: .B datum value;
44: .PP
45: .B dbzsync()
46: .PP
47: .B long
48: .B dbzsize(nentries)
49: .B long nentries;
50: .PP
51: .B dbzincore(newvalue)
52: .PP
53: .B dbzcancel()
54: .PP
55: .B dbzdebug(newvalue)
56: .SH DESCRIPTION
57: These functions provide an indexing system for rapid random access to a
58: text file (the
59: .I base
60: .IR file ).
61: Subject to certain constraints, they are call-compatible with
62: .IR dbm (3),
63: although they also provide some extensions.
64: (Note that they are
65: .I not
66: file-compatible with
67: .I dbm
68: or any variant thereof.)
69: .PP
70: In principle,
71: .I dbz
72: stores key-value pairs, where both key and value are arbitrary sequences
73: of bytes, specified to the functions by
74: values of type
75: .IR datum ,
76: typedefed in the header file to be a structure with members
77: .I dptr
78: (a value of type
79: .I char *
80: pointing to the bytes)
81: and
82: .I dsize
83: (a value of type
84: .I int
85: indicating how long the byte sequence is).
86: .PP
87: In practice,
88: .I dbz
89: is more restricted than
90: .IR dbm .
91: A
92: .I dbz
93: database
94: must be an index into a base file,
95: with the database
96: .IR value s
97: being
98: .IR fseek (3)
99: offsets into the base file.
100: Each such
101: .I value
102: must ``point to'' a place in the base file where the corresponding
103: .I key
104: sequence is found.
105: A key can be no longer than
106: .SM DBZMAXKEY
107: (a constant defined in the header file) bytes.
108: No key can be an initial subsequence of another,
109: which in most applications requires that keys be
110: either bracketed or terminated in some way (see the
111: discussion of the
112: .I fieldsep
113: parameter of
114: .IR dbzfresh ,
115: below,
116: for a fine point on terminators).
117: .PP
118: .I Dbminit
119: opens a database,
120: an index into the base file
121: .IR base ,
122: consisting of files
123: .IB base .dir
124: and
125: .IB base .pag
126: which must already exist.
127: (If the database is new, they should be zero-length files.)
128: Subsequent accesses go to that database until
129: .I dbmclose
130: is called to close the database.
131: The base file need not exist at the time of the
132: .IR dbminit ,
133: but it must exist before accesses are attempted.
134: .PP
135: .I Fetch
136: searches the database for the specified
137: .IR key ,
138: returning the corresponding
139: .IR value
140: if any.
141: .I Store
142: stores the
143: .IR key - value
144: pair in the database.
145: .I Store
146: will fail unless the database files are writeable.
147: See below for a complication arising from case mapping.
148: .PP
149: .I Dbzfresh
150: is a variant of
151: .I dbminit
152: for creating a new database with more control over details.
153: Unlike for
154: .IR dbminit ,
155: the database files need not exist:
156: they will be created if necessary,
157: and truncated in any case.
158: .PP
159: .IR Dbzfresh 's
160: .I size
161: parameter specifies the size of the first hash table within the database,
162: in key-value pairs.
163: Performance will be best if
164: .I size
165: is a prime number and
166: the number of key-value pairs stored in the database does not exceed
167: about 2/3 of
168: .IR size .
169: (The
170: .I dbzsize
171: function, given the expected number of key-value pairs,
172: will suggest a database size that meets these criteria.)
173: Assuming that an
174: .I fseek
175: offset is 4 bytes,
176: the
177: .B .pag
178: file will be
179: .RI 4* size
180: bytes
181: (the
182: .B .dir
183: file is tiny and roughly constant in size)
184: until
185: the number of key-value pairs exceeds about 80% of
186: .IR size .
187: (Nothing awful will happen if the database grows beyond 100% of
188: .IR size ,
189: but accesses will slow down somewhat and the
190: .B .pag
191: file will grow somewhat.)
192: .PP
193: .IR Dbzfresh 's
194: .I fieldsep
195: parameter specifies the field separator in the base file.
196: If this is not
197: NUL (0), and the last character of a
198: .I key
199: argument is NUL, that NUL compares equal to either a NUL or a
200: .I fieldsep
201: in the base file.
202: This permits use of NUL to terminate key strings without requiring that
203: NULs appear in the base file.
204: The
205: .I fieldsep
206: of a database created with
207: .I dbminit
208: is the horizontal-tab character.
209: .PP
210: For use in news systems, various forms of case mapping (e.g. uppercase to
211: lowercase) in keys are available.
212: The
213: .I cmap
214: parameter to
215: .I dbzfresh
216: is a single character specifying which of several mapping algorithms to use.
217: Available algorithms are:
218: .RS
219: .TP
220: .B 0
221: case-sensitive: no case mapping
222: .TP
223: .B B
224: same as
225: .B 0
226: .TP
227: .B NUL
228: same as
229: .B 0
230: .TP
231: .B =
232: case-insensitive: uppercase and lowercase equivalent
233: .TP
234: .B b
235: same as
236: .B =
237: .TP
238: .B C
239: RFC822 message-ID rules, case-sensitive before `@' (with certain exceptions)
240: and case-insensitive after
241: .TP
242: .B ?
243: whatever the local default is, normally
244: .B C
245: .RE
246: .PP
247: Mapping algorithm
248: .B 0
249: (no mapping) is faster than the others and is overwhelmingly the correct
250: choice for most applications.
251: Unless compatibility constraints interfere, it is more efficient to pre-map
252: the keys, storing mapped keys in the base file, than to have
253: .I dbz
254: do the mapping on every search.
255: .PP
256: For historical reasons,
257: .I fetch
258: and
259: .I store
260: expect their
261: .I key
262: arguments to be pre-mapped, but expect unmapped keys in the base file.
263: .I Dbzfetch
264: and
265: .I dbzstore
266: do the same jobs but handle all case mapping internally,
267: so the customer need not worry about it.
268: .PP
269: .I Dbz
270: stores only the database
271: .IR value s
272: in its files, relying on reference to the base file to confirm a hit on a key.
273: References to the base file can be minimized, greatly speeding up searches,
274: if a little bit of information about the keys can be stored in the
275: .I dbz
276: files.
277: This is ``free'' if there are some unused bits in an
278: .I fseek
279: offset,
280: so that the offset can be
281: .I tagged
282: with some information about the key.
283: The
284: .I tagmask
285: parameter of
286: .I dbzfresh
287: allows specifying the location of unused bits.
288: .I Tagmask
289: should be a mask with
290: one group of
291: contiguous
292: .B 1
293: bits.
294: The bits in the mask should
295: be unused (0) in
296: .I most
297: offsets.
298: The bit immediately above the mask (the
299: .I flag
300: bit) should be unused (0) in
301: .I all
302: offsets;
303: .I (dbz)store
304: will reject attempts to store a key-value pair in which the
305: .I value
306: has the flag bit on.
307: Apart from this restriction, tagging is invisible to the user.
308: As a special case, a
309: .I tagmask
310: of 1 means ``no tagging'', for use with enormous base files or
311: on systems with unusual offset representations.
312: .PP
313: A
314: .I size
315: of 0
316: given to
317: .I dbzfresh
318: is synonymous with the local default;
319: the normal default is suitable for tables of 90-100,000
320: key-value pairs.
321: A
322: .I cmap
323: of 0 (NUL) is synonymous with the character
324: .BR 0 ,
325: signifying no case mapping
326: (note that the character
327: .B ?
328: specifies the local default mapping,
329: normally
330: .BR C ).
331: A
332: .I tagmask
333: of 0 is synonymous with the local default tag mask,
334: normally 0x7f000000 (specifying the top bit in a 32-bit offset
335: as the flag bit, and the next 7 bits as the mask,
336: which is suitable for base files up to circa 24MB).
337: Calling
338: .I dbminit(name)
339: with the database files empty is equivalent to calling
340: .IR dbzfresh(name,0,'\et','?',0) .
341: .PP
342: When databases are regenerated periodically, as in news,
343: it is simplest to pick the parameters for a new database based on the old one.
344: This also permits some memory of past sizes of the old database, so that
345: a new database size can be chosen to cover expected fluctuations.
346: .I Dbzagain
347: is a variant of
348: .I dbminit
349: for creating a new database as a new generation of an old database.
350: The database files for
351: .I oldbase
352: must exist.
353: .I Dbzagain
354: is equivalent to calling
355: .I dbzfresh
356: with the same field separator, case mapping, and tag mask as the old database,
357: and a
358: .I size
359: equal to the result of applying
360: .I dbzsize
361: to the largest number of entries in the
362: .I oldbase
363: database and its previous 10 generations.
364: .PP
365: When many accesses are being done by the same program,
366: .I dbz
367: is massively faster if its first hash table is in memory.
368: If an internal flag is 1,
369: an attempt is made to read the table in when
370: the database is opened, and
371: .I dbmclose
372: writes it out to disk again (if it was read successfully and
373: has been modified).
374: .I Dbzincore
375: sets the flag to
376: .I newvalue
377: (which should be 0 or 1)
378: and returns the previous value;
379: this does not affect the status of a database that has already been opened.
380: The default is 0.
381: The attempt to read the table in may fail due to memory shortage;
382: in this case
383: .I dbz
384: quietly falls back on its default behavior.
385: .IR Store s
386: to an in-memory database are not (in general) written out to the file
387: until
388: .IR dbmclose
389: or
390: .IR dbzsync ,
391: so if robustness in the presence of crashes
392: or concurrent accesses
393: is crucial, in-memory databases
394: should probably be avoided.
395: .PP
396: .I Dbzsync
397: causes all buffers etc. to be flushed out to the files.
398: It is typically used as a precaution against crashes or concurrent accesses
399: when a
400: .IR dbz -using
401: process will be running for a long time.
402: It is a somewhat expensive operation,
403: especially
404: for an in-memory database.
405: .PP
406: .I Dbzcancel
407: cancels any pending writes from buffers.
408: This is typically useful only for in-core databases, since writes are
409: otherwise done immediately.
410: Its main purpose is to let a child process, in the wake of a
411: .IR fork ,
412: do a
413: .I dbmclose
414: without writing its parent's data to disk.
415: .PP
416: If
417: .I dbz
418: has been compiled with debugging facilities available (which makes it
419: bigger and a bit slower),
420: .I dbzdebug
421: alters the value (and returns the previous value) of an internal flag
422: which (when 1; default is 0) causes
423: verbose and cryptic debugging output on standard output.
424: .PP
425: Concurrent reading of databases is fairly safe,
426: but there is no (inter)locking,
427: so concurrent updating is not.
428: .PP
429: The database files include a record of the byte order of the processor
430: creating the database, and accesses by processors with different byte
431: order will work, although they will be slightly slower.
432: Byte order is preserved by
433: .IR dbzagain .
434: However,
435: agreement on the size and internal structure of an
436: .I fseek
437: offset is necessary, as is consensus on
438: the character set.
439: .PP
440: An open database occupies three
441: .I stdio
442: streams and their corresponding file descriptors;
443: a fourth is needed for an in-memory database.
444: Memory consumption is negligible (except for
445: .I stdio
446: buffers) except for in-memory databases.
447: .SH SEE ALSO
448: dbz(1), dbm(3)
449: .SH DIAGNOSTICS
450: Functions returning
451: .I int
452: values return 0 for success, \-1 for failure.
453: Functions returning
454: .I datum
455: values return a value with
456: .I dptr
457: set to NULL for failure.
458: .I Dbminit
459: attempts to have
460: .I errno
461: set plausibly on return, but otherwise this is not guaranteed.
462: An
463: .I errno
464: of
465: .B EDOM
466: from
467: .I dbminit
468: indicates that the database did not appear to be in
469: .I dbz
470: format.
471: .SH HISTORY
472: The original
473: .I dbz
474: was written by
475: Jon Zeeff ([email protected]).
476: Later contributions by David Butler and Mark Moraes.
477: Extensive reworking,
478: including this documentation,
479: by Henry Spencer ([email protected]) as
480: part of the C News project.
481: Hashing function by Peter Honeyman.
482: .SH BUGS
483: The
484: .I dptr
485: members of returned
486: .I datum
487: values point to static storage which is overwritten by later calls.
488: .PP
489: Unlike
490: .IR dbm ,
491: .I dbz
492: will misbehave if an existing key-value pair is `overwritten' by
493: a new
494: .I (dbz)store
495: with the same key.
496: The user is responsible for avoiding this by using
497: .I (dbz)fetch
498: first to check for duplicates;
499: an internal optimization remembers the result of the
500: first search so there is minimal overhead in this.
501: .PP
502: Waiting until after
503: .I dbminit
504: to bring the base file into existence
505: will fail if
506: .IR chdir (2)
507: has been used meanwhile.
508: .PP
509: The RFC822 case mapper implements only a first approximation to the
510: hideously-complex RFC822 case rules.
511: .PP
512: The prime finder in
513: .I dbzsize
514: is not particularly quick.
515: .PP
516: Should implement the
517: .I dbm
518: functions
519: .IR delete ,
520: .IR firstkey ,
521: and
522: .IR nextkey .
523: .PP
524: On C implementations which trap integer overflow,
525: .I dbz
526: will refuse to
527: .I (dbz)store
528: an
529: .I fseek
530: offset equal to the greatest
531: representable
532: positive number,
533: as this would cause overflow in the biased representation used.
534: .PP
535: .I Dbzagain
536: perhaps ought to notice when many offsets
537: in the old database were
538: too big for
539: tagging, and shrink the tag mask to match.
540: .PP
541: Marking
542: .IR dbz 's
543: file descriptors
544: .RI close-on- exec
545: would be a better approach to the problem
546: .I dbzcancel
547: tries to address, but that's harder to do portably.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.