|
|
1.1 root 1:
2:
3: Description of Dynamic Update and T_UNSPEC Code
4:
5:
6:
7:
8: Added by Mike Schwartz
9: University of Washington Computer Science Department
10: 11/86
11: [email protected]
12:
13:
14:
15:
16: I have incorporated 2 new features into BIND:
17: 1. Code to allow (unauthenticated) dynamic updates: surrounded by
18: #ifdef ALLOW_UPDATES
19: 2. Code to allow data of unspecified type: surrounded by
20: #ifdef ALLOW_T_UNSPEC
21:
22: Note that you can have one or the other or both (or neither) of these
23: modifications running, by appropriately modifying the makefiles. Also,
24: the external interface isn't changed (other than being extended), i.e.,
25: a BIND server that allows dynamic updates and/or T_UNSPEC data can
26: still talk to a 'vanilla' server using the 'vanilla' operations.
27:
28: The description that follows is broken into 3 parts: a functional
29: description of the dynamic update facility, a functional description of
30: the T_UNSPEC facility, and a discussion of the implementation of
31: dynamic updates. The implementation description is mostly intended for
32: those who want to make future enhancements (especially the addition of
33: a good authentication mechanism). If you make enhancements, I would be
34: interested in hearing about them.
35:
36:
37:
38:
39:
40: 1. Dynamic Update Facility
41:
42: I added this code in conjunction with my research into naming in large
43: heterogeneous systems. For the purposes of this research, I ignored
44: security issues. In other words, no authentication/authorization
45: mechanism exists to control updates. Authentication will hopefully be
46: addressed at some future point (although probably not by me). In the
47: mean time, BIND Internet name servers (as opposed to "private" name
48: server networks operating with their own port numbers, as I use in my
49: research) should be compiled *without* -DALLOW_UPDATES, so that the
50: integrity of the Internet name database won't be compromised by this
51: code.
52:
53:
54: There are 5 different dynamic update interfaces:
55: UPDATEA - add a resource record
56: UPDATED - delete a specific resource record
57: UPDATEDA - delete all named resource records
58: UPDATEM - modify a specific resource record
59: UPDATEMA - modify all named resource records
60:
61: These all work through the normal resolver interface, i.e., these
62: interfaces are opcodes, and the data in the buffers passed to
63: res_mkquery must conform to what is expected for the particular
64: operation (see the #ifdef ALLOW_UPDATES extensions to nstest.c for
65: example usage).
66:
67: UPDATEM is logically equivalent to an UPDATED followed by an UPDATEA,
68: except that the updates occur atomically at the primary server (as
69: usual with Domain servers, secondaries may become temporarily
70: inconsistent). The difference between UPDATED and UPDATEDA is that the
71: latter allows you to delete all RRs associated with a name; similarly
72: for UPDATEM and UPDATEMA. The reason for the UPDATE{D,M}A interfaces
73: is two-fold:
74:
75: 1. Sometimes you want to delete/modify some data, but you know you'll
76: only have a single RR for that data; in such a case, it's more
77: convenient to delete/modify the RR by just giving the name;
78: otherwise, you would have to first look it up, and then
79: delete/modify it.
80:
81: 2. It is sometimes useful to be able to delete/modify multiple RRs
82: this way, since one can then perform the operation atomically.
83: Otherwise, one would have to delete/modify the RRs one-by-one.
84:
85: One additional point to note about UPDATEMA is that it will return a
86: success status if there were *zero* or more RRs associated with the given
87: name (and the RR add succeeds), whereas UPDATEM, UPDATED, and UPDATEDA
88: will return a success status if there were *one* or more RRs associated
89: with the given name. The reason for the difference is to handle the
90: (probably common) case where what you want to do is set a particular
91: name to contain a single RR, irrespective of whether or not it was
92: already set.
93:
94:
95:
96:
97: 2. T_UNSPEC Facility
98:
99: Type T_UNSPEC allows you to store data whose layout BIND doesn't
100: understand. Data of this type is not marshalled (i.e., converted
101: between host and network representation, as is done, for example, with
102: Internet addresses) by BIND, so it is up to the client to make sure
103: things work out ok w.r.t. heterogeneous data representations. The way
104: I use this type is to have the client marshal data, store it, retrieve
105: it, and demarshal it. This way I can store arbitrary data in BIND
106: without having to add new code for each specific type.
107:
108: T_UNSPEC data is dumped in an ASCII-encoded, checksummed format so
109: that, although it's not human-readable, it at least doesn't fill the
110: dump file with unprintable characters.
111:
112: Type T_UNSPEC is important for my research environment, where
113: potentially lots of people want to store data in the name service, and
114: each person's data looks different. Instead of having BIND understand
115: the format of each of their data types, the clients define marshaling
116: routines and pass buffers of marshalled data to BIND; BIND never tries
117: to demarshal the data...it just holds on to it, and gives it back to
118: the client when the client requests it, and the client must then
119: demarshal it.
120:
121: The Xerox Network System's name service (the Clearinghouse) works this
122: way. The reason 'vanilla' BIND understands the format of all the data
123: it holds is probably that BIND is tailored for a very specific
124: application, and wants to make sure the data it holds makes sense (and,
125: for some types, BIND needs to take additional action depending on the
126: data's semantics). For more general purpose name services (like the
127: Clearinghouse and my usage of BIND), this approach is less tractable.
128:
129: See the #ifdef ALLOW_T_UNSPEC extensions to nstest.c for example usage of
130: this type.
131:
132:
133:
134:
135:
136:
137: 3. Dynamic Update Implementation Description
138:
139: This section is divided into 3 subsections: General Discussion,
140: Miscellaneous Points, and Known Defects.
141:
142:
143:
144:
145: 3.1 General Discussion
146:
147: The basic scheme is this: When an update message arrives, a call is
148: made to InitDynUpdate, which first looks up the SOA record for the zone
149: the update affects. If this is the primary server for that zone, we do
150: the update and then update the zone serial number (so that secondaries
151: will refresh later). If this is a secondary server, we forward the
152: update to the primary, and if that's successful, we update our copy
153: afterwards. If it's neither, we refuse the update. (One might think
154: to try to propagate the update to an authoritative server; I figured
155: that updates will probably be most likely within an administrative
156: domain anyway; this could be changed if someone has strong feelings
157: about it).
158:
159: Note that this mechanism disallows updates when the primary is
160: down, preserving the Domain scheme's consistency requirements,
161: but making the primary a critical point for updates. This seemed
162: reasonable to me because
163: 1. Alternative schemes must deal with potentially complex
164: situations involving merging of inconsistent secondary
165: updates
166: 2. Updates are presumed to be rare relative to read accesses,
167: so this increased restrictiveness for updates over reads is
168: probably not critical
169:
170: I have placed comments through out the code, so it shouldn't be
171: too hard to see what I did. The majority of the processing is in
172: doupdate() and InitDynUpdate(). Also, I added a field to the zone
173: struct, to keep track of when zones get updated, so that only changed
174: zones get checkpointed.
175:
176:
177:
178:
179:
180: 3.2 Miscellaneous Points
181:
182: I use ns_maint to call zonedump() if the database changes, to
183: provide a checkpointing mechanism. I use the zone refresh times to
184: set up ns_maint interrupts if there are either secondaries or
185: primaries. Hence, if there is a secondary, this interrupt can cause
186: zoneref (as before), and if there is a primary, this interrupt can
187: cause doadump. I also checkpoint if needed before shutting down.
188:
189: You can force a server to checkpoint any changed zones by sending the
190: maint signal (SIGALRM) to the process. Otherwise it just checkpoints
191: during maint. interrupts, or when being shutdown (with SIGTERM).
192: Sending it the dump signal causes the database to be dumped into the
193: (single) dump file, but doesn't checkpoint (i.e., update the boot
194: files). Note that the boot files will be overwritten with checkpoint
195: files, so if you want to preserve the comments, you should keep copies
196: of the original boot files separate from the versions that are actually
197: used.
198:
199: I disallow T_SOA updates, for several reasons:
200: - T_SOA deletes at the primary wont be discovered by the secondaries
201: until they try to request them at maint time, which will cause
202: a failure
203: - the corresponding NS record would have to be deleted at the same
204: time (atomically) to avoid various problems
205: - T_SOA updates would have to be done in the right order, or else
206: the primary and secondaries will be out-of-sync for that zone.
207: My feeling is that changing the zone topology is a weighty enough thing
208: to do that it should involve changing the load file and reloading all
209: affected servers.
210:
211: There are alot of places where bind exits due to catastrophic failures
212: (mainly malloc failures). I don't try to dump the database in these
213: places because it's probably inconsistent anyway. It's probably better
214: to depend on the most recent dump.
215:
216:
217:
218:
219:
220: 3.2 Known Defects
221:
222: 1. I put the following comment in nlookup (db_lookup.c):
223:
224: Note: at this point, if np->n_data is NULL, we could be in one
225: of two situations: Either we have come across a name for which
226: all the RRs have been (dynamically) deleted, or else we have
227: come across a name which has no RRs associated with it because
228: it is just a place holder (e.g., EDU). In the former case, we
229: would like to delete the namebuf, since it is no longer of use,
230: but in the latter case we need to hold on to it, so future
231: lookups that depend on it don't fail. The only way I can see
232: of doing this is to always leave the namebufs around (although
233: then the memory usage continues to grow whenever names are
234: added, and can never shrink back down completely when all their
235: associated RRs are deleted).
236:
237: Thus, there is a problem that the memory usage will keep growing for
238: the situation described. You might just choose to ignore this
239: problem (since I don't see any good way out), since things probably
240: wont grow fast anyway (how many names are created and then deleted
241: during a single server incarnation, after all?)
242:
243: The problem is that one can't delete old namebufs because one would
244: want to do it from db_update, but db_update calls nlookup to do the
245: actual work, and can't do it there, since we need to maintain place
246: holders. One could make db_update not call nlookup, so we know it's
247: ok to delete the namebuf (since we know the call is part of a delete
248: call); but then there is code with alot of overlapping functionality
249: in the 2 routines.
250:
251: This also causes another problem: If you create a name and then do
252: UPDATEDA, all it's RRs get deleted, but the name remains; then, if you
253: do a lookup on that name later, the name is found in the hash table,
254: but no RRs are found for it. It then forwards the query to itself (for
255: some reason), and then somehow decides there is no such domain, and then
256: returns (with the correct answer, but after going through extra work).
257: But the name remains, and each time it is looked up, we go through
258: these same steps. This should be fixed, but I don't have time right
259: now (and the right answer seems to come back anyway, so it's good
260: enough for now).
261:
262: 2. There are 2 problems that crop up when you store data (other than
263: T_SOA and T_NS records) in the root:
264: a. Can't get primary to doaxfr RRs other than SOA and NS to
265: secondary.
266: b. Upon checkpoint (zonedump), this data sometimes comes out after other
267: data in the root, so that (since the SOA and NS records have null
268: names), they will get interpreted as being records under the
269: other names upon the next boot up. For example, if you have a
270: T_A record called ABC, the checkpoint may look like:
271: $ORIGIN .
272: ABC IN A 128.95.1.3
273: 99999999 IN NS UW-BORNEO.
274: IN SOA UW-BORNEO. SCHWARTZ.CS.WASHINGTON.EDU.
275: ( 50 3600 300 3600000 3600 )
276: Then when booting up the next time, the SOA and NS records get
277: interpreted as being called "ABC" rather than the null root
278: name.
279:
280: 3. The secondary server caches the T_A RR for the primary, and hence when
281: it tries to ns_forw an update, it won't find the address of the primary
282: using nslookup unless that T_A RR is *also* stored in the main hashtable
283: (by putting it in a named.db file as well as the named.ca file).
284:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.