|
|
1.1 root 1: .so ../ADM/mac
2: .XX backup 593 "The File Motel: An Owner's Manual"
3: .nr dP 2
4: .nr dV 3p
5: .TL
6: The File Motel:
7: .br
8: An Owner's Manual
9: .AU
10: Andrew G. Hume
11: .AI
12: .MH
13: .AB
14: .PP
15: The File Motel is an incremental user-level file backup system for
16: .UX
17: systems.
18: The first version of the File Motel has been in successful operation
19: for over two years with three sites supporting about 50 systems.
20: The first version supported only Ninth Edition
21: .UX
22: systems, although with only modest inconvenience
23: files could be saved from Sun 3 clients.
24: The second version of the File Motel is a complete reworking
25: of the original system, emphasizing easy portability to most
26: .UX
27: systems.
28: The files are stored in a machine-independent form;
29: as an example, I have recovered a directory onto a Sun 3
30: from a server on a MIPS 120/5 that had been originally
31: saved from a Cray X/MP\-24.
32: The user and administrative interfaces have been streamlined,
33: based on experience in the field.
34: .PP
35: The system has been restructured to look like a kit.
36: Most of the modules, such as the database, networking and media code,
37: have been isolated via simple interface routines.
38: As with a kit, you may not find exactly what you want,
39: but it should be easy to roll your own.
40: .AE
41: .2C
42: .NH
43: An Overview
44: .PP
45: This is a manual for the File Motel |reference(file motel usenix),
46: a backup system for
47: .UX
48: systems.
49: The File Motel consists of a central server system
50: servicing many client systems.
51: The server system is almost always also a client system.
52: The File Motel saves only the files that change on any given client system,
53: using a database to record what versions
54: have been saved for any particular file.
55: Under normal usage patterns, this is on the order of 1\-5% of the user files
56: on the client.
57: This makes backup practical over slow networks to slow backup media.
58: .PP
59: The daily routine in the File Motel starts around midnight when
60: the clients send copies of any new or recently modified files to the server machine.
61: After receiving the files from all the clients,
62: a separate processing step transforms the received files into
63: backup copies, which are then written to the backup media
64: of your choice (typically, WORM disks).
65: Backup and recovery can be performed by anyone with the appropriate permissions;
66: in general, there is no administrative overhead other than
67: fussing with backup media.
68: .PP
69: This description may be a little clearer with some details.
70: The following description includes
71: some sample numbers from our Center's File Motel;
72: other sites will differ.
73: The first step is a client sending files to the server.
74: A shell script configured for the client generates a list of (say 5000)
75: candidates for backup (say, all the files changed in the last week).
76: This list is sent to the server which returns a list of the files (say 900)
77: that really need to be backed up.
78: Each of these files is then transmitted to the server together with a header
79: (which includes a checksum).
80: (On average, about 15MB is sent taking about 20 minutes.)
81: There is an acknowledgement from the server after every file;
82: this allows graceful termination when the server has problems,
83: such as running out of space.
84: The redundancy in the candidate list allows non-critical clients
85: to cope with transient faults (such as a broken network) without
86: administrative intervention by ignoring the fault
87: and getting the files the following night.
88: The client process is normally initiated either by the server machine
89: or by
90: .I cron (8).
91: Exactly the same mechanism is used for user-initiated backups;
92: the only difference is that the system backup is executed by the super-user
93: (and thus has access to all the files on the client system).
94: .PP
95: The files sent by the clients are kept in receiving areas each of
96: 32 subdirectories, each processed in turn.
97: First, the program
98: .I sweep
99: deletes any unnecessary files, assigns each backup copy a name
100: (which is stored in the file's header) and recalculates the file's checksum.
101: The remaining files are fed to
102: .I dbupdate
103: which deletes any unnecessary files and stores the version information
104: in the database.
105: Finally, the surviving files are moved to a staging area for writing to the
106: backup media.
107: .PP
108: The last step is writing the backup copies to the backup media.
109: The only medium currently supported is a WORM disk.
110: In our environment they are preferred because of their large capacity
111: and because you can get reliable jukeboxes (automatic disk changers).
112: Optical jukeboxes come in all sorts of sizes; our center has a
113: SONY WDA 3000-10 with a total capacity of 164GB (328GB after October 1989).
114: .PP
115: There are many other programs in the File Motel,
116: some intended for the user (for example, recovering files),
117: and others for the administrator (usage statistics, backing up the database).
118: .PP
119: The rest of this manual is intended for the caretaker of the File Motel.
120: Section 2 details some of the
121: peculiar aspects of the File Motel that have caused problems in the past;
122: if you can survive these,
123: then installing and running the File Motel
124: ought to be easy.
125: If there are incompatibilities, then installing the File Motel
126: will require (perhaps substantial) work.
127: The File Motel uses many small single-purpose tools;
128: if you need to figure out what is going on (or wrong, as the case may be),
129: these tools are described in Section 3.
130: Sections 4 and 5 are step by step instructions for installing
131: a client and a server respectively.
132: Finally, Section 6 elaborates on media management.
133: .NH
134: Some Things You Should Know
135: .PP
136: .de BL
137: .IP \ \ \ \ \s+3\(bu\s-3
138: ..
139: This section describes some of the assumptions underlying the
140: construction of the File Motel software.
141: Most of these assumptions have caused problems in porting to
142: systems less hospitable than 10th Edition
143: .UX
144: (or V10 for short).
145: .BL
146: Each server has one global name space for all the files saved from all the clients.
147: The file
148: .I z
149: from machine
150: .I mach
151: is stored under the name
152: .I /n/mach/z .
153: It so happens that this is how V10 networked file systems are normally mounted
154: and in fact, all file references actually go through this network name.
155: For other systems, you should define the
156: .CW -DNO_NETNAME
157: switch as described in section 5.
158: .BL
159: All client\-service communication uses a uniform networking interface.
160: That is, a system invokes a service on the remote machine and gets a
161: pair of file descriptors attached to the input and output of that service.
162: Both Berkeley-style sockets and V10-style IPC are supported.
163: For the case of a single system that is both a server and client
164: and has no networking, you will have to write an execution service that
165: constructs pipes to the desired services.
166: Note that it is possible to provide this interface even if all
167: the networking you have is a user program (such as
168: .I rx
169: or
170: .I rsh )
171: that executes a program on another machine.
172: .BL
173: It must be possible to nominate the user that the remote service runs as.
174: Most run as a regular user, say
175: .I fmdaemon ,
176: but some must run as superuser and on V10 systems, one must run as
177: .I bin .
178: .BL
179: The code is reasonably portable; with the canned configuration files
180: it runs on a Cray X-MP/24 (UNICOS),
181: VAX 11/750, Microvax II, 8550, 8600 and 11/780 (V10, Ultrix, 4.3BSD),
182: Sun 3 (SunOs 4.0),
183: MIPS M120/5, M2000 (UMIPS 3.0, 3.10, RISC/os 4.0).
184: Some of the code, notably Ken Thompson's new version of
185: .CW doprint ,
186: makes assumptions about variable argument lists.
187: So far, the code has continued to work on all the systems we have tried
188: (although we can't optimise on the MIPS)
189: but in this world of perverse hardware and compilers, you may not be so lucky.
190: .BL
191: The code assumes no particular byte-ordering but does assume that there
192: is an integer type of at least 32 bits.
193: By and large, the programs allocate all data areas dynamically;
194: whenever there is choice, programs trade space for smaller runtime,
195: so there must be at least 24 bits of data space.
196: If you have a 32 bit machine but have 16 bit ints,
197: you will have trouble (perhaps
198: .I lint
199: will help).
200: .BL
201: It is assumed that the backup medium can hold at least one backup copy
202: and practically, it should hold at least one volume.
203: This is about 20MB by default; if you have smaller backup media
204: and cannot arrange better, change the volume size \(em it is
205: a constant defined in
206: .CW fm/sweep.c .
207: .BL
208: The File Motel depends on each file having a unique name.
209: This continue to cause problems, particularly in the presence of symbolic links.
210: For example, on a system I use
211: .CW /usr/andrew
212: is a symbolic link
213: .CW /usr2/guest/andrew .
214: The right thing to do is to save files under
215: .CW /usr/andrew
216: (so that you can move them from file system to file system and keep their name).
217: Yet, the user may not be aware of this name; if they do a
218: .I pwd
219: to find out, they will get the wrong answer.
220: .NH
221: A Detailed Description
222: .PP
223: The action in the File Motel can be functionally divided into four areas:
224: client selecting and sending files to the server,
225: the server processing the client files onto backup media,
226: client recovering files back from the server,
227: and an assortment of administrative functions.
228: .PP
229: Programs and scripts used by the File Motel live in three places:
230: .CW /usr/bin/backup
231: is the user interface,
232: .CW /usr/lib/filemotel
233: holds all the programs and scripts used by clients,
234: and
235: .CW /usr/filemotel/bin
236: holds the server-specific programs.
237: These are the conventional names \(em they can be reconfigured to taste.
238: Because of this and their length,
239: these abbreviations will be used in the following text:
240: .TS
241: center;
242: lFCW lFCW.
243: $FM /usr/filemotel
244: $FB /usr/filemotel/bin
245: $FL /usr/lib/filemotel
246: .TE
247: .NH 2
248: Client Sends Files to the Server
249: .PP
250: The controlling script here is
251: .CW $FL/doclient :
252: .P1
253: #!/bin/sh
254: $FL/sel | $FL/act
255: .P2
256: The selection script
257: .I sel
258: has to generate a list of absolute filenames.
259: You can use any tools available to you; the File Motel supplies
260: the program
261: .I fcheck
262: which is rather more efficient than
263: .I find (1)
264: and follows symbolic links that are arguments.
265: This is to help clients save files as
266: .CW /usr/andrew/...
267: rather than the less than informative
268: .CW /usr2/guest/andrew/... .
269: A small
270: .I sel
271: file is shown below.
272: .KF
273: .P1 0
274: /usr/lib/filemotel/fcheck 512 7 /etc /usr/* |
275: sed -e '/\e.o$/d
276: /\e/a\e.out$/d
277: /\e/core$/d
278: /\e/foo$/d
279: /^\e/usr\e/tmp\e//d
280: /^\e/usr\e/spool\e//d'
281: cat <<EOF
282: /unix
283: EOF
284: .P2
285: .KE
286: .PP
287: The script
288: .I act
289: works in a straightforward way.
290: First, the filenames are transformed into the input format for
291: .I missing
292: by the program
293: .I iprint .
294: This prepends
295: .CW /n/\fImachine
296: to the filename (unless this is already there) and appends the
297: inode change time and size.
298: There is a convention that an input filename starting with a
299: .CW //
300: is a symbolic link to be followed (that is, use
301: .I stat (2)
302: rather than
303: .I lstat (2)
304: to get the time and size).
305: The size is carried around so that if you choose a file because it is small
306: and it grows dramatically while you are asking about it, you can reject it
307: later on (although this is not done now because no one cares yet).
308: .I Missing
309: takes these names and ships them to the corresponding server
310: .I missing_
311: on the server machine.
312: (Servers for a service
313: .CW abc
314: are called
315: .CW abc_ ).
316: .I Missing_
317: checks the name,time tuples against the database and sends back
318: the lines that are newer than the entry in the database.
319: Transmissions in both directions are checksummed; any errors
320: are reported to standard error and are also logged in the log file
321: on the server machine.
322: .PP
323: The results from
324: .I missing
325: are stored in
326: .CW $FL/files.\fIday\fP .
327: They are given to
328: .I fmpush
329: which actually pushes them to the backup system.
330: .I Fmpush
331: also takes a system name argument for logging purposes.
332: If there are any errors,
333: .I fmpush
334: reports the error and the number of files transmitted.
335: This allows the push to be restarted efficiently:
336: .P1
337: $ pwd
338: $FL
339: $ fmpush wild < files.Tue
340: EOF after 2713 files sent.
341: $ sed 1,2713d files.Tue | fmpush wild
342: .P2
343: Any diagnostics are mailed to the user
344: .I backup
345: and also kept in
346: .CW $FL/files.\fIday\fP.sho .
347: It is not necessary to keep these files around after they have been used
348: but they are relatively small and often useful;
349: for example, a client who normally saves 100 or so files suddenly sends
350: you 10,000 files \(em you can quickly go to that client and check
351: what the files were.
352: When possible, diagnostics are also logged on the backup system.
353: .PP
354: Only regular files, symbolic links and directories have their contents saved;
355: all other files (such as devices) just have their
356: .I stat (2)
357: buffers saved.
358: To preserve machine independence, the content of a directory is saved
359: as a list of null-terminated element names.
360: This removes the need for the server to be able to guess a
361: client's directory structure, although it does lose a small
362: amount of subtle information contained in the freed slots of the
363: directory.
364: .........
365: .NH 2
366: Server Processes Client's Files
367: .PP
368: Received files are processed by the script
369: .CW $FL/munge .
370: This processing is decoupled from either receiving or restoring
371: client files; for example,
372: it is safe to process files while receiving them.
373: Munging is typically started by
374: .I cron ,
375: but you can also cause
376: .I rcv
377: to invoke
378: .I munge
379: automatically,
380: and you can invoke
381: .I munge
382: manually by executing
383: .CW $FL/callmunge .
384: .PP
385: Regardless of how it is called,
386: .I munge
387: scans the 32 receiving subdirectories in each of the receiving areas in
388: .CW $FM/adm/rcvdirs
389: looking for files to process.
390: If it finds any, it calls a program
391: whose name is supplied as
392: .CW $PROCPERM
393: to copy the final copies
394: to the media of your choice.
395: It repeats this scan until it found nothing to do during the last scan.
396: .PP
397: The action within a subdirectory is simple.
398: .CW $FB/sweep
399: looks for files with mode
400: .CW 0 ,
401: .CW 0400 ,
402: or
403: .CW 0600 .
404: Mode
405: .CW 0
406: files are files that are being received
407: (\fIrcv\fP
408: marks a file as done by changing its mode to
409: .CW 0600 )
410: and are ignored unless it is hasn't been modified within the
411: last 12 hours.
412: In this latter case, it is regarded as stale (almost always a network
413: connection was dropped) and unlinked.
414: Mode
415: .CW 0400
416: files have been already processed by
417: .I sweep
418: but for some reason (most often, running out of space)
419: weren't copied to the backup area.
420: Mode
421: .CW 0600
422: files are assigned a backup copy name and after recalculating the
423: checksum are changed to mode
424: .CW 0400 .
425: .I Sweep
426: emits the names of all the files ready to be copied to the backup area
427: and this is saved in a file.
428: .I Munge
429: then makes any needed directories in the backup area that don't exist.
430: .CW $FB/fmmv
431: then moves all the files to be copied to the backup area.
432: We then update the database with information from the files we just copied.
433: .PP
434: Updating the database is a two part process.
435: Run
436: .CW $FB/updatef
437: on the files (use the program
438: .I updatew
439: for files on the WORM)
440: and then feed the output to
441: .CW $FB/dbupdate .
442: In this way, we guarantee that the database is purely a function of
443: the backed up files
444: (assuming none get lost between the backup area and the backup media).
445: The input to
446: .I dbupdate
447: is (roughly) a sequence of backup file headers and contents of
448: backed up directories.
449: .I Dbupdate
450: updates the various databases (described below) and sometimes tries to unlink
451: the backup copies.
452: (This happens when two copies of the same file are
453: in the same receiving subdirectory.
454: .I Sweep
455: happily copies both to the backup area but when
456: .I dbupdate
457: goes to update the main database for the second copy, it discovers
458: it already has this copy and so unlinks the second copy.
459: It doesn't care if the unlink fails because this is just an attempt
460: to be space efficient and in any case, the unlink can fail only if the file
461: has already been committed to the backup media.)
462: .I Dbupdate
463: also appends accounting statistics records for each file, containing the time
464: the file was saved, the size, the owner and the system name, to the file
465: .CW $FM/stat.log .
466: .PP
467: After
468: .I munge
469: is finished scanning the receive areas,
470: it processes the statistics records generated by
471: .I dbupdate
472: by
473: calling
474: .CW $FB/procstats .
475: This reads (and then truncates)
476: .CW $FM/stat.log
477: and adds new records to the files
478: .CW $FM/stat/\fIsystem\f(CW .
479: These records are in machine independent format and have been collapsed to
480: refer to all the files per user/day combination.
481: Even in this compressed format,
482: the statistics records would grow without bound.
483: Accordingly,
484: .I munge
485: calls
486: .CW "procstats -c"
487: to further collapse together all the records older than 30 days for each user.
488: (The number 30 comes from the only program that looks at these statistics,
489: .CW "backup stats" .)
490: .PP
491: Throughout its work,
492: .I munge
493: checks to see if it should exit by checking the existence of a guard file;
494: this is created by
495: .CW $FB/stopmunge .
496: .NH 2
497: The Databases on the Server
498: .PP
499: There are three databases kept on the server, all conventionally kept in
500: .CW $FM/db .
501: The first,
502: .CW filemap ,
503: is the main and only required database;
504: it contains the mappings from filename to last modify date and
505: from (filename, modify date) tuple to backup copy name.
506: The second,
507: .CW dir ,
508: is optional and maps (directory, modify date) tuples to their contents.
509: It is used to make recovery of file trees go (much) faster.
510: The third,
511: .CW fs ,
512: is optional and maps (filename, modify date) tuples to their
513: .I stat
514: buffers.
515: It is used to implement the backup file system.
516: .PP
517: The default implementation of these databases is Peter Weinberger's
518: compressed B-trees (see
519: .I cbt (1)).
520: (The compression refers to eliding common prefixes
521: of successive keys; it does very well on the pathnames used by the File Motel.)
522: The
523: .I cbt
524: database
525: .I db
526: consists of two files,
527: .I db\f(CW.T\fR
528: (the tree part) and
529: .I db\f(CW.F\fR
530: (the data part).
531: As the
532: .I cbt
533: routines do not reclaim space, the
534: .CW .T
535: file can start growing at a very fast rate when the tree is large
536: (say four levels).
537: This has proved to be a real nuisance so there is considerable support
538: for periodic squashing of the database
539: (which reclaims space by rebuilding the database) and
540: for supporting the
541: .I filemap
542: database as a collection of separate databases.
543: .PP
544: The latter is intended to be used in the following way.
545: The file
546: .CW $FM/db/filemap
547: is always the current
548: .I filemap
549: database.
550: If the file
551: .CW $FM/db/filemaplist
552: exists, it is taken as a list of database names,
553: one per line in oldest to newest order, to be used in addition to
554: .CW $FM/db/filemap .
555: These are searched only, never updated.
556: At our site, we produce one of these databases for about every 15\-16GB
557: of backup files.
558: .NH 2
559: Server Sends Files to the Client
560: .PP
561: All requests for files go through a central server
562: .CW $FB/fetch_ .
563: This program simply farms out work to other programs.
564: .I Fetchf
565: attempts to find files that are still under
566: .CW $FM/v .
567: Systems with plenty of mass storage can leave the backup copies
568: online and things will go quite fast.
569: For the files that
570: .I fetchf
571: can't find,
572: .I fetch_
573: looks in the configuration file
574: (\f(CW$FL/conf\fP)
575: to determine the backup media (say
576: .CW j
577: for jukebox).
578: It then calls
579: .CW $FB/fetchj
580: with the appropriate filenames.
581: If you just have a WORM drive, you should use
582: .CW $FB/fetchw
583: instead.
584: These two programs purport to be generic drivers for jukeboxes
585: and single drives; if you have different media (say an Exabyte tape),
586: you should be able simply to load the drivers with your media library
587: to generate the appropriate fetch program.
588: More details are given below in the section on media management.
589: .PP
590: Users can generally access any files they have read permission for,
591: regardless of what system they are on or the system from which the files were
592: stored from.
593: In addition, we trust our users (or more importantly, our network)
594: and so we do no checking of a user's right to retrieve files.
595: Such checking, such as a password, can easily be added to the startup
596: protocol between the program the user calls (\f(CW$FL/fetch\fP)
597: and the server (\f(CW$FB/fetch_\fP).
598: .NH 2
599: Administrivia
600: .PP
601: This section is a bunch of administrative odds and ends for the way we organise
602: the File Motel in our Center.
603: Your details may be different, and indeed ours change over time,
604: but the examples are probably helpful.
605: .NH 3
606: File Layout
607: .PP
608: We store the File Motel under the directory
609: .CW /usr/backup ,
610: which is a file system large enough to hold comfortably the current
611: databases and a squashed version (more on this later).
612: The receiving area is another smallish file system (about 120MB) mounted on
613: .CW /usr/backup/rcv
614: and the holding area
615: .CW /usr/backup/v
616: is another file system of the same size.
617: This is done to isolate the effects of client excesses;
618: the sending processes all know how to deal with running out of space
619: (we practice often).
620: I regard running out of space once a month as tolerable;
621: once a week is too much.
622: To aid searches for files, we keep a file
623: .CW /usr/backup/filenames ), (
624: which is a sorted list of all filenames.
625: This is maintained by the database squasher.
626: .PP
627: The main drawback to Weinberger's B-tree software is that it does
628: not reclaim space in the tree.
629: Thus, over time the tree file gets huge
630: (the rate grows as the depth of the tree).
631: The fix is to periodically squash the tree.
632: We combine this with dumping the database to WORM disk in the script
633: .CW $FB/backupdb .
634: .NH 3
635: Talking to the Clients
636: .PP
637: We have found it best to call the clients rather than have them call us.
638: The load seems more balanced and things get done sooner.
639: We use
640: .I mk
641: as it handles parallel processing; a typical mkfile is
642: .P1 0
643: CLIENTS=Cwild C3k Ctcp!tempel
644: NPROC=3
645:
646: clients:VQ: $CLIENTS
647: PROCPERM=$FB/toworm $FB/munge
648:
649: C%:VQ:
650: set +e; $FB/callclient $stem; exit 0
651: .P2
652: Understanding this completely requires familiarity with
653: .I mk
654: but the intent is clear.
655: We first get the files from the clients by the
656: .CW C%
657: rule and then process them by
658: .CW munge
659: and then put them out on WORM disk by
660: .CW toworm .
661: As we use Datakit, most clients are called using Datakit but some
662: (like
663: .CW tempel
664: in the mkfile) are called using TCP/IP.
665: This convenient piece of magic works on V9 because of Dave Presotto's
666: clever design of the IPC system; you may have to work harder.
667: The
668: .CW set
669: stuff in the
670: .CW C%
671: rule means to keep on processing even if a client gets an error.
672: The entry in
673: .CW /etc/crontab
674: is more or less
675: (this is one physical line folded at the \*(cr because of the column width)
676: .P1 0
677: eval "cd /usr/backup/adm; mk clients 2>&1" |\*(cr
678: mail backup
679: .P2
680: We use the
681: .CW backup
682: mailbox to redirect mail to someone appropriate.
683: .NH 2
684: Disasters
685: .PP
686: Currently, the only disaster we have had that was not the result of a kernel bug
687: is running out of space;
688: this is either inconvenient or quite bad.
689: Running out of space in the receiving or safe areas is just inconvenient.
690: By default, the client's
691: .I fmpush
692: stops when the receiving area runs out of space after saying how many files got
693: transmitted.
694: This is enough information to resend the rest when convenient.
695: Alternatively, you can change
696: .CW $FL/act
697: to give
698: .I fmpush
699: the
700: .CW -r
701: flag; this means that it will retry sending files every hour or so
702: until it succeeds.
703: Running out of space in the holding area is also not too bad;
704: eventually
705: .I munge
706: will put the holding area onto the backup media and then cycle through
707: the receiving area again.
708: .PP
709: The worst effect of running out of space is ruining your database.
710: (This happens rarely for us as
711: we keep our databases on a file system apart from the receiving/holding areas.)
712: Rebuilding the database is not too hard.
713: First, find out the next backup name to be assigned
714: (by a
715: .CW "sweep -n"
716: or by examining the backup media and holding areas).
717: Then, get the most recent backup copy of your database and install it.
718: Set the next backup name you found in the first step with
719: .CW "sweep -s" .
720: You then need to extract the database information for each file added
721: to the database since the backup copy of the database was made.
722: The starting file name is stored in the
723: .CW .N
724: file by
725: .I backupdb .
726: The program
727: .I updatew
728: will extract this from files on a WORM, and
729: .I updatef
730: from regular disk files.
731: The result is fed to
732: .I dbupdate
733: as done in
734: .I munge .
735: .NH
736: Installing the File Motel on a Client System
737: .PP
738: The following instructions assume you have the source
739: .CW fm.cpio
740: somewhere, say
741: .CW /tmp/fm.cpio .
742: Note also that these instructions will change over time;
743: you must follow the online copy of this document included with the source.
744: .IP [1]
745: You will need version 3 of
746: .I mk
747: (or any version
748: dated later than Mar 11, 1989).
749: .IP [2]
750: Select the root directory for the source, set the
751: environment variable
752: .CW FMSRC
753: to its name
754: and export
755: .CW FMSRC .
756: For example,
757: .P1
758: FMSRC=/usr/filemotel/src
759: export FMSRC
760: .P2
761: .IP [3]
762: Install the source tree by
763: .P1
764: cd $FMSRC/..; cpio -iudc < /tmp/fm.cpio
765: .P2
766: .IP [4]
767: Create a
768: .CW CONF
769: file.
770: This describes your installation environment
771: and is included in lower-level mkfiles.
772: The various switches are described in detail below;
773: however, the easiest way is to start with one of the
774: sample configuration files in the directory
775: .CW $FMSRC/conf .
776: .IP [5]
777: If necessary, create the repository for client files:
778: .P1
779: mkdir /usr/lib/filemotel
780: .P2
781: (this is configurable, see
782: .CW FMLIB
783: below)
784: and if you have not defined
785: .CW NO_NETNAME
786: in
787: .CW CONF ,
788: ensure that
789: .CW /n/clientname
790: is a link to
791: .CW / .
792: .IP [6]
793: Initialise the source tree for compiling by
794: .P1
795: mk depend
796: .P2
797: This only needs to be done once.
798: If you have to repeat, you can undo this by
799: .P1
800: mk undepend
801: .P2
802: .IP [7]
803: Compile and install the client software
804: by
805: .P1
806: mk client
807: .P2
808: This can be repeated as often as you like.
809: Only files in
810: .CW $FMLIB
811: and the file
812: .CW $FMBIN/backup
813: (these are configurable, see
814: .CW FMBIN
815: below)
816: are affected.
817: .IP [8]
818: Setup up the dialstring of the server system by
819: .P1
820: echo server-machine-name > $FMLIB/conf
821: .P2
822: The name should match the type of IPC you selected in
823: .CW CONF .
824: For example,
825: .TS
826: center;
827: c c
828: l lFCW.
829: IPC Example
830: Datakit nj/astro/wild
831: Datakit wild
832: IP wild.astro.nj.att.com
833: .TE
834: .IP [9]
835: In theory, you are now operational.
836: A couple of small tests are described in the file
837: .CW SANITY .
838: Some common problems and their cures are described below.
839: .IP [10]
840: You need to construct the script
841: .CW $FMLIB/sel
842: which prints the names of files that you want backed up.
843: There is a sample script (\f(CWsample.sel\fP) in that directory.
844: Be careful not to backup networked file systems by mistake.
845: .IP [11]
846: If you are initiating backup via
847: .I cron (8),
848: add the following command to
849: .CW crontab :
850: .P1
851: eval "/usr/lib/filemotel/sel | \*(cr
852: /usr/lib/filemotel/act 2>&1" | \*(cr
853: mail backup
854: .P2
855: The exact format varies from system to system;
856: the File Motel administrator should tell you what time to set it off.
857: .IP
858: If your client's backup is initiated from the server system,
859: you will have to add the line for
860: .CW fmclient
861: to your flavour of IPC services file.
862: If you communicate to the server by TCP/IP
863: (that is, your
864: .CW CONF
865: file has
866: .CW IPC=socket ),
867: get the
868: .I fmclient
869: line from the file
870: .CW tcp.inetd
871: and add it to
872: .CW /etc/inetd.conf
873: (some systems use
874: .CW /usr/etc/inetd.conf )
875: and add the
876: .I fmclient
877: line from the file
878: .CW tcp.services
879: and add it to
880: .CW /etc/services .
881: You then need to prod
882: .I inetd
883: to look at the new files (commonly by sending it a hangup signal).
884: On some systems, like SunOS, you may need to prod name servers
885: such as the Yellow Pages as well.
886: .IP
887: If you use V10 IPC,
888: add the corresponding line for
889: .CW fmclient
890: from
891: .CW ipc.V10
892: to
893: .CW /usr/ipc/lib/serv.local .
894: The files
895: .CW tcp.inetd
896: and
897: .CW ipc.V10
898: are made by
899: .P1
900: cd $FMSRC; mk ipc.list
901: .P2
902: .NH 2
903: Some Common Installation Problems
904: .PP
905: As a general rule, keep an eye on the log file (on the server)
906: when setting up the File Motel.
907: The most convenient way is a window with a
908: .P1
909: tail -f $FM/log
910: .P2
911: (\f(CW$FM\fP is the root directory of the File Motel.)
912: .PP
913: The most common problem is that the basic IPC software doesn't work.
914: This affects most programs because they involve calling a service on the
915: backup machine.
916: That is why the first thing you try to get working is
917: .I logger
918: which sends messages to the logger process on the backup machine.
919: The kinds of bugs I have seen here are typically bugs in the networking code,
920: particularly TCP/IP.
921: For example, the
922: .I logprint
923: function expects an acknowledgement from the logger server
924: to indicate that everything went okay.
925: On at least two of the systems I use, this sometimes doesn't happen because
926: the closing of the socket by the logger server after sending the ack
927: seems to speed pass the ack and get to
928: .I logprint
929: first.
930: Naturally,
931: .I logprint
932: complains, as might we all.
933: The best solution is to fix the TCP/IP implementation; failing that,
934: you might try a judicious sleep between the ack and the close in
935: the logger server.
936: This is only one example of a general class of timing problems.
937: .PP
938: Another fertile field of failed implementations have to do
939: with user and system names.
940: The File Motel tries to check the validity of system and user names
941: and denies service if there appears to be something sleazy going on.
942: Regrettably, some otherwise sound TCP/IP implementations resemble sleaze.
943: For example, a user
944: .CW mary
945: on a client may appear on the server as the user
946: .CW bill .
947: Or a system may have a system name that is unrelated to the name
948: the networking code uses.
949: An attempt is made to cope with these cases, but may fail with
950: unexpectedly bizarre implementations.
951: If worst comes to worst, simply turn off all the checking
952: and hope no one does anything naughty.
953: (Even if you do this, think hard about allowing remote users
954: to claim they are
955: .CW root ;
956: they will be able to look at all sorts of things.)
957: Unlike X,
958: I implement function and policy.
959: However, all the checking is done in one place
960: .CW serv_$IPC.c ); (
961: feel free to do whatever you like,
962: it's your Motel now.
963: .de XX
964: .IP \\f(CW\\$1\\fP
965: .br
966: ..
967: .NH 2
968: Configuration and Compiling Options
969: .PP
970: The File Motel software is designed for easy installation in heterogeneous
971: environments.
972: The configuration details described below are stored in the file
973: .CW $FMSRC/CONF .
974: .CW $FMSRC
975: contains a number of files
976: containing settings for various systems; you may want to use
977: one of these as a starting point.
978: (Remember that the following information has a small half-life;
979: the truth should be in the online copy of this manual.)
980: The most obvious aspect of configuring the File Motel means choosing on
981: the three directories where files live.
982: The source directory,
983: .CW FMSRC ,
984: has been described above. The other two are
985: .XX FMLIB=/usr/lib/filemotel
986: Change this to wherever you want to put the subprograms.
987: .XX FMBIN=/usr/bin
988: The directory for the (only) user-called command,
989: .CW backup .
990: .LP
991: Configuring the source to your environment is mostly done with
992: .I mk
993: variables and an interface library in
994: .CW src/sys/\f2system .
995: The
996: .I mk
997: variables are
998: .XX RANLIB=ranlib
999: Some systems, mostly BSD-based, whine incessantly unless archive libraries
1000: are processed with some program typically called
1001: .I ranlib .
1002: In this case, set
1003: .CW RANLIB=ranlib ;
1004: otherwise, say if you are on a System V machine, use a harmless program
1005: such as
1006: .CW RANLIB=: .
1007: .XX IPC=socket
1008: Select your favorite type of IPC.
1009: Different clients can use different types and the client's type
1010: need not match the backup system.
1011: (For example, in our Center,
1012: the Cray talks to us via TCP/IP but we talk to it via Datakit.)
1013: The only choices are
1014: .CW socket
1015: and
1016: .CW v10 .
1017: .XX IPCLIB=
1018: Set this if you need a special library in order to use your flavor of IPC.
1019: For example, on V10 systems set
1020: .CW IPCLIB=-lipc .
1021: .XX LIBTYPE=a
1022: This should be set to
1023: .CW a
1024: unless you are on the Cray (which doesn't have archives yet!)
1025: when it should be
1026: .CW o .
1027: .XX COMPAT=
1028: Set
1029: .CW COMPAT=.compat
1030: if you want to be able to process older File Motel files.
1031: (You may have to work hard to get this to work on some systems;
1032: I gave up on the Cray.)
1033: .XX SECTYPE=
1034: Set
1035: .CW SECTYPE=v9
1036: if you are running a McIlroy-Reeds compatible security kernel.
1037: .XX WORMFACE=uda
1038: If you are running the WORM software, you need to say what kind of interface
1039: the WORM is attached to.
1040: The other option
1041: (and the best if you just want to compile without thinking too hard) is
1042: .CW scsi .
1043: The latter may need customizing at your site.
1044: .LP
1045: Currently the system dependent interface library includes the following routines:
1046: .TS
1047: center;
1048: lFCW l.
1049: dirtoents convert directory to element names
1050: ftw traverse file tree
1051: nofile number of fd's available
1052: sysname system name
1053: username user's login name
1054: rx_$IPC call a remote service
1055: serv_$IPC receive calls
1056: service service/socket mapping details
1057: dateadjust do daylight savings/timezone
1058: .TE
1059: .LP
1060: There are a small number of
1061: .CW #define 's
1062: inside
1063: .CW .c
1064: files.
1065: .XX -DSTRINGH="'<string.h>'"
1066: Define the value to be the string function header file.
1067: .XX -DNO_NETNAME
1068: Define this to disable saving and restoring files through
1069: .CW /n/machine-name
1070: although they will still be stored with that prefix.
1071: .NH
1072: Installing the File Motel on a Server System
1073: .PP
1074: The source comes in both
1075: .I cpio
1076: and
1077: .I tar
1078: formats.
1079: As with the client source installation, note that the following description
1080: is dated and the online copy may be significantly different in detail.
1081: .IP [1]
1082: Follow the client installation process steps 1\-8.
1083: You also need to set the place where the administrative binaries are kept.
1084: I do it this way:
1085: .P1
1086: FMAB=$FM/bin
1087: export FMAB
1088: .P2
1089: .IP [2]
1090: Complete
1091: .CW $FMLIB/conf .
1092: You have to specify the default media type and the root of the administrative
1093: file tree (denoted by
1094: .CW $FM
1095: below).
1096: Details are in
1097: .I backup (5);
1098: my File Motel has this configuration:
1099: .P1
1100: wild
1101: j
1102: /usr/backup
1103: .P2
1104: .IP [3]
1105: Everything that doesn't need to run as
1106: .CW root
1107: should run as an otherwise unused id.
1108: By default, this is
1109: .CW fmdaemon ;
1110: if you don't like this, change the define in
1111: .CW libfm/server.c .
1112: Whatever you choose, set up an account for them;
1113: the File Motel requires nothing but their name/uid
1114: (not even a login directory).
1115: By default, all the shell scripts send mail to the mailbox
1116: .CW backup .
1117: This should be set to an alias for the File Motel caretaker.
1118: .IP [4]
1119: Inform your IPC system of the many services the File Motel offers.
1120: See the notes under step 11 in the client installation above but
1121: install everything, not just
1122: .CW fmclient .
1123: (See step 10 below as well.)
1124: You also need to set up the periodic (normally nightly)
1125: calling of clients and/or
1126: the processing of their files by
1127: .CW $FB/munge .
1128: .I Munge
1129: needs the name of a program to copy the files to your backup media;
1130: set the variable
1131: .CW PROCPERM
1132: to that program's name.
1133: As described previously, you also need to periodically backup the databases
1134: with
1135: .CW backupdb ;
1136: it also needs the name of the program to copy files to your media.
1137: .IP [5]
1138: Initialise the log file:
1139: .P1
1140: > $FM/log; chown bin $FM/log
1141: chmod 644 $FM/log
1142: .P2
1143: .IP [6]
1144: Install the server programs:
1145: .P1
1146: mk server
1147: .P2
1148: .IP [7]
1149: Setup the receiving areas.
1150: List their names in
1151: .CW $FM/adm/rcvdirs
1152: and initialise each are by running
1153: .P1
1154: $FB/rcvdirs
1155: .P2
1156: We use one 120MB file system mounted on
1157: .CW $FM/rcv .
1158: .IP [8]
1159: Allocate the safe area for backup copies.
1160: It must have the name
1161: .CW $FM/v
1162: but may be a symbolic link if there is not enough space in
1163: .CW $FM .
1164: We use an identically sized file system to the receive area mounted on
1165: .CW $FM/v .
1166: .IP [9]
1167: After deciding which databases you want maintained,
1168: initialise the databases with
1169: .P1
1170: src/dbinit.sh
1171: .P2
1172: You may want to start off with all three and remove any you don't want later on
1173: (like when they get to be too big).
1174: .IP [10]
1175: Choose how the receiving process
1176: .I rcv
1177: works.
1178: By default, it simply accepts files.
1179: If it is invoked by the name
1180: .CW mrcv ,
1181: it initiates processing of the received by
1182: .I munge
1183: (or more accurately,
1184: .CW $FL/callmunge )
1185: after the first and last files have been received
1186: (you need both in case any one file took longer to receive than
1187: .I munge 's
1188: cycle time).
1189: The advantage is that you will almost never run out of space, as you will be
1190: processing files at the same time as receiving them.
1191: The disadvantage is that everything will run slower.
1192: I use the default behavior; we rarely run out of space and I like to
1193: investigate why some client is sending much more than normal
1194: before accepting it all.
1195: .IP [11]
1196: Finish the client installation starting with step 10, making sure you
1197: do not backup the receiving areas or
1198: .CW $FM/v .
1199: .IP [12]
1200: Add the command
1201: .P1
1202: $FB/rmlocks
1203: .P2
1204: to
1205: .CW /etc/rc
1206: (or whatever passes for system startup on your system).
1207: This simply removes any lockfiles in
1208: .CW $FM/locks .
1209: .NH
1210: Media Management
1211: .PP
1212: An attempt has been made to provide generic media management programs.
1213: For example, the recovery servers
1214: .I fetchw_
1215: and
1216: .I fetchj_
1217: are instances of a single device server and a jukebox server respectively.
1218: To make this work, a media library is used.
1219: To use a new media, such as Exabyte tapes,
1220: implement the routines in the library, and link the library with
1221: .CW fm/media_.o
1222: or
1223: .CW fm/mmedia_.o .
1224: An informal description of the routines follow.
1225: .LP
1226: .CW "mediainit(char *device, char *vol_id)"
1227: .ti +5n
1228: Initialise the media on the device specified by
1229: .I device .
1230: The latter may a full name or any recognizable abbreviation.
1231: If
1232: .I vol_id
1233: is given, it is checked against the media present.
1234: .LP
1235: .CW "char *mediamount(char *vol_id)"
1236: .ti +5n
1237: Mount the media named
1238: .I vol_id
1239: and return the appropriate device
1240: (suitable for use by
1241: .I mediainit ).
1242: Currently, the values given as
1243: .I vol_id
1244: are those returned by
1245: .I medianame
1246: (below).
1247: .LP
1248: .CW "medianame(char *volume)"
1249: .ti +5n
1250: Return the media name containing
1251: .I volume .
1252: .LP
1253: .CW "mediaopen(char *name, Media *m)"
1254: .ti +5n
1255: Set up
1256: .I m
1257: to point at the backup copy
1258: .I name .
1259: The fields in a
1260: .CW Media
1261: include a file descriptor, preferred read block size, and copy size.
1262: .LP
1263: .CW "void mediafiles(int32 v, int32 n, Media *m, Tb **bp)"
1264: .ti +5n
1265: Return a Media and a list (in
1266: .CW *bp )
1267: of backup copy pointers for all backup copies more recent than
1268: file
1269: .I n
1270: in volume
1271: .I v .
1272: A
1273: .CW Tb
1274: has the creation time and initial (1K) block number for a backup copy.
1275: (It is used by
1276: .I dbupdate ).
1277: The size returned in
1278: .I m
1279: is not actually a size but the number of records in
1280: .CW *bp .
1281: .PP
1282: This is not a complete description;
1283: if you have to write new versions of these routines,
1284: look at the existing implementations (in
1285: .CW $FMSRC/media )
1286: and the programs that use them
1287: (all in
1288: .CW $FMSRC/fm ).
1289: .NH
1290: References
1291: .LP
1292: |reference_placement
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.