|
|
1.1 ! root 1: .so ../ADM/mac ! 2: .XX backup 593 "The File Motel: An Owner's Manual" ! 3: .nr dP 2 ! 4: .nr dV 3p ! 5: .TL ! 6: The File Motel: ! 7: .br ! 8: An Owner's Manual ! 9: .AU ! 10: Andrew G. Hume ! 11: .AI ! 12: .MH ! 13: .AB ! 14: .PP ! 15: The File Motel is an incremental user-level file backup system for ! 16: .UX ! 17: systems. ! 18: The first version of the File Motel has been in successful operation ! 19: for over two years with three sites supporting about 50 systems. ! 20: The first version supported only Ninth Edition ! 21: .UX ! 22: systems, although with only modest inconvenience ! 23: files could be saved from Sun 3 clients. ! 24: The second version of the File Motel is a complete reworking ! 25: of the original system, emphasizing easy portability to most ! 26: .UX ! 27: systems. ! 28: The files are stored in a machine-independent form; ! 29: as an example, I have recovered a directory onto a Sun 3 ! 30: from a server on a MIPS 120/5 that had been originally ! 31: saved from a Cray X/MP\-24. ! 32: The user and administrative interfaces have been streamlined, ! 33: based on experience in the field. ! 34: .PP ! 35: The system has been restructured to look like a kit. ! 36: Most of the modules, such as the database, networking and media code, ! 37: have been isolated via simple interface routines. ! 38: As with a kit, you may not find exactly what you want, ! 39: but it should be easy to roll your own. ! 40: .AE ! 41: .2C ! 42: .NH ! 43: An Overview ! 44: .PP ! 45: This is a manual for the File Motel |reference(file motel usenix), ! 46: a backup system for ! 47: .UX ! 48: systems. ! 49: The File Motel consists of a central server system ! 50: servicing many client systems. ! 51: The server system is almost always also a client system. ! 52: The File Motel saves only the files that change on any given client system, ! 53: using a database to record what versions ! 54: have been saved for any particular file. ! 55: Under normal usage patterns, this is on the order of 1\-5% of the user files ! 56: on the client. ! 57: This makes backup practical over slow networks to slow backup media. ! 58: .PP ! 59: The daily routine in the File Motel starts around midnight when ! 60: the clients send copies of any new or recently modified files to the server machine. ! 61: After receiving the files from all the clients, ! 62: a separate processing step transforms the received files into ! 63: backup copies, which are then written to the backup media ! 64: of your choice (typically, WORM disks). ! 65: Backup and recovery can be performed by anyone with the appropriate permissions; ! 66: in general, there is no administrative overhead other than ! 67: fussing with backup media. ! 68: .PP ! 69: This description may be a little clearer with some details. ! 70: The following description includes ! 71: some sample numbers from our Center's File Motel; ! 72: other sites will differ. ! 73: The first step is a client sending files to the server. ! 74: A shell script configured for the client generates a list of (say 5000) ! 75: candidates for backup (say, all the files changed in the last week). ! 76: This list is sent to the server which returns a list of the files (say 900) ! 77: that really need to be backed up. ! 78: Each of these files is then transmitted to the server together with a header ! 79: (which includes a checksum). ! 80: (On average, about 15MB is sent taking about 20 minutes.) ! 81: There is an acknowledgement from the server after every file; ! 82: this allows graceful termination when the server has problems, ! 83: such as running out of space. ! 84: The redundancy in the candidate list allows non-critical clients ! 85: to cope with transient faults (such as a broken network) without ! 86: administrative intervention by ignoring the fault ! 87: and getting the files the following night. ! 88: The client process is normally initiated either by the server machine ! 89: or by ! 90: .I cron (8). ! 91: Exactly the same mechanism is used for user-initiated backups; ! 92: the only difference is that the system backup is executed by the super-user ! 93: (and thus has access to all the files on the client system). ! 94: .PP ! 95: The files sent by the clients are kept in receiving areas each of ! 96: 32 subdirectories, each processed in turn. ! 97: First, the program ! 98: .I sweep ! 99: deletes any unnecessary files, assigns each backup copy a name ! 100: (which is stored in the file's header) and recalculates the file's checksum. ! 101: The remaining files are fed to ! 102: .I dbupdate ! 103: which deletes any unnecessary files and stores the version information ! 104: in the database. ! 105: Finally, the surviving files are moved to a staging area for writing to the ! 106: backup media. ! 107: .PP ! 108: The last step is writing the backup copies to the backup media. ! 109: The only medium currently supported is a WORM disk. ! 110: In our environment they are preferred because of their large capacity ! 111: and because you can get reliable jukeboxes (automatic disk changers). ! 112: Optical jukeboxes come in all sorts of sizes; our center has a ! 113: SONY WDA 3000-10 with a total capacity of 164GB (328GB after October 1989). ! 114: .PP ! 115: There are many other programs in the File Motel, ! 116: some intended for the user (for example, recovering files), ! 117: and others for the administrator (usage statistics, backing up the database). ! 118: .PP ! 119: The rest of this manual is intended for the caretaker of the File Motel. ! 120: Section 2 details some of the ! 121: peculiar aspects of the File Motel that have caused problems in the past; ! 122: if you can survive these, ! 123: then installing and running the File Motel ! 124: ought to be easy. ! 125: If there are incompatibilities, then installing the File Motel ! 126: will require (perhaps substantial) work. ! 127: The File Motel uses many small single-purpose tools; ! 128: if you need to figure out what is going on (or wrong, as the case may be), ! 129: these tools are described in Section 3. ! 130: Sections 4 and 5 are step by step instructions for installing ! 131: a client and a server respectively. ! 132: Finally, Section 6 elaborates on media management. ! 133: .NH ! 134: Some Things You Should Know ! 135: .PP ! 136: .de BL ! 137: .IP \ \ \ \ \s+3\(bu\s-3 ! 138: .. ! 139: This section describes some of the assumptions underlying the ! 140: construction of the File Motel software. ! 141: Most of these assumptions have caused problems in porting to ! 142: systems less hospitable than 10th Edition ! 143: .UX ! 144: (or V10 for short). ! 145: .BL ! 146: Each server has one global name space for all the files saved from all the clients. ! 147: The file ! 148: .I z ! 149: from machine ! 150: .I mach ! 151: is stored under the name ! 152: .I /n/mach/z . ! 153: It so happens that this is how V10 networked file systems are normally mounted ! 154: and in fact, all file references actually go through this network name. ! 155: For other systems, you should define the ! 156: .CW -DNO_NETNAME ! 157: switch as described in section 5. ! 158: .BL ! 159: All client\-service communication uses a uniform networking interface. ! 160: That is, a system invokes a service on the remote machine and gets a ! 161: pair of file descriptors attached to the input and output of that service. ! 162: Both Berkeley-style sockets and V10-style IPC are supported. ! 163: For the case of a single system that is both a server and client ! 164: and has no networking, you will have to write an execution service that ! 165: constructs pipes to the desired services. ! 166: Note that it is possible to provide this interface even if all ! 167: the networking you have is a user program (such as ! 168: .I rx ! 169: or ! 170: .I rsh ) ! 171: that executes a program on another machine. ! 172: .BL ! 173: It must be possible to nominate the user that the remote service runs as. ! 174: Most run as a regular user, say ! 175: .I fmdaemon , ! 176: but some must run as superuser and on V10 systems, one must run as ! 177: .I bin . ! 178: .BL ! 179: The code is reasonably portable; with the canned configuration files ! 180: it runs on a Cray X-MP/24 (UNICOS), ! 181: VAX 11/750, Microvax II, 8550, 8600 and 11/780 (V10, Ultrix, 4.3BSD), ! 182: Sun 3 (SunOs 4.0), ! 183: MIPS M120/5, M2000 (UMIPS 3.0, 3.10, RISC/os 4.0). ! 184: Some of the code, notably Ken Thompson's new version of ! 185: .CW doprint , ! 186: makes assumptions about variable argument lists. ! 187: So far, the code has continued to work on all the systems we have tried ! 188: (although we can't optimise on the MIPS) ! 189: but in this world of perverse hardware and compilers, you may not be so lucky. ! 190: .BL ! 191: The code assumes no particular byte-ordering but does assume that there ! 192: is an integer type of at least 32 bits. ! 193: By and large, the programs allocate all data areas dynamically; ! 194: whenever there is choice, programs trade space for smaller runtime, ! 195: so there must be at least 24 bits of data space. ! 196: If you have a 32 bit machine but have 16 bit ints, ! 197: you will have trouble (perhaps ! 198: .I lint ! 199: will help). ! 200: .BL ! 201: It is assumed that the backup medium can hold at least one backup copy ! 202: and practically, it should hold at least one volume. ! 203: This is about 20MB by default; if you have smaller backup media ! 204: and cannot arrange better, change the volume size \(em it is ! 205: a constant defined in ! 206: .CW fm/sweep.c . ! 207: .BL ! 208: The File Motel depends on each file having a unique name. ! 209: This continue to cause problems, particularly in the presence of symbolic links. ! 210: For example, on a system I use ! 211: .CW /usr/andrew ! 212: is a symbolic link ! 213: .CW /usr2/guest/andrew . ! 214: The right thing to do is to save files under ! 215: .CW /usr/andrew ! 216: (so that you can move them from file system to file system and keep their name). ! 217: Yet, the user may not be aware of this name; if they do a ! 218: .I pwd ! 219: to find out, they will get the wrong answer. ! 220: .NH ! 221: A Detailed Description ! 222: .PP ! 223: The action in the File Motel can be functionally divided into four areas: ! 224: client selecting and sending files to the server, ! 225: the server processing the client files onto backup media, ! 226: client recovering files back from the server, ! 227: and an assortment of administrative functions. ! 228: .PP ! 229: Programs and scripts used by the File Motel live in three places: ! 230: .CW /usr/bin/backup ! 231: is the user interface, ! 232: .CW /usr/lib/filemotel ! 233: holds all the programs and scripts used by clients, ! 234: and ! 235: .CW /usr/filemotel/bin ! 236: holds the server-specific programs. ! 237: These are the conventional names \(em they can be reconfigured to taste. ! 238: Because of this and their length, ! 239: these abbreviations will be used in the following text: ! 240: .TS ! 241: center; ! 242: lFCW lFCW. ! 243: $FM /usr/filemotel ! 244: $FB /usr/filemotel/bin ! 245: $FL /usr/lib/filemotel ! 246: .TE ! 247: .NH 2 ! 248: Client Sends Files to the Server ! 249: .PP ! 250: The controlling script here is ! 251: .CW $FL/doclient : ! 252: .P1 ! 253: #!/bin/sh ! 254: $FL/sel | $FL/act ! 255: .P2 ! 256: The selection script ! 257: .I sel ! 258: has to generate a list of absolute filenames. ! 259: You can use any tools available to you; the File Motel supplies ! 260: the program ! 261: .I fcheck ! 262: which is rather more efficient than ! 263: .I find (1) ! 264: and follows symbolic links that are arguments. ! 265: This is to help clients save files as ! 266: .CW /usr/andrew/... ! 267: rather than the less than informative ! 268: .CW /usr2/guest/andrew/... . ! 269: A small ! 270: .I sel ! 271: file is shown below. ! 272: .KF ! 273: .P1 0 ! 274: /usr/lib/filemotel/fcheck 512 7 /etc /usr/* | ! 275: sed -e '/\e.o$/d ! 276: /\e/a\e.out$/d ! 277: /\e/core$/d ! 278: /\e/foo$/d ! 279: /^\e/usr\e/tmp\e//d ! 280: /^\e/usr\e/spool\e//d' ! 281: cat <<EOF ! 282: /unix ! 283: EOF ! 284: .P2 ! 285: .KE ! 286: .PP ! 287: The script ! 288: .I act ! 289: works in a straightforward way. ! 290: First, the filenames are transformed into the input format for ! 291: .I missing ! 292: by the program ! 293: .I iprint . ! 294: This prepends ! 295: .CW /n/\fImachine ! 296: to the filename (unless this is already there) and appends the ! 297: inode change time and size. ! 298: There is a convention that an input filename starting with a ! 299: .CW // ! 300: is a symbolic link to be followed (that is, use ! 301: .I stat (2) ! 302: rather than ! 303: .I lstat (2) ! 304: to get the time and size). ! 305: The size is carried around so that if you choose a file because it is small ! 306: and it grows dramatically while you are asking about it, you can reject it ! 307: later on (although this is not done now because no one cares yet). ! 308: .I Missing ! 309: takes these names and ships them to the corresponding server ! 310: .I missing_ ! 311: on the server machine. ! 312: (Servers for a service ! 313: .CW abc ! 314: are called ! 315: .CW abc_ ). ! 316: .I Missing_ ! 317: checks the name,time tuples against the database and sends back ! 318: the lines that are newer than the entry in the database. ! 319: Transmissions in both directions are checksummed; any errors ! 320: are reported to standard error and are also logged in the log file ! 321: on the server machine. ! 322: .PP ! 323: The results from ! 324: .I missing ! 325: are stored in ! 326: .CW $FL/files.\fIday\fP . ! 327: They are given to ! 328: .I fmpush ! 329: which actually pushes them to the backup system. ! 330: .I Fmpush ! 331: also takes a system name argument for logging purposes. ! 332: If there are any errors, ! 333: .I fmpush ! 334: reports the error and the number of files transmitted. ! 335: This allows the push to be restarted efficiently: ! 336: .P1 ! 337: $ pwd ! 338: $FL ! 339: $ fmpush wild < files.Tue ! 340: EOF after 2713 files sent. ! 341: $ sed 1,2713d files.Tue | fmpush wild ! 342: .P2 ! 343: Any diagnostics are mailed to the user ! 344: .I backup ! 345: and also kept in ! 346: .CW $FL/files.\fIday\fP.sho . ! 347: It is not necessary to keep these files around after they have been used ! 348: but they are relatively small and often useful; ! 349: for example, a client who normally saves 100 or so files suddenly sends ! 350: you 10,000 files \(em you can quickly go to that client and check ! 351: what the files were. ! 352: When possible, diagnostics are also logged on the backup system. ! 353: .PP ! 354: Only regular files, symbolic links and directories have their contents saved; ! 355: all other files (such as devices) just have their ! 356: .I stat (2) ! 357: buffers saved. ! 358: To preserve machine independence, the content of a directory is saved ! 359: as a list of null-terminated element names. ! 360: This removes the need for the server to be able to guess a ! 361: client's directory structure, although it does lose a small ! 362: amount of subtle information contained in the freed slots of the ! 363: directory. ! 364: ......... ! 365: .NH 2 ! 366: Server Processes Client's Files ! 367: .PP ! 368: Received files are processed by the script ! 369: .CW $FL/munge . ! 370: This processing is decoupled from either receiving or restoring ! 371: client files; for example, ! 372: it is safe to process files while receiving them. ! 373: Munging is typically started by ! 374: .I cron , ! 375: but you can also cause ! 376: .I rcv ! 377: to invoke ! 378: .I munge ! 379: automatically, ! 380: and you can invoke ! 381: .I munge ! 382: manually by executing ! 383: .CW $FL/callmunge . ! 384: .PP ! 385: Regardless of how it is called, ! 386: .I munge ! 387: scans the 32 receiving subdirectories in each of the receiving areas in ! 388: .CW $FM/adm/rcvdirs ! 389: looking for files to process. ! 390: If it finds any, it calls a program ! 391: whose name is supplied as ! 392: .CW $PROCPERM ! 393: to copy the final copies ! 394: to the media of your choice. ! 395: It repeats this scan until it found nothing to do during the last scan. ! 396: .PP ! 397: The action within a subdirectory is simple. ! 398: .CW $FB/sweep ! 399: looks for files with mode ! 400: .CW 0 , ! 401: .CW 0400 , ! 402: or ! 403: .CW 0600 . ! 404: Mode ! 405: .CW 0 ! 406: files are files that are being received ! 407: (\fIrcv\fP ! 408: marks a file as done by changing its mode to ! 409: .CW 0600 ) ! 410: and are ignored unless it is hasn't been modified within the ! 411: last 12 hours. ! 412: In this latter case, it is regarded as stale (almost always a network ! 413: connection was dropped) and unlinked. ! 414: Mode ! 415: .CW 0400 ! 416: files have been already processed by ! 417: .I sweep ! 418: but for some reason (most often, running out of space) ! 419: weren't copied to the backup area. ! 420: Mode ! 421: .CW 0600 ! 422: files are assigned a backup copy name and after recalculating the ! 423: checksum are changed to mode ! 424: .CW 0400 . ! 425: .I Sweep ! 426: emits the names of all the files ready to be copied to the backup area ! 427: and this is saved in a file. ! 428: .I Munge ! 429: then makes any needed directories in the backup area that don't exist. ! 430: .CW $FB/fmmv ! 431: then moves all the files to be copied to the backup area. ! 432: We then update the database with information from the files we just copied. ! 433: .PP ! 434: Updating the database is a two part process. ! 435: Run ! 436: .CW $FB/updatef ! 437: on the files (use the program ! 438: .I updatew ! 439: for files on the WORM) ! 440: and then feed the output to ! 441: .CW $FB/dbupdate . ! 442: In this way, we guarantee that the database is purely a function of ! 443: the backed up files ! 444: (assuming none get lost between the backup area and the backup media). ! 445: The input to ! 446: .I dbupdate ! 447: is (roughly) a sequence of backup file headers and contents of ! 448: backed up directories. ! 449: .I Dbupdate ! 450: updates the various databases (described below) and sometimes tries to unlink ! 451: the backup copies. ! 452: (This happens when two copies of the same file are ! 453: in the same receiving subdirectory. ! 454: .I Sweep ! 455: happily copies both to the backup area but when ! 456: .I dbupdate ! 457: goes to update the main database for the second copy, it discovers ! 458: it already has this copy and so unlinks the second copy. ! 459: It doesn't care if the unlink fails because this is just an attempt ! 460: to be space efficient and in any case, the unlink can fail only if the file ! 461: has already been committed to the backup media.) ! 462: .I Dbupdate ! 463: also appends accounting statistics records for each file, containing the time ! 464: the file was saved, the size, the owner and the system name, to the file ! 465: .CW $FM/stat.log . ! 466: .PP ! 467: After ! 468: .I munge ! 469: is finished scanning the receive areas, ! 470: it processes the statistics records generated by ! 471: .I dbupdate ! 472: by ! 473: calling ! 474: .CW $FB/procstats . ! 475: This reads (and then truncates) ! 476: .CW $FM/stat.log ! 477: and adds new records to the files ! 478: .CW $FM/stat/\fIsystem\f(CW . ! 479: These records are in machine independent format and have been collapsed to ! 480: refer to all the files per user/day combination. ! 481: Even in this compressed format, ! 482: the statistics records would grow without bound. ! 483: Accordingly, ! 484: .I munge ! 485: calls ! 486: .CW "procstats -c" ! 487: to further collapse together all the records older than 30 days for each user. ! 488: (The number 30 comes from the only program that looks at these statistics, ! 489: .CW "backup stats" .) ! 490: .PP ! 491: Throughout its work, ! 492: .I munge ! 493: checks to see if it should exit by checking the existence of a guard file; ! 494: this is created by ! 495: .CW $FB/stopmunge . ! 496: .NH 2 ! 497: The Databases on the Server ! 498: .PP ! 499: There are three databases kept on the server, all conventionally kept in ! 500: .CW $FM/db . ! 501: The first, ! 502: .CW filemap , ! 503: is the main and only required database; ! 504: it contains the mappings from filename to last modify date and ! 505: from (filename, modify date) tuple to backup copy name. ! 506: The second, ! 507: .CW dir , ! 508: is optional and maps (directory, modify date) tuples to their contents. ! 509: It is used to make recovery of file trees go (much) faster. ! 510: The third, ! 511: .CW fs , ! 512: is optional and maps (filename, modify date) tuples to their ! 513: .I stat ! 514: buffers. ! 515: It is used to implement the backup file system. ! 516: .PP ! 517: The default implementation of these databases is Peter Weinberger's ! 518: compressed B-trees (see ! 519: .I cbt (1)). ! 520: (The compression refers to eliding common prefixes ! 521: of successive keys; it does very well on the pathnames used by the File Motel.) ! 522: The ! 523: .I cbt ! 524: database ! 525: .I db ! 526: consists of two files, ! 527: .I db\f(CW.T\fR ! 528: (the tree part) and ! 529: .I db\f(CW.F\fR ! 530: (the data part). ! 531: As the ! 532: .I cbt ! 533: routines do not reclaim space, the ! 534: .CW .T ! 535: file can start growing at a very fast rate when the tree is large ! 536: (say four levels). ! 537: This has proved to be a real nuisance so there is considerable support ! 538: for periodic squashing of the database ! 539: (which reclaims space by rebuilding the database) and ! 540: for supporting the ! 541: .I filemap ! 542: database as a collection of separate databases. ! 543: .PP ! 544: The latter is intended to be used in the following way. ! 545: The file ! 546: .CW $FM/db/filemap ! 547: is always the current ! 548: .I filemap ! 549: database. ! 550: If the file ! 551: .CW $FM/db/filemaplist ! 552: exists, it is taken as a list of database names, ! 553: one per line in oldest to newest order, to be used in addition to ! 554: .CW $FM/db/filemap . ! 555: These are searched only, never updated. ! 556: At our site, we produce one of these databases for about every 15\-16GB ! 557: of backup files. ! 558: .NH 2 ! 559: Server Sends Files to the Client ! 560: .PP ! 561: All requests for files go through a central server ! 562: .CW $FB/fetch_ . ! 563: This program simply farms out work to other programs. ! 564: .I Fetchf ! 565: attempts to find files that are still under ! 566: .CW $FM/v . ! 567: Systems with plenty of mass storage can leave the backup copies ! 568: online and things will go quite fast. ! 569: For the files that ! 570: .I fetchf ! 571: can't find, ! 572: .I fetch_ ! 573: looks in the configuration file ! 574: (\f(CW$FL/conf\fP) ! 575: to determine the backup media (say ! 576: .CW j ! 577: for jukebox). ! 578: It then calls ! 579: .CW $FB/fetchj ! 580: with the appropriate filenames. ! 581: If you just have a WORM drive, you should use ! 582: .CW $FB/fetchw ! 583: instead. ! 584: These two programs purport to be generic drivers for jukeboxes ! 585: and single drives; if you have different media (say an Exabyte tape), ! 586: you should be able simply to load the drivers with your media library ! 587: to generate the appropriate fetch program. ! 588: More details are given below in the section on media management. ! 589: .PP ! 590: Users can generally access any files they have read permission for, ! 591: regardless of what system they are on or the system from which the files were ! 592: stored from. ! 593: In addition, we trust our users (or more importantly, our network) ! 594: and so we do no checking of a user's right to retrieve files. ! 595: Such checking, such as a password, can easily be added to the startup ! 596: protocol between the program the user calls (\f(CW$FL/fetch\fP) ! 597: and the server (\f(CW$FB/fetch_\fP). ! 598: .NH 2 ! 599: Administrivia ! 600: .PP ! 601: This section is a bunch of administrative odds and ends for the way we organise ! 602: the File Motel in our Center. ! 603: Your details may be different, and indeed ours change over time, ! 604: but the examples are probably helpful. ! 605: .NH 3 ! 606: File Layout ! 607: .PP ! 608: We store the File Motel under the directory ! 609: .CW /usr/backup , ! 610: which is a file system large enough to hold comfortably the current ! 611: databases and a squashed version (more on this later). ! 612: The receiving area is another smallish file system (about 120MB) mounted on ! 613: .CW /usr/backup/rcv ! 614: and the holding area ! 615: .CW /usr/backup/v ! 616: is another file system of the same size. ! 617: This is done to isolate the effects of client excesses; ! 618: the sending processes all know how to deal with running out of space ! 619: (we practice often). ! 620: I regard running out of space once a month as tolerable; ! 621: once a week is too much. ! 622: To aid searches for files, we keep a file ! 623: .CW /usr/backup/filenames ), ( ! 624: which is a sorted list of all filenames. ! 625: This is maintained by the database squasher. ! 626: .PP ! 627: The main drawback to Weinberger's B-tree software is that it does ! 628: not reclaim space in the tree. ! 629: Thus, over time the tree file gets huge ! 630: (the rate grows as the depth of the tree). ! 631: The fix is to periodically squash the tree. ! 632: We combine this with dumping the database to WORM disk in the script ! 633: .CW $FB/backupdb . ! 634: .NH 3 ! 635: Talking to the Clients ! 636: .PP ! 637: We have found it best to call the clients rather than have them call us. ! 638: The load seems more balanced and things get done sooner. ! 639: We use ! 640: .I mk ! 641: as it handles parallel processing; a typical mkfile is ! 642: .P1 0 ! 643: CLIENTS=Cwild C3k Ctcp!tempel ! 644: NPROC=3 ! 645: ! 646: clients:VQ: $CLIENTS ! 647: PROCPERM=$FB/toworm $FB/munge ! 648: ! 649: C%:VQ: ! 650: set +e; $FB/callclient $stem; exit 0 ! 651: .P2 ! 652: Understanding this completely requires familiarity with ! 653: .I mk ! 654: but the intent is clear. ! 655: We first get the files from the clients by the ! 656: .CW C% ! 657: rule and then process them by ! 658: .CW munge ! 659: and then put them out on WORM disk by ! 660: .CW toworm . ! 661: As we use Datakit, most clients are called using Datakit but some ! 662: (like ! 663: .CW tempel ! 664: in the mkfile) are called using TCP/IP. ! 665: This convenient piece of magic works on V9 because of Dave Presotto's ! 666: clever design of the IPC system; you may have to work harder. ! 667: The ! 668: .CW set ! 669: stuff in the ! 670: .CW C% ! 671: rule means to keep on processing even if a client gets an error. ! 672: The entry in ! 673: .CW /etc/crontab ! 674: is more or less ! 675: (this is one physical line folded at the \*(cr because of the column width) ! 676: .P1 0 ! 677: eval "cd /usr/backup/adm; mk clients 2>&1" |\*(cr ! 678: mail backup ! 679: .P2 ! 680: We use the ! 681: .CW backup ! 682: mailbox to redirect mail to someone appropriate. ! 683: .NH 2 ! 684: Disasters ! 685: .PP ! 686: Currently, the only disaster we have had that was not the result of a kernel bug ! 687: is running out of space; ! 688: this is either inconvenient or quite bad. ! 689: Running out of space in the receiving or safe areas is just inconvenient. ! 690: By default, the client's ! 691: .I fmpush ! 692: stops when the receiving area runs out of space after saying how many files got ! 693: transmitted. ! 694: This is enough information to resend the rest when convenient. ! 695: Alternatively, you can change ! 696: .CW $FL/act ! 697: to give ! 698: .I fmpush ! 699: the ! 700: .CW -r ! 701: flag; this means that it will retry sending files every hour or so ! 702: until it succeeds. ! 703: Running out of space in the holding area is also not too bad; ! 704: eventually ! 705: .I munge ! 706: will put the holding area onto the backup media and then cycle through ! 707: the receiving area again. ! 708: .PP ! 709: The worst effect of running out of space is ruining your database. ! 710: (This happens rarely for us as ! 711: we keep our databases on a file system apart from the receiving/holding areas.) ! 712: Rebuilding the database is not too hard. ! 713: First, find out the next backup name to be assigned ! 714: (by a ! 715: .CW "sweep -n" ! 716: or by examining the backup media and holding areas). ! 717: Then, get the most recent backup copy of your database and install it. ! 718: Set the next backup name you found in the first step with ! 719: .CW "sweep -s" . ! 720: You then need to extract the database information for each file added ! 721: to the database since the backup copy of the database was made. ! 722: The starting file name is stored in the ! 723: .CW .N ! 724: file by ! 725: .I backupdb . ! 726: The program ! 727: .I updatew ! 728: will extract this from files on a WORM, and ! 729: .I updatef ! 730: from regular disk files. ! 731: The result is fed to ! 732: .I dbupdate ! 733: as done in ! 734: .I munge . ! 735: .NH ! 736: Installing the File Motel on a Client System ! 737: .PP ! 738: The following instructions assume you have the source ! 739: .CW fm.cpio ! 740: somewhere, say ! 741: .CW /tmp/fm.cpio . ! 742: Note also that these instructions will change over time; ! 743: you must follow the online copy of this document included with the source. ! 744: .IP [1] ! 745: You will need version 3 of ! 746: .I mk ! 747: (or any version ! 748: dated later than Mar 11, 1989). ! 749: .IP [2] ! 750: Select the root directory for the source, set the ! 751: environment variable ! 752: .CW FMSRC ! 753: to its name ! 754: and export ! 755: .CW FMSRC . ! 756: For example, ! 757: .P1 ! 758: FMSRC=/usr/filemotel/src ! 759: export FMSRC ! 760: .P2 ! 761: .IP [3] ! 762: Install the source tree by ! 763: .P1 ! 764: cd $FMSRC/..; cpio -iudc < /tmp/fm.cpio ! 765: .P2 ! 766: .IP [4] ! 767: Create a ! 768: .CW CONF ! 769: file. ! 770: This describes your installation environment ! 771: and is included in lower-level mkfiles. ! 772: The various switches are described in detail below; ! 773: however, the easiest way is to start with one of the ! 774: sample configuration files in the directory ! 775: .CW $FMSRC/conf . ! 776: .IP [5] ! 777: If necessary, create the repository for client files: ! 778: .P1 ! 779: mkdir /usr/lib/filemotel ! 780: .P2 ! 781: (this is configurable, see ! 782: .CW FMLIB ! 783: below) ! 784: and if you have not defined ! 785: .CW NO_NETNAME ! 786: in ! 787: .CW CONF , ! 788: ensure that ! 789: .CW /n/clientname ! 790: is a link to ! 791: .CW / . ! 792: .IP [6] ! 793: Initialise the source tree for compiling by ! 794: .P1 ! 795: mk depend ! 796: .P2 ! 797: This only needs to be done once. ! 798: If you have to repeat, you can undo this by ! 799: .P1 ! 800: mk undepend ! 801: .P2 ! 802: .IP [7] ! 803: Compile and install the client software ! 804: by ! 805: .P1 ! 806: mk client ! 807: .P2 ! 808: This can be repeated as often as you like. ! 809: Only files in ! 810: .CW $FMLIB ! 811: and the file ! 812: .CW $FMBIN/backup ! 813: (these are configurable, see ! 814: .CW FMBIN ! 815: below) ! 816: are affected. ! 817: .IP [8] ! 818: Setup up the dialstring of the server system by ! 819: .P1 ! 820: echo server-machine-name > $FMLIB/conf ! 821: .P2 ! 822: The name should match the type of IPC you selected in ! 823: .CW CONF . ! 824: For example, ! 825: .TS ! 826: center; ! 827: c c ! 828: l lFCW. ! 829: IPC Example ! 830: Datakit nj/astro/wild ! 831: Datakit wild ! 832: IP wild.astro.nj.att.com ! 833: .TE ! 834: .IP [9] ! 835: In theory, you are now operational. ! 836: A couple of small tests are described in the file ! 837: .CW SANITY . ! 838: Some common problems and their cures are described below. ! 839: .IP [10] ! 840: You need to construct the script ! 841: .CW $FMLIB/sel ! 842: which prints the names of files that you want backed up. ! 843: There is a sample script (\f(CWsample.sel\fP) in that directory. ! 844: Be careful not to backup networked file systems by mistake. ! 845: .IP [11] ! 846: If you are initiating backup via ! 847: .I cron (8), ! 848: add the following command to ! 849: .CW crontab : ! 850: .P1 ! 851: eval "/usr/lib/filemotel/sel | \*(cr ! 852: /usr/lib/filemotel/act 2>&1" | \*(cr ! 853: mail backup ! 854: .P2 ! 855: The exact format varies from system to system; ! 856: the File Motel administrator should tell you what time to set it off. ! 857: .IP ! 858: If your client's backup is initiated from the server system, ! 859: you will have to add the line for ! 860: .CW fmclient ! 861: to your flavour of IPC services file. ! 862: If you communicate to the server by TCP/IP ! 863: (that is, your ! 864: .CW CONF ! 865: file has ! 866: .CW IPC=socket ), ! 867: get the ! 868: .I fmclient ! 869: line from the file ! 870: .CW tcp.inetd ! 871: and add it to ! 872: .CW /etc/inetd.conf ! 873: (some systems use ! 874: .CW /usr/etc/inetd.conf ) ! 875: and add the ! 876: .I fmclient ! 877: line from the file ! 878: .CW tcp.services ! 879: and add it to ! 880: .CW /etc/services . ! 881: You then need to prod ! 882: .I inetd ! 883: to look at the new files (commonly by sending it a hangup signal). ! 884: On some systems, like SunOS, you may need to prod name servers ! 885: such as the Yellow Pages as well. ! 886: .IP ! 887: If you use V10 IPC, ! 888: add the corresponding line for ! 889: .CW fmclient ! 890: from ! 891: .CW ipc.V10 ! 892: to ! 893: .CW /usr/ipc/lib/serv.local . ! 894: The files ! 895: .CW tcp.inetd ! 896: and ! 897: .CW ipc.V10 ! 898: are made by ! 899: .P1 ! 900: cd $FMSRC; mk ipc.list ! 901: .P2 ! 902: .NH 2 ! 903: Some Common Installation Problems ! 904: .PP ! 905: As a general rule, keep an eye on the log file (on the server) ! 906: when setting up the File Motel. ! 907: The most convenient way is a window with a ! 908: .P1 ! 909: tail -f $FM/log ! 910: .P2 ! 911: (\f(CW$FM\fP is the root directory of the File Motel.) ! 912: .PP ! 913: The most common problem is that the basic IPC software doesn't work. ! 914: This affects most programs because they involve calling a service on the ! 915: backup machine. ! 916: That is why the first thing you try to get working is ! 917: .I logger ! 918: which sends messages to the logger process on the backup machine. ! 919: The kinds of bugs I have seen here are typically bugs in the networking code, ! 920: particularly TCP/IP. ! 921: For example, the ! 922: .I logprint ! 923: function expects an acknowledgement from the logger server ! 924: to indicate that everything went okay. ! 925: On at least two of the systems I use, this sometimes doesn't happen because ! 926: the closing of the socket by the logger server after sending the ack ! 927: seems to speed pass the ack and get to ! 928: .I logprint ! 929: first. ! 930: Naturally, ! 931: .I logprint ! 932: complains, as might we all. ! 933: The best solution is to fix the TCP/IP implementation; failing that, ! 934: you might try a judicious sleep between the ack and the close in ! 935: the logger server. ! 936: This is only one example of a general class of timing problems. ! 937: .PP ! 938: Another fertile field of failed implementations have to do ! 939: with user and system names. ! 940: The File Motel tries to check the validity of system and user names ! 941: and denies service if there appears to be something sleazy going on. ! 942: Regrettably, some otherwise sound TCP/IP implementations resemble sleaze. ! 943: For example, a user ! 944: .CW mary ! 945: on a client may appear on the server as the user ! 946: .CW bill . ! 947: Or a system may have a system name that is unrelated to the name ! 948: the networking code uses. ! 949: An attempt is made to cope with these cases, but may fail with ! 950: unexpectedly bizarre implementations. ! 951: If worst comes to worst, simply turn off all the checking ! 952: and hope no one does anything naughty. ! 953: (Even if you do this, think hard about allowing remote users ! 954: to claim they are ! 955: .CW root ; ! 956: they will be able to look at all sorts of things.) ! 957: Unlike X, ! 958: I implement function and policy. ! 959: However, all the checking is done in one place ! 960: .CW serv_$IPC.c ); ( ! 961: feel free to do whatever you like, ! 962: it's your Motel now. ! 963: .de XX ! 964: .IP \\f(CW\\$1\\fP ! 965: .br ! 966: .. ! 967: .NH 2 ! 968: Configuration and Compiling Options ! 969: .PP ! 970: The File Motel software is designed for easy installation in heterogeneous ! 971: environments. ! 972: The configuration details described below are stored in the file ! 973: .CW $FMSRC/CONF . ! 974: .CW $FMSRC ! 975: contains a number of files ! 976: containing settings for various systems; you may want to use ! 977: one of these as a starting point. ! 978: (Remember that the following information has a small half-life; ! 979: the truth should be in the online copy of this manual.) ! 980: The most obvious aspect of configuring the File Motel means choosing on ! 981: the three directories where files live. ! 982: The source directory, ! 983: .CW FMSRC , ! 984: has been described above. The other two are ! 985: .XX FMLIB=/usr/lib/filemotel ! 986: Change this to wherever you want to put the subprograms. ! 987: .XX FMBIN=/usr/bin ! 988: The directory for the (only) user-called command, ! 989: .CW backup . ! 990: .LP ! 991: Configuring the source to your environment is mostly done with ! 992: .I mk ! 993: variables and an interface library in ! 994: .CW src/sys/\f2system . ! 995: The ! 996: .I mk ! 997: variables are ! 998: .XX RANLIB=ranlib ! 999: Some systems, mostly BSD-based, whine incessantly unless archive libraries ! 1000: are processed with some program typically called ! 1001: .I ranlib . ! 1002: In this case, set ! 1003: .CW RANLIB=ranlib ; ! 1004: otherwise, say if you are on a System V machine, use a harmless program ! 1005: such as ! 1006: .CW RANLIB=: . ! 1007: .XX IPC=socket ! 1008: Select your favorite type of IPC. ! 1009: Different clients can use different types and the client's type ! 1010: need not match the backup system. ! 1011: (For example, in our Center, ! 1012: the Cray talks to us via TCP/IP but we talk to it via Datakit.) ! 1013: The only choices are ! 1014: .CW socket ! 1015: and ! 1016: .CW v10 . ! 1017: .XX IPCLIB= ! 1018: Set this if you need a special library in order to use your flavor of IPC. ! 1019: For example, on V10 systems set ! 1020: .CW IPCLIB=-lipc . ! 1021: .XX LIBTYPE=a ! 1022: This should be set to ! 1023: .CW a ! 1024: unless you are on the Cray (which doesn't have archives yet!) ! 1025: when it should be ! 1026: .CW o . ! 1027: .XX COMPAT= ! 1028: Set ! 1029: .CW COMPAT=.compat ! 1030: if you want to be able to process older File Motel files. ! 1031: (You may have to work hard to get this to work on some systems; ! 1032: I gave up on the Cray.) ! 1033: .XX SECTYPE= ! 1034: Set ! 1035: .CW SECTYPE=v9 ! 1036: if you are running a McIlroy-Reeds compatible security kernel. ! 1037: .XX WORMFACE=uda ! 1038: If you are running the WORM software, you need to say what kind of interface ! 1039: the WORM is attached to. ! 1040: The other option ! 1041: (and the best if you just want to compile without thinking too hard) is ! 1042: .CW scsi . ! 1043: The latter may need customizing at your site. ! 1044: .LP ! 1045: Currently the system dependent interface library includes the following routines: ! 1046: .TS ! 1047: center; ! 1048: lFCW l. ! 1049: dirtoents convert directory to element names ! 1050: ftw traverse file tree ! 1051: nofile number of fd's available ! 1052: sysname system name ! 1053: username user's login name ! 1054: rx_$IPC call a remote service ! 1055: serv_$IPC receive calls ! 1056: service service/socket mapping details ! 1057: dateadjust do daylight savings/timezone ! 1058: .TE ! 1059: .LP ! 1060: There are a small number of ! 1061: .CW #define 's ! 1062: inside ! 1063: .CW .c ! 1064: files. ! 1065: .XX -DSTRINGH="'<string.h>'" ! 1066: Define the value to be the string function header file. ! 1067: .XX -DNO_NETNAME ! 1068: Define this to disable saving and restoring files through ! 1069: .CW /n/machine-name ! 1070: although they will still be stored with that prefix. ! 1071: .NH ! 1072: Installing the File Motel on a Server System ! 1073: .PP ! 1074: The source comes in both ! 1075: .I cpio ! 1076: and ! 1077: .I tar ! 1078: formats. ! 1079: As with the client source installation, note that the following description ! 1080: is dated and the online copy may be significantly different in detail. ! 1081: .IP [1] ! 1082: Follow the client installation process steps 1\-8. ! 1083: You also need to set the place where the administrative binaries are kept. ! 1084: I do it this way: ! 1085: .P1 ! 1086: FMAB=$FM/bin ! 1087: export FMAB ! 1088: .P2 ! 1089: .IP [2] ! 1090: Complete ! 1091: .CW $FMLIB/conf . ! 1092: You have to specify the default media type and the root of the administrative ! 1093: file tree (denoted by ! 1094: .CW $FM ! 1095: below). ! 1096: Details are in ! 1097: .I backup (5); ! 1098: my File Motel has this configuration: ! 1099: .P1 ! 1100: wild ! 1101: j ! 1102: /usr/backup ! 1103: .P2 ! 1104: .IP [3] ! 1105: Everything that doesn't need to run as ! 1106: .CW root ! 1107: should run as an otherwise unused id. ! 1108: By default, this is ! 1109: .CW fmdaemon ; ! 1110: if you don't like this, change the define in ! 1111: .CW libfm/server.c . ! 1112: Whatever you choose, set up an account for them; ! 1113: the File Motel requires nothing but their name/uid ! 1114: (not even a login directory). ! 1115: By default, all the shell scripts send mail to the mailbox ! 1116: .CW backup . ! 1117: This should be set to an alias for the File Motel caretaker. ! 1118: .IP [4] ! 1119: Inform your IPC system of the many services the File Motel offers. ! 1120: See the notes under step 11 in the client installation above but ! 1121: install everything, not just ! 1122: .CW fmclient . ! 1123: (See step 10 below as well.) ! 1124: You also need to set up the periodic (normally nightly) ! 1125: calling of clients and/or ! 1126: the processing of their files by ! 1127: .CW $FB/munge . ! 1128: .I Munge ! 1129: needs the name of a program to copy the files to your backup media; ! 1130: set the variable ! 1131: .CW PROCPERM ! 1132: to that program's name. ! 1133: As described previously, you also need to periodically backup the databases ! 1134: with ! 1135: .CW backupdb ; ! 1136: it also needs the name of the program to copy files to your media. ! 1137: .IP [5] ! 1138: Initialise the log file: ! 1139: .P1 ! 1140: > $FM/log; chown bin $FM/log ! 1141: chmod 644 $FM/log ! 1142: .P2 ! 1143: .IP [6] ! 1144: Install the server programs: ! 1145: .P1 ! 1146: mk server ! 1147: .P2 ! 1148: .IP [7] ! 1149: Setup the receiving areas. ! 1150: List their names in ! 1151: .CW $FM/adm/rcvdirs ! 1152: and initialise each are by running ! 1153: .P1 ! 1154: $FB/rcvdirs ! 1155: .P2 ! 1156: We use one 120MB file system mounted on ! 1157: .CW $FM/rcv . ! 1158: .IP [8] ! 1159: Allocate the safe area for backup copies. ! 1160: It must have the name ! 1161: .CW $FM/v ! 1162: but may be a symbolic link if there is not enough space in ! 1163: .CW $FM . ! 1164: We use an identically sized file system to the receive area mounted on ! 1165: .CW $FM/v . ! 1166: .IP [9] ! 1167: After deciding which databases you want maintained, ! 1168: initialise the databases with ! 1169: .P1 ! 1170: src/dbinit.sh ! 1171: .P2 ! 1172: You may want to start off with all three and remove any you don't want later on ! 1173: (like when they get to be too big). ! 1174: .IP [10] ! 1175: Choose how the receiving process ! 1176: .I rcv ! 1177: works. ! 1178: By default, it simply accepts files. ! 1179: If it is invoked by the name ! 1180: .CW mrcv , ! 1181: it initiates processing of the received by ! 1182: .I munge ! 1183: (or more accurately, ! 1184: .CW $FL/callmunge ) ! 1185: after the first and last files have been received ! 1186: (you need both in case any one file took longer to receive than ! 1187: .I munge 's ! 1188: cycle time). ! 1189: The advantage is that you will almost never run out of space, as you will be ! 1190: processing files at the same time as receiving them. ! 1191: The disadvantage is that everything will run slower. ! 1192: I use the default behavior; we rarely run out of space and I like to ! 1193: investigate why some client is sending much more than normal ! 1194: before accepting it all. ! 1195: .IP [11] ! 1196: Finish the client installation starting with step 10, making sure you ! 1197: do not backup the receiving areas or ! 1198: .CW $FM/v . ! 1199: .IP [12] ! 1200: Add the command ! 1201: .P1 ! 1202: $FB/rmlocks ! 1203: .P2 ! 1204: to ! 1205: .CW /etc/rc ! 1206: (or whatever passes for system startup on your system). ! 1207: This simply removes any lockfiles in ! 1208: .CW $FM/locks . ! 1209: .NH ! 1210: Media Management ! 1211: .PP ! 1212: An attempt has been made to provide generic media management programs. ! 1213: For example, the recovery servers ! 1214: .I fetchw_ ! 1215: and ! 1216: .I fetchj_ ! 1217: are instances of a single device server and a jukebox server respectively. ! 1218: To make this work, a media library is used. ! 1219: To use a new media, such as Exabyte tapes, ! 1220: implement the routines in the library, and link the library with ! 1221: .CW fm/media_.o ! 1222: or ! 1223: .CW fm/mmedia_.o . ! 1224: An informal description of the routines follow. ! 1225: .LP ! 1226: .CW "mediainit(char *device, char *vol_id)" ! 1227: .ti +5n ! 1228: Initialise the media on the device specified by ! 1229: .I device . ! 1230: The latter may a full name or any recognizable abbreviation. ! 1231: If ! 1232: .I vol_id ! 1233: is given, it is checked against the media present. ! 1234: .LP ! 1235: .CW "char *mediamount(char *vol_id)" ! 1236: .ti +5n ! 1237: Mount the media named ! 1238: .I vol_id ! 1239: and return the appropriate device ! 1240: (suitable for use by ! 1241: .I mediainit ). ! 1242: Currently, the values given as ! 1243: .I vol_id ! 1244: are those returned by ! 1245: .I medianame ! 1246: (below). ! 1247: .LP ! 1248: .CW "medianame(char *volume)" ! 1249: .ti +5n ! 1250: Return the media name containing ! 1251: .I volume . ! 1252: .LP ! 1253: .CW "mediaopen(char *name, Media *m)" ! 1254: .ti +5n ! 1255: Set up ! 1256: .I m ! 1257: to point at the backup copy ! 1258: .I name . ! 1259: The fields in a ! 1260: .CW Media ! 1261: include a file descriptor, preferred read block size, and copy size. ! 1262: .LP ! 1263: .CW "void mediafiles(int32 v, int32 n, Media *m, Tb **bp)" ! 1264: .ti +5n ! 1265: Return a Media and a list (in ! 1266: .CW *bp ) ! 1267: of backup copy pointers for all backup copies more recent than ! 1268: file ! 1269: .I n ! 1270: in volume ! 1271: .I v . ! 1272: A ! 1273: .CW Tb ! 1274: has the creation time and initial (1K) block number for a backup copy. ! 1275: (It is used by ! 1276: .I dbupdate ). ! 1277: The size returned in ! 1278: .I m ! 1279: is not actually a size but the number of records in ! 1280: .CW *bp . ! 1281: .PP ! 1282: This is not a complete description; ! 1283: if you have to write new versions of these routines, ! 1284: look at the existing implementations (in ! 1285: .CW $FMSRC/media ) ! 1286: and the programs that use them ! 1287: (all in ! 1288: .CW $FMSRC/fm ). ! 1289: .NH ! 1290: References ! 1291: .LP ! 1292: |reference_placement
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.