Annotation of researchv10dc/vol2/fm/fm.ms, revision 1.1.1.1

1.1       root        1: .so ../ADM/mac
                      2: .XX backup 593 "The File Motel: An Owner's Manual"
                      3: .nr dP 2
                      4: .nr dV 3p
                      5: .TL
                      6: The File Motel:
                      7: .br
                      8: An Owner's Manual
                      9: .AU
                     10: Andrew G. Hume
                     11: .AI
                     12: .MH
                     13: .AB
                     14: .PP
                     15: The File Motel is an incremental user-level file backup system for
                     16: .UX
                     17: systems.
                     18: The first version of the File Motel has been in successful operation
                     19: for over two years with three sites supporting about 50 systems.
                     20: The first version supported only Ninth Edition
                     21: .UX
                     22: systems, although with only modest inconvenience
                     23: files could be saved from Sun 3 clients.
                     24: The second version of the File Motel is a complete reworking
                     25: of the original system, emphasizing easy portability to most
                     26: .UX
                     27: systems.
                     28: The files are stored in a machine-independent form;
                     29: as an example, I have recovered a directory onto a Sun 3
                     30: from a server on a MIPS 120/5 that had been originally
                     31: saved from a Cray X/MP\-24.
                     32: The user and administrative interfaces have been streamlined,
                     33: based on experience in the field.
                     34: .PP
                     35: The system has been restructured to look like a kit.
                     36: Most of the modules, such as the database, networking and media code,
                     37: have been isolated via simple interface routines.
                     38: As with a kit, you may not find exactly what you want,
                     39: but it should be easy to roll your own.
                     40: .AE
                     41: .2C
                     42: .NH
                     43: An Overview
                     44: .PP
                     45: This is a manual for the File Motel |reference(file motel usenix),
                     46: a backup system for
                     47: .UX
                     48: systems.
                     49: The File Motel consists of a central server system
                     50: servicing many client systems.
                     51: The server system is almost always also a client system.
                     52: The File Motel saves only the files that change on any given client system,
                     53: using a database to record what versions
                     54: have been saved for any particular file.
                     55: Under normal usage patterns, this is on the order of 1\-5% of the user files
                     56: on the client.
                     57: This makes backup practical over slow networks to slow backup media.
                     58: .PP
                     59: The daily routine in the File Motel starts around midnight when
                     60: the clients send copies of any new or recently modified files to the server machine.
                     61: After receiving the files from all the clients,
                     62: a separate processing step transforms the received files into
                     63: backup copies, which are then written to the backup media
                     64: of your choice (typically, WORM disks).
                     65: Backup and recovery can be performed by anyone with the appropriate permissions;
                     66: in general, there is no administrative overhead other than
                     67: fussing with backup media.
                     68: .PP
                     69: This description may be a little clearer with some details.
                     70: The following description includes
                     71: some sample numbers from our Center's File Motel;
                     72: other sites will differ.
                     73: The first step is a client sending files to the server.
                     74: A shell script configured for the client generates a list of (say 5000)
                     75: candidates for backup (say, all the files changed in the last week).
                     76: This list is sent to the server which returns a list of the files (say 900)
                     77: that really need to be backed up.
                     78: Each of these files is then transmitted to the server together with a header
                     79: (which includes a checksum).
                     80: (On average, about 15MB is sent taking about 20 minutes.)
                     81: There is an acknowledgement from the server after every file;
                     82: this allows graceful termination when the server has problems,
                     83: such as running out of space.
                     84: The redundancy in the candidate list allows non-critical clients
                     85: to cope with transient faults (such as a broken network) without
                     86: administrative intervention by ignoring the fault
                     87: and getting the files the following night.
                     88: The client process is normally initiated either by the server machine
                     89: or by
                     90: .I cron (8).
                     91: Exactly the same mechanism is used for user-initiated backups;
                     92: the only difference is that the system backup is executed by the super-user
                     93: (and thus has access to all the files on the client system).
                     94: .PP
                     95: The files sent by the clients are kept in receiving areas each of
                     96: 32 subdirectories, each processed in turn.
                     97: First, the program
                     98: .I sweep
                     99: deletes any unnecessary files, assigns each backup copy a name
                    100: (which is stored in the file's header) and recalculates the file's checksum.
                    101: The remaining files are fed to
                    102: .I dbupdate
                    103: which deletes any unnecessary files and stores the version information
                    104: in the database.
                    105: Finally, the surviving files are moved to a staging area for writing to the
                    106: backup media.
                    107: .PP
                    108: The last step is writing the backup copies to the backup media.
                    109: The only medium currently supported is a WORM disk.
                    110: In our environment they are preferred because of their large capacity
                    111: and because you can get reliable jukeboxes (automatic disk changers).
                    112: Optical jukeboxes come in all sorts of sizes; our center has a
                    113: SONY WDA 3000-10 with a total capacity of 164GB (328GB after October 1989).
                    114: .PP
                    115: There are many other programs in the File Motel,
                    116: some intended for the user (for example, recovering files),
                    117: and others for the administrator (usage statistics, backing up the database).
                    118: .PP
                    119: The rest of this manual is intended for the caretaker of the File Motel.
                    120: Section 2 details some of the
                    121: peculiar aspects of the File Motel that have caused problems in the past;
                    122: if you can survive these,
                    123: then installing and running the File Motel
                    124: ought to be easy.
                    125: If there are incompatibilities, then installing the File Motel
                    126: will require (perhaps substantial) work.
                    127: The File Motel uses many small single-purpose tools;
                    128: if you need to figure out what is going on (or wrong, as the case may be),
                    129: these tools are described in Section 3.
                    130: Sections 4 and 5 are step by step instructions for installing
                    131: a client and a server respectively.
                    132: Finally, Section 6 elaborates on media management.
                    133: .NH
                    134: Some Things You Should Know
                    135: .PP
                    136: .de BL
                    137: .IP \ \ \ \ \s+3\(bu\s-3
                    138: ..
                    139: This section describes some of the assumptions underlying the
                    140: construction of the File Motel software.
                    141: Most of these assumptions have caused problems in porting to
                    142: systems less hospitable than 10th Edition
                    143: .UX
                    144: (or V10 for short).
                    145: .BL
                    146: Each server has one global name space for all the files saved from all the clients.
                    147: The file
                    148: .I z
                    149: from machine
                    150: .I mach
                    151: is stored under the name
                    152: .I /n/mach/z .
                    153: It so happens that this is how V10 networked file systems are normally mounted
                    154: and in fact, all file references actually go through this network name.
                    155: For other systems, you should define the
                    156: .CW -DNO_NETNAME
                    157: switch as described in section 5.
                    158: .BL
                    159: All client\-service communication uses a uniform networking interface.
                    160: That is, a system invokes a service on the remote machine and gets a
                    161: pair of file descriptors attached to the input and output of that service.
                    162: Both Berkeley-style sockets and V10-style IPC are supported.
                    163: For the case of a single system that is both a server and client
                    164: and has no networking, you will have to write an execution service that
                    165: constructs pipes to the desired services.
                    166: Note that it is possible to provide this interface even if all
                    167: the networking  you have is a user program (such as
                    168: .I rx
                    169: or
                    170: .I rsh )
                    171: that executes a program on another machine.
                    172: .BL
                    173: It must be possible to nominate the user that the remote service runs as.
                    174: Most run as a regular user, say
                    175: .I fmdaemon ,
                    176: but some must run as superuser and on V10 systems, one must run as
                    177: .I bin .
                    178: .BL
                    179: The code is reasonably portable; with the canned configuration files
                    180: it runs on a Cray X-MP/24 (UNICOS),
                    181: VAX 11/750, Microvax II, 8550, 8600 and 11/780 (V10, Ultrix, 4.3BSD),
                    182: Sun 3 (SunOs 4.0),
                    183: MIPS M120/5, M2000 (UMIPS 3.0, 3.10, RISC/os 4.0).
                    184: Some of the code, notably Ken Thompson's new version of
                    185: .CW doprint ,
                    186: makes assumptions about variable argument lists.
                    187: So far, the code has continued to work on all the systems we have tried
                    188: (although we can't optimise on the MIPS)
                    189: but in this world of perverse hardware and compilers, you may not be so lucky.
                    190: .BL
                    191: The code assumes no particular byte-ordering but does assume that there
                    192: is an integer type of at least 32 bits.
                    193: By and large, the programs allocate all data areas dynamically;
                    194: whenever there is choice, programs trade space for smaller runtime,
                    195: so there must be at least 24 bits of data space.
                    196: If you have a 32 bit machine but have 16 bit ints,
                    197: you will have trouble (perhaps
                    198: .I lint
                    199: will help).
                    200: .BL
                    201: It is assumed that the backup medium can hold at least one backup copy
                    202: and practically, it should hold at least one volume.
                    203: This is about 20MB by default; if you have smaller backup media
                    204: and cannot arrange better, change the volume size \(em it is
                    205: a constant defined in
                    206: .CW fm/sweep.c .
                    207: .BL
                    208: The File Motel depends on each file having a unique name.
                    209: This continue to cause problems, particularly in the presence of symbolic links.
                    210: For example, on a system I use
                    211: .CW /usr/andrew
                    212: is a symbolic link
                    213: .CW /usr2/guest/andrew .
                    214: The right thing to do is to save files under
                    215: .CW /usr/andrew
                    216: (so that you can move them from file system to file system and keep their name).
                    217: Yet, the user may not be aware of this name; if they do a
                    218: .I pwd
                    219: to find out, they will get the wrong answer.
                    220: .NH
                    221: A Detailed Description
                    222: .PP
                    223: The action in the File Motel can be functionally divided into four areas:
                    224: client selecting and sending files to the server,
                    225: the server processing the client files onto backup media,
                    226: client recovering files back from the server,
                    227: and an assortment of administrative functions.
                    228: .PP
                    229: Programs and scripts used by the File Motel live in three places:
                    230: .CW /usr/bin/backup
                    231: is the user interface,
                    232: .CW /usr/lib/filemotel
                    233: holds all the programs and scripts used by clients,
                    234: and
                    235: .CW /usr/filemotel/bin
                    236: holds the server-specific programs.
                    237: These are the conventional names \(em they can be reconfigured to taste.
                    238: Because of this and their length,
                    239: these abbreviations will be used in the following text:
                    240: .TS
                    241: center;
                    242: lFCW lFCW.
                    243: $FM    /usr/filemotel
                    244: $FB    /usr/filemotel/bin
                    245: $FL    /usr/lib/filemotel
                    246: .TE
                    247: .NH 2
                    248: Client Sends Files to the Server
                    249: .PP
                    250: The controlling script here is
                    251: .CW $FL/doclient :
                    252: .P1
                    253: #!/bin/sh
                    254: $FL/sel | $FL/act
                    255: .P2
                    256: The selection script
                    257: .I sel
                    258: has to generate a list of absolute filenames.
                    259: You can use any tools available to you; the File Motel supplies
                    260: the program
                    261: .I fcheck
                    262: which is rather more efficient than
                    263: .I find (1)
                    264: and follows symbolic links that are arguments.
                    265: This is to help clients save files as
                    266: .CW /usr/andrew/...
                    267: rather than the less than informative
                    268: .CW /usr2/guest/andrew/... .
                    269: A small
                    270: .I sel
                    271: file is shown below.
                    272: .KF
                    273: .P1 0
                    274: /usr/lib/filemotel/fcheck 512 7 /etc /usr/* |
                    275: sed -e '/\e.o$/d
                    276: /\e/a\e.out$/d
                    277: /\e/core$/d
                    278: /\e/foo$/d
                    279: /^\e/usr\e/tmp\e//d
                    280: /^\e/usr\e/spool\e//d'
                    281: cat <<EOF
                    282: /unix
                    283: EOF
                    284: .P2
                    285: .KE
                    286: .PP
                    287: The script
                    288: .I act
                    289: works in a straightforward way.
                    290: First, the filenames are transformed into the input format for
                    291: .I missing
                    292: by the program
                    293: .I iprint .
                    294: This prepends
                    295: .CW /n/\fImachine
                    296: to the filename (unless this is already there) and appends the
                    297: inode change time and size.
                    298: There is a convention that an input filename starting with a
                    299: .CW //
                    300: is a symbolic link to be followed (that is, use
                    301: .I stat (2)
                    302: rather than
                    303: .I lstat (2)
                    304: to get the time and size).
                    305: The size is carried around so that if you choose a file because it is small
                    306: and it grows dramatically while you are asking about it, you can reject it
                    307: later on (although this is not done now because no one cares yet).
                    308: .I Missing
                    309: takes these names and ships them to the corresponding server
                    310: .I missing_
                    311: on the server machine.
                    312: (Servers for a service
                    313: .CW abc
                    314: are called
                    315: .CW abc_ ).
                    316: .I Missing_
                    317: checks the name,time tuples against the database and sends back
                    318: the lines that are newer than the entry in the database.
                    319: Transmissions in both directions are checksummed; any errors
                    320: are reported to standard error and are also logged in the log file
                    321: on the server machine.
                    322: .PP
                    323: The results from
                    324: .I missing
                    325: are stored in
                    326: .CW $FL/files.\fIday\fP .
                    327: They are given to
                    328: .I fmpush
                    329: which actually pushes them to the backup system.
                    330: .I Fmpush
                    331: also takes a system name argument for logging purposes.
                    332: If there are any errors,
                    333: .I fmpush
                    334: reports the error and the number of files transmitted.
                    335: This allows the push to be restarted efficiently:
                    336: .P1
                    337: $ pwd
                    338: $FL
                    339: $ fmpush wild < files.Tue
                    340: EOF after 2713 files sent.
                    341: $ sed 1,2713d files.Tue | fmpush wild
                    342: .P2
                    343: Any diagnostics are mailed to the user
                    344: .I backup
                    345: and also kept in
                    346: .CW $FL/files.\fIday\fP.sho .
                    347: It is not necessary to keep these files around after they have been used
                    348: but they are relatively small and often useful;
                    349: for example, a client who normally saves 100 or so files suddenly sends
                    350: you 10,000 files \(em you can quickly go to that client and check
                    351: what the files were.
                    352: When possible, diagnostics are also logged on the backup system.
                    353: .PP
                    354: Only regular files, symbolic links and directories have their contents saved;
                    355: all other files (such as devices) just have their
                    356: .I stat (2)
                    357: buffers saved.
                    358: To preserve machine independence, the content of a directory is saved
                    359: as a list of null-terminated element names.
                    360: This removes the need for the server to be able to guess a
                    361: client's directory structure, although it does lose a small
                    362: amount of subtle information contained in the freed slots of the
                    363: directory.
                    364: .........
                    365: .NH 2
                    366: Server Processes Client's Files
                    367: .PP
                    368: Received files are processed by the script
                    369: .CW $FL/munge .
                    370: This processing is decoupled from either receiving or restoring
                    371: client files; for example,
                    372: it is safe to process files while receiving them.
                    373: Munging is typically started by
                    374: .I cron ,
                    375: but you can also cause
                    376: .I rcv
                    377: to invoke
                    378: .I munge
                    379: automatically,
                    380: and you can invoke
                    381: .I munge
                    382: manually by executing
                    383: .CW $FL/callmunge .
                    384: .PP
                    385: Regardless of how it is called,
                    386: .I munge
                    387: scans the 32 receiving subdirectories in each of the receiving areas in
                    388: .CW $FM/adm/rcvdirs
                    389: looking for files to process.
                    390: If it finds any, it calls a program
                    391: whose name is supplied as
                    392: .CW $PROCPERM
                    393: to copy the final copies
                    394: to the media of your choice.
                    395: It repeats this scan until it found nothing to do during the last scan.
                    396: .PP
                    397: The action within a subdirectory is simple.
                    398: .CW $FB/sweep
                    399: looks for files with mode
                    400: .CW 0 ,
                    401: .CW 0400 ,
                    402: or
                    403: .CW 0600 .
                    404: Mode
                    405: .CW 0
                    406: files are files that are being received
                    407: (\fIrcv\fP
                    408: marks a file as done by changing its mode to
                    409: .CW 0600 )
                    410: and are ignored unless it is hasn't been modified within the
                    411: last 12 hours.
                    412: In this latter case, it is regarded as stale (almost always a network
                    413: connection was dropped) and unlinked.
                    414: Mode
                    415: .CW 0400
                    416: files have been already processed by
                    417: .I sweep
                    418: but for some reason (most often, running out of space)
                    419: weren't copied to the backup area.
                    420: Mode
                    421: .CW 0600
                    422: files are assigned a backup copy name and after recalculating the
                    423: checksum are changed to mode
                    424: .CW 0400 .
                    425: .I Sweep
                    426: emits the names of all the files ready to be copied to the backup area
                    427: and this is saved in a file.
                    428: .I Munge
                    429: then makes any needed directories in the backup area that don't exist.
                    430: .CW $FB/fmmv
                    431: then moves all the files to be copied to the backup area.
                    432: We then update the database with information from the files we just copied.
                    433: .PP
                    434: Updating the database is a two part process.
                    435: Run
                    436: .CW $FB/updatef
                    437: on the files (use the program
                    438: .I updatew
                    439: for files on the WORM)
                    440: and then feed the output to
                    441: .CW $FB/dbupdate .
                    442: In this way, we guarantee that the database is purely a function of
                    443: the backed up files
                    444: (assuming none get lost between the backup area and the backup media).
                    445: The input to
                    446: .I dbupdate
                    447: is (roughly) a sequence of backup file headers and contents of
                    448: backed up directories.
                    449: .I Dbupdate
                    450: updates the various databases (described below) and sometimes tries to unlink
                    451: the backup copies.
                    452: (This happens when two copies of the same file are
                    453: in the same receiving subdirectory.
                    454: .I Sweep
                    455: happily copies both to the backup area but when
                    456: .I dbupdate
                    457: goes to update the main database for the second copy, it discovers
                    458: it already has this copy and so unlinks the second copy.
                    459: It doesn't care if the unlink fails because this is just an attempt
                    460: to be space efficient and in any case, the unlink can fail only if the file
                    461: has already been committed to the backup media.)
                    462: .I Dbupdate
                    463: also appends accounting statistics records for each file, containing the time
                    464: the file was saved, the size, the owner and the system name, to the file
                    465: .CW $FM/stat.log .
                    466: .PP
                    467: After
                    468: .I munge
                    469: is finished scanning the receive areas,
                    470: it processes the statistics records generated by
                    471: .I dbupdate
                    472: by
                    473: calling
                    474: .CW $FB/procstats .
                    475: This reads (and then truncates)
                    476: .CW $FM/stat.log
                    477: and adds new records to the files
                    478: .CW $FM/stat/\fIsystem\f(CW .
                    479: These records are in machine independent format and have been collapsed to
                    480: refer to all the files per user/day combination.
                    481: Even in this compressed format,
                    482: the statistics records would grow without bound.
                    483: Accordingly,
                    484: .I munge
                    485: calls
                    486: .CW "procstats -c"
                    487: to further collapse together all the records older than 30 days for each user.
                    488: (The number 30 comes from the only program that looks at these statistics,
                    489: .CW "backup stats" .)
                    490: .PP
                    491: Throughout its work,
                    492: .I munge
                    493: checks to see if it should exit by checking the existence of a guard file;
                    494: this is created by
                    495: .CW $FB/stopmunge .
                    496: .NH 2
                    497: The Databases on the Server
                    498: .PP
                    499: There are three databases kept on the server, all conventionally kept in
                    500: .CW $FM/db .
                    501: The first,
                    502: .CW filemap ,
                    503: is the main and only required database;
                    504: it contains the mappings from filename to last modify date and
                    505: from (filename, modify date) tuple to backup copy name.
                    506: The second,
                    507: .CW dir ,
                    508: is optional and maps (directory, modify date) tuples to their contents.
                    509: It is used to make recovery of file trees go (much) faster.
                    510: The third,
                    511: .CW fs ,
                    512: is optional and maps (filename, modify date) tuples to their
                    513: .I stat
                    514: buffers.
                    515: It is used to implement the backup file system.
                    516: .PP
                    517: The default implementation of these databases is Peter Weinberger's
                    518: compressed B-trees (see
                    519: .I cbt (1)).
                    520: (The compression refers to eliding common prefixes
                    521: of successive keys; it does very well on the pathnames used by the File Motel.)
                    522: The
                    523: .I cbt
                    524: database
                    525: .I db
                    526: consists of two files,
                    527: .I db\f(CW.T\fR
                    528: (the tree part) and
                    529: .I db\f(CW.F\fR
                    530: (the data part).
                    531: As the
                    532: .I cbt
                    533: routines do not reclaim space, the
                    534: .CW .T
                    535: file can start growing at a very fast rate when the tree is large
                    536: (say four levels).
                    537: This has proved to be a real nuisance so there is considerable support
                    538: for periodic squashing of the database
                    539: (which reclaims space by rebuilding the database) and
                    540: for supporting the
                    541: .I filemap
                    542: database as a collection of separate databases.
                    543: .PP
                    544: The latter is intended to be used in the following way.
                    545: The file
                    546: .CW $FM/db/filemap
                    547: is always the current
                    548: .I filemap
                    549: database.
                    550: If the file
                    551: .CW $FM/db/filemaplist
                    552: exists, it is taken as a list of database names,
                    553: one per line in oldest to newest order, to be used in addition to
                    554: .CW $FM/db/filemap .
                    555: These are searched only, never updated.
                    556: At our site, we produce one of these databases for about every 15\-16GB
                    557: of backup files.
                    558: .NH 2
                    559: Server Sends Files to the Client
                    560: .PP
                    561: All requests for files go through a central server
                    562: .CW $FB/fetch_ .
                    563: This program simply farms out work to other programs.
                    564: .I Fetchf
                    565: attempts to find files that are still under
                    566: .CW $FM/v .
                    567: Systems with plenty of mass storage can leave the backup copies
                    568: online and things will go quite fast.
                    569: For the files that
                    570: .I fetchf
                    571: can't find,
                    572: .I fetch_
                    573: looks in the configuration file
                    574: (\f(CW$FL/conf\fP)
                    575: to determine the backup media (say
                    576: .CW j
                    577: for jukebox).
                    578: It then calls
                    579: .CW $FB/fetchj
                    580: with the appropriate filenames.
                    581: If you just have a WORM drive, you should use
                    582: .CW $FB/fetchw
                    583: instead.
                    584: These two programs purport to be generic drivers for jukeboxes
                    585: and single drives; if you have different media (say an Exabyte tape),
                    586: you should be able simply to load the drivers with your media library
                    587: to generate the appropriate fetch program.
                    588: More details are given below in the section on media management.
                    589: .PP
                    590: Users can generally access any files they have read permission for,
                    591: regardless of what system they are on or the system from which the files were
                    592: stored from.
                    593: In addition, we trust our users (or more importantly, our network)
                    594: and so we do no checking of a user's right to retrieve files.
                    595: Such checking, such as a password, can easily be added to the startup
                    596: protocol between the program the user calls (\f(CW$FL/fetch\fP)
                    597: and the server (\f(CW$FB/fetch_\fP).
                    598: .NH 2
                    599: Administrivia
                    600: .PP
                    601: This section is a bunch of administrative odds and ends for the way we organise
                    602: the File Motel in our Center.
                    603: Your details may be different, and indeed ours change over time,
                    604: but the examples are probably helpful.
                    605: .NH 3
                    606: File Layout
                    607: .PP
                    608: We store the File Motel under the directory
                    609: .CW /usr/backup ,
                    610: which is a file system large enough to hold comfortably the current
                    611: databases and a squashed version (more on this later).
                    612: The receiving area is another smallish file system (about 120MB) mounted on
                    613: .CW /usr/backup/rcv
                    614: and the holding area
                    615: .CW /usr/backup/v
                    616: is another file system of the same size.
                    617: This is done to isolate the effects of client excesses;
                    618: the sending processes all know how to deal with running out of space
                    619: (we practice often).
                    620: I regard running out of space once a month as tolerable;
                    621: once a week is too much.
                    622: To aid searches for files, we keep a file
                    623: .CW /usr/backup/filenames ), (
                    624: which is a sorted list of all filenames.
                    625: This is maintained by the database squasher.
                    626: .PP
                    627: The main drawback to Weinberger's B-tree software is that it does
                    628: not reclaim space in the tree.
                    629: Thus, over time the tree file gets huge
                    630: (the rate grows as the depth of the tree).
                    631: The fix is to periodically squash the tree.
                    632: We combine this with dumping the database to WORM disk in the script
                    633: .CW $FB/backupdb .
                    634: .NH 3
                    635: Talking to the Clients
                    636: .PP
                    637: We have found it best to call the clients rather than have them call us.
                    638: The load seems more balanced and things get done sooner.
                    639: We use
                    640: .I mk
                    641: as it handles parallel processing; a typical mkfile is
                    642: .P1 0
                    643: CLIENTS=Cwild C3k Ctcp!tempel
                    644: NPROC=3
                    645: 
                    646: clients:VQ:    $CLIENTS
                    647:        PROCPERM=$FB/toworm $FB/munge
                    648: 
                    649: C%:VQ:
                    650:        set +e; $FB/callclient $stem; exit 0
                    651: .P2
                    652: Understanding this completely requires familiarity with
                    653: .I mk
                    654: but the intent is clear.
                    655: We first get the files from the clients by the
                    656: .CW C%
                    657: rule and then process them by
                    658: .CW munge
                    659: and then put them out on WORM disk by
                    660: .CW toworm .
                    661: As we use Datakit, most clients are called using Datakit but some
                    662: (like
                    663: .CW tempel
                    664: in the mkfile) are called using TCP/IP.
                    665: This convenient piece of magic works on V9 because of Dave Presotto's
                    666: clever design of the IPC system; you may have to work harder.
                    667: The
                    668: .CW set
                    669: stuff in the
                    670: .CW C%
                    671: rule means to keep on processing even if a client gets an error.
                    672: The entry in
                    673: .CW /etc/crontab
                    674: is more or less
                    675: (this is one physical line folded at the \*(cr because of the column width)
                    676: .P1 0
                    677: eval "cd /usr/backup/adm; mk clients 2>&1" |\*(cr
                    678: mail backup
                    679: .P2
                    680: We use the
                    681: .CW backup
                    682: mailbox to redirect mail to someone appropriate.
                    683: .NH 2
                    684: Disasters
                    685: .PP
                    686: Currently, the only disaster we have had that was not the result of a kernel bug
                    687: is running out of space;
                    688: this is either inconvenient or quite bad.
                    689: Running out of space in the receiving or safe areas is just inconvenient.
                    690: By default, the client's
                    691: .I fmpush
                    692: stops when the receiving area runs out of space after saying how many files got
                    693: transmitted.
                    694: This is enough information to resend the rest when convenient.
                    695: Alternatively, you can change
                    696: .CW $FL/act
                    697: to give
                    698: .I fmpush
                    699: the
                    700: .CW -r
                    701: flag; this means that it will retry sending files every hour or so
                    702: until it succeeds.
                    703: Running out of space in the holding area is also not too bad;
                    704: eventually
                    705: .I munge
                    706: will put the holding area onto the backup media and then cycle through
                    707: the receiving area again.
                    708: .PP
                    709: The worst effect of running out of space is ruining your database.
                    710: (This happens rarely for us as
                    711: we keep our databases on a file system apart from the receiving/holding areas.)
                    712: Rebuilding the database is not too hard.
                    713: First, find out the next backup name to be assigned
                    714: (by a
                    715: .CW "sweep -n"
                    716: or by examining the backup media and holding areas).
                    717: Then, get the most recent backup copy of your database and install it.
                    718: Set the next backup name you found in the first step with
                    719: .CW "sweep -s" .
                    720: You then need to extract the database information for each file added
                    721: to the database since the backup copy of the database was made.
                    722: The starting file name is stored in the
                    723: .CW .N
                    724: file by
                    725: .I backupdb .
                    726: The program
                    727: .I updatew
                    728: will extract this from files on a WORM, and
                    729: .I updatef
                    730: from regular disk files.
                    731: The result is fed to
                    732: .I dbupdate
                    733: as done in
                    734: .I munge .
                    735: .NH
                    736: Installing the File Motel on a Client System
                    737: .PP
                    738: The following instructions assume you have the source
                    739: .CW fm.cpio
                    740: somewhere, say
                    741: .CW /tmp/fm.cpio .
                    742: Note also that these instructions will change over time;
                    743: you must follow the online copy of this document included with the source.
                    744: .IP [1]
                    745: You will need version 3 of
                    746: .I mk
                    747: (or any version
                    748: dated later than Mar 11, 1989).
                    749: .IP [2]
                    750: Select the root directory for the source, set the
                    751: environment variable
                    752: .CW FMSRC
                    753: to its name
                    754: and export
                    755: .CW FMSRC .
                    756: For example,
                    757: .P1
                    758: FMSRC=/usr/filemotel/src
                    759: export FMSRC
                    760: .P2
                    761: .IP [3]
                    762: Install the source tree by
                    763: .P1
                    764: cd $FMSRC/..; cpio -iudc < /tmp/fm.cpio
                    765: .P2
                    766: .IP [4]
                    767: Create a
                    768: .CW CONF
                    769: file.
                    770: This describes your installation environment
                    771: and is included in lower-level mkfiles.
                    772: The various switches are described in detail below;
                    773: however, the easiest way is to start with one of the
                    774: sample configuration files in the directory
                    775: .CW $FMSRC/conf .
                    776: .IP [5]
                    777: If necessary, create the repository for client files:
                    778: .P1
                    779: mkdir /usr/lib/filemotel
                    780: .P2
                    781: (this is configurable, see
                    782: .CW FMLIB
                    783: below)
                    784: and if you have not defined
                    785: .CW NO_NETNAME
                    786: in
                    787: .CW CONF ,
                    788: ensure that
                    789: .CW /n/clientname
                    790: is a link to
                    791: .CW / .
                    792: .IP [6]
                    793: Initialise the source tree for compiling by
                    794: .P1
                    795: mk depend
                    796: .P2
                    797: This only needs to be done once.
                    798: If you have to repeat, you can undo this by
                    799: .P1
                    800: mk undepend
                    801: .P2
                    802: .IP [7]
                    803: Compile and install the client software
                    804: by
                    805: .P1
                    806: mk client
                    807: .P2
                    808: This can be repeated as often as you like.
                    809: Only files in
                    810: .CW $FMLIB
                    811: and the file
                    812: .CW $FMBIN/backup
                    813: (these are configurable, see
                    814: .CW FMBIN
                    815: below)
                    816: are affected.
                    817: .IP [8]
                    818: Setup up the dialstring of the server system by
                    819: .P1
                    820: echo server-machine-name > $FMLIB/conf
                    821: .P2
                    822: The name should match the type of IPC you selected in
                    823: .CW CONF .
                    824: For example, 
                    825: .TS
                    826: center;
                    827: c c
                    828: l lFCW.
                    829: IPC    Example
                    830: Datakit        nj/astro/wild
                    831: Datakit        wild
                    832: IP     wild.astro.nj.att.com
                    833: .TE
                    834: .IP [9]
                    835: In theory, you are now operational.
                    836: A couple of small tests are described in the file
                    837: .CW SANITY .
                    838: Some common problems and their cures are described below.
                    839: .IP [10]
                    840: You need to construct the script
                    841: .CW $FMLIB/sel
                    842: which prints the names of files that you want backed up.
                    843: There is a sample script (\f(CWsample.sel\fP) in that directory.
                    844: Be careful not to backup networked file systems by mistake.
                    845: .IP [11]
                    846: If you are initiating backup via
                    847: .I cron (8),
                    848: add the following command to
                    849: .CW crontab :
                    850: .P1
                    851: eval "/usr/lib/filemotel/sel | \*(cr
                    852: /usr/lib/filemotel/act 2>&1" | \*(cr
                    853: mail backup
                    854: .P2
                    855: The exact format varies from system to system;
                    856: the File Motel administrator should tell you what time to set it off.
                    857: .IP
                    858: If your client's backup is initiated from the server system,
                    859: you will have to add the line for
                    860: .CW fmclient
                    861: to your flavour of IPC services file.
                    862: If you communicate to the server by TCP/IP
                    863: (that is, your
                    864: .CW CONF
                    865: file has
                    866: .CW IPC=socket ),
                    867: get the
                    868: .I fmclient
                    869: line from the file
                    870: .CW tcp.inetd
                    871: and add it to
                    872: .CW /etc/inetd.conf
                    873: (some systems use
                    874: .CW /usr/etc/inetd.conf )
                    875: and add the
                    876: .I fmclient
                    877: line from the file
                    878: .CW tcp.services
                    879: and add it to
                    880: .CW /etc/services .
                    881: You then need to prod
                    882: .I inetd
                    883: to look at the new files (commonly by sending it a hangup signal).
                    884: On some systems, like SunOS, you may need to prod name servers
                    885: such as the Yellow Pages as well.
                    886: .IP
                    887: If you use V10 IPC,
                    888: add the corresponding line for
                    889: .CW fmclient
                    890: from
                    891: .CW ipc.V10
                    892: to
                    893: .CW /usr/ipc/lib/serv.local .
                    894: The files
                    895: .CW tcp.inetd
                    896: and
                    897: .CW ipc.V10
                    898: are made by
                    899: .P1
                    900: cd $FMSRC; mk ipc.list
                    901: .P2
                    902: .NH 2
                    903: Some Common Installation Problems
                    904: .PP
                    905: As a general rule, keep an eye on the log file (on the server)
                    906: when setting up the File Motel.
                    907: The most convenient way is a window with a
                    908: .P1
                    909: tail -f $FM/log
                    910: .P2
                    911: (\f(CW$FM\fP is the root directory of the File Motel.)
                    912: .PP
                    913: The most common problem is that the basic IPC software doesn't work.
                    914: This affects most programs because they involve calling a service on the
                    915: backup machine.
                    916: That is why the first thing you try to get working is
                    917: .I logger
                    918: which sends messages to the logger process on the backup machine.
                    919: The kinds of bugs I have seen here are typically bugs in the networking code,
                    920: particularly TCP/IP.
                    921: For example, the
                    922: .I logprint
                    923: function expects an acknowledgement from the logger server
                    924: to indicate that everything went okay.
                    925: On at least two of the systems I use, this sometimes doesn't happen because
                    926: the closing of the socket by the logger server after sending the ack
                    927: seems to speed pass the ack and get to
                    928: .I logprint
                    929: first.
                    930: Naturally,
                    931: .I logprint
                    932: complains, as might we all.
                    933: The best solution is to fix the TCP/IP implementation; failing that,
                    934: you might try a judicious sleep between the ack and the close in
                    935: the logger server.
                    936: This is only one example of a general class of timing problems.
                    937: .PP
                    938: Another fertile field of failed implementations have to do
                    939: with user and system names.
                    940: The File Motel tries to check the validity of system and user names
                    941: and denies service if there appears to be something sleazy going on.
                    942: Regrettably, some otherwise sound TCP/IP implementations resemble sleaze.
                    943: For example, a user
                    944: .CW mary
                    945: on a client may appear on the server as the user
                    946: .CW bill .
                    947: Or a system may have a system name that is unrelated to the name
                    948: the networking code uses.
                    949: An attempt is made to cope with these cases, but may fail with
                    950: unexpectedly bizarre implementations.
                    951: If worst comes to worst, simply turn off all the checking
                    952: and hope no one does anything naughty.
                    953: (Even if you do this, think hard about allowing remote users
                    954: to claim they are
                    955: .CW root ;
                    956: they will be able to look at all sorts of things.)
                    957: Unlike X,
                    958: I implement function and policy.
                    959: However, all the checking is done in one place
                    960: .CW serv_$IPC.c ); (
                    961: feel free to do whatever you like,
                    962: it's your Motel now.
                    963: .de XX
                    964: .IP \\f(CW\\$1\\fP
                    965: .br
                    966: ..
                    967: .NH 2
                    968: Configuration and Compiling Options
                    969: .PP
                    970: The File Motel software is designed for easy installation in heterogeneous
                    971: environments.
                    972: The configuration details described below are stored in the file
                    973: .CW $FMSRC/CONF .
                    974: .CW $FMSRC
                    975: contains a number of files
                    976: containing settings for various systems; you may want to use
                    977: one of these as a starting point.
                    978: (Remember that the following information has a small half-life;
                    979: the truth should be in the online copy of this manual.)
                    980: The most obvious aspect of configuring the File Motel means choosing on
                    981: the three directories where files live.
                    982: The source directory,
                    983: .CW FMSRC ,
                    984: has been described above. The other two are
                    985: .XX FMLIB=/usr/lib/filemotel
                    986: Change this to wherever you want to put the subprograms.
                    987: .XX FMBIN=/usr/bin
                    988: The directory for the (only) user-called command,
                    989: .CW backup .
                    990: .LP
                    991: Configuring the source to your environment is mostly done with
                    992: .I mk
                    993: variables and an interface library in
                    994: .CW src/sys/\f2system .
                    995: The
                    996: .I mk
                    997: variables are
                    998: .XX RANLIB=ranlib
                    999: Some systems, mostly BSD-based, whine incessantly unless archive libraries
                   1000: are processed with some program typically called
                   1001: .I ranlib .
                   1002: In this case, set
                   1003: .CW RANLIB=ranlib ;
                   1004: otherwise, say if you are on a System V machine, use a harmless program
                   1005: such as
                   1006: .CW RANLIB=: .
                   1007: .XX IPC=socket
                   1008: Select your favorite type of IPC.
                   1009: Different clients can use different types and the client's type
                   1010: need not match the backup system.
                   1011: (For example, in our Center,
                   1012: the Cray talks to us via TCP/IP but we talk to it via Datakit.)
                   1013: The only choices are
                   1014: .CW socket
                   1015: and
                   1016: .CW v10 .
                   1017: .XX IPCLIB=
                   1018: Set this if you need a special library in order to use your flavor of IPC.
                   1019: For example, on V10 systems set
                   1020: .CW IPCLIB=-lipc .
                   1021: .XX LIBTYPE=a
                   1022: This should be set to
                   1023: .CW a
                   1024: unless you are on the Cray (which doesn't have archives yet!)
                   1025: when it should be
                   1026: .CW o .
                   1027: .XX COMPAT=
                   1028: Set
                   1029: .CW COMPAT=.compat
                   1030: if you want to be able to process older File Motel files.
                   1031: (You may have to work hard to get this to work on some systems;
                   1032: I gave up on the Cray.)
                   1033: .XX SECTYPE=
                   1034: Set
                   1035: .CW SECTYPE=v9
                   1036: if you are running a McIlroy-Reeds compatible security kernel.
                   1037: .XX WORMFACE=uda
                   1038: If you are running the WORM software, you need to say what kind of interface
                   1039: the WORM is attached to.
                   1040: The other option
                   1041: (and the best if you just want to compile without thinking too hard) is
                   1042: .CW scsi .
                   1043: The latter may need customizing at your site.
                   1044: .LP
                   1045: Currently the system dependent interface library includes the following routines:
                   1046: .TS
                   1047: center;
                   1048: lFCW l.
                   1049: dirtoents      convert directory to element names
                   1050: ftw    traverse file tree
                   1051: nofile number of fd's available
                   1052: sysname        system name
                   1053: username       user's login name
                   1054: rx_$IPC        call a remote service
                   1055: serv_$IPC      receive calls
                   1056: service        service/socket mapping details
                   1057: dateadjust     do daylight savings/timezone
                   1058: .TE
                   1059: .LP
                   1060: There are a small number of
                   1061: .CW #define 's
                   1062: inside
                   1063: .CW .c
                   1064: files. 
                   1065: .XX -DSTRINGH="'<string.h>'"
                   1066: Define the value to be the string function header file.
                   1067: .XX -DNO_NETNAME
                   1068: Define this to disable saving and restoring files through
                   1069: .CW /n/machine-name
                   1070: although they will still be stored with that prefix.
                   1071: .NH
                   1072: Installing the File Motel on a Server System
                   1073: .PP
                   1074: The source comes in both
                   1075: .I cpio
                   1076: and
                   1077: .I tar
                   1078: formats.
                   1079: As with the client source installation, note that the following description
                   1080: is dated and the online copy may be significantly different in detail.
                   1081: .IP [1]
                   1082: Follow the client installation process steps 1\-8.
                   1083: You also need to set the place where the administrative binaries are kept.
                   1084: I do it this way:
                   1085: .P1
                   1086: FMAB=$FM/bin
                   1087: export FMAB
                   1088: .P2
                   1089: .IP [2]
                   1090: Complete
                   1091: .CW $FMLIB/conf .
                   1092: You have to specify the default media type and the root of the administrative
                   1093: file tree (denoted by
                   1094: .CW $FM
                   1095: below).
                   1096: Details are in
                   1097: .I backup (5);
                   1098: my File Motel has this configuration:
                   1099: .P1
                   1100: wild
                   1101: j
                   1102: /usr/backup
                   1103: .P2
                   1104: .IP [3]
                   1105: Everything that doesn't need to run as
                   1106: .CW root
                   1107: should run as an otherwise unused id.
                   1108: By default, this is
                   1109: .CW fmdaemon ;
                   1110: if you don't like this, change the define in
                   1111: .CW libfm/server.c .
                   1112: Whatever you choose, set up an account for them;
                   1113: the File Motel requires nothing but their name/uid
                   1114: (not even a login directory).
                   1115: By default, all the shell scripts send mail to the mailbox
                   1116: .CW backup .
                   1117: This should be set to an alias for the File Motel caretaker.
                   1118: .IP [4]
                   1119: Inform your IPC system of the many services the File Motel offers.
                   1120: See the notes under step 11 in the client installation above but
                   1121: install everything, not just
                   1122: .CW fmclient .
                   1123: (See step 10 below as well.)
                   1124: You also need to set up the periodic (normally nightly)
                   1125: calling of clients and/or
                   1126: the processing of their files by
                   1127: .CW $FB/munge .
                   1128: .I Munge
                   1129: needs the name of a program to copy the files to your backup media;
                   1130: set the variable
                   1131: .CW PROCPERM
                   1132: to that program's name.
                   1133: As described previously, you also need to periodically backup the databases
                   1134: with
                   1135: .CW backupdb ;
                   1136: it also needs the name of the program to copy files to your media.
                   1137: .IP [5]
                   1138: Initialise the log file:
                   1139: .P1
                   1140: > $FM/log; chown bin $FM/log
                   1141: chmod 644 $FM/log
                   1142: .P2
                   1143: .IP [6]
                   1144: Install the server programs:
                   1145: .P1
                   1146: mk server
                   1147: .P2
                   1148: .IP [7]
                   1149: Setup the receiving areas.
                   1150: List their names in
                   1151: .CW $FM/adm/rcvdirs
                   1152: and initialise each are by running
                   1153: .P1
                   1154: $FB/rcvdirs
                   1155: .P2
                   1156: We use one 120MB file system mounted on
                   1157: .CW $FM/rcv .
                   1158: .IP [8]
                   1159: Allocate the safe area for backup copies.
                   1160: It must have the name
                   1161: .CW $FM/v
                   1162: but may be a symbolic link if there is not enough space in
                   1163: .CW $FM .
                   1164: We use an identically sized file system to the receive area mounted on
                   1165: .CW $FM/v .
                   1166: .IP [9]
                   1167: After deciding which databases you want maintained,
                   1168: initialise the databases with
                   1169: .P1
                   1170: src/dbinit.sh
                   1171: .P2
                   1172: You may want to start off with all three and remove any you don't want later on
                   1173: (like when they get to be too big).
                   1174: .IP [10]
                   1175: Choose how the receiving process
                   1176: .I rcv
                   1177: works.
                   1178: By default, it simply accepts files.
                   1179: If it is invoked by the name
                   1180: .CW mrcv ,
                   1181: it initiates processing of the received by
                   1182: .I munge
                   1183: (or more accurately,
                   1184: .CW $FL/callmunge )
                   1185: after the first and last files have been received
                   1186: (you need both in case any one file took longer to receive than
                   1187: .I munge 's
                   1188: cycle time).
                   1189: The advantage is that you will almost never run out of space, as you will be
                   1190: processing files at the same time as receiving them.
                   1191: The disadvantage is that everything will run slower.
                   1192: I use the default behavior; we rarely run out of space and I like to
                   1193: investigate why some client is sending much more than normal
                   1194: before accepting it all.
                   1195: .IP [11]
                   1196: Finish the client installation starting with step 10, making sure you
                   1197: do not backup the receiving areas or
                   1198: .CW $FM/v .
                   1199: .IP [12]
                   1200: Add the command
                   1201: .P1
                   1202: $FB/rmlocks
                   1203: .P2
                   1204: to
                   1205: .CW /etc/rc
                   1206: (or whatever passes for system startup on your system).
                   1207: This simply removes any lockfiles in
                   1208: .CW $FM/locks .
                   1209: .NH
                   1210: Media Management
                   1211: .PP
                   1212: An attempt has been made to provide generic media management programs.
                   1213: For example, the recovery servers
                   1214: .I fetchw_
                   1215: and
                   1216: .I fetchj_
                   1217: are instances of a single device server and a jukebox server respectively.
                   1218: To make this work, a media library is used.
                   1219: To use a new media, such as Exabyte tapes,
                   1220: implement the routines in the library, and link the library with
                   1221: .CW fm/media_.o
                   1222: or
                   1223: .CW fm/mmedia_.o .
                   1224: An informal description of the routines follow.
                   1225: .LP
                   1226: .CW "mediainit(char *device, char *vol_id)"
                   1227: .ti +5n
                   1228: Initialise the media on the device specified by
                   1229: .I device .
                   1230: The latter may a full name or any recognizable abbreviation.
                   1231: If
                   1232: .I vol_id
                   1233: is given, it is checked against the media present.
                   1234: .LP
                   1235: .CW "char *mediamount(char *vol_id)"
                   1236: .ti +5n
                   1237: Mount the media named
                   1238: .I vol_id
                   1239: and return the appropriate device
                   1240: (suitable for use by
                   1241: .I mediainit ).
                   1242: Currently, the values given as
                   1243: .I vol_id
                   1244: are those returned by
                   1245: .I medianame
                   1246: (below).
                   1247: .LP
                   1248: .CW "medianame(char *volume)"
                   1249: .ti +5n
                   1250: Return the media name containing
                   1251: .I volume .
                   1252: .LP
                   1253: .CW "mediaopen(char *name, Media *m)"
                   1254: .ti +5n
                   1255: Set up
                   1256: .I m
                   1257: to point at the backup copy
                   1258: .I name .
                   1259: The fields in a
                   1260: .CW Media
                   1261: include a file descriptor, preferred read block size, and copy size.
                   1262: .LP
                   1263: .CW "void mediafiles(int32 v, int32 n, Media *m, Tb **bp)"
                   1264: .ti +5n
                   1265: Return a Media and a list (in
                   1266: .CW *bp )
                   1267: of backup copy pointers for all backup copies more recent than
                   1268: file
                   1269: .I n
                   1270: in volume
                   1271: .I v .
                   1272: A
                   1273: .CW Tb
                   1274: has the creation time and initial (1K) block number for a backup copy.
                   1275: (It is used by
                   1276: .I dbupdate ).
                   1277: The size returned in
                   1278: .I m
                   1279: is not actually a size but the number of records in
                   1280: .CW *bp .
                   1281: .PP
                   1282: This is not a complete description;
                   1283: if you have to write new versions of these routines,
                   1284: look at the existing implementations (in
                   1285: .CW $FMSRC/media )
                   1286: and the programs that use them
                   1287: (all in
                   1288: .CW $FMSRC/fm ).
                   1289: .NH
                   1290: References
                   1291: .LP
                   1292: |reference_placement

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.