Annotation of 43BSDReno/usr.sbin/amd/doc/overview.tex, revision 1.1.1.1

1.1       root        1: % $Id: overview.tex,v 5.2 90/06/23 22:21:50 jsp Rel $
                      2: %
                      3: % Copyright (c) 1989 Jan-Simon Pendry
                      4: % Copyright (c) 1989 Imperial College of Science, Technology & Medicine
                      5: % Copyright (c) 1989 The Regents of the University of California.
                      6: % All rights reserved.
                      7: %
                      8: % This code is derived from software contributed to Berkeley by
                      9: % Jan-Simon Pendry at Imperial College, London.
                     10: %
                     11: % Redistribution and use in source and binary forms are permitted provided
                     12: % that: (1) source distributions retain this entire copyright notice and
                     13: % comment, and (2) distributions including binaries display the following
                     14: % acknowledgement:  ``This product includes software developed by the
                     15: % University of California, Berkeley and its contributors'' in the
                     16: % documentation or other materials provided with the distribution and in
                     17: % all advertising materials mentioning features or use of this software.
                     18: % Neither the name of the University nor the names of its contributors may
                     19: % be used to endorse or promote products derived from this software without
                     20: % specific prior written permission.
                     21: % THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
                     22: % WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
                     23: % MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
                     24: %
                     25: %      @(#)overview.tex        5.1 (Berkeley) 7/19/90
                     26: 
                     27: 
                     28: \Chapter{Overview}
                     29: \pagenumbering{arabic}
                     30: 
                     31: \Amd\ maintains a cache of mounted filesystems.  Filesystems are {\em demand-mounted}
                     32: when they are first referenced, and unmounted after a period of inactivity.
                     33: \Amd\ may be used as a replacement for Sun's {\bf automount}(8)
                     34: \cite{usenix:automounter,sun:automount} program.
                     35: It contains no proprietary source code and has been ported
                     36: to numerous flavours of \Unix\ (see table \ref{table:os},~p\pageref{table:os}).
                     37: 
                     38: \Amd\ was designed as the basis for experimenting with filesystem
                     39: layout and management.  Although \amd\ has many direct applications it
                     40: is loaded with additional features which have little practical use.
                     41: At some point the infrequently used components may be removed to
                     42: streamline the production system.
                     43: 
                     44: %\Amd\ supports the notion of {\em replicated} filesystems by evaluating
                     45: %each member of a list of possible filesystem locations in parallel.
                     46: %\Amd\ checks that each cached mapping remains valid.  Should a mapping be
                     47: %lost -- such as happens when a fileserver goes down -- \amd\ automatically
                     48: %selects a replacement should one be available.
                     49: 
                     50: The fundamental concept behind \amd\ is the ability to separate the name used to refer to
                     51: a file from the name used to refer to its physical storage location.
                     52: This allows the same files to be accessed with the same name regardless of where
                     53: in the network the name is used.  This is very different from placing
                     54: {\tt /n/hostname} in front of the pathname since that includes location
                     55: dependent information which may change if files are moved to another
                     56: machine.
                     57: By placing the required mappings in a centrally administered database,
                     58: filesystems can be re-organised without requiring changes to password
                     59: files, shell scripts and so on.
                     60: 
                     61: \Section{Filesystems and Volumes}
                     62: \Amd\ views the world as a set of fileservers, each containg one or more filesystems
                     63: where each filesystem contains one or more {\em volumes}.
                     64: Here the term volume is used to refer to a coherent set of files such as a user's home directory or
                     65: a \TeX\ distribution.
                     66: 
                     67: In order to access the contents of a volume, \amd\ must be told in which filesystem
                     68: the volume resides and which host owns the filesystem.
                     69: By default the host is assumed to be local and the volume is
                     70: assumed to be the entire filesystem.
                     71: If a filesystem contains more than one volume, then a {\em sublink} is used to
                     72: refer to the sub-directory within the filesystem where the volume can be found.
                     73: 
                     74: \Section{Volume Naming}
                     75: 
                     76: Volume names are assumed to be unique across the entire network.
                     77: A volume name is the pathname to the volume's root as known by the
                     78: users of that volume.  Since this name uniquely identifies the volume contents,
                     79: all volumes can be named and accessed from each host, subject to
                     80: administrative controls.
                     81: 
                     82: Volumes may be replicated or duplicated.  Replicated volumes contain identical
                     83: copies of the same data and reside at two or more locations in the network.
                     84: Each of the replicated volumes can be used interchangeably.
                     85: Duplicated volumes each have the same name but contain different, though
                     86: functionally identical, data.  For example, {\tt /vol/tex} might be the
                     87: name of a \TeX\ distribution which varied for each machine architecture.
                     88: 
                     89: \Amd\ provides facilities to take advantage of both replicated and
                     90: duplicated volumes.  Configuration options allow a single set of configuration
                     91: data to be shared across an entire network by taking advantage of replicated
                     92: and duplicated volumes.
                     93: 
                     94: \Amd\ can take advantage of replacement volumes by mounting
                     95: them as required should an active fileserver become unavailable.
                     96: 
                     97: \Section{Volume Binding}
                     98: 
                     99: \Unix\ implements a namespace of hierarchically mounted filesystems.
                    100: Two forms of binding between names and files are provided.
                    101: A {\em hard link} completes the binding when the name is added to the filesystem.
                    102: A {\em soft link} delays the binding until the name is accessed.
                    103: An {\em automounter} adds a further form in which the binding of name to
                    104: filesystem is delayed until the name is accessed.
                    105: 
                    106: The target volume, in its general form, is a tuple (host, filesystem, sublink)
                    107: which can be used to name the physical location of any volume in
                    108: the network.
                    109: 
                    110: When a target is referenced, \amd\ ignores the sublink element and determines
                    111: whether the required filesystem is already mounted.  This is done by computing
                    112: the local mount point for the filesystem and checking for an existing filesystem
                    113: mounted at the same place.  If such a filesystem already exists then it is
                    114: assumed to be functionally identical to the target filesystem.  By default
                    115: there is a one-to-one mapping between the pair (host, filesystem) and the local
                    116: mount point so this assumption is valid.
                    117: 
                    118: \Section{Operational Principles}
                    119: 
                    120: \Amd\ operates by introducing new mount points into the namespace.
                    121: The kernel sees these mount points as \NFS\ \cite{sun:nfs} filesystems being served by \amd.
                    122: Having attached itself to the namespace, \amd\ is now able to control
                    123: the view the rest of the system has of those mount points.
                    124: RPC \cite{sun:rpc} calls are received from the kernel one at a time.
                    125: 
                    126: When a {\em lookup} call is received \amd\ checks whether the
                    127: name is already known.  If it is not, the required volume is mounted.
                    128: A symbolic link pointing to the volume root is then returned.
                    129: Once the symbolic link is returned, the kernel will send all
                    130: other requests direct to the mounted filesystem.
                    131: 
                    132: If a volume is not yet mounted, \amd\ consults a configuration
                    133: {\em mount-map} corresponding to the automount point.
                    134: \Amd\ then makes a runtime decision on what and where to mount
                    135: a filesystem based on the information obtained from the map.
                    136: 
                    137: \Amd\ does not implement all the \NFS\ requests; only those
                    138: relevant to name binding such as {\em lookup}, {\em readlink}
                    139: and {\em readdir}.  Some other calls are also implemented
                    140: but most simply return an error code; for example {\em mkdir}
                    141: always returns ``Read-only filesystem''.
                    142: 
                    143: \Section{Mounting a Volume}
                    144: 
                    145: Each automount point has a mount map.  The mount map contains
                    146: a list of key--value pairs.  The key is the name of the volume to
                    147: be mounted.  The value is a list of locations describing where the
                    148: filesystem is stored in the network.
                    149: In the source for the map the value would look like
                    150: \begin{quote}
                    151: ${\em location}_1\ \ {\em location}_2\ \ \ldots\ \ {\em location}_n$
                    152: \end{quote}
                    153: 
                    154: \Amd\ examines each location in turn.  Each location may contain {\em selectors}
                    155: which control whether \amd\ can use that location.  For example, the location
                    156: may be restricted to use by certain hosts.  Those locations which cannot be used
                    157: are ignored.
                    158: 
                    159: \Amd\ attempts to mount the filesystem described by each remaining location
                    160: until a mount succeeds or \amd\ can no longer proceed.
                    161: The latter can occur in three ways:
                    162: \begin{itemize}
                    163: \item
                    164: If none of
                    165: the locations could be used, or if all of the locations caused an error,
                    166: then the last error is returned.
                    167: 
                    168: \item
                    169: If a location could be used but was being mounted in the background then \amd\ marks
                    170: that mount as being ``in progress'' and continues with the next request; no reply
                    171: is sent to the kernel.
                    172: 
                    173: \item
                    174: Lastly, one or more of the mounts may have been {\em deferred}.
                    175: A mount is deferred if extra information is required before the mount
                    176: can proceed.  When the information becomes available the mount will
                    177: take place, but in the mean time no reply is sent to the kernel.
                    178: If the mount is deferred, \amd\ continues to try any remaining locations.
                    179: \end{itemize}
                    180: 
                    181: %\Section{Task Scheduling}\label{task scheduler}
                    182: %
                    183: %\Amd\ provides a task scheduler to support its non-blocking semantics.
                    184: %The basic operation of the scheduler is to call a procedure when
                    185: %a particular event occurs.  A general sleep/wakeup mechanism is used
                    186: %and sub-process support is built on that.  The scheduler maintains
                    187: %two queues: one of blocked calls and one of callbacks waiting to
                    188: %be made.
                    189: %When a child process exits, its exit status is picked up by a signal
                    190: %handler and a wakeup is issued on the internal job descriptor for that sub-process.
                    191: %A timeout/untimeout mechanism provides for time dependent processing.
                    192: 
                    193: \Section{Automatic Unmounting}
                    194: 
                    195: To avoid an ever increasing number of filesystem mounts, \amd\ removes
                    196: volume mappings which have not been used recently.  A time-to-live interval
                    197: is associated with each mapping and when that expires the mapping is removed.
                    198: When the last reference to a filesystem is removed, that filesystem is unmounted.
                    199: If the unmount fails, for example the filesystem is still busy, the mapping
                    200: is re-instated and its time-to-live interval is extended.
                    201: The global default for this grace period is controlled by the ``-w'' command-line
                    202: option (\see \Ref{opt:wait}).  It is also possible to set this value on a per-mount basis
                    203: (\see \Ref{opt:utimeout}).
                    204: 
                    205: \Section{Keep-alives}\label{keepalives}
                    206: 
                    207: Use of some filesystem types requires the presence of a server on another machine.
                    208: If a machine crashes then it is of no concern to processes on that machine
                    209: that the filesystem is unavailable.  However, to processes on a remote host using
                    210: that machine as a fileserver this event is important.  This situation is
                    211: most widely recognised when an \NFS\ server crashes and the behaviour observed
                    212: on client machines is that more and more processes hang.
                    213: In order to provide the possibility of recovery, \amd\ implements a {\em keep-alive}
                    214: interval timer for some filesystem types.
                    215: Currently only \NFS\ makes use of this service.
                    216: 
                    217: The basis of the \NFS\ keep-alive implementation is the observation that
                    218: most sites maintain replicated copies of common system data such as manual
                    219: pages, most or all programs, system source code and so on.
                    220: If one of those servers goes down it would be reasonable to mount one of
                    221: the others as a replacement.
                    222: 
                    223: The first part of the process is to keep track of which fileservers are up and
                    224: which are down.  \Amd\ does this by sending RPC requests to the servers'
                    225: \NFS\ {\sc NullProc} and checking whether a reply is returned.
                    226: While the server state is uncertain the requests are re-transmitted
                    227: at three second intervals and if no reply is received after four attempts
                    228: the server is marked down.  If a reply is received the fileserver is marked
                    229: up and stays in that state for 30 seconds at which time another \NFS\ ping is sent.
                    230: 
                    231: Once a fileserver is marked down, requests continue to be sent every 30 seconds
                    232: in order to determine when the fileserver comes back up.  During this time
                    233: any reference through \amd\ to the filesystems on that server fail with the
                    234: error ``Operation would block''.
                    235: If a replacement volume is available then it will be mounted, otherwise
                    236: the error is returned to the user.
                    237: 
                    238: %\Amd\ keeps track of which servers are up and which are down.
                    239: %It does this by sending RPC requests to the servers' \NFS\ {\sc NullProc} and
                    240: %checking whether a reply is returned.  If no replies are received after a
                    241: %short period, \amd\ marks the fileserver {\em down}.
                    242: %RPC requests continue to be sent so that it will notice when a fileserver
                    243: %comes back up.
                    244: %ICMP echo packets \cite{rfc:icmp} are not used because it is the availability
                    245: %of the \NFS\ service that is important, not the existence of a base kernel.
                    246: 
                    247: %Whenever a reference to a fileserver which is down is made via \amd\, an alternate
                    248: %filesystem is mounted if one is available.
                    249: Although this action does not protect
                    250: user files, which are unique on the network, or processes which do not access files
                    251: via \amd\ or already have open files on the hung filesystem, it can prevent most new
                    252: processes from hanging.
                    253: 
                    254: %With a suitable combination of filesystem management and mount-maps,
                    255: %machines can be protected against most server downtime.  This can be
                    256: %enhanced by allocating boot-servers dynamically which allows a diskless
                    257: %workstation to be quickly restarted if necessary.  Once the root filesystem
                    258: %is mounted, \amd\ can be started and allowed to mount the remainder of
                    259: %the filesystem from whichever fileservers are available.
                    260: 
                    261: \Section{Non-blocking Operation}
                    262: 
                    263: Since there is only one instance of \amd\ for each automount point,
                    264: and usually only one instance on each machine, it is important
                    265: that it is always available to service kernel calls.
                    266: \Amd\ goes to great lengths to ensure that it does not block in a system call.
                    267: As a last resort \amd\ will fork before it attempts a system call that may block
                    268: indefinitely, such as mounting an \NFS\ filesystem.
                    269: Other tasks such as obtaining filehandle information for an \NFS\ filesystem,
                    270: are done using a purpose built non-blocking RPC library which is integrated
                    271: with \amd's task scheduler.% (\see \Ref{task scheduler}).
                    272: This library is also used to implement \NFS\ keep-alives (\see \Ref{keepalives}).
                    273: 
                    274: Whenever a mount is deferred or backgrounded, \amd\ must wait for it to complete
                    275: before replying to the kernel.  However, this would cause \amd\ to block waiting
                    276: for a reply to be constructed.  Rather than do this, \amd\ simply {\em drops}
                    277: the call under the assumption that the kernel RPC mechanism will automatically
                    278: retry the request.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.