|
|
BSD 4.3reno
% $Id: overview.tex,v 1.1.1.1 2018/04/24 16:12:57 root Exp $
%
% Copyright (c) 1989 Jan-Simon Pendry
% Copyright (c) 1989 Imperial College of Science, Technology & Medicine
% Copyright (c) 1989 The Regents of the University of California.
% All rights reserved.
%
% This code is derived from software contributed to Berkeley by
% Jan-Simon Pendry at Imperial College, London.
%
% Redistribution and use in source and binary forms are permitted provided
% that: (1) source distributions retain this entire copyright notice and
% comment, and (2) distributions including binaries display the following
% acknowledgement: ``This product includes software developed by the
% University of California, Berkeley and its contributors'' in the
% documentation or other materials provided with the distribution and in
% all advertising materials mentioning features or use of this software.
% Neither the name of the University nor the names of its contributors may
% be used to endorse or promote products derived from this software without
% specific prior written permission.
% THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
% WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
% MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
%
% @(#)overview.tex 5.1 (Berkeley) 7/19/90
\Chapter{Overview}
\pagenumbering{arabic}
\Amd\ maintains a cache of mounted filesystems. Filesystems are {\em demand-mounted}
when they are first referenced, and unmounted after a period of inactivity.
\Amd\ may be used as a replacement for Sun's {\bf automount}(8)
\cite{usenix:automounter,sun:automount} program.
It contains no proprietary source code and has been ported
to numerous flavours of \Unix\ (see table \ref{table:os},~p\pageref{table:os}).
\Amd\ was designed as the basis for experimenting with filesystem
layout and management. Although \amd\ has many direct applications it
is loaded with additional features which have little practical use.
At some point the infrequently used components may be removed to
streamline the production system.
%\Amd\ supports the notion of {\em replicated} filesystems by evaluating
%each member of a list of possible filesystem locations in parallel.
%\Amd\ checks that each cached mapping remains valid. Should a mapping be
%lost -- such as happens when a fileserver goes down -- \amd\ automatically
%selects a replacement should one be available.
The fundamental concept behind \amd\ is the ability to separate the name used to refer to
a file from the name used to refer to its physical storage location.
This allows the same files to be accessed with the same name regardless of where
in the network the name is used. This is very different from placing
{\tt /n/hostname} in front of the pathname since that includes location
dependent information which may change if files are moved to another
machine.
By placing the required mappings in a centrally administered database,
filesystems can be re-organised without requiring changes to password
files, shell scripts and so on.
\Section{Filesystems and Volumes}
\Amd\ views the world as a set of fileservers, each containg one or more filesystems
where each filesystem contains one or more {\em volumes}.
Here the term volume is used to refer to a coherent set of files such as a user's home directory or
a \TeX\ distribution.
In order to access the contents of a volume, \amd\ must be told in which filesystem
the volume resides and which host owns the filesystem.
By default the host is assumed to be local and the volume is
assumed to be the entire filesystem.
If a filesystem contains more than one volume, then a {\em sublink} is used to
refer to the sub-directory within the filesystem where the volume can be found.
\Section{Volume Naming}
Volume names are assumed to be unique across the entire network.
A volume name is the pathname to the volume's root as known by the
users of that volume. Since this name uniquely identifies the volume contents,
all volumes can be named and accessed from each host, subject to
administrative controls.
Volumes may be replicated or duplicated. Replicated volumes contain identical
copies of the same data and reside at two or more locations in the network.
Each of the replicated volumes can be used interchangeably.
Duplicated volumes each have the same name but contain different, though
functionally identical, data. For example, {\tt /vol/tex} might be the
name of a \TeX\ distribution which varied for each machine architecture.
\Amd\ provides facilities to take advantage of both replicated and
duplicated volumes. Configuration options allow a single set of configuration
data to be shared across an entire network by taking advantage of replicated
and duplicated volumes.
\Amd\ can take advantage of replacement volumes by mounting
them as required should an active fileserver become unavailable.
\Section{Volume Binding}
\Unix\ implements a namespace of hierarchically mounted filesystems.
Two forms of binding between names and files are provided.
A {\em hard link} completes the binding when the name is added to the filesystem.
A {\em soft link} delays the binding until the name is accessed.
An {\em automounter} adds a further form in which the binding of name to
filesystem is delayed until the name is accessed.
The target volume, in its general form, is a tuple (host, filesystem, sublink)
which can be used to name the physical location of any volume in
the network.
When a target is referenced, \amd\ ignores the sublink element and determines
whether the required filesystem is already mounted. This is done by computing
the local mount point for the filesystem and checking for an existing filesystem
mounted at the same place. If such a filesystem already exists then it is
assumed to be functionally identical to the target filesystem. By default
there is a one-to-one mapping between the pair (host, filesystem) and the local
mount point so this assumption is valid.
\Section{Operational Principles}
\Amd\ operates by introducing new mount points into the namespace.
The kernel sees these mount points as \NFS\ \cite{sun:nfs} filesystems being served by \amd.
Having attached itself to the namespace, \amd\ is now able to control
the view the rest of the system has of those mount points.
RPC \cite{sun:rpc} calls are received from the kernel one at a time.
When a {\em lookup} call is received \amd\ checks whether the
name is already known. If it is not, the required volume is mounted.
A symbolic link pointing to the volume root is then returned.
Once the symbolic link is returned, the kernel will send all
other requests direct to the mounted filesystem.
If a volume is not yet mounted, \amd\ consults a configuration
{\em mount-map} corresponding to the automount point.
\Amd\ then makes a runtime decision on what and where to mount
a filesystem based on the information obtained from the map.
\Amd\ does not implement all the \NFS\ requests; only those
relevant to name binding such as {\em lookup}, {\em readlink}
and {\em readdir}. Some other calls are also implemented
but most simply return an error code; for example {\em mkdir}
always returns ``Read-only filesystem''.
\Section{Mounting a Volume}
Each automount point has a mount map. The mount map contains
a list of key--value pairs. The key is the name of the volume to
be mounted. The value is a list of locations describing where the
filesystem is stored in the network.
In the source for the map the value would look like
\begin{quote}
${\em location}_1\ \ {\em location}_2\ \ \ldots\ \ {\em location}_n$
\end{quote}
\Amd\ examines each location in turn. Each location may contain {\em selectors}
which control whether \amd\ can use that location. For example, the location
may be restricted to use by certain hosts. Those locations which cannot be used
are ignored.
\Amd\ attempts to mount the filesystem described by each remaining location
until a mount succeeds or \amd\ can no longer proceed.
The latter can occur in three ways:
\begin{itemize}
\item
If none of
the locations could be used, or if all of the locations caused an error,
then the last error is returned.
\item
If a location could be used but was being mounted in the background then \amd\ marks
that mount as being ``in progress'' and continues with the next request; no reply
is sent to the kernel.
\item
Lastly, one or more of the mounts may have been {\em deferred}.
A mount is deferred if extra information is required before the mount
can proceed. When the information becomes available the mount will
take place, but in the mean time no reply is sent to the kernel.
If the mount is deferred, \amd\ continues to try any remaining locations.
\end{itemize}
%\Section{Task Scheduling}\label{task scheduler}
%
%\Amd\ provides a task scheduler to support its non-blocking semantics.
%The basic operation of the scheduler is to call a procedure when
%a particular event occurs. A general sleep/wakeup mechanism is used
%and sub-process support is built on that. The scheduler maintains
%two queues: one of blocked calls and one of callbacks waiting to
%be made.
%When a child process exits, its exit status is picked up by a signal
%handler and a wakeup is issued on the internal job descriptor for that sub-process.
%A timeout/untimeout mechanism provides for time dependent processing.
\Section{Automatic Unmounting}
To avoid an ever increasing number of filesystem mounts, \amd\ removes
volume mappings which have not been used recently. A time-to-live interval
is associated with each mapping and when that expires the mapping is removed.
When the last reference to a filesystem is removed, that filesystem is unmounted.
If the unmount fails, for example the filesystem is still busy, the mapping
is re-instated and its time-to-live interval is extended.
The global default for this grace period is controlled by the ``-w'' command-line
option (\see \Ref{opt:wait}). It is also possible to set this value on a per-mount basis
(\see \Ref{opt:utimeout}).
\Section{Keep-alives}\label{keepalives}
Use of some filesystem types requires the presence of a server on another machine.
If a machine crashes then it is of no concern to processes on that machine
that the filesystem is unavailable. However, to processes on a remote host using
that machine as a fileserver this event is important. This situation is
most widely recognised when an \NFS\ server crashes and the behaviour observed
on client machines is that more and more processes hang.
In order to provide the possibility of recovery, \amd\ implements a {\em keep-alive}
interval timer for some filesystem types.
Currently only \NFS\ makes use of this service.
The basis of the \NFS\ keep-alive implementation is the observation that
most sites maintain replicated copies of common system data such as manual
pages, most or all programs, system source code and so on.
If one of those servers goes down it would be reasonable to mount one of
the others as a replacement.
The first part of the process is to keep track of which fileservers are up and
which are down. \Amd\ does this by sending RPC requests to the servers'
\NFS\ {\sc NullProc} and checking whether a reply is returned.
While the server state is uncertain the requests are re-transmitted
at three second intervals and if no reply is received after four attempts
the server is marked down. If a reply is received the fileserver is marked
up and stays in that state for 30 seconds at which time another \NFS\ ping is sent.
Once a fileserver is marked down, requests continue to be sent every 30 seconds
in order to determine when the fileserver comes back up. During this time
any reference through \amd\ to the filesystems on that server fail with the
error ``Operation would block''.
If a replacement volume is available then it will be mounted, otherwise
the error is returned to the user.
%\Amd\ keeps track of which servers are up and which are down.
%It does this by sending RPC requests to the servers' \NFS\ {\sc NullProc} and
%checking whether a reply is returned. If no replies are received after a
%short period, \amd\ marks the fileserver {\em down}.
%RPC requests continue to be sent so that it will notice when a fileserver
%comes back up.
%ICMP echo packets \cite{rfc:icmp} are not used because it is the availability
%of the \NFS\ service that is important, not the existence of a base kernel.
%Whenever a reference to a fileserver which is down is made via \amd\, an alternate
%filesystem is mounted if one is available.
Although this action does not protect
user files, which are unique on the network, or processes which do not access files
via \amd\ or already have open files on the hung filesystem, it can prevent most new
processes from hanging.
%With a suitable combination of filesystem management and mount-maps,
%machines can be protected against most server downtime. This can be
%enhanced by allocating boot-servers dynamically which allows a diskless
%workstation to be quickly restarted if necessary. Once the root filesystem
%is mounted, \amd\ can be started and allowed to mount the remainder of
%the filesystem from whichever fileservers are available.
\Section{Non-blocking Operation}
Since there is only one instance of \amd\ for each automount point,
and usually only one instance on each machine, it is important
that it is always available to service kernel calls.
\Amd\ goes to great lengths to ensure that it does not block in a system call.
As a last resort \amd\ will fork before it attempts a system call that may block
indefinitely, such as mounting an \NFS\ filesystem.
Other tasks such as obtaining filehandle information for an \NFS\ filesystem,
are done using a purpose built non-blocking RPC library which is integrated
with \amd's task scheduler.% (\see \Ref{task scheduler}).
This library is also used to implement \NFS\ keep-alives (\see \Ref{keepalives}).
Whenever a mount is deferred or backgrounded, \amd\ must wait for it to complete
before replying to the kernel. However, this would cause \amd\ to block waiting
for a reply to be constructed. Rather than do this, \amd\ simply {\em drops}
the call under the assumption that the kernel RPC mechanism will automatically
retry the request.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.