|
|
1.1 root 1: % $Id: overview.tex,v 5.2 90/06/23 22:21:50 jsp Rel $
2: %
3: % Copyright (c) 1989 Jan-Simon Pendry
4: % Copyright (c) 1989 Imperial College of Science, Technology & Medicine
5: % Copyright (c) 1989 The Regents of the University of California.
6: % All rights reserved.
7: %
8: % This code is derived from software contributed to Berkeley by
9: % Jan-Simon Pendry at Imperial College, London.
10: %
11: % Redistribution and use in source and binary forms are permitted provided
12: % that: (1) source distributions retain this entire copyright notice and
13: % comment, and (2) distributions including binaries display the following
14: % acknowledgement: ``This product includes software developed by the
15: % University of California, Berkeley and its contributors'' in the
16: % documentation or other materials provided with the distribution and in
17: % all advertising materials mentioning features or use of this software.
18: % Neither the name of the University nor the names of its contributors may
19: % be used to endorse or promote products derived from this software without
20: % specific prior written permission.
21: % THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
22: % WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
23: % MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
24: %
25: % @(#)overview.tex 5.1 (Berkeley) 7/19/90
26:
27:
28: \Chapter{Overview}
29: \pagenumbering{arabic}
30:
31: \Amd\ maintains a cache of mounted filesystems. Filesystems are {\em demand-mounted}
32: when they are first referenced, and unmounted after a period of inactivity.
33: \Amd\ may be used as a replacement for Sun's {\bf automount}(8)
34: \cite{usenix:automounter,sun:automount} program.
35: It contains no proprietary source code and has been ported
36: to numerous flavours of \Unix\ (see table \ref{table:os},~p\pageref{table:os}).
37:
38: \Amd\ was designed as the basis for experimenting with filesystem
39: layout and management. Although \amd\ has many direct applications it
40: is loaded with additional features which have little practical use.
41: At some point the infrequently used components may be removed to
42: streamline the production system.
43:
44: %\Amd\ supports the notion of {\em replicated} filesystems by evaluating
45: %each member of a list of possible filesystem locations in parallel.
46: %\Amd\ checks that each cached mapping remains valid. Should a mapping be
47: %lost -- such as happens when a fileserver goes down -- \amd\ automatically
48: %selects a replacement should one be available.
49:
50: The fundamental concept behind \amd\ is the ability to separate the name used to refer to
51: a file from the name used to refer to its physical storage location.
52: This allows the same files to be accessed with the same name regardless of where
53: in the network the name is used. This is very different from placing
54: {\tt /n/hostname} in front of the pathname since that includes location
55: dependent information which may change if files are moved to another
56: machine.
57: By placing the required mappings in a centrally administered database,
58: filesystems can be re-organised without requiring changes to password
59: files, shell scripts and so on.
60:
61: \Section{Filesystems and Volumes}
62: \Amd\ views the world as a set of fileservers, each containg one or more filesystems
63: where each filesystem contains one or more {\em volumes}.
64: Here the term volume is used to refer to a coherent set of files such as a user's home directory or
65: a \TeX\ distribution.
66:
67: In order to access the contents of a volume, \amd\ must be told in which filesystem
68: the volume resides and which host owns the filesystem.
69: By default the host is assumed to be local and the volume is
70: assumed to be the entire filesystem.
71: If a filesystem contains more than one volume, then a {\em sublink} is used to
72: refer to the sub-directory within the filesystem where the volume can be found.
73:
74: \Section{Volume Naming}
75:
76: Volume names are assumed to be unique across the entire network.
77: A volume name is the pathname to the volume's root as known by the
78: users of that volume. Since this name uniquely identifies the volume contents,
79: all volumes can be named and accessed from each host, subject to
80: administrative controls.
81:
82: Volumes may be replicated or duplicated. Replicated volumes contain identical
83: copies of the same data and reside at two or more locations in the network.
84: Each of the replicated volumes can be used interchangeably.
85: Duplicated volumes each have the same name but contain different, though
86: functionally identical, data. For example, {\tt /vol/tex} might be the
87: name of a \TeX\ distribution which varied for each machine architecture.
88:
89: \Amd\ provides facilities to take advantage of both replicated and
90: duplicated volumes. Configuration options allow a single set of configuration
91: data to be shared across an entire network by taking advantage of replicated
92: and duplicated volumes.
93:
94: \Amd\ can take advantage of replacement volumes by mounting
95: them as required should an active fileserver become unavailable.
96:
97: \Section{Volume Binding}
98:
99: \Unix\ implements a namespace of hierarchically mounted filesystems.
100: Two forms of binding between names and files are provided.
101: A {\em hard link} completes the binding when the name is added to the filesystem.
102: A {\em soft link} delays the binding until the name is accessed.
103: An {\em automounter} adds a further form in which the binding of name to
104: filesystem is delayed until the name is accessed.
105:
106: The target volume, in its general form, is a tuple (host, filesystem, sublink)
107: which can be used to name the physical location of any volume in
108: the network.
109:
110: When a target is referenced, \amd\ ignores the sublink element and determines
111: whether the required filesystem is already mounted. This is done by computing
112: the local mount point for the filesystem and checking for an existing filesystem
113: mounted at the same place. If such a filesystem already exists then it is
114: assumed to be functionally identical to the target filesystem. By default
115: there is a one-to-one mapping between the pair (host, filesystem) and the local
116: mount point so this assumption is valid.
117:
118: \Section{Operational Principles}
119:
120: \Amd\ operates by introducing new mount points into the namespace.
121: The kernel sees these mount points as \NFS\ \cite{sun:nfs} filesystems being served by \amd.
122: Having attached itself to the namespace, \amd\ is now able to control
123: the view the rest of the system has of those mount points.
124: RPC \cite{sun:rpc} calls are received from the kernel one at a time.
125:
126: When a {\em lookup} call is received \amd\ checks whether the
127: name is already known. If it is not, the required volume is mounted.
128: A symbolic link pointing to the volume root is then returned.
129: Once the symbolic link is returned, the kernel will send all
130: other requests direct to the mounted filesystem.
131:
132: If a volume is not yet mounted, \amd\ consults a configuration
133: {\em mount-map} corresponding to the automount point.
134: \Amd\ then makes a runtime decision on what and where to mount
135: a filesystem based on the information obtained from the map.
136:
137: \Amd\ does not implement all the \NFS\ requests; only those
138: relevant to name binding such as {\em lookup}, {\em readlink}
139: and {\em readdir}. Some other calls are also implemented
140: but most simply return an error code; for example {\em mkdir}
141: always returns ``Read-only filesystem''.
142:
143: \Section{Mounting a Volume}
144:
145: Each automount point has a mount map. The mount map contains
146: a list of key--value pairs. The key is the name of the volume to
147: be mounted. The value is a list of locations describing where the
148: filesystem is stored in the network.
149: In the source for the map the value would look like
150: \begin{quote}
151: ${\em location}_1\ \ {\em location}_2\ \ \ldots\ \ {\em location}_n$
152: \end{quote}
153:
154: \Amd\ examines each location in turn. Each location may contain {\em selectors}
155: which control whether \amd\ can use that location. For example, the location
156: may be restricted to use by certain hosts. Those locations which cannot be used
157: are ignored.
158:
159: \Amd\ attempts to mount the filesystem described by each remaining location
160: until a mount succeeds or \amd\ can no longer proceed.
161: The latter can occur in three ways:
162: \begin{itemize}
163: \item
164: If none of
165: the locations could be used, or if all of the locations caused an error,
166: then the last error is returned.
167:
168: \item
169: If a location could be used but was being mounted in the background then \amd\ marks
170: that mount as being ``in progress'' and continues with the next request; no reply
171: is sent to the kernel.
172:
173: \item
174: Lastly, one or more of the mounts may have been {\em deferred}.
175: A mount is deferred if extra information is required before the mount
176: can proceed. When the information becomes available the mount will
177: take place, but in the mean time no reply is sent to the kernel.
178: If the mount is deferred, \amd\ continues to try any remaining locations.
179: \end{itemize}
180:
181: %\Section{Task Scheduling}\label{task scheduler}
182: %
183: %\Amd\ provides a task scheduler to support its non-blocking semantics.
184: %The basic operation of the scheduler is to call a procedure when
185: %a particular event occurs. A general sleep/wakeup mechanism is used
186: %and sub-process support is built on that. The scheduler maintains
187: %two queues: one of blocked calls and one of callbacks waiting to
188: %be made.
189: %When a child process exits, its exit status is picked up by a signal
190: %handler and a wakeup is issued on the internal job descriptor for that sub-process.
191: %A timeout/untimeout mechanism provides for time dependent processing.
192:
193: \Section{Automatic Unmounting}
194:
195: To avoid an ever increasing number of filesystem mounts, \amd\ removes
196: volume mappings which have not been used recently. A time-to-live interval
197: is associated with each mapping and when that expires the mapping is removed.
198: When the last reference to a filesystem is removed, that filesystem is unmounted.
199: If the unmount fails, for example the filesystem is still busy, the mapping
200: is re-instated and its time-to-live interval is extended.
201: The global default for this grace period is controlled by the ``-w'' command-line
202: option (\see \Ref{opt:wait}). It is also possible to set this value on a per-mount basis
203: (\see \Ref{opt:utimeout}).
204:
205: \Section{Keep-alives}\label{keepalives}
206:
207: Use of some filesystem types requires the presence of a server on another machine.
208: If a machine crashes then it is of no concern to processes on that machine
209: that the filesystem is unavailable. However, to processes on a remote host using
210: that machine as a fileserver this event is important. This situation is
211: most widely recognised when an \NFS\ server crashes and the behaviour observed
212: on client machines is that more and more processes hang.
213: In order to provide the possibility of recovery, \amd\ implements a {\em keep-alive}
214: interval timer for some filesystem types.
215: Currently only \NFS\ makes use of this service.
216:
217: The basis of the \NFS\ keep-alive implementation is the observation that
218: most sites maintain replicated copies of common system data such as manual
219: pages, most or all programs, system source code and so on.
220: If one of those servers goes down it would be reasonable to mount one of
221: the others as a replacement.
222:
223: The first part of the process is to keep track of which fileservers are up and
224: which are down. \Amd\ does this by sending RPC requests to the servers'
225: \NFS\ {\sc NullProc} and checking whether a reply is returned.
226: While the server state is uncertain the requests are re-transmitted
227: at three second intervals and if no reply is received after four attempts
228: the server is marked down. If a reply is received the fileserver is marked
229: up and stays in that state for 30 seconds at which time another \NFS\ ping is sent.
230:
231: Once a fileserver is marked down, requests continue to be sent every 30 seconds
232: in order to determine when the fileserver comes back up. During this time
233: any reference through \amd\ to the filesystems on that server fail with the
234: error ``Operation would block''.
235: If a replacement volume is available then it will be mounted, otherwise
236: the error is returned to the user.
237:
238: %\Amd\ keeps track of which servers are up and which are down.
239: %It does this by sending RPC requests to the servers' \NFS\ {\sc NullProc} and
240: %checking whether a reply is returned. If no replies are received after a
241: %short period, \amd\ marks the fileserver {\em down}.
242: %RPC requests continue to be sent so that it will notice when a fileserver
243: %comes back up.
244: %ICMP echo packets \cite{rfc:icmp} are not used because it is the availability
245: %of the \NFS\ service that is important, not the existence of a base kernel.
246:
247: %Whenever a reference to a fileserver which is down is made via \amd\, an alternate
248: %filesystem is mounted if one is available.
249: Although this action does not protect
250: user files, which are unique on the network, or processes which do not access files
251: via \amd\ or already have open files on the hung filesystem, it can prevent most new
252: processes from hanging.
253:
254: %With a suitable combination of filesystem management and mount-maps,
255: %machines can be protected against most server downtime. This can be
256: %enhanced by allocating boot-servers dynamically which allows a diskless
257: %workstation to be quickly restarted if necessary. Once the root filesystem
258: %is mounted, \amd\ can be started and allowed to mount the remainder of
259: %the filesystem from whichever fileservers are available.
260:
261: \Section{Non-blocking Operation}
262:
263: Since there is only one instance of \amd\ for each automount point,
264: and usually only one instance on each machine, it is important
265: that it is always available to service kernel calls.
266: \Amd\ goes to great lengths to ensure that it does not block in a system call.
267: As a last resort \amd\ will fork before it attempts a system call that may block
268: indefinitely, such as mounting an \NFS\ filesystem.
269: Other tasks such as obtaining filehandle information for an \NFS\ filesystem,
270: are done using a purpose built non-blocking RPC library which is integrated
271: with \amd's task scheduler.% (\see \Ref{task scheduler}).
272: This library is also used to implement \NFS\ keep-alives (\see \Ref{keepalives}).
273:
274: Whenever a mount is deferred or backgrounded, \amd\ must wait for it to complete
275: before replying to the kernel. However, this would cause \amd\ to block waiting
276: for a reply to be constructed. Rather than do this, \amd\ simply {\em drops}
277: the call under the assumption that the kernel RPC mechanism will automatically
278: retry the request.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.