|
|
1.1 root 1: .\" @(#)p2 6.1 (Berkeley) 4/24/86
2: .\"
3: .SH
4: 3.6 I/O calls
5: .PP
6: The system calls to do I/O are designed to eliminate
7: the differences between the various devices and styles of
8: access.
9: There is no distinction between ``random''
10: and ``sequential'' I/O, nor is any logical record size imposed
11: by the system.
12: The size of an ordinary file is determined
13: by the number of bytes written on it;
14: no predetermination of the size of a file is necessary
15: or possible.
16: .PP
17: To illustrate the essentials of I/O,
18: some of the basic calls are
19: summarized below
20: in an anonymous language that will
21: indicate the required parameters without getting into the
22: underlying
23: complexities.
24: Each call to the system may potentially result in an error
25: return, which for simplicity is not represented
26: in the calling sequence.
27: .PP
28: To read or write a file assumed to exist already, it must
29: be opened by the following call:
30: .P1
31: filep = open\|(\|name, flag\|)
32: .P2
33: where
34: .UL name
35: indicates the name of the file.
36: An arbitrary path name may be given.
37: The
38: .UL flag
39: argument indicates whether the file is to be read, written,
40: or ``updated,'' that is, read and written simultaneously.
41: .PP
42: The returned value
43: .UL filep
44: is called a
45: .IT "file descriptor" .
46: It is a small integer used to identify the file
47: in subsequent calls to read, write,
48: or otherwise manipulate the file.
49: .PP
50: To create a new file or completely rewrite an old one,
51: there is a
52: .UL create
53: system call that
54: creates the given file if it does not exist,
55: or truncates it to zero length
56: if it does exist;
57: .UL create
58: also opens the new file for writing
59: and, like
60: .UL open ,
61: returns a file descriptor.
62: .PP
63: The file system maintains no locks visible to the user, nor is there any
64: restriction on the number of users who may have a file
65: open for reading or writing.
66: Although it is possible for the contents of a file
67: to become scrambled when two users write on it simultaneously,
68: in practice difficulties do not arise.
69: We take the view that locks are neither
70: necessary nor sufficient, in our environment,
71: to prevent interference between users of the same file.
72: They are unnecessary because we are not
73: faced with large, single-file data bases
74: maintained by independent processes.
75: They are insufficient because
76: locks in the ordinary sense, whereby
77: one user is prevented from writing on a file that another
78: user is reading,
79: cannot prevent confusion
80: when, for example, both users are editing
81: a file with an editor that makes
82: a copy of the file being edited.
83: .PP
84: There are, however,
85: sufficient internal interlocks to maintain
86: the logical consistency of the file system
87: when two users engage simultaneously in
88: activities such as writing on
89: the same file,
90: creating files in the same directory,
91: or deleting each other's open files.
92: .PP
93: Except as indicated below, reading and writing
94: are sequential.
95: This means that if a particular
96: byte in the file was the last byte written (or read),
97: the next I/O call implicitly refers to the
98: immediately following byte.
99: For each open file there is a pointer, maintained
100: inside the system,
101: that indicates the next byte to be read
102: or written.
103: If
104: .IT n
105: bytes are read or written, the pointer advances
106: by
107: .IT n
108: bytes.
109: .PP
110: Once a file is open, the following calls
111: may be used:
112: .P1
113: n = read\|(\|filep, buffer, count\|)
114: n = write\|(\|filep, buffer, count\|)
115: .P2
116: Up to
117: .UL count
118: bytes are transmitted between the file specified
119: by
120: .UL filep
121: and the byte array
122: specified by
123: .UL buffer .
124: The returned value
125: .UL n
126: is the number of bytes actually transmitted.
127: In the
128: .UL write
129: case,
130: .UL n
131: is the same as
132: .UL count
133: except under exceptional conditions, such as I/O errors or
134: end of physical medium on special files;
135: in a
136: .UL read ,
137: however,
138: .UL n
139: may without error be less than
140: .UL count .
141: If the read pointer is so near the end of the
142: file that reading
143: .UL count
144: characters
145: would cause reading beyond the end, only sufficient
146: bytes are transmitted to reach the end of the
147: file;
148: also, typewriter-like terminals
149: never return more than one line of input.
150: When a
151: .UL read
152: call returns with
153: .UL n
154: equal
155: to zero, the end of the file has been reached.
156: For disk files this occurs when the read pointer
157: becomes equal to the current
158: size of the file.
159: It is possible to generate an end-of-file
160: from a terminal by use of an escape
161: sequence that depends on the device used.
162: .PP
163: Bytes written affect only those parts of a file implied by
164: the position of the write pointer and the
165: count; no other part of the file
166: is changed.
167: If the last byte lies beyond the end of the file, the
168: file is made to grow as needed.
169: .PP
170: To do random (direct-access) I/O
171: it is only necessary to move the read or write pointer
172: to the appropriate location in the file.
173: .P1
174: location = lseek\|(\|filep, offset, base\|)
175: .P2
176: The pointer
177: associated with
178: .UL filep
179: is moved to a position
180: .UL offset
181: bytes from the beginning of the file, from the current position
182: of the pointer, or from the end of the file,
183: depending on
184: .UL base.
185: .UL \&offset
186: may be negative.
187: For some devices (e.g., paper
188: tape and
189: terminals) seek calls are
190: ignored.
191: The actual offset from the beginning of the file
192: to which the pointer was moved is returned
193: in
194: .UL location .
195: .PP
196: There are several additional system entries
197: having to do with I/O and with the file
198: system that will not be discussed.
199: For example:
200: close a file,
201: get the status of a file,
202: change the protection mode or the owner
203: of a file,
204: create a directory,
205: make a link to an existing file,
206: delete a file.
207: .SH
208: IV. IMPLEMENTATION OF THE FILE SYSTEM
209: .PP
210: As mentioned in Section 3.2 above, a directory entry contains
211: only a name for the associated file and a pointer to the
212: file itself.
213: This pointer is an integer called the
214: .IT i-number
215: (for index number)
216: of the file.
217: When the file is accessed,
218: its i-number is used as an index into
219: a system table (the
220: .IT i-list \|)
221: stored in a known
222: part of the device on which
223: the directory resides.
224: The entry found thereby (the file's
225: .IT i-node \|)
226: contains
227: the description of the file:
228: .IP i
229: the user and group-\*sID\*n of its owner
230: .IP ii
231: its protection bits
232: .IP iii
233: the physical disk or tape addresses for the file contents
234: .IP iv
235: its size
236: .IP v
237: time of creation, last use, and last modification
238: .IP vi
239: the number of links to the file, that is, the number of times it appears in a directory
240: .IP vii
241: a code indicating whether the file is a directory, an ordinary file, or a special file.
242: .LP
243: The purpose of an
244: .UL open
245: or
246: .UL create
247: system call is to turn the path name given by the user
248: into an i-number
249: by searching the explicitly or implicitly named directories.
250: Once a file is open,
251: its device, i-number, and read/write pointer are stored in a system table
252: indexed by the file descriptor returned by the
253: .UL open
254: or
255: .UL create .
256: Thus, during a subsequent
257: call to read or write the
258: file,
259: the descriptor
260: may be easily related to the information necessary to access the file.
261: .PP
262: When a new file is created,
263: an i-node is allocated for it and a directory entry is made
264: that contains the name of the file and the i-node
265: number.
266: Making a link to an existing file involves
267: creating a directory entry with the new name,
268: copying the i-number from the original file entry,
269: and incrementing the link-count field of the i-node.
270: Removing (deleting) a file is done by
271: decrementing the
272: link-count of the i-node specified by its directory entry
273: and erasing the directory entry.
274: If the link-count drops to 0,
275: any disk blocks in the file
276: are freed and the i-node is de-allocated.
277: .PP
278: The space on all disks that
279: contain a file system is divided into a number of
280: 512-byte
281: blocks logically addressed from 0 up to a limit that
282: depends on the device.
283: There is space in the i-node of each file for 13 device addresses.
284: For nonspecial files,
285: the first 10 device addresses point at the first
286: 10 blocks of the file.
287: If the file is larger than 10 blocks,
288: the 11 device address points to an
289: indirect block containing up to 128 addresses
290: of additional blocks in the file.
291: Still larger files use the twelfth device address
292: of the i-node to point to
293: a double-indirect block naming
294: 128 indirect blocks,
295: each
296: pointing to 128 blocks of the file.
297: If required,
298: the thirteenth device address is
299: a triple-indirect block.
300: Thus files may conceptually grow to
301: [\|(10+128+128\u\s62\s0\d+128\u\s63\s0\d)\*m512\|] bytes.
302: Once opened,
303: bytes numbered below 5120 can be read with a single
304: disk access;
305: bytes in the range 5120 to 70,656
306: require two accesses;
307: bytes in the range 70,656
308: to 8,459,264
309: require three accesses;
310: bytes from there to the
311: largest file
312: (1,082,201,088)
313: require four accesses.
314: In practice,
315: a device cache mechanism
316: (see below)
317: proves effective in eliminating
318: most of the indirect fetches.
319: .PP
320: The foregoing discussion applies to ordinary files.
321: When an I/O request is made to a file whose i-node indicates that it
322: is special,
323: the last 12 device address words are immaterial,
324: and the first specifies
325: an internal
326: .IT "device name" ,
327: which is interpreted as a pair of numbers
328: representing,
329: respectively, a device type
330: and subdevice number.
331: The device type indicates which
332: system routine will deal with I/O on that device;
333: the subdevice number selects, for example, a disk drive
334: attached to a particular controller or one of several
335: similar terminal interfaces.
336: .PP
337: In this environment, the implementation of the
338: .UL mount
339: system call (Section 3.4) is quite straightforward.
340: .UL \&mount
341: maintains a system table whose
342: argument is the i-number and device name of the
343: ordinary file specified
344: during the
345: .UL mount ,
346: and whose corresponding value is the
347: device name of the indicated special file.
348: This table is searched for each i-number/device pair
349: that turns up while a path name is being scanned
350: during an
351: .UL open
352: or
353: .UL create ;
354: if a match is found,
355: the i-number is replaced by the i-number of the root
356: directory
357: and the device name is replaced by the table value.
358: .PP
359: To the user, both reading and writing of files appear to
360: be synchronous and unbuffered.
361: That is, immediately after
362: return from a
363: .UL read
364: call the data are available; conversely,
365: after a
366: .UL write
367: the user's workspace may be reused.
368: In fact, the system maintains a rather complicated
369: buffering mechanism that reduces greatly the number
370: of I/O operations required to access a file.
371: Suppose a
372: .UL write
373: call is made specifying transmission
374: of a single byte.
375: The system
376: will search its buffers to see
377: whether the affected disk block currently resides in main memory;
378: if not, it will be read in from the device.
379: Then the affected byte is replaced in the buffer and an
380: entry is made in a list of blocks to be written.
381: The return from the
382: .UL write
383: call may then take place,
384: although the actual I/O may not be completed until a later time.
385: Conversely, if a single byte is read, the system determines
386: whether the secondary storage block in which the byte is located is already
387: in one of the system's buffers; if so, the byte can be returned immediately.
388: If not, the block is read into a buffer and the byte picked out.
389: .PP
390: The system recognizes when
391: a program has
392: made accesses to
393: sequential blocks of a file,
394: and asynchronously
395: pre-reads the next block.
396: This significantly reduces
397: the running time of most programs
398: while adding little to
399: system overhead.
400: .PP
401: A program that reads or writes files in units of 512 bytes
402: has an advantage over a program that reads or writes
403: a single byte at a time, but the gain is not immense;
404: it comes mainly from the avoidance of system overhead.
405: If a program is used rarely or does
406: no great volume of I/O, it may quite reasonably
407: read and write in units as small as it wishes.
408: .PP
409: The notion of the i-list is an unusual feature
410: of
411: .UX .
412: In practice, this method of organizing the file system
413: has proved quite reliable and easy to deal with.
414: To the system itself, one of its strengths is
415: the fact that each file has a short, unambiguous name
416: related in a simple way to the protection, addressing,
417: and other information needed to access the file.
418: It also permits a quite simple and rapid
419: algorithm for checking the consistency of a file system,
420: for example, verification
421: that the portions of each device containing useful information
422: and those free to be allocated are disjoint and together
423: exhaust the space on the device.
424: This algorithm is independent
425: of the directory hierarchy, because it need only scan
426: the linearly organized i-list.
427: At the same time the notion of the i-list induces certain
428: peculiarities not found in other file system organizations.
429: For example, there is the question of who is to be charged
430: for the space a file occupies,
431: because all directory entries for a file have equal status.
432: Charging the owner of a file is unfair in general,
433: for one user may create a file, another may link to
434: it, and the first user may delete the file.
435: The first user is still the owner of the
436: file, but it should be charged
437: to the second user.
438: The simplest reasonably fair algorithm
439: seems to be to spread the charges
440: equally among users who have links to a file.
441: Many installations
442: avoid the
443: issue by not charging any fees at all.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.