|
|
1.1 root 1: .\" @(#)p4 6.2 (Berkeley) 5/9/86
2: .\"
3: .NH
4: LOW-LEVEL I/O
5: .PP
6: This section describes the
7: bottom level of I/O on the
8: .UC UNIX
9: system.
10: The lowest level of I/O in
11: .UC UNIX
12: provides no buffering or any other services;
13: it is in fact a direct entry into the operating system.
14: You are entirely on your own,
15: but on the other hand,
16: you have the most control over what happens.
17: And since the calls and usage are quite simple,
18: this isn't as bad as it sounds.
19: .NH 2
20: File Descriptors
21: .PP
22: In the
23: .UC UNIX
24: operating system,
25: all input and output is done
26: by reading or writing files,
27: because all peripheral devices, even the user's terminal,
28: are files in the file system.
29: This means that a single, homogeneous interface
30: handles all communication between a program and peripheral devices.
31: .PP
32: In the most general case,
33: before reading or writing a file,
34: it is necessary to inform the system
35: of your intent to do so,
36: a process called
37: ``opening'' the file.
38: If you are going to write on a file,
39: it may also be necessary to create it.
40: The system checks your right to do so
41: (Does the file exist?
42: Do you have permission to access it?),
43: and if all is well,
44: returns a small positive integer
45: called a
46: .ul
47: file descriptor.
48: Whenever I/O is to be done on the file,
49: the file descriptor is used instead of the name to identify the file.
50: (This is roughly analogous to the use of
51: .UC READ(5,...)
52: and
53: .UC WRITE(6,...)
54: in Fortran.)
55: All
56: information about an open file is maintained by the system;
57: the user program refers to the file
58: only
59: by the file descriptor.
60: .PP
61: The file pointers discussed in section 3
62: are similar in spirit to file descriptors,
63: but file descriptors are more fundamental.
64: A file pointer is a pointer to a structure that contains,
65: among other things, the file descriptor for the file in question.
66: .PP
67: Since input and output involving the user's terminal
68: are so common,
69: special arrangements exist to make this convenient.
70: When the command interpreter (the
71: ``shell'')
72: runs a program,
73: it opens
74: three files, with file descriptors 0, 1, and 2,
75: called the standard input,
76: the standard output, and the standard error output.
77: All of these are normally connected to the terminal,
78: so if a program reads file descriptor 0
79: and writes file descriptors 1 and 2,
80: it can do terminal I/O
81: without worrying about opening the files.
82: .PP
83: If I/O is redirected
84: to and from files with
85: .UL <
86: and
87: .UL > ,
88: as in
89: .P1
90: prog <infile >outfile
91: .P2
92: the shell changes the default assignments for file descriptors
93: 0 and 1
94: from the terminal to the named files.
95: Similar observations hold if the input or output is associated with a pipe.
96: Normally file descriptor 2 remains attached to the terminal,
97: so error messages can go there.
98: In all cases,
99: the file assignments are changed by the shell,
100: not by the program.
101: The program does not need to know where its input
102: comes from nor where its output goes,
103: so long as it uses file 0 for input and 1 and 2 for output.
104: .NH 2
105: Read and Write
106: .PP
107: All input and output is done by
108: two functions called
109: .UL read
110: and
111: .UL write .
112: For both, the first argument is a file descriptor.
113: The second argument is a buffer in your program where the data is to
114: come from or go to.
115: The third argument is the number of bytes to be transferred.
116: The calls are
117: .P1
118: n_read = read(fd, buf, n);
119:
120: n_written = write(fd, buf, n);
121: .P2
122: Each call returns a byte count
123: which is the number of bytes actually transferred.
124: On reading,
125: the number of bytes returned may be less than
126: the number asked for,
127: because fewer than
128: .UL n
129: bytes remained to be read.
130: (When the file is a terminal,
131: .UL read
132: normally reads only up to the next newline,
133: which is generally less than what was requested.)
134: A return value of zero bytes implies end of file,
135: and
136: .UL -1
137: indicates an error of some sort.
138: For writing, the returned value is the number of bytes
139: actually written;
140: it is generally an error if this isn't equal
141: to the number supposed to be written.
142: .PP
143: The number of bytes to be read or written is quite arbitrary.
144: The two most common values are
145: 1,
146: which means one character at a time
147: (``unbuffered''),
148: and
149: 512,
150: which corresponds to a physical blocksize on many peripheral devices.
151: This latter size will be most efficient,
152: but even character at a time I/O
153: is not inordinately expensive.
154: .PP
155: Putting these facts together,
156: we can write a simple program to copy
157: its input to its output.
158: This program will copy anything to anything,
159: since the input and output can be redirected to any file or device.
160: .P1
161: #define BUFSIZE 512 /* best size for PDP-11 UNIX */
162:
163: main() /* copy input to output */
164: {
165: char buf[BUFSIZE];
166: int n;
167:
168: while ((n = read(0, buf, BUFSIZE)) > 0)
169: write(1, buf, n);
170: exit(0);
171: }
172: .P2
173: If the file size is not a multiple of
174: .UL BUFSIZE ,
175: some
176: .UL read
177: will return a smaller number of bytes
178: to be written by
179: .UL write ;
180: the next call to
181: .UL read
182: after that
183: will return zero.
184: .PP
185: It is instructive to see how
186: .UL read
187: and
188: .UL write
189: can be used to construct
190: higher level routines like
191: .UL getchar ,
192: .UL putchar ,
193: etc.
194: For example,
195: here is a version of
196: .UL getchar
197: which does unbuffered input.
198: .P1
199: #define CMASK 0377 /* for making char's > 0 */
200:
201: getchar() /* unbuffered single character input */
202: {
203: char c;
204:
205: return((read(0, &c, 1) > 0) ? c & CMASK : EOF);
206: }
207: .P2
208: .UL c
209: .ul
210: must
211: be declared
212: .UL char ,
213: because
214: .UL read
215: accepts a character pointer.
216: The character being returned must be masked with
217: .UL 0377
218: to ensure that it is positive;
219: otherwise sign extension may make it negative.
220: (The constant
221: .UL 0377
222: is appropriate for the
223: .UC PDP -11
224: but not necessarily for other machines.)
225: .PP
226: The second version of
227: .UL getchar
228: does input in big chunks,
229: and hands out the characters one at a time.
230: .P1
231: #define CMASK 0377 /* for making char's > 0 */
232: #define BUFSIZE 512
233:
234: getchar() /* buffered version */
235: {
236: static char buf[BUFSIZE];
237: static char *bufp = buf;
238: static int n = 0;
239:
240: if (n == 0) { /* buffer is empty */
241: n = read(0, buf, BUFSIZE);
242: bufp = buf;
243: }
244: return((--n >= 0) ? *bufp++ & CMASK : EOF);
245: }
246: .P2
247: .NH 2
248: Open, Creat, Close, Unlink
249: .PP
250: Other than the default
251: standard input, output and error files,
252: you must explicitly open files in order to
253: read or write them.
254: There are two system entry points for this,
255: .UL open
256: and
257: .UL creat
258: [sic].
259: .PP
260: .UL open
261: is rather like the
262: .UL fopen
263: discussed in the previous section,
264: except that instead of returning a file pointer,
265: it returns a file descriptor,
266: which is just an
267: .UL int .
268: .P1
269: int fd;
270:
271: fd = open(name, rwmode);
272: .P2
273: As with
274: .UL fopen ,
275: the
276: .UL name
277: argument
278: is a character string corresponding to the external file name.
279: The access mode argument
280: is different, however:
281: .UL rwmode
282: is 0 for read, 1 for write, and 2 for read and write access.
283: .UL open
284: returns
285: .UL -1
286: if any error occurs;
287: otherwise it returns a valid file descriptor.
288: .PP
289: It is an error to
290: try to
291: .UL open
292: a file that does not exist.
293: The entry point
294: .UL creat
295: is provided to create new files,
296: or to re-write old ones.
297: .P1
298: fd = creat(name, pmode);
299: .P2
300: returns a file descriptor
301: if it was able to create the file
302: called
303: .UL name ,
304: and
305: .UL -1
306: if not.
307: If the file
308: already exists,
309: .UL creat
310: will truncate it to zero length;
311: it is not an error to
312: .UL creat
313: a file that already exists.
314: .PP
315: If the file is brand new,
316: .UL creat
317: creates it with the
318: .ul
319: protection mode
320: specified by
321: the
322: .UL pmode
323: argument.
324: In the
325: .UC UNIX
326: file system,
327: there are nine bits of protection information
328: associated with a file,
329: controlling read, write and execute permission for
330: the owner of the file,
331: for the owner's group,
332: and for all others.
333: Thus a three-digit octal number
334: is most convenient for specifying the permissions.
335: For example,
336: 0755
337: specifies read, write and execute permission for the owner,
338: and read and execute permission for the group and everyone else.
339: .PP
340: To illustrate,
341: here is a simplified version of
342: the
343: .UC UNIX
344: utility
345: .IT cp ,
346: a program which copies one file to another.
347: (The main simplification is that our version
348: copies only one file,
349: and does not permit the second argument
350: to be a directory.)
351: .P1
352: #define NULL 0
353: #define BUFSIZE 512
354: #define PMODE 0644 /* RW for owner, R for group, others */
355:
356: main(argc, argv) /* cp: copy f1 to f2 */
357: int argc;
358: char *argv[];
359: {
360: int f1, f2, n;
361: char buf[BUFSIZE];
362:
363: if (argc != 3)
364: error("Usage: cp from to", NULL);
365: if ((f1 = open(argv[1], 0)) == -1)
366: error("cp: can't open %s", argv[1]);
367: if ((f2 = creat(argv[2], PMODE)) == -1)
368: error("cp: can't create %s", argv[2]);
369:
370: while ((n = read(f1, buf, BUFSIZE)) > 0)
371: if (write(f2, buf, n) != n)
372: error("cp: write error", NULL);
373: exit(0);
374: }
375: .P2
376: .P1
377: error(s1, s2) /* print error message and die */
378: char *s1, *s2;
379: {
380: printf(s1, s2);
381: printf("\en");
382: exit(1);
383: }
384: .P2
385: .PP
386: As we said earlier,
387: there is a limit (typically 15-25)
388: on the number of files which a program
389: may have open simultaneously.
390: Accordingly, any program which intends to process
391: many files must be prepared to re-use
392: file descriptors.
393: The routine
394: .UL close
395: breaks the connection between a file descriptor
396: and an open file,
397: and frees the
398: file descriptor for use with some other file.
399: Termination of a program
400: via
401: .UL exit
402: or return from the main program closes all open files.
403: .PP
404: The function
405: .UL unlink(filename)
406: removes the file
407: .UL filename
408: from the file system.
409: .NH 2
410: Random Access \(em Seek and Lseek
411: .PP
412: File I/O is normally sequential:
413: each
414: .UL read
415: or
416: .UL write
417: takes place at a position in the file
418: right after the previous one.
419: When necessary, however,
420: a file can be read or written in any arbitrary order.
421: The
422: system call
423: .UL lseek
424: provides a way to move around in
425: a file without actually reading
426: or writing:
427: .P1
428: lseek(fd, offset, origin);
429: .P2
430: forces the current position in the file
431: whose descriptor is
432: .UL fd
433: to move to position
434: .UL offset ,
435: which is taken relative to the location
436: specified by
437: .UL origin .
438: Subsequent reading or writing will begin at that position.
439: .UL offset
440: is
441: a
442: .UL long ;
443: .UL fd
444: and
445: .UL origin
446: are
447: .UL int 's.
448: .UL origin
449: can be 0, 1, or 2 to specify that
450: .UL offset
451: is to be
452: measured from
453: the beginning, from the current position, or from the
454: end of the file respectively.
455: For example,
456: to append to a file,
457: seek to the end before writing:
458: .P1
459: lseek(fd, 0L, 2);
460: .P2
461: To get back to the beginning (``rewind''),
462: .P1
463: lseek(fd, 0L, 0);
464: .P2
465: Notice the
466: .UL 0L
467: argument;
468: it could also be written as
469: .UL (long)\ 0 .
470: .PP
471: With
472: .UL lseek ,
473: it is possible to treat files more or less like large arrays,
474: at the price of slower access.
475: For example, the following simple function reads any number of bytes
476: from any arbitrary place in a file.
477: .P1
478: get(fd, pos, buf, n) /* read n bytes from position pos */
479: int fd, n;
480: long pos;
481: char *buf;
482: {
483: lseek(fd, pos, 0); /* get to pos */
484: return(read(fd, buf, n));
485: }
486: .P2
487: .PP
488: In pre-version 7
489: .UC UNIX ,
490: the basic entry point to the I/O system
491: is called
492: .UL seek .
493: .UL seek
494: is identical to
495: .UL lseek ,
496: except that its
497: .UL offset
498: argument is an
499: .UL int
500: rather than a
501: .UL long .
502: Accordingly,
503: since
504: .UC PDP -11
505: integers have only 16 bits,
506: the
507: .UL offset
508: specified
509: for
510: .UL seek
511: is limited to 65,535;
512: for this reason,
513: .UL origin
514: values of 3, 4, 5 cause
515: .UL seek
516: to multiply the given offset by 512
517: (the number of bytes in one physical block)
518: and then interpret
519: .UL origin
520: as if it were 0, 1, or 2 respectively.
521: Thus to get to an arbitrary place in a large file
522: requires two seeks, first one which selects
523: the block, then one which
524: has
525: .UL origin
526: equal to 1 and moves to the desired byte within the block.
527: .NH 2
528: Error Processing
529: .PP
530: The routines discussed in this section,
531: and in fact all the routines which are direct entries into the system
532: can incur errors.
533: Usually they indicate an error by returning a value of \-1.
534: Sometimes it is nice to know what sort of error occurred;
535: for this purpose all these routines, when appropriate,
536: leave an error number in the external cell
537: .UL errno .
538: The meanings of the various error numbers are
539: listed
540: in the introduction to Section II
541: of the
542: .I
543: .UC UNIX
544: Programmer's Manual,
545: .R
546: so your program can, for example, determine if
547: an attempt to open a file failed because it did not exist
548: or because the user lacked permission to read it.
549: Perhaps more commonly,
550: you may want to print out the
551: reason for failure.
552: The routine
553: .UL perror
554: will print a message associated with the value
555: of
556: .UL errno ;
557: more generally,
558: .UL sys\_errno
559: is an array of character strings which can be indexed
560: by
561: .UL errno
562: and printed by your program.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.