|
|
1.1 ! root 1: .\" @(#)p4 6.2 (Berkeley) 5/9/86 ! 2: .\" ! 3: .NH ! 4: LOW-LEVEL I/O ! 5: .PP ! 6: This section describes the ! 7: bottom level of I/O on the ! 8: .UC UNIX ! 9: system. ! 10: The lowest level of I/O in ! 11: .UC UNIX ! 12: provides no buffering or any other services; ! 13: it is in fact a direct entry into the operating system. ! 14: You are entirely on your own, ! 15: but on the other hand, ! 16: you have the most control over what happens. ! 17: And since the calls and usage are quite simple, ! 18: this isn't as bad as it sounds. ! 19: .NH 2 ! 20: File Descriptors ! 21: .PP ! 22: In the ! 23: .UC UNIX ! 24: operating system, ! 25: all input and output is done ! 26: by reading or writing files, ! 27: because all peripheral devices, even the user's terminal, ! 28: are files in the file system. ! 29: This means that a single, homogeneous interface ! 30: handles all communication between a program and peripheral devices. ! 31: .PP ! 32: In the most general case, ! 33: before reading or writing a file, ! 34: it is necessary to inform the system ! 35: of your intent to do so, ! 36: a process called ! 37: ``opening'' the file. ! 38: If you are going to write on a file, ! 39: it may also be necessary to create it. ! 40: The system checks your right to do so ! 41: (Does the file exist? ! 42: Do you have permission to access it?), ! 43: and if all is well, ! 44: returns a small positive integer ! 45: called a ! 46: .ul ! 47: file descriptor. ! 48: Whenever I/O is to be done on the file, ! 49: the file descriptor is used instead of the name to identify the file. ! 50: (This is roughly analogous to the use of ! 51: .UC READ(5,...) ! 52: and ! 53: .UC WRITE(6,...) ! 54: in Fortran.) ! 55: All ! 56: information about an open file is maintained by the system; ! 57: the user program refers to the file ! 58: only ! 59: by the file descriptor. ! 60: .PP ! 61: The file pointers discussed in section 3 ! 62: are similar in spirit to file descriptors, ! 63: but file descriptors are more fundamental. ! 64: A file pointer is a pointer to a structure that contains, ! 65: among other things, the file descriptor for the file in question. ! 66: .PP ! 67: Since input and output involving the user's terminal ! 68: are so common, ! 69: special arrangements exist to make this convenient. ! 70: When the command interpreter (the ! 71: ``shell'') ! 72: runs a program, ! 73: it opens ! 74: three files, with file descriptors 0, 1, and 2, ! 75: called the standard input, ! 76: the standard output, and the standard error output. ! 77: All of these are normally connected to the terminal, ! 78: so if a program reads file descriptor 0 ! 79: and writes file descriptors 1 and 2, ! 80: it can do terminal I/O ! 81: without worrying about opening the files. ! 82: .PP ! 83: If I/O is redirected ! 84: to and from files with ! 85: .UL < ! 86: and ! 87: .UL > , ! 88: as in ! 89: .P1 ! 90: prog <infile >outfile ! 91: .P2 ! 92: the shell changes the default assignments for file descriptors ! 93: 0 and 1 ! 94: from the terminal to the named files. ! 95: Similar observations hold if the input or output is associated with a pipe. ! 96: Normally file descriptor 2 remains attached to the terminal, ! 97: so error messages can go there. ! 98: In all cases, ! 99: the file assignments are changed by the shell, ! 100: not by the program. ! 101: The program does not need to know where its input ! 102: comes from nor where its output goes, ! 103: so long as it uses file 0 for input and 1 and 2 for output. ! 104: .NH 2 ! 105: Read and Write ! 106: .PP ! 107: All input and output is done by ! 108: two functions called ! 109: .UL read ! 110: and ! 111: .UL write . ! 112: For both, the first argument is a file descriptor. ! 113: The second argument is a buffer in your program where the data is to ! 114: come from or go to. ! 115: The third argument is the number of bytes to be transferred. ! 116: The calls are ! 117: .P1 ! 118: n_read = read(fd, buf, n); ! 119: ! 120: n_written = write(fd, buf, n); ! 121: .P2 ! 122: Each call returns a byte count ! 123: which is the number of bytes actually transferred. ! 124: On reading, ! 125: the number of bytes returned may be less than ! 126: the number asked for, ! 127: because fewer than ! 128: .UL n ! 129: bytes remained to be read. ! 130: (When the file is a terminal, ! 131: .UL read ! 132: normally reads only up to the next newline, ! 133: which is generally less than what was requested.) ! 134: A return value of zero bytes implies end of file, ! 135: and ! 136: .UL -1 ! 137: indicates an error of some sort. ! 138: For writing, the returned value is the number of bytes ! 139: actually written; ! 140: it is generally an error if this isn't equal ! 141: to the number supposed to be written. ! 142: .PP ! 143: The number of bytes to be read or written is quite arbitrary. ! 144: The two most common values are ! 145: 1, ! 146: which means one character at a time ! 147: (``unbuffered''), ! 148: and ! 149: 512, ! 150: which corresponds to a physical blocksize on many peripheral devices. ! 151: This latter size will be most efficient, ! 152: but even character at a time I/O ! 153: is not inordinately expensive. ! 154: .PP ! 155: Putting these facts together, ! 156: we can write a simple program to copy ! 157: its input to its output. ! 158: This program will copy anything to anything, ! 159: since the input and output can be redirected to any file or device. ! 160: .P1 ! 161: #define BUFSIZE 512 /* best size for PDP-11 UNIX */ ! 162: ! 163: main() /* copy input to output */ ! 164: { ! 165: char buf[BUFSIZE]; ! 166: int n; ! 167: ! 168: while ((n = read(0, buf, BUFSIZE)) > 0) ! 169: write(1, buf, n); ! 170: exit(0); ! 171: } ! 172: .P2 ! 173: If the file size is not a multiple of ! 174: .UL BUFSIZE , ! 175: some ! 176: .UL read ! 177: will return a smaller number of bytes ! 178: to be written by ! 179: .UL write ; ! 180: the next call to ! 181: .UL read ! 182: after that ! 183: will return zero. ! 184: .PP ! 185: It is instructive to see how ! 186: .UL read ! 187: and ! 188: .UL write ! 189: can be used to construct ! 190: higher level routines like ! 191: .UL getchar , ! 192: .UL putchar , ! 193: etc. ! 194: For example, ! 195: here is a version of ! 196: .UL getchar ! 197: which does unbuffered input. ! 198: .P1 ! 199: #define CMASK 0377 /* for making char's > 0 */ ! 200: ! 201: getchar() /* unbuffered single character input */ ! 202: { ! 203: char c; ! 204: ! 205: return((read(0, &c, 1) > 0) ? c & CMASK : EOF); ! 206: } ! 207: .P2 ! 208: .UL c ! 209: .ul ! 210: must ! 211: be declared ! 212: .UL char , ! 213: because ! 214: .UL read ! 215: accepts a character pointer. ! 216: The character being returned must be masked with ! 217: .UL 0377 ! 218: to ensure that it is positive; ! 219: otherwise sign extension may make it negative. ! 220: (The constant ! 221: .UL 0377 ! 222: is appropriate for the ! 223: .UC PDP -11 ! 224: but not necessarily for other machines.) ! 225: .PP ! 226: The second version of ! 227: .UL getchar ! 228: does input in big chunks, ! 229: and hands out the characters one at a time. ! 230: .P1 ! 231: #define CMASK 0377 /* for making char's > 0 */ ! 232: #define BUFSIZE 512 ! 233: ! 234: getchar() /* buffered version */ ! 235: { ! 236: static char buf[BUFSIZE]; ! 237: static char *bufp = buf; ! 238: static int n = 0; ! 239: ! 240: if (n == 0) { /* buffer is empty */ ! 241: n = read(0, buf, BUFSIZE); ! 242: bufp = buf; ! 243: } ! 244: return((--n >= 0) ? *bufp++ & CMASK : EOF); ! 245: } ! 246: .P2 ! 247: .NH 2 ! 248: Open, Creat, Close, Unlink ! 249: .PP ! 250: Other than the default ! 251: standard input, output and error files, ! 252: you must explicitly open files in order to ! 253: read or write them. ! 254: There are two system entry points for this, ! 255: .UL open ! 256: and ! 257: .UL creat ! 258: [sic]. ! 259: .PP ! 260: .UL open ! 261: is rather like the ! 262: .UL fopen ! 263: discussed in the previous section, ! 264: except that instead of returning a file pointer, ! 265: it returns a file descriptor, ! 266: which is just an ! 267: .UL int . ! 268: .P1 ! 269: int fd; ! 270: ! 271: fd = open(name, rwmode); ! 272: .P2 ! 273: As with ! 274: .UL fopen , ! 275: the ! 276: .UL name ! 277: argument ! 278: is a character string corresponding to the external file name. ! 279: The access mode argument ! 280: is different, however: ! 281: .UL rwmode ! 282: is 0 for read, 1 for write, and 2 for read and write access. ! 283: .UL open ! 284: returns ! 285: .UL -1 ! 286: if any error occurs; ! 287: otherwise it returns a valid file descriptor. ! 288: .PP ! 289: It is an error to ! 290: try to ! 291: .UL open ! 292: a file that does not exist. ! 293: The entry point ! 294: .UL creat ! 295: is provided to create new files, ! 296: or to re-write old ones. ! 297: .P1 ! 298: fd = creat(name, pmode); ! 299: .P2 ! 300: returns a file descriptor ! 301: if it was able to create the file ! 302: called ! 303: .UL name , ! 304: and ! 305: .UL -1 ! 306: if not. ! 307: If the file ! 308: already exists, ! 309: .UL creat ! 310: will truncate it to zero length; ! 311: it is not an error to ! 312: .UL creat ! 313: a file that already exists. ! 314: .PP ! 315: If the file is brand new, ! 316: .UL creat ! 317: creates it with the ! 318: .ul ! 319: protection mode ! 320: specified by ! 321: the ! 322: .UL pmode ! 323: argument. ! 324: In the ! 325: .UC UNIX ! 326: file system, ! 327: there are nine bits of protection information ! 328: associated with a file, ! 329: controlling read, write and execute permission for ! 330: the owner of the file, ! 331: for the owner's group, ! 332: and for all others. ! 333: Thus a three-digit octal number ! 334: is most convenient for specifying the permissions. ! 335: For example, ! 336: 0755 ! 337: specifies read, write and execute permission for the owner, ! 338: and read and execute permission for the group and everyone else. ! 339: .PP ! 340: To illustrate, ! 341: here is a simplified version of ! 342: the ! 343: .UC UNIX ! 344: utility ! 345: .IT cp , ! 346: a program which copies one file to another. ! 347: (The main simplification is that our version ! 348: copies only one file, ! 349: and does not permit the second argument ! 350: to be a directory.) ! 351: .P1 ! 352: #define NULL 0 ! 353: #define BUFSIZE 512 ! 354: #define PMODE 0644 /* RW for owner, R for group, others */ ! 355: ! 356: main(argc, argv) /* cp: copy f1 to f2 */ ! 357: int argc; ! 358: char *argv[]; ! 359: { ! 360: int f1, f2, n; ! 361: char buf[BUFSIZE]; ! 362: ! 363: if (argc != 3) ! 364: error("Usage: cp from to", NULL); ! 365: if ((f1 = open(argv[1], 0)) == -1) ! 366: error("cp: can't open %s", argv[1]); ! 367: if ((f2 = creat(argv[2], PMODE)) == -1) ! 368: error("cp: can't create %s", argv[2]); ! 369: ! 370: while ((n = read(f1, buf, BUFSIZE)) > 0) ! 371: if (write(f2, buf, n) != n) ! 372: error("cp: write error", NULL); ! 373: exit(0); ! 374: } ! 375: .P2 ! 376: .P1 ! 377: error(s1, s2) /* print error message and die */ ! 378: char *s1, *s2; ! 379: { ! 380: printf(s1, s2); ! 381: printf("\en"); ! 382: exit(1); ! 383: } ! 384: .P2 ! 385: .PP ! 386: As we said earlier, ! 387: there is a limit (typically 15-25) ! 388: on the number of files which a program ! 389: may have open simultaneously. ! 390: Accordingly, any program which intends to process ! 391: many files must be prepared to re-use ! 392: file descriptors. ! 393: The routine ! 394: .UL close ! 395: breaks the connection between a file descriptor ! 396: and an open file, ! 397: and frees the ! 398: file descriptor for use with some other file. ! 399: Termination of a program ! 400: via ! 401: .UL exit ! 402: or return from the main program closes all open files. ! 403: .PP ! 404: The function ! 405: .UL unlink(filename) ! 406: removes the file ! 407: .UL filename ! 408: from the file system. ! 409: .NH 2 ! 410: Random Access \(em Seek and Lseek ! 411: .PP ! 412: File I/O is normally sequential: ! 413: each ! 414: .UL read ! 415: or ! 416: .UL write ! 417: takes place at a position in the file ! 418: right after the previous one. ! 419: When necessary, however, ! 420: a file can be read or written in any arbitrary order. ! 421: The ! 422: system call ! 423: .UL lseek ! 424: provides a way to move around in ! 425: a file without actually reading ! 426: or writing: ! 427: .P1 ! 428: lseek(fd, offset, origin); ! 429: .P2 ! 430: forces the current position in the file ! 431: whose descriptor is ! 432: .UL fd ! 433: to move to position ! 434: .UL offset , ! 435: which is taken relative to the location ! 436: specified by ! 437: .UL origin . ! 438: Subsequent reading or writing will begin at that position. ! 439: .UL offset ! 440: is ! 441: a ! 442: .UL long ; ! 443: .UL fd ! 444: and ! 445: .UL origin ! 446: are ! 447: .UL int 's. ! 448: .UL origin ! 449: can be 0, 1, or 2 to specify that ! 450: .UL offset ! 451: is to be ! 452: measured from ! 453: the beginning, from the current position, or from the ! 454: end of the file respectively. ! 455: For example, ! 456: to append to a file, ! 457: seek to the end before writing: ! 458: .P1 ! 459: lseek(fd, 0L, 2); ! 460: .P2 ! 461: To get back to the beginning (``rewind''), ! 462: .P1 ! 463: lseek(fd, 0L, 0); ! 464: .P2 ! 465: Notice the ! 466: .UL 0L ! 467: argument; ! 468: it could also be written as ! 469: .UL (long)\ 0 . ! 470: .PP ! 471: With ! 472: .UL lseek , ! 473: it is possible to treat files more or less like large arrays, ! 474: at the price of slower access. ! 475: For example, the following simple function reads any number of bytes ! 476: from any arbitrary place in a file. ! 477: .P1 ! 478: get(fd, pos, buf, n) /* read n bytes from position pos */ ! 479: int fd, n; ! 480: long pos; ! 481: char *buf; ! 482: { ! 483: lseek(fd, pos, 0); /* get to pos */ ! 484: return(read(fd, buf, n)); ! 485: } ! 486: .P2 ! 487: .PP ! 488: In pre-version 7 ! 489: .UC UNIX , ! 490: the basic entry point to the I/O system ! 491: is called ! 492: .UL seek . ! 493: .UL seek ! 494: is identical to ! 495: .UL lseek , ! 496: except that its ! 497: .UL offset ! 498: argument is an ! 499: .UL int ! 500: rather than a ! 501: .UL long . ! 502: Accordingly, ! 503: since ! 504: .UC PDP -11 ! 505: integers have only 16 bits, ! 506: the ! 507: .UL offset ! 508: specified ! 509: for ! 510: .UL seek ! 511: is limited to 65,535; ! 512: for this reason, ! 513: .UL origin ! 514: values of 3, 4, 5 cause ! 515: .UL seek ! 516: to multiply the given offset by 512 ! 517: (the number of bytes in one physical block) ! 518: and then interpret ! 519: .UL origin ! 520: as if it were 0, 1, or 2 respectively. ! 521: Thus to get to an arbitrary place in a large file ! 522: requires two seeks, first one which selects ! 523: the block, then one which ! 524: has ! 525: .UL origin ! 526: equal to 1 and moves to the desired byte within the block. ! 527: .NH 2 ! 528: Error Processing ! 529: .PP ! 530: The routines discussed in this section, ! 531: and in fact all the routines which are direct entries into the system ! 532: can incur errors. ! 533: Usually they indicate an error by returning a value of \-1. ! 534: Sometimes it is nice to know what sort of error occurred; ! 535: for this purpose all these routines, when appropriate, ! 536: leave an error number in the external cell ! 537: .UL errno . ! 538: The meanings of the various error numbers are ! 539: listed ! 540: in the introduction to Section II ! 541: of the ! 542: .I ! 543: .UC UNIX ! 544: Programmer's Manual, ! 545: .R ! 546: so your program can, for example, determine if ! 547: an attempt to open a file failed because it did not exist ! 548: or because the user lacked permission to read it. ! 549: Perhaps more commonly, ! 550: you may want to print out the ! 551: reason for failure. ! 552: The routine ! 553: .UL perror ! 554: will print a message associated with the value ! 555: of ! 556: .UL errno ; ! 557: more generally, ! 558: .UL sys\_errno ! 559: is an array of character strings which can be indexed ! 560: by ! 561: .UL errno ! 562: and printed by your program.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.