Annotation of 43BSDReno/share/doc/ps2/03.uprog/p4, revision 1.1.1.1

1.1       root        1: .\"    @(#)p4  6.2 (Berkeley) 5/9/86
                      2: .\"
                      3: .NH
                      4: LOW-LEVEL I/O
                      5: .PP
                      6: This section describes the 
                      7: bottom level of I/O on the
                      8: .UC UNIX
                      9: system.
                     10: The lowest level of I/O in
                     11: .UC UNIX
                     12: provides no buffering or any other services;
                     13: it is in fact a direct entry into the operating system.
                     14: You are entirely on your own,
                     15: but on the other hand,
                     16: you have the most control over what happens.
                     17: And since the calls and usage are quite simple,
                     18: this isn't as bad as it sounds.
                     19: .NH 2
                     20: File Descriptors
                     21: .PP
                     22: In the
                     23: .UC UNIX
                     24: operating system,
                     25: all input and output is done
                     26: by reading or writing files,
                     27: because all peripheral devices, even the user's terminal,
                     28: are files in the file system.
                     29: This means that a single, homogeneous interface
                     30: handles all communication between a program and peripheral devices.
                     31: .PP
                     32: In the most general case,
                     33: before reading or writing a file,
                     34: it is necessary to inform the system
                     35: of your intent to do so,
                     36: a process called
                     37: ``opening'' the file.
                     38: If you are going to write on a file,
                     39: it may also be necessary to create it.
                     40: The system checks your right to do so
                     41: (Does the file exist?
                     42: Do you have permission to access it?),
                     43: and if all is well,
                     44: returns a small positive integer
                     45: called a
                     46: .ul
                     47: file descriptor.
                     48: Whenever I/O is to be done on the file,
                     49: the file descriptor is used instead of the name to identify the file.
                     50: (This is roughly analogous to the use of
                     51: .UC READ(5,...)
                     52: and
                     53: .UC WRITE(6,...)
                     54: in Fortran.)
                     55: All
                     56: information about an open file is maintained by the system;
                     57: the user program refers to the file
                     58: only
                     59: by the file descriptor.
                     60: .PP
                     61: The file pointers discussed in section 3
                     62: are similar in spirit to file descriptors,
                     63: but file descriptors are more fundamental.
                     64: A file pointer is a pointer to a structure that contains,
                     65: among other things, the file descriptor for the file in question.
                     66: .PP
                     67: Since input and output involving the user's terminal
                     68: are so common,
                     69: special arrangements exist to make this convenient.
                     70: When the command interpreter (the
                     71: ``shell'')
                     72: runs a program,
                     73: it opens
                     74: three files, with file descriptors 0, 1, and 2,
                     75: called the standard input,
                     76: the standard output, and the standard error output.
                     77: All of these are normally connected to the terminal,
                     78: so if a program reads file descriptor 0
                     79: and writes file descriptors 1 and 2,
                     80: it can do terminal I/O
                     81: without worrying about opening the files.
                     82: .PP
                     83: If I/O is redirected 
                     84: to and from files with
                     85: .UL < 
                     86: and
                     87: .UL > ,
                     88: as in
                     89: .P1
                     90: prog <infile >outfile
                     91: .P2
                     92: the shell changes the default assignments for file descriptors
                     93: 0 and 1
                     94: from the terminal to the named files.
                     95: Similar observations hold if the input or output is associated with a pipe.
                     96: Normally file descriptor 2 remains attached to the terminal,
                     97: so error messages can go there.
                     98: In all cases,
                     99: the file assignments are changed by the shell,
                    100: not by the program.
                    101: The program does not need to know where its input
                    102: comes from nor where its output goes,
                    103: so long as it uses file 0 for input and 1 and 2 for output.
                    104: .NH 2
                    105: Read and Write
                    106: .PP
                    107: All input and output is done by
                    108: two functions called
                    109: .UL read
                    110: and
                    111: .UL write .
                    112: For both, the first argument is a file descriptor.
                    113: The second argument is a buffer in your program where the data is to
                    114: come from or go to.
                    115: The third argument is the number of bytes to be transferred.
                    116: The calls are
                    117: .P1
                    118: n_read = read(fd, buf, n);
                    119: 
                    120: n_written = write(fd, buf, n);
                    121: .P2
                    122: Each call returns a byte count
                    123: which is the number of bytes actually transferred.
                    124: On reading,
                    125: the number of bytes returned may be less than
                    126: the number asked for,
                    127: because fewer than
                    128: .UL n
                    129: bytes remained to be read.
                    130: (When the file is a terminal,
                    131: .UL read
                    132: normally reads only up to the next newline,
                    133: which is generally less than what was requested.)
                    134: A return value of zero bytes implies end of file,
                    135: and
                    136: .UL -1
                    137: indicates an error of some sort.
                    138: For writing, the returned value is the number of bytes
                    139: actually written;
                    140: it is generally an error if this isn't equal
                    141: to the number supposed to be written.
                    142: .PP
                    143: The number of bytes to be read or written is quite arbitrary.
                    144: The two most common values are 
                    145: 1,
                    146: which means one character at a time
                    147: (``unbuffered''),
                    148: and
                    149: 512,
                    150: which corresponds to a physical blocksize on many peripheral devices.
                    151: This latter size will be most efficient,
                    152: but even character at a time I/O
                    153: is not inordinately expensive.
                    154: .PP
                    155: Putting these facts together,
                    156: we can write a simple program to copy
                    157: its input to its output.
                    158: This program will copy anything to anything,
                    159: since the input and output can be redirected to any file or device.
                    160: .P1
                    161: #define        BUFSIZE 512     /* best size for PDP-11 UNIX */
                    162: 
                    163: main() /* copy input to output */
                    164: {
                    165:        char    buf[BUFSIZE];
                    166:        int     n;
                    167: 
                    168:        while ((n = read(0, buf, BUFSIZE)) > 0)
                    169:                write(1, buf, n);
                    170:        exit(0);
                    171: }
                    172: .P2
                    173: If the file size is not a multiple of
                    174: .UL BUFSIZE ,
                    175: some 
                    176: .UL read
                    177: will return a smaller number of bytes
                    178: to be written by
                    179: .UL write ;
                    180: the next call to 
                    181: .UL read
                    182: after that
                    183: will return zero.
                    184: .PP
                    185: It is instructive to see how
                    186: .UL read
                    187: and
                    188: .UL write
                    189: can be used to construct
                    190: higher level routines like
                    191: .UL getchar ,
                    192: .UL putchar ,
                    193: etc.
                    194: For example,
                    195: here is a version of
                    196: .UL getchar
                    197: which does unbuffered input.
                    198: .P1
                    199: #define        CMASK   0377    /* for making char's > 0 */
                    200: 
                    201: getchar()      /* unbuffered single character input */
                    202: {
                    203:        char c;
                    204: 
                    205:        return((read(0, &c, 1) > 0) ? c & CMASK : EOF);
                    206: }
                    207: .P2
                    208: .UL c
                    209: .ul
                    210: must
                    211: be declared
                    212: .UL char ,
                    213: because
                    214: .UL read
                    215: accepts a character pointer.
                    216: The character being returned must be masked with
                    217: .UL 0377
                    218: to ensure that it is positive;
                    219: otherwise sign extension may make it negative.
                    220: (The constant
                    221: .UL 0377
                    222: is appropriate for the
                    223: .UC PDP -11
                    224: but not necessarily for other machines.)
                    225: .PP
                    226: The second version of
                    227: .UL getchar
                    228: does input in big chunks,
                    229: and hands out the characters one at a time.
                    230: .P1
                    231: #define        CMASK   0377    /* for making char's > 0 */
                    232: #define        BUFSIZE 512
                    233: 
                    234: getchar()      /* buffered version */
                    235: {
                    236:        static char     buf[BUFSIZE];
                    237:        static char     *bufp = buf;
                    238:        static int      n = 0;
                    239: 
                    240:        if (n == 0) {   /* buffer is empty */
                    241:                n = read(0, buf, BUFSIZE);
                    242:                bufp = buf;
                    243:        }
                    244:        return((--n >= 0) ? *bufp++ & CMASK : EOF);
                    245: }
                    246: .P2
                    247: .NH 2
                    248: Open, Creat, Close, Unlink
                    249: .PP
                    250: Other than the default
                    251: standard input, output and error files,
                    252: you must explicitly open files in order to
                    253: read or write them.
                    254: There are two system entry points for this,
                    255: .UL open
                    256: and
                    257: .UL creat 
                    258: [sic].
                    259: .PP
                    260: .UL open
                    261: is rather like the
                    262: .UL  fopen
                    263: discussed in the previous section,
                    264: except that instead of returning a file pointer,
                    265: it returns a file descriptor,
                    266: which is just an
                    267: .UL int .
                    268: .P1
                    269: int fd;
                    270: 
                    271: fd = open(name, rwmode);
                    272: .P2
                    273: As with
                    274: .UL fopen ,
                    275: the
                    276: .UL name
                    277: argument
                    278: is a character string corresponding to the external file name.
                    279: The access mode argument
                    280: is different, however:
                    281: .UL rwmode
                    282: is 0 for read, 1 for write, and 2 for read and write access.
                    283: .UL open
                    284: returns
                    285: .UL -1
                    286: if any error occurs;
                    287: otherwise it returns a valid file descriptor.
                    288: .PP
                    289: It is an error to 
                    290: try to
                    291: .UL open
                    292: a file that does not exist.
                    293: The entry point
                    294: .UL creat
                    295: is provided to create new files,
                    296: or to re-write old ones.
                    297: .P1
                    298: fd = creat(name, pmode);
                    299: .P2
                    300: returns a file descriptor
                    301: if it was able to create the file
                    302: called
                    303: .UL name ,
                    304: and
                    305: .UL -1
                    306: if not.
                    307: If the file
                    308: already exists,
                    309: .UL creat
                    310: will truncate it to zero length;
                    311: it is not an error to
                    312: .UL creat
                    313: a file that already exists.
                    314: .PP
                    315: If the file is brand new,
                    316: .UL creat
                    317: creates it with the
                    318: .ul
                    319: protection mode 
                    320: specified by
                    321: the
                    322: .UL pmode
                    323: argument.
                    324: In the
                    325: .UC UNIX
                    326: file system,
                    327: there are nine bits of protection information
                    328: associated with a file,
                    329: controlling read, write and execute permission for
                    330: the owner of the file,
                    331: for the owner's group,
                    332: and for all others.
                    333: Thus a three-digit octal number
                    334: is most convenient for specifying the permissions.
                    335: For example,
                    336: 0755
                    337: specifies read, write and execute permission for the owner,
                    338: and read and execute permission for the group and everyone else.
                    339: .PP
                    340: To illustrate,
                    341: here is a simplified version of
                    342: the
                    343: .UC UNIX
                    344: utility
                    345: .IT cp ,
                    346: a program which copies one file to another.
                    347: (The main simplification is that our version
                    348: copies only one file,
                    349: and does not permit the second argument
                    350: to be a directory.)
                    351: .P1
                    352: #define NULL 0
                    353: #define BUFSIZE 512
                    354: #define PMODE 0644 /* RW for owner, R for group, others */
                    355: 
                    356: main(argc, argv)       /* cp: copy f1 to f2 */
                    357: int argc;
                    358: char *argv[];
                    359: {
                    360:        int     f1, f2, n;
                    361:        char    buf[BUFSIZE];
                    362: 
                    363:        if (argc != 3)
                    364:                error("Usage: cp from to", NULL);
                    365:        if ((f1 = open(argv[1], 0)) == -1)
                    366:                error("cp: can't open %s", argv[1]);
                    367:        if ((f2 = creat(argv[2], PMODE)) == -1)
                    368:                error("cp: can't create %s", argv[2]);
                    369: 
                    370:        while ((n = read(f1, buf, BUFSIZE)) > 0)
                    371:                if (write(f2, buf, n) != n)
                    372:                        error("cp: write error", NULL);
                    373:        exit(0);
                    374: }
                    375: .P2
                    376: .P1
                    377: error(s1, s2)  /* print error message and die */
                    378: char *s1, *s2;
                    379: {
                    380:        printf(s1, s2);
                    381:        printf("\en");
                    382:        exit(1);
                    383: }
                    384: .P2
                    385: .PP
                    386: As we said earlier,
                    387: there is a limit (typically 15-25)
                    388: on the number of files which a program
                    389: may have open simultaneously.
                    390: Accordingly, any program which intends to process
                    391: many files must be prepared to re-use
                    392: file descriptors.
                    393: The routine
                    394: .UL close
                    395: breaks the connection between a file descriptor
                    396: and an open file,
                    397: and frees the
                    398: file descriptor for use with some other file.
                    399: Termination of a program
                    400: via
                    401: .UL exit
                    402: or return from the main program closes all open files.
                    403: .PP
                    404: The function
                    405: .UL unlink(filename)
                    406: removes the file
                    407: .UL filename
                    408: from the file system.
                    409: .NH 2
                    410: Random Access \(em Seek and Lseek
                    411: .PP
                    412: File I/O is normally sequential:
                    413: each
                    414: .UL read
                    415: or
                    416: .UL write
                    417: takes place at a position in the file
                    418: right after the previous one.
                    419: When necessary, however,
                    420: a file can be read or written in any arbitrary order.
                    421: The
                    422: system call
                    423: .UL lseek
                    424: provides a way to move around in
                    425: a file without actually reading
                    426: or writing:
                    427: .P1
                    428: lseek(fd, offset, origin);
                    429: .P2
                    430: forces the current position in the file
                    431: whose descriptor is
                    432: .UL fd
                    433: to move to position
                    434: .UL offset ,
                    435: which is taken relative to the location
                    436: specified by
                    437: .UL origin .
                    438: Subsequent reading or writing will begin at that position.
                    439: .UL offset
                    440: is
                    441: a
                    442: .UL long ;
                    443: .UL fd
                    444: and
                    445: .UL origin
                    446: are
                    447: .UL int 's.
                    448: .UL origin
                    449: can be 0, 1, or 2 to specify that 
                    450: .UL offset
                    451: is to be
                    452: measured from
                    453: the beginning, from the current position, or from the
                    454: end of the file respectively.
                    455: For example,
                    456: to append to a file,
                    457: seek to the end before writing:
                    458: .P1
                    459: lseek(fd, 0L, 2);
                    460: .P2
                    461: To get back to the beginning (``rewind''),
                    462: .P1
                    463: lseek(fd, 0L, 0);
                    464: .P2
                    465: Notice the
                    466: .UL 0L
                    467: argument;
                    468: it could also be written as
                    469: .UL (long)\ 0 .
                    470: .PP
                    471: With 
                    472: .UL lseek ,
                    473: it is possible to treat files more or less like large arrays,
                    474: at the price of slower access.
                    475: For example, the following simple function reads any number of bytes
                    476: from any arbitrary place in a file.
                    477: .P1
                    478: get(fd, pos, buf, n) /* read n bytes from position pos */
                    479: int fd, n;
                    480: long pos;
                    481: char *buf;
                    482: {
                    483:        lseek(fd, pos, 0);      /* get to pos */
                    484:        return(read(fd, buf, n));
                    485: }
                    486: .P2
                    487: .PP
                    488: In pre-version 7
                    489: .UC UNIX ,
                    490: the basic entry point to the I/O system
                    491: is called
                    492: .UL seek .
                    493: .UL seek
                    494: is identical to
                    495: .UL lseek ,
                    496: except that its
                    497: .UL  offset 
                    498: argument is an
                    499: .UL int
                    500: rather than  a
                    501: .UL long .
                    502: Accordingly,
                    503: since
                    504: .UC PDP -11
                    505: integers have only 16 bits,
                    506: the
                    507: .UL offset
                    508: specified
                    509: for
                    510: .UL seek
                    511: is limited to 65,535;
                    512: for this reason,
                    513: .UL origin
                    514: values of 3, 4, 5 cause
                    515: .UL seek
                    516: to multiply the given offset by 512
                    517: (the number of bytes in one physical block)
                    518: and then interpret
                    519: .UL origin
                    520: as if it were 0, 1, or 2 respectively.
                    521: Thus to get to an arbitrary place in a large file
                    522: requires two seeks, first one which selects
                    523: the block, then one which
                    524: has
                    525: .UL origin
                    526: equal to 1 and moves to the desired byte within the block.
                    527: .NH 2
                    528: Error Processing
                    529: .PP
                    530: The routines discussed in this section,
                    531: and in fact all the routines which are direct entries into the system
                    532: can incur errors.
                    533: Usually they indicate an error by returning a value of \-1.
                    534: Sometimes it is nice to know what sort of error occurred;
                    535: for this purpose all these routines, when appropriate,
                    536: leave an error number in the external cell
                    537: .UL errno .
                    538: The meanings of the various error numbers are
                    539: listed
                    540: in the introduction to Section II
                    541: of the
                    542: .I
                    543: .UC UNIX
                    544: Programmer's Manual,
                    545: .R
                    546: so your program can, for example, determine if
                    547: an attempt to open a file failed because it did not exist
                    548: or because the user lacked permission to read it.
                    549: Perhaps more commonly,
                    550: you may want to print out the
                    551: reason for failure.
                    552: The routine
                    553: .UL perror
                    554: will print a message associated with the value
                    555: of
                    556: .UL errno ;
                    557: more generally,
                    558: .UL sys\_errno
                    559: is an array of character strings which can be indexed
                    560: by
                    561: .UL errno
                    562: and printed by your program.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.