43BSDReno/share/doc/ps2/01.cacm/p2 - annotate

Return to p2 CVS log
Up to [CSRG BSD Unix] / 43BSDReno / share / doc / ps2 / 01.cacm
Annotation of 43BSDReno/share/doc/ps2/01.cacm/p2, revision 1.1.1.1

1.1       root        1: .\"    @(#)p2  6.1 (Berkeley) 4/24/86
                      2: .\"
                      3: .SH
                      4: 3.6 I/O calls
                      5: .PP
                      6: The system calls to do I/O are designed to eliminate
                      7: the differences between the various devices and styles of
                      8: access.
                      9: There is no distinction between ``random''
                     10: and ``sequential'' I/O, nor is any logical record size imposed
                     11: by the system.
                     12: The size of an ordinary file is determined
                     13: by the number of bytes written on it;
                     14: no predetermination of the size of a file is necessary
                     15: or possible.
                     16: .PP
                     17: To illustrate the essentials of I/O,
                     18: some of the basic calls are
                     19: summarized below
                     20: in an anonymous language that will
                     21: indicate the required parameters without getting into the
                     22: underlying
                     23: complexities.
                     24: Each call to the system may potentially result in an error
                     25: return, which for simplicity is not represented
                     26: in the calling sequence.
                     27: .PP
                     28: To read or write a file assumed to exist already, it must
                     29: be opened by the following call:
                     30: .P1
                     31: filep = open\|(\|name, flag\|)
                     32: .P2
                     33: where
                     34: .UL name
                     35: indicates the name of the file.
                     36: An arbitrary path name may be given.
                     37: The
                     38: .UL flag
                     39: argument indicates whether the file is to be read, written,
                     40: or ``updated,'' that is, read and written simultaneously.
                     41: .PP
                     42: The returned value
                     43: .UL filep
                     44: is called a
                     45: .IT "file descriptor" .
                     46: It is a small integer used to identify the file
                     47: in subsequent calls to read, write,
                     48: or otherwise manipulate the file.
                     49: .PP
                     50: To create a new file or completely rewrite an old one,
                     51: there is a
                     52: .UL create
                     53: system call that
                     54: creates the given file if it does not exist,
                     55: or truncates it to zero length
                     56: if it does exist;
                     57: .UL create
                     58: also opens the new file for writing
                     59: and, like
                     60: .UL open ,
                     61: returns a file descriptor.
                     62: .PP
                     63: The file system maintains no locks visible to the user, nor is there any
                     64: restriction on the number of users who may have a file
                     65: open for reading or writing.
                     66: Although it is possible for the contents of a file
                     67: to become scrambled when two users write on it simultaneously,
                     68: in practice difficulties do not arise.
                     69: We take the view that locks are neither
                     70: necessary nor sufficient, in our environment,
                     71: to prevent interference between users of the same file.
                     72: They are unnecessary because we are not
                     73: faced with large, single-file data bases
                     74: maintained by independent processes.
                     75: They are insufficient because
                     76: locks in the ordinary sense, whereby
                     77: one user is prevented from writing on a file that another
                     78: user is reading,
                     79: cannot prevent confusion
                     80: when, for example, both users are editing
                     81: a file with an editor that makes
                     82: a copy of the file being edited.
                     83: .PP
                     84: There are, however,
                     85: sufficient internal interlocks to maintain
                     86: the logical consistency of the file system
                     87: when two users engage simultaneously in
                     88: activities such as writing on
                     89: the same file,
                     90: creating files in the same directory,
                     91: or deleting each other's open files.
                     92: .PP
                     93: Except as indicated below, reading and writing
                     94: are sequential.
                     95: This means that if a particular
                     96: byte in the file was the last byte written (or read),
                     97: the next I/O call implicitly refers to the
                     98: immediately following byte.
                     99: For each open file there is a pointer, maintained
                    100: inside the system,
                    101: that indicates the next byte to be read
                    102: or written.
                    103: If
                    104: .IT n
                    105: bytes are read or written, the pointer advances
                    106: by
                    107: .IT n
                    108: bytes.
                    109: .PP
                    110: Once a file is open, the following calls
                    111: may be used:
                    112: .P1
                    113: n = read\|(\|filep, buffer, count\|)
                    114: n = write\|(\|filep, buffer, count\|)
                    115: .P2
                    116: Up to
                    117: .UL count
                    118: bytes are transmitted between the file specified
                    119: by
                    120: .UL filep
                    121: and the byte array
                    122: specified by
                    123: .UL buffer .
                    124: The returned value
                    125: .UL n
                    126: is the number of bytes actually transmitted.
                    127: In the
                    128: .UL write
                    129: case,
                    130: .UL n
                    131: is the same as
                    132: .UL count
                    133: except under exceptional conditions, such as I/O errors or
                    134: end of physical medium on special files;
                    135: in a
                    136: .UL read ,
                    137: however,
                    138: .UL n
                    139: may without error be less than
                    140: .UL count .
                    141: If the read pointer is so near the end of the
                    142: file that reading
                    143: .UL count
                    144: characters
                    145: would cause reading beyond the end, only sufficient
                    146: bytes are transmitted to reach the end of the
                    147: file;
                    148: also, typewriter-like terminals
                    149: never return more than one line of input.
                    150: When a
                    151: .UL read
                    152: call returns with
                    153: .UL n
                    154: equal
                    155: to zero, the end of the file has been reached.
                    156: For disk files this occurs when the read pointer
                    157: becomes equal to the current
                    158: size of the file.
                    159: It is possible to generate an end-of-file
                    160: from a terminal by use of an escape
                    161: sequence that depends on the device used.
                    162: .PP
                    163: Bytes written affect only those parts of a file implied by
                    164: the position of the write pointer and the
                    165: count; no other part of the file
                    166: is changed.
                    167: If the last byte lies beyond the end of the file, the
                    168: file is made to grow as needed.
                    169: .PP
                    170: To do random (direct-access) I/O
                    171: it is only necessary to move the read or write pointer
                    172: to the appropriate location in the file.
                    173: .P1
                    174: location = lseek\|(\|filep, offset, base\|)
                    175: .P2
                    176: The pointer
                    177: associated with
                    178: .UL filep
                    179: is moved to a position
                    180: .UL offset
                    181: bytes from the beginning of the file, from the current position
                    182: of the pointer, or from the end of the file,
                    183: depending on
                    184: .UL base.
                    185: .UL \&offset
                    186: may be negative.
                    187: For some devices (e.g., paper
                    188: tape and
                    189: terminals) seek calls are
                    190: ignored.
                    191: The actual offset from the beginning of the file
                    192: to which the pointer was moved is returned
                    193: in
                    194: .UL location .
                    195: .PP
                    196: There are several additional system entries
                    197: having to do with I/O and with the file
                    198: system that will not be discussed.
                    199: For example:
                    200: close a file,
                    201: get the status of a file,
                    202: change the protection mode or the owner
                    203: of a file,
                    204: create a directory,
                    205: make a link to an existing file,
                    206: delete a file.
                    207: .SH
                    208: IV. IMPLEMENTATION OF THE FILE SYSTEM
                    209: .PP
                    210: As mentioned in Section 3.2 above, a directory entry contains
                    211: only a name for the associated file and a pointer to the
                    212: file itself.
                    213: This pointer is an integer called the
                    214: .IT i-number
                    215: (for index number)
                    216: of the file.
                    217: When the file is accessed,
                    218: its i-number is used as an index into
                    219: a system table (the
                    220: .IT i-list \|)
                    221: stored in a known
                    222: part of the device on which
                    223: the directory resides.
                    224: The entry found thereby (the file's
                    225: .IT i-node \|)
                    226: contains
                    227: the description of the file:
                    228: .IP i
                    229: the user and group-\*sID\*n of its owner
                    230: .IP ii
                    231: its protection bits
                    232: .IP iii
                    233: the physical disk or tape addresses for the file contents
                    234: .IP iv
                    235: its size
                    236: .IP v
                    237: time of creation, last use, and last modification
                    238: .IP vi
                    239: the number of links to the file, that is, the number of times it appears in a directory
                    240: .IP vii
                    241: a code indicating whether the file is a directory, an ordinary file, or a special file.
                    242: .LP
                    243: The purpose of an
                    244: .UL open
                    245: or
                    246: .UL create
                    247: system call is to turn the path name given by the user
                    248: into an i-number
                    249: by searching the explicitly or implicitly named directories.
                    250: Once a file is open,
                    251: its device, i-number, and read/write pointer are stored in a system table
                    252: indexed by the file descriptor returned by the
                    253: .UL open
                    254: or
                    255: .UL create .
                    256: Thus, during a subsequent
                    257: call to read or write the
                    258: file,
                    259: the descriptor
                    260: may be easily related to the information necessary to access the file.
                    261: .PP
                    262: When a new file is created,
                    263: an i-node is allocated for it and a directory entry is made
                    264: that contains the name of the file and the i-node
                    265: number.
                    266: Making a link to an existing file involves
                    267: creating a directory entry with the new name,
                    268: copying the i-number from the original file entry,
                    269: and incrementing the link-count field of the i-node.
                    270: Removing (deleting) a file is done by
                    271: decrementing the
                    272: link-count of the i-node specified by its directory entry
                    273: and erasing the directory entry.
                    274: If the link-count drops to 0,
                    275: any disk blocks in the file
                    276: are freed and the i-node is de-allocated.
                    277: .PP
                    278: The space on all disks that
                    279: contain a file system is divided into a number of
                    280: 512-byte
                    281: blocks logically addressed from 0 up to a limit that
                    282: depends on the device.
                    283: There is space in the i-node of each file for 13 device addresses.
                    284: For nonspecial files,
                    285: the first 10 device addresses point at the first
                    286: 10 blocks of the file.
                    287: If the file is larger than 10 blocks,
                    288: the 11 device address points to an
                    289: indirect block containing up to 128 addresses
                    290: of additional blocks in the file.
                    291: Still larger files use the twelfth device address
                    292: of the i-node to point to
                    293: a double-indirect block naming
                    294: 128 indirect blocks,
                    295: each
                    296: pointing to 128 blocks of the file.
                    297: If required,
                    298: the thirteenth device address is
                    299: a triple-indirect block.
                    300: Thus files may conceptually grow to
                    301: [\|(10+128+128\u\s62\s0\d+128\u\s63\s0\d)\*m512\|] bytes.
                    302: Once opened,
                    303: bytes numbered below 5120 can be read with a single
                    304: disk access;
                    305: bytes in the range 5120 to 70,656
                    306: require two accesses;
                    307: bytes in the range 70,656
                    308: to 8,459,264
                    309: require three accesses;
                    310: bytes from there to the
                    311: largest file
                    312: (1,082,201,088)
                    313: require four accesses.
                    314: In practice,
                    315: a device cache mechanism
                    316: (see below)
                    317: proves effective in eliminating
                    318: most of the indirect fetches.
                    319: .PP
                    320: The foregoing discussion applies to ordinary files.
                    321: When an I/O request is made to a file whose i-node indicates that it
                    322: is special,
                    323: the last 12 device address words are immaterial,
                    324: and the first specifies
                    325: an internal
                    326: .IT "device name" ,
                    327: which is interpreted as a pair of numbers
                    328: representing,
                    329: respectively, a device type
                    330: and subdevice number.
                    331: The device type indicates which
                    332: system routine will deal with I/O on that device;
                    333: the subdevice number selects, for example, a disk drive
                    334: attached to a particular controller or one of several
                    335: similar terminal interfaces.
                    336: .PP
                    337: In this environment, the implementation of the
                    338: .UL mount
                    339: system call (Section 3.4) is quite straightforward.
                    340: .UL \&mount
                    341: maintains a system table whose
                    342: argument is the i-number and device name of the
                    343: ordinary file specified
                    344: during the
                    345: .UL mount ,
                    346: and whose corresponding value is the
                    347: device name of the indicated special file.
                    348: This table is searched for each i-number/device pair
                    349: that turns up while a path name is being scanned
                    350: during an
                    351: .UL open
                    352: or
                    353: .UL create ;
                    354: if a match is found,
                    355: the i-number is replaced by the i-number of the root
                    356: directory
                    357: and the device name is replaced by the table value.
                    358: .PP
                    359: To the user, both reading and writing of files appear to
                    360: be synchronous and unbuffered.
                    361: That is, immediately after
                    362: return from a
                    363: .UL read
                    364: call the data are available; conversely,
                    365: after a
                    366: .UL write
                    367: the user's workspace may be reused.
                    368: In fact, the system maintains a rather complicated
                    369: buffering mechanism that reduces greatly the number
                    370: of I/O operations required to access a file.
                    371: Suppose a
                    372: .UL write
                    373: call is made specifying transmission
                    374: of a single byte.
                    375: The system
                    376: will search its buffers to see
                    377: whether the affected disk block currently resides in main memory;
                    378: if not, it will be read in from the device.
                    379: Then the affected byte is replaced in the buffer and an
                    380: entry is made in a list of blocks to be written.
                    381: The return from the
                    382: .UL write
                    383: call may then take place,
                    384: although the actual I/O may not be completed until a later time.
                    385: Conversely, if a single byte is read, the system determines
                    386: whether the secondary storage block in which the byte is located is already
                    387: in one of the system's buffers; if so, the byte can be returned immediately.
                    388: If not, the block is read into a buffer and the byte picked out.
                    389: .PP
                    390: The system recognizes when
                    391: a program has
                    392: made accesses to
                    393: sequential blocks of a file,
                    394: and asynchronously
                    395: pre-reads the next block.
                    396: This significantly reduces
                    397: the running time of most programs
                    398: while adding little to
                    399: system overhead.
                    400: .PP
                    401: A program that reads or writes files in units of 512 bytes
                    402: has an advantage over a program that reads or writes
                    403: a single byte at a time, but the gain is not immense;
                    404: it comes mainly from the avoidance of system overhead.
                    405: If a program is used rarely or does
                    406: no great volume of I/O, it may quite reasonably
                    407: read and write in units as small as it wishes.
                    408: .PP
                    409: The notion of the i-list is an unusual feature
                    410: of
                    411: .UX .
                    412: In practice, this method of organizing the file system
                    413: has proved quite reliable and easy to deal with.
                    414: To the system itself, one of its strengths is
                    415: the fact that each file has a short, unambiguous name
                    416: related in a simple way to the protection, addressing,
                    417: and other information needed to access the file.
                    418: It also permits a quite simple and rapid
                    419: algorithm for checking the consistency of a file system,
                    420: for example, verification
                    421: that the portions of each device containing useful information
                    422: and those free to be allocated are disjoint and together
                    423: exhaust the space on the device.
                    424: This algorithm is independent
                    425: of the directory hierarchy, because it need only scan
                    426: the linearly organized i-list.
                    427: At the same time the notion of the i-list induces certain
                    428: peculiarities not found in other file system organizations.
                    429: For example, there is the question of who is to be charged
                    430: for the space a file occupies,
                    431: because all directory entries for a file have equal status.
                    432: Charging the owner of a file is unfair in general,
                    433: for one user may create a file, another may link to
                    434: it, and the first user may delete the file.
                    435: The first user is still the owner of the
                    436: file, but it should be charged
                    437: to the second user.
                    438: The simplest reasonably fair algorithm
                    439: seems to be to spread the charges
                    440: equally among users who have links to a file.
                    441: Many installations
                    442: avoid the
                    443: issue by not charging any fees at all.
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.