Annotation of 43BSDReno/share/doc/ps2/05.iosys/iosys, revision 1.1

1.1     ! root        1: .\"    @(#)iosys       6.1 (Berkeley) 4/29/86
        !             2: .\"
        !             3: .EH 'PS2:5-%''The UNIX I/O System'
        !             4: .OH 'The UNIX I/O System''PS2:5-%'
        !             5: .TL
        !             6: The UNIX I/O System
        !             7: .AU
        !             8: Dennis M. Ritchie
        !             9: .AI
        !            10: .MH
        !            11: .PP
        !            12: This paper gives an overview of the workings of the UNIX\(dg
        !            13: .FS
        !            14: \(dgUNIX is a Trademark of Bell Laboratories.
        !            15: .FE
        !            16: I/O system.
        !            17: It was written with an eye toward providing
        !            18: guidance to writers of device driver routines,
        !            19: and is oriented more toward describing the environment
        !            20: and nature of device drivers than the implementation
        !            21: of that part of the file system which deals with
        !            22: ordinary files.
        !            23: .PP
        !            24: It is assumed that the reader has a good knowledge
        !            25: of the overall structure of the file system as discussed
        !            26: in the paper ``The UNIX Time-sharing System.''
        !            27: A more detailed discussion
        !            28: appears in
        !            29: ``UNIX Implementation;''
        !            30: the current document restates parts of that one,
        !            31: but is still more detailed.
        !            32: It is most useful in
        !            33: conjunction with a copy of the system code,
        !            34: since it is basically an exegesis of that code.
        !            35: .SH
        !            36: Device Classes
        !            37: .PP
        !            38: There are two classes of device:
        !            39: .I block
        !            40: and
        !            41: .I character.
        !            42: The block interface is suitable for devices
        !            43: like disks, tapes, and DECtape
        !            44: which work, or can work, with addressible 512-byte blocks.
        !            45: Ordinary magnetic tape just barely fits in this category,
        !            46: since by use of forward
        !            47: and
        !            48: backward spacing any block can be read, even though
        !            49: blocks can be written only at the end of the tape.
        !            50: Block devices can at least potentially contain a mounted
        !            51: file system.
        !            52: The interface to block devices is very highly structured;
        !            53: the drivers for these devices share a great many routines
        !            54: as well as a pool of buffers.
        !            55: .PP
        !            56: Character-type devices have a much
        !            57: more straightforward interface, although
        !            58: more work must be done by the driver itself.
        !            59: .PP
        !            60: Devices of both types are named by a
        !            61: .I major
        !            62: and a
        !            63: .I minor
        !            64: device number.
        !            65: These numbers are generally stored as an integer
        !            66: with the minor device number
        !            67: in the low-order 8 bits and the major device number
        !            68: in the next-higher 8 bits;
        !            69: macros
        !            70: .I major
        !            71: and
        !            72: .I minor
        !            73: are available to access these numbers.
        !            74: The major device number selects which driver will deal with
        !            75: the device; the minor device number is not used
        !            76: by the rest of the system but is passed to the
        !            77: driver at appropriate times.
        !            78: Typically the minor number
        !            79: selects a subdevice attached to
        !            80: a given controller, or one of
        !            81: several similar hardware interfaces.
        !            82: .PP
        !            83: The major device numbers for block and character devices
        !            84: are used as indices in separate tables;
        !            85: they both start at 0 and therefore overlap.
        !            86: .SH
        !            87: Overview of I/O
        !            88: .PP
        !            89: The purpose of
        !            90: the
        !            91: .I open
        !            92: and
        !            93: .I creat
        !            94: system calls is to set up entries in three separate
        !            95: system tables.
        !            96: The first of these is the
        !            97: .I u_ofile
        !            98: table,
        !            99: which is stored in the system's per-process
        !           100: data area
        !           101: .I u.
        !           102: This table is indexed by
        !           103: the file descriptor returned by the
        !           104: .I open
        !           105: or
        !           106: .I creat,
        !           107: and is accessed during
        !           108: a
        !           109: .I read,
        !           110: .I write,
        !           111: or other operation on the open file.
        !           112: An entry contains only
        !           113: a pointer to the corresponding
        !           114: entry of the
        !           115: .I file
        !           116: table,
        !           117: which is a per-system data base.
        !           118: There is one entry in the
        !           119: .I file
        !           120: table for each
        !           121: instance of
        !           122: .I open
        !           123: or
        !           124: .I creat.
        !           125: This table is per-system because the same instance
        !           126: of an open file must be shared among the several processes
        !           127: which can result from
        !           128: .I forks
        !           129: after the file is opened.
        !           130: A
        !           131: .I file
        !           132: table entry contains
        !           133: flags which indicate whether the file
        !           134: was open for reading or writing or is a pipe, and
        !           135: a count which is used to decide when all processes
        !           136: using the entry have terminated or closed the file
        !           137: (so the entry can be abandoned).
        !           138: There is also a 32-bit file offset
        !           139: which is used to indicate where in the file the next read
        !           140: or write will take place.
        !           141: Finally, there is a pointer to the
        !           142: entry for the file in the
        !           143: .I inode
        !           144: table,
        !           145: which contains a copy of the file's i-node.
        !           146: .PP
        !           147: Certain open files can be designated ``multiplexed''
        !           148: files, and several other flags apply to such
        !           149: channels.
        !           150: In such a case, instead of an offset,
        !           151: there is a pointer to an associated multiplex channel table.
        !           152: Multiplex channels will not be discussed here.
        !           153: .PP
        !           154: An entry in the
        !           155: .I file
        !           156: table corresponds precisely to an instance of
        !           157: .I open
        !           158: or
        !           159: .I creat;
        !           160: if the same file is opened several times,
        !           161: it will have several
        !           162: entries in this table.
        !           163: However,
        !           164: there is at most one entry
        !           165: in the
        !           166: .I inode
        !           167: table for a given file.
        !           168: Also, a file may enter the
        !           169: .I inode
        !           170: table not only because it is open,
        !           171: but also because it is the current directory
        !           172: of some process or because it
        !           173: is a special file containing a currently-mounted
        !           174: file system.
        !           175: .PP
        !           176: An entry in the
        !           177: .I inode
        !           178: table differs somewhat from the
        !           179: corresponding i-node as stored on the disk;
        !           180: the modified and accessed times are not stored,
        !           181: and the entry is augmented
        !           182: by a flag word containing information about the entry,
        !           183: a count used to determine when it may be
        !           184: allowed to disappear,
        !           185: and the device and i-number
        !           186: whence the entry came.
        !           187: Also, the several block numbers that give addressing
        !           188: information for the file are expanded from
        !           189: the 3-byte, compressed format used on the disk to full
        !           190: .I long
        !           191: quantities.
        !           192: .PP
        !           193: During the processing of an
        !           194: .I open
        !           195: or
        !           196: .I creat
        !           197: call for a special file,
        !           198: the system always calls the device's
        !           199: .I open
        !           200: routine to allow for any special processing
        !           201: required (rewinding a tape, turning on
        !           202: the data-terminal-ready lead of a modem, etc.).
        !           203: However,
        !           204: the
        !           205: .I close
        !           206: routine is called only when the last
        !           207: process closes a file,
        !           208: that is, when the i-node table entry
        !           209: is being deallocated.
        !           210: Thus it is not feasible
        !           211: for a device to maintain, or depend on,
        !           212: a count of its users, although it is quite
        !           213: possible to
        !           214: implement an exclusive-use device which cannot
        !           215: be reopened until it has been closed.
        !           216: .PP
        !           217: When a
        !           218: .I read
        !           219: or
        !           220: .I write
        !           221: takes place,
        !           222: the user's arguments
        !           223: and the
        !           224: .I file
        !           225: table entry are used to set up the
        !           226: variables
        !           227: .I u.u_base,
        !           228: .I u.u_count,
        !           229: and
        !           230: .I u.u_offset
        !           231: which respectively contain the (user) address
        !           232: of the I/O target area, the byte-count for the transfer,
        !           233: and the current location in the file.
        !           234: If the file referred to is
        !           235: a character-type special file, the appropriate read
        !           236: or write routine is called; it is responsible
        !           237: for transferring data and updating the
        !           238: count and current location appropriately
        !           239: as discussed below.
        !           240: Otherwise, the current location is used to calculate
        !           241: a logical block number in the file.
        !           242: If the file is an ordinary file the logical block
        !           243: number must be mapped (possibly using indirect blocks)
        !           244: to a physical block number; a block-type
        !           245: special file need not be mapped.
        !           246: This mapping is performed by the
        !           247: .I bmap
        !           248: routine.
        !           249: In any event, the resulting physical block number
        !           250: is used, as discussed below, to
        !           251: read or write the appropriate device.
        !           252: .SH
        !           253: Character Device Drivers
        !           254: .PP
        !           255: The
        !           256: .I cdevsw
        !           257: table specifies the interface routines present for
        !           258: character devices.
        !           259: Each device provides five routines:
        !           260: open, close, read, write, and special-function
        !           261: (to implement the
        !           262: .I ioctl
        !           263: system call).
        !           264: Any of these may be missing.
        !           265: If a call on the routine
        !           266: should be ignored,
        !           267: (e.g.
        !           268: .I open
        !           269: on non-exclusive devices that require no setup)
        !           270: the
        !           271: .I cdevsw
        !           272: entry can be given as
        !           273: .I nulldev;
        !           274: if it should be considered an error,
        !           275: (e.g.
        !           276: .I write
        !           277: on read-only devices)
        !           278: .I nodev
        !           279: is used.
        !           280: For terminals,
        !           281: the
        !           282: .I cdevsw
        !           283: structure also contains a pointer to the
        !           284: .I tty
        !           285: structure associated with the terminal.
        !           286: .PP
        !           287: The
        !           288: .I open
        !           289: routine is called each time the file
        !           290: is opened with the full device number as argument.
        !           291: The second argument is a flag which is
        !           292: non-zero only if the device is to be written upon.
        !           293: .PP
        !           294: The
        !           295: .I close
        !           296: routine is called only when the file
        !           297: is closed for the last time,
        !           298: that is when the very last process in
        !           299: which the file is open closes it.
        !           300: This means it is not possible for the driver to
        !           301: maintain its own count of its users.
        !           302: The first argument is the device number;
        !           303: the second is a flag which is non-zero
        !           304: if the file was open for writing in the process which
        !           305: performs the final
        !           306: .I close.
        !           307: .PP
        !           308: When
        !           309: .I write
        !           310: is called, it is supplied the device
        !           311: as argument.
        !           312: The per-user variable
        !           313: .I u.u_count
        !           314: has been set to
        !           315: the number of characters indicated by the user;
        !           316: for character devices, this number may be 0
        !           317: initially.
        !           318: .I u.u_base
        !           319: is the address supplied by the user from which to start
        !           320: taking characters.
        !           321: The system may call the
        !           322: routine internally, so the
        !           323: flag
        !           324: .I u.u_segflg
        !           325: is supplied that indicates,
        !           326: if
        !           327: .I on,
        !           328: that
        !           329: .I u.u_base
        !           330: refers to the system address space instead of
        !           331: the user's.
        !           332: .PP
        !           333: The
        !           334: .I write
        !           335: routine
        !           336: should copy up to
        !           337: .I u.u_count
        !           338: characters from the user's buffer to the device,
        !           339: decrementing
        !           340: .I u.u_count
        !           341: for each character passed.
        !           342: For most drivers, which work one character at a time,
        !           343: the routine
        !           344: .I "cpass( )"
        !           345: is used to pick up characters
        !           346: from the user's buffer.
        !           347: Successive calls on it return
        !           348: the characters to be written until
        !           349: .I u.u_count
        !           350: goes to 0 or an error occurs,
        !           351: when it returns \(mi1.
        !           352: .I Cpass
        !           353: takes care of interrogating
        !           354: .I u.u_segflg
        !           355: and updating
        !           356: .I u.u_count.
        !           357: .PP
        !           358: Write routines which want to transfer
        !           359: a probably large number of characters into an internal
        !           360: buffer may also use the routine
        !           361: .I "iomove(buffer, offset, count, flag)"
        !           362: which is faster when many characters must be moved.
        !           363: .I Iomove
        !           364: transfers up to
        !           365: .I count
        !           366: characters into the
        !           367: .I buffer
        !           368: starting
        !           369: .I offset
        !           370: bytes from the start of the buffer;
        !           371: .I flag
        !           372: should be
        !           373: .I B_WRITE
        !           374: (which is 0) in the write case.
        !           375: Caution:
        !           376: the caller is responsible for making sure
        !           377: the count is not too large and is non-zero.
        !           378: As an efficiency note,
        !           379: .I iomove
        !           380: is much slower if any of
        !           381: .I "buffer+offset, count"
        !           382: or
        !           383: .I u.u_base
        !           384: is odd.
        !           385: .PP
        !           386: The device's
        !           387: .I read
        !           388: routine is called under conditions similar to
        !           389: .I write,
        !           390: except that
        !           391: .I u.u_count
        !           392: is guaranteed to be non-zero.
        !           393: To return characters to the user, the routine
        !           394: .I "passc(c)"
        !           395: is available; it takes care of housekeeping
        !           396: like
        !           397: .I cpass
        !           398: and returns \(mi1 as the last character
        !           399: specified by
        !           400: .I u.u_count
        !           401: is returned to the user;
        !           402: before that time, 0 is returned.
        !           403: .I Iomove
        !           404: is also usable as with
        !           405: .I write;
        !           406: the flag should be
        !           407: .I B_READ
        !           408: but the same cautions apply.
        !           409: .PP
        !           410: The ``special-functions'' routine
        !           411: is invoked by the
        !           412: .I stty
        !           413: and
        !           414: .I gtty
        !           415: system calls as follows:
        !           416: .I "(*p) (dev, v)"
        !           417: where
        !           418: .I p
        !           419: is a pointer to the device's routine,
        !           420: .I dev
        !           421: is the device number,
        !           422: and
        !           423: .I v
        !           424: is a vector.
        !           425: In the
        !           426: .I gtty
        !           427: case,
        !           428: the device is supposed to place up to 3 words of status information
        !           429: into the vector; this will be returned to the caller.
        !           430: In the
        !           431: .I stty
        !           432: case,
        !           433: .I v
        !           434: is 0;
        !           435: the device should take up to 3 words of
        !           436: control information from
        !           437: the array
        !           438: .I "u.u_arg[0...2]."
        !           439: .PP
        !           440: Finally, each device should have appropriate interrupt-time
        !           441: routines.
        !           442: When an interrupt occurs, it is turned into a C-compatible call
        !           443: on the devices's interrupt routine.
        !           444: The interrupt-catching mechanism makes
        !           445: the low-order four bits of the ``new PS'' word in the
        !           446: trap vector for the interrupt available
        !           447: to the interrupt handler.
        !           448: This is conventionally used by drivers
        !           449: which deal with multiple similar devices
        !           450: to encode the minor device number.
        !           451: After the interrupt has been processed,
        !           452: a return from the interrupt handler will
        !           453: return from the interrupt itself.
        !           454: .PP
        !           455: A number of subroutines are available which are useful
        !           456: to character device drivers.
        !           457: Most of these handlers, for example, need a place
        !           458: to buffer characters in the internal interface
        !           459: between their ``top half'' (read/write)
        !           460: and ``bottom half'' (interrupt) routines.
        !           461: For relatively low data-rate devices, the best mechanism
        !           462: is the character queue maintained by the
        !           463: routines
        !           464: .I getc
        !           465: and
        !           466: .I putc.
        !           467: A queue header has the structure
        !           468: .DS
        !           469: struct {
        !           470:        int     c_cc;   /* character count */
        !           471:        char    *c_cf;  /* first character */
        !           472:        char    *c_cl;  /* last character */
        !           473: } queue;
        !           474: .DE
        !           475: A character is placed on the end of a queue by
        !           476: .I "putc(c, &queue)"
        !           477: where
        !           478: .I c
        !           479: is the character and
        !           480: .I queue
        !           481: is the queue header.
        !           482: The routine returns \(mi1 if there is no space
        !           483: to put the character, 0 otherwise.
        !           484: The first character on the queue may be retrieved
        !           485: by
        !           486: .I "getc(&queue)"
        !           487: which returns either the (non-negative) character
        !           488: or \(mi1 if the queue is empty.
        !           489: .PP
        !           490: Notice that the space for characters in queues is
        !           491: shared among all devices in the system
        !           492: and in the standard system there are only some 600
        !           493: character slots available.
        !           494: Thus device handlers,
        !           495: especially write routines, must take
        !           496: care to avoid gobbling up excessive numbers of characters.
        !           497: .PP
        !           498: The other major help available
        !           499: to device handlers is the sleep-wakeup mechanism.
        !           500: The call
        !           501: .I "sleep(event, priority)"
        !           502: causes the process to wait (allowing other processes to run)
        !           503: until the
        !           504: .I event
        !           505: occurs;
        !           506: at that time, the process is marked ready-to-run
        !           507: and the call will return when there is no
        !           508: process with higher
        !           509: .I priority.
        !           510: .PP
        !           511: The call
        !           512: .I "wakeup(event)"
        !           513: indicates that the
        !           514: .I event
        !           515: has happened, that is, causes processes sleeping
        !           516: on the event to be awakened.
        !           517: The
        !           518: .I event
        !           519: is an arbitrary quantity agreed upon
        !           520: by the sleeper and the waker-up.
        !           521: By convention, it is the address of some data area used
        !           522: by the driver, which guarantees that events
        !           523: are unique.
        !           524: .PP
        !           525: Processes sleeping on an event should not assume
        !           526: that the event has really happened;
        !           527: they should check that the conditions which
        !           528: caused them to sleep no longer hold.
        !           529: .PP
        !           530: Priorities can range from 0 to 127;
        !           531: a higher numerical value indicates a less-favored
        !           532: scheduling situation.
        !           533: A distinction is made between processes sleeping
        !           534: at priority less than the parameter
        !           535: .I PZERO
        !           536: and those at numerically larger priorities.
        !           537: The former cannot
        !           538: be interrupted by signals, although it
        !           539: is conceivable that it may be swapped out.
        !           540: Thus it is a bad idea to sleep with
        !           541: priority less than PZERO on an event which might never occur.
        !           542: On the other hand, calls to
        !           543: .I sleep
        !           544: with larger priority
        !           545: may never return if the process is terminated by
        !           546: some signal in the meantime.
        !           547: Incidentally, it is a gross error to call
        !           548: .I sleep
        !           549: in a routine called at interrupt time, since the process
        !           550: which is running is almost certainly not the
        !           551: process which should go to sleep.
        !           552: Likewise, none of the variables in the user area
        !           553: ``\fIu\fB.\fR''
        !           554: should be touched, let alone changed, by an interrupt routine.
        !           555: .PP
        !           556: If a device driver
        !           557: wishes to wait for some event for which it is inconvenient
        !           558: or impossible to supply a
        !           559: .I wakeup,
        !           560: (for example, a device going on-line, which does not
        !           561: generally cause an interrupt),
        !           562: the call
        !           563: .I "sleep(&lbolt, priority)
        !           564: may be given.
        !           565: .I Lbolt
        !           566: is an external cell whose address is awakened once every 4 seconds
        !           567: by the clock interrupt routine.
        !           568: .PP
        !           569: The routines
        !           570: .I "spl4( ), spl5( ), spl6( ), spl7( )"
        !           571: are available to
        !           572: set the processor priority level as indicated to avoid
        !           573: inconvenient interrupts from the device.
        !           574: .PP
        !           575: If a device needs to know about real-time intervals,
        !           576: then
        !           577: .I "timeout(func, arg, interval)
        !           578: will be useful.
        !           579: This routine arranges that after
        !           580: .I interval
        !           581: sixtieths of a second, the
        !           582: .I func
        !           583: will be called with
        !           584: .I arg
        !           585: as argument, in the style
        !           586: .I "(*func)(arg).
        !           587: Timeouts are used, for example,
        !           588: to provide real-time delays after function characters
        !           589: like new-line and tab in typewriter output,
        !           590: and to terminate an attempt to
        !           591: read the 201 Dataphone
        !           592: .I dp
        !           593: if there is no response within a specified number
        !           594: of seconds.
        !           595: Notice that the number of sixtieths of a second is limited to 32767,
        !           596: since it must appear to be positive,
        !           597: and that only a bounded number of timeouts
        !           598: can be going on at once.
        !           599: Also, the specified
        !           600: .I func
        !           601: is called at clock-interrupt time, so it should
        !           602: conform to the requirements of interrupt routines
        !           603: in general.
        !           604: .SH
        !           605: The Block-device Interface
        !           606: .PP
        !           607: Handling of block devices is mediated by a collection
        !           608: of routines that manage a set of buffers containing
        !           609: the images of blocks of data on the various devices.
        !           610: The most important purpose of these routines is to assure
        !           611: that several processes that access the same block of the same
        !           612: device in multiprogrammed fashion maintain a consistent
        !           613: view of the data in the block.
        !           614: A secondary but still important purpose is to increase
        !           615: the efficiency of the system by
        !           616: keeping in-core copies of blocks that are being
        !           617: accessed frequently.
        !           618: The main data base for this mechanism is the
        !           619: table of buffers
        !           620: .I buf.
        !           621: Each buffer header contains a pair of pointers
        !           622: .I "(b_forw, b_back)"
        !           623: which maintain a doubly-linked list
        !           624: of the buffers associated with a particular
        !           625: block device, and a
        !           626: pair of pointers
        !           627: .I "(av_forw, av_back)"
        !           628: which generally maintain a doubly-linked list of blocks
        !           629: which are ``free,'' that is,
        !           630: eligible to be reallocated for another transaction.
        !           631: Buffers that have I/O in progress
        !           632: or are busy for other purposes do not appear in this list.
        !           633: The buffer header
        !           634: also contains the device and block number to which the
        !           635: buffer refers, and a pointer to the actual storage associated with
        !           636: the buffer.
        !           637: There is a word count
        !           638: which is the negative of the number of words
        !           639: to be transferred to or from the buffer;
        !           640: there is also an error byte and a residual word
        !           641: count used to communicate information
        !           642: from an I/O routine to its caller.
        !           643: Finally, there is a flag word
        !           644: with bits indicating the status of the buffer.
        !           645: These flags will be discussed below.
        !           646: .PP
        !           647: Seven routines constitute
        !           648: the most important part of the interface with the
        !           649: rest of the system.
        !           650: Given a device and block number,
        !           651: both
        !           652: .I bread
        !           653: and
        !           654: .I getblk
        !           655: return a pointer to a buffer header for the block;
        !           656: the difference is that
        !           657: .I bread
        !           658: is guaranteed to return a buffer actually containing the
        !           659: current data for the block,
        !           660: while
        !           661: .I getblk
        !           662: returns a buffer which contains the data in the
        !           663: block only if it is already in core (whether it is
        !           664: or not is indicated by the
        !           665: .I B_DONE
        !           666: bit; see below).
        !           667: In either case the buffer, and the corresponding
        !           668: device block, is made ``busy,''
        !           669: so that other processes referring to it
        !           670: are obliged to wait until it becomes free.
        !           671: .I Getblk
        !           672: is used, for example,
        !           673: when a block is about to be totally rewritten,
        !           674: so that its previous contents are
        !           675: not useful;
        !           676: still, no other process can be allowed to refer to the block
        !           677: until the new data is placed into it.
        !           678: .PP
        !           679: The
        !           680: .I breada
        !           681: routine is used to implement read-ahead.
        !           682: it is logically similar to
        !           683: .I bread,
        !           684: but takes as an additional argument the number of
        !           685: a block (on the same device) to be read asynchronously
        !           686: after the specifically requested block is available.
        !           687: .PP
        !           688: Given a pointer to a buffer,
        !           689: the
        !           690: .I brelse
        !           691: routine
        !           692: makes the buffer again available to other processes.
        !           693: It is called, for example, after
        !           694: data has been extracted following a
        !           695: .I bread.
        !           696: There are three subtly-different write routines,
        !           697: all of which take a buffer pointer as argument,
        !           698: and all of which logically release the buffer for
        !           699: use by others and place it on the free list.
        !           700: .I Bwrite
        !           701: puts the
        !           702: buffer on the appropriate device queue,
        !           703: waits for the write to be done,
        !           704: and sets the user's error flag if required.
        !           705: .I Bawrite
        !           706: places the buffer on the device's queue, but does not wait
        !           707: for completion, so that errors cannot be reflected directly to
        !           708: the user.
        !           709: .I Bdwrite
        !           710: does not start any I/O operation at all,
        !           711: but merely marks
        !           712: the buffer so that if it happens
        !           713: to be grabbed from the free list to contain
        !           714: data from some other block, the data in it will
        !           715: first be written
        !           716: out.
        !           717: .PP
        !           718: .I Bwrite
        !           719: is used when one wants to be sure that
        !           720: I/O takes place correctly, and that
        !           721: errors are reflected to the proper user;
        !           722: it is used, for example, when updating i-nodes.
        !           723: .I Bawrite
        !           724: is useful when more overlap is desired
        !           725: (because no wait is required for I/O to finish)
        !           726: but when it is reasonably certain that the
        !           727: write is really required.
        !           728: .I Bdwrite
        !           729: is used when there is doubt that the write is
        !           730: needed at the moment.
        !           731: For example,
        !           732: .I bdwrite
        !           733: is called when the last byte of a
        !           734: .I write
        !           735: system call falls short of the end of a
        !           736: block, on the assumption that
        !           737: another
        !           738: .I write
        !           739: will be given soon which will re-use the same block.
        !           740: On the other hand,
        !           741: as the end of a block is passed,
        !           742: .I bawrite
        !           743: is called, since probably the block will
        !           744: not be accessed again soon and one might as
        !           745: well start the writing process as soon as possible.
        !           746: .PP
        !           747: In any event, notice that the routines
        !           748: .I "getblk"
        !           749: and
        !           750: .I bread
        !           751: dedicate the given block exclusively to the
        !           752: use of the caller, and make others wait,
        !           753: while one of
        !           754: .I "brelse, bwrite, bawrite,"
        !           755: or
        !           756: .I bdwrite
        !           757: must eventually be called to free the block for use by others.
        !           758: .PP
        !           759: As mentioned, each buffer header contains a flag
        !           760: word which indicates the status of the buffer.
        !           761: Since they provide
        !           762: one important channel for information between the drivers and the
        !           763: block I/O system, it is important to understand these flags.
        !           764: The following names are manifest constants which
        !           765: select the associated flag bits.
        !           766: .IP B_READ 10
        !           767: This bit is set when the buffer is handed to the device strategy routine
        !           768: (see below) to indicate a read operation.
        !           769: The symbol
        !           770: .I B_WRITE
        !           771: is defined as 0 and does not define a flag; it is provided
        !           772: as a mnemonic convenience to callers of routines like
        !           773: .I swap
        !           774: which have a separate argument
        !           775: which indicates read or write.
        !           776: .IP B_DONE 10
        !           777: This bit is set
        !           778: to 0 when a block is handed to the the device strategy
        !           779: routine and is turned on when the operation completes,
        !           780: whether normally as the result of an error.
        !           781: It is also used as part of the return argument of
        !           782: .I getblk
        !           783: to indicate if 1 that the returned
        !           784: buffer actually contains the data in the requested block.
        !           785: .IP B_ERROR 10
        !           786: This bit may be set to 1 when
        !           787: .I B_DONE
        !           788: is set to indicate that an I/O or other error occurred.
        !           789: If it is set the
        !           790: .I b_error
        !           791: byte of the buffer header may contain an error code
        !           792: if it is non-zero.
        !           793: If
        !           794: .I b_error
        !           795: is 0 the nature of the error is not specified.
        !           796: Actually no driver at present sets
        !           797: .I b_error;
        !           798: the latter is provided for a future improvement
        !           799: whereby a more detailed error-reporting
        !           800: scheme may be implemented.
        !           801: .IP B_BUSY 10
        !           802: This bit indicates that the buffer header is not on
        !           803: the free list, i.e. is
        !           804: dedicated to someone's exclusive use.
        !           805: The buffer still remains attached to the list of
        !           806: blocks associated with its device, however.
        !           807: When
        !           808: .I getblk
        !           809: (or
        !           810: .I bread,
        !           811: which calls it) searches the buffer list
        !           812: for a given device and finds the requested
        !           813: block with this bit on, it sleeps until the bit
        !           814: clears.
        !           815: .IP B_PHYS 10
        !           816: This bit is set for raw I/O transactions that
        !           817: need to allocate the Unibus map on an 11/70.
        !           818: .IP B_MAP 10
        !           819: This bit is set on buffers that have the Unibus map allocated,
        !           820: so that the
        !           821: .I iodone
        !           822: routine knows to deallocate the map.
        !           823: .IP B_WANTED 10
        !           824: This flag is used in conjunction with the
        !           825: .I B_BUSY
        !           826: bit.
        !           827: Before sleeping as described
        !           828: just above,
        !           829: .I getblk
        !           830: sets this flag.
        !           831: Conversely, when the block is freed and the busy bit
        !           832: goes down (in
        !           833: .I brelse)
        !           834: a
        !           835: .I wakeup
        !           836: is given for the block header whenever
        !           837: .I B_WANTED
        !           838: is on.
        !           839: This strategem avoids the overhead
        !           840: of having to call
        !           841: .I wakeup
        !           842: every time a buffer is freed on the chance that someone
        !           843: might want it.
        !           844: .IP B_AGE
        !           845: This bit may be set on buffers just before releasing them; if it
        !           846: is on,
        !           847: the buffer is placed at the head of the free list, rather than at the
        !           848: tail.
        !           849: It is a performance heuristic
        !           850: used when the caller judges that the same block will not soon be used again.
        !           851: .IP B_ASYNC 10
        !           852: This bit is set by
        !           853: .I bawrite
        !           854: to indicate to the appropriate device driver
        !           855: that the buffer should be released when the
        !           856: write has been finished, usually at interrupt time.
        !           857: The difference between
        !           858: .I bwrite
        !           859: and
        !           860: .I bawrite
        !           861: is that the former starts I/O, waits until it is done, and
        !           862: frees the buffer.
        !           863: The latter merely sets this bit and starts I/O.
        !           864: The bit indicates that
        !           865: .I relse
        !           866: should be called for the buffer on completion.
        !           867: .IP B_DELWRI 10
        !           868: This bit is set by
        !           869: .I bdwrite
        !           870: before releasing the buffer.
        !           871: When
        !           872: .I getblk,
        !           873: while searching for a free block,
        !           874: discovers the bit is 1 in a buffer it would otherwise grab,
        !           875: it causes the block to be written out before reusing it.
        !           876: .SH
        !           877: Block Device Drivers
        !           878: .PP
        !           879: The
        !           880: .I bdevsw
        !           881: table contains the names of the interface routines
        !           882: and that of a table for each block device.
        !           883: .PP
        !           884: Just as for character devices, block device drivers may supply
        !           885: an
        !           886: .I open
        !           887: and a
        !           888: .I close
        !           889: routine
        !           890: called respectively on each open and on the final close
        !           891: of the device.
        !           892: Instead of separate read and write routines,
        !           893: each block device driver has a
        !           894: .I strategy
        !           895: routine which is called with a pointer to a buffer
        !           896: header as argument.
        !           897: As discussed, the buffer header contains
        !           898: a read/write flag, the core address,
        !           899: the block number, a (negative) word count,
        !           900: and the major and minor device number.
        !           901: The role of the strategy routine
        !           902: is to carry out the operation as requested by the
        !           903: information in the buffer header.
        !           904: When the transaction is complete the
        !           905: .I B_DONE
        !           906: (and possibly the
        !           907: .I B_ERROR)
        !           908: bits should be set.
        !           909: Then if the
        !           910: .I B_ASYNC
        !           911: bit is set,
        !           912: .I brelse
        !           913: should be called;
        !           914: otherwise,
        !           915: .I wakeup.
        !           916: In cases where the device
        !           917: is capable, under error-free operation,
        !           918: of transferring fewer words than requested,
        !           919: the device's word-count register should be placed
        !           920: in the residual count slot of
        !           921: the buffer header;
        !           922: otherwise, the residual count should be set to 0.
        !           923: This particular mechanism is really for the benefit
        !           924: of the magtape driver;
        !           925: when reading this device
        !           926: records shorter than requested are quite normal,
        !           927: and the user should be told the actual length of the record.
        !           928: .PP
        !           929: Although the most usual argument
        !           930: to the strategy routines
        !           931: is a genuine buffer header allocated as discussed above,
        !           932: all that is actually required
        !           933: is that the argument be a pointer to a place containing the
        !           934: appropriate information.
        !           935: For example the
        !           936: .I swap
        !           937: routine, which manages movement
        !           938: of core images to and from the swapping device,
        !           939: uses the strategy routine
        !           940: for this device.
        !           941: Care has to be taken that
        !           942: no extraneous bits get turned on in the
        !           943: flag word.
        !           944: .PP
        !           945: The device's table specified by
        !           946: .I bdevsw
        !           947: has a
        !           948: byte to contain an active flag and an error count,
        !           949: a pair of links which constitute the
        !           950: head of the chain of buffers for the device
        !           951: .I "(b_forw, b_back),"
        !           952: and a first and last pointer for a device queue.
        !           953: Of these things, all are used solely by the device driver
        !           954: itself
        !           955: except for the buffer-chain pointers.
        !           956: Typically the flag encodes the state of the
        !           957: device, and is used at a minimum to
        !           958: indicate that the device is currently engaged in
        !           959: transferring information and no new command should be issued.
        !           960: The error count is useful for counting retries
        !           961: when errors occur.
        !           962: The device queue is used to remember stacked requests;
        !           963: in the simplest case it may be maintained as a first-in
        !           964: first-out list.
        !           965: Since buffers which have been handed over to
        !           966: the strategy routines are never
        !           967: on the list of free buffers,
        !           968: the pointers in the buffer which maintain the free list
        !           969: .I "(av_forw, av_back)"
        !           970: are also used to contain the pointers
        !           971: which maintain the device queues.
        !           972: .PP
        !           973: A couple of routines
        !           974: are provided which are useful to block device drivers.
        !           975: .I "iodone(bp)"
        !           976: arranges that the buffer to which
        !           977: .I bp
        !           978: points be released or awakened,
        !           979: as appropriate,
        !           980: when the
        !           981: strategy module has finished with the buffer,
        !           982: either normally or after an error.
        !           983: (In the latter case the
        !           984: .I B_ERROR
        !           985: bit has presumably been set.)
        !           986: .PP
        !           987: The routine
        !           988: .I "geterror(bp)"
        !           989: can be used to examine the error bit in a buffer header
        !           990: and arrange that any error indication found therein is
        !           991: reflected to the user.
        !           992: It may be called only in the non-interrupt
        !           993: part of a driver when I/O has completed
        !           994: .I (B_DONE
        !           995: has been set).
        !           996: .SH
        !           997: Raw Block-device I/O
        !           998: .PP
        !           999: A scheme has been set up whereby block device drivers may
        !          1000: provide the ability to transfer information
        !          1001: directly between the user's core image and the device
        !          1002: without the use of buffers and in blocks as large as
        !          1003: the caller requests.
        !          1004: The method involves setting up a character-type special file
        !          1005: corresponding to the raw device
        !          1006: and providing
        !          1007: .I read
        !          1008: and
        !          1009: .I write
        !          1010: routines which set up what is usually a private,
        !          1011: non-shared buffer header with the appropriate information
        !          1012: and call the device's strategy routine.
        !          1013: If desired, separate
        !          1014: .I open
        !          1015: and
        !          1016: .I close
        !          1017: routines may be provided but this is usually unnecessary.
        !          1018: A special-function routine might come in handy, especially for
        !          1019: magtape.
        !          1020: .PP
        !          1021: A great deal of work has to be done to generate the
        !          1022: ``appropriate information''
        !          1023: to put in the argument buffer for
        !          1024: the strategy module;
        !          1025: the worst part is to map relocated user addresses to physical addresses.
        !          1026: Most of this work is done by
        !          1027: .I "physio(strat, bp, dev, rw)
        !          1028: whose arguments are the name of the
        !          1029: strategy routine
        !          1030: .I strat,
        !          1031: the buffer pointer
        !          1032: .I bp,
        !          1033: the device number
        !          1034: .I dev,
        !          1035: and a read-write flag
        !          1036: .I rw
        !          1037: whose value is either
        !          1038: .I B_READ
        !          1039: or
        !          1040: .I B_WRITE.
        !          1041: .I Physio
        !          1042: makes sure that the user's base address and count are
        !          1043: even (because most devices work in words)
        !          1044: and that the core area affected is contiguous
        !          1045: in physical space;
        !          1046: it delays until the buffer is not busy, and makes it
        !          1047: busy while the operation is in progress;
        !          1048: and it sets up user error return information.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.