|
|
1.1 ! root 1: .\" @(#)iosys 6.1 (Berkeley) 4/29/86 ! 2: .\" ! 3: .EH 'PS2:5-%''The UNIX I/O System' ! 4: .OH 'The UNIX I/O System''PS2:5-%' ! 5: .TL ! 6: The UNIX I/O System ! 7: .AU ! 8: Dennis M. Ritchie ! 9: .AI ! 10: .MH ! 11: .PP ! 12: This paper gives an overview of the workings of the UNIX\(dg ! 13: .FS ! 14: \(dgUNIX is a Trademark of Bell Laboratories. ! 15: .FE ! 16: I/O system. ! 17: It was written with an eye toward providing ! 18: guidance to writers of device driver routines, ! 19: and is oriented more toward describing the environment ! 20: and nature of device drivers than the implementation ! 21: of that part of the file system which deals with ! 22: ordinary files. ! 23: .PP ! 24: It is assumed that the reader has a good knowledge ! 25: of the overall structure of the file system as discussed ! 26: in the paper ``The UNIX Time-sharing System.'' ! 27: A more detailed discussion ! 28: appears in ! 29: ``UNIX Implementation;'' ! 30: the current document restates parts of that one, ! 31: but is still more detailed. ! 32: It is most useful in ! 33: conjunction with a copy of the system code, ! 34: since it is basically an exegesis of that code. ! 35: .SH ! 36: Device Classes ! 37: .PP ! 38: There are two classes of device: ! 39: .I block ! 40: and ! 41: .I character. ! 42: The block interface is suitable for devices ! 43: like disks, tapes, and DECtape ! 44: which work, or can work, with addressible 512-byte blocks. ! 45: Ordinary magnetic tape just barely fits in this category, ! 46: since by use of forward ! 47: and ! 48: backward spacing any block can be read, even though ! 49: blocks can be written only at the end of the tape. ! 50: Block devices can at least potentially contain a mounted ! 51: file system. ! 52: The interface to block devices is very highly structured; ! 53: the drivers for these devices share a great many routines ! 54: as well as a pool of buffers. ! 55: .PP ! 56: Character-type devices have a much ! 57: more straightforward interface, although ! 58: more work must be done by the driver itself. ! 59: .PP ! 60: Devices of both types are named by a ! 61: .I major ! 62: and a ! 63: .I minor ! 64: device number. ! 65: These numbers are generally stored as an integer ! 66: with the minor device number ! 67: in the low-order 8 bits and the major device number ! 68: in the next-higher 8 bits; ! 69: macros ! 70: .I major ! 71: and ! 72: .I minor ! 73: are available to access these numbers. ! 74: The major device number selects which driver will deal with ! 75: the device; the minor device number is not used ! 76: by the rest of the system but is passed to the ! 77: driver at appropriate times. ! 78: Typically the minor number ! 79: selects a subdevice attached to ! 80: a given controller, or one of ! 81: several similar hardware interfaces. ! 82: .PP ! 83: The major device numbers for block and character devices ! 84: are used as indices in separate tables; ! 85: they both start at 0 and therefore overlap. ! 86: .SH ! 87: Overview of I/O ! 88: .PP ! 89: The purpose of ! 90: the ! 91: .I open ! 92: and ! 93: .I creat ! 94: system calls is to set up entries in three separate ! 95: system tables. ! 96: The first of these is the ! 97: .I u_ofile ! 98: table, ! 99: which is stored in the system's per-process ! 100: data area ! 101: .I u. ! 102: This table is indexed by ! 103: the file descriptor returned by the ! 104: .I open ! 105: or ! 106: .I creat, ! 107: and is accessed during ! 108: a ! 109: .I read, ! 110: .I write, ! 111: or other operation on the open file. ! 112: An entry contains only ! 113: a pointer to the corresponding ! 114: entry of the ! 115: .I file ! 116: table, ! 117: which is a per-system data base. ! 118: There is one entry in the ! 119: .I file ! 120: table for each ! 121: instance of ! 122: .I open ! 123: or ! 124: .I creat. ! 125: This table is per-system because the same instance ! 126: of an open file must be shared among the several processes ! 127: which can result from ! 128: .I forks ! 129: after the file is opened. ! 130: A ! 131: .I file ! 132: table entry contains ! 133: flags which indicate whether the file ! 134: was open for reading or writing or is a pipe, and ! 135: a count which is used to decide when all processes ! 136: using the entry have terminated or closed the file ! 137: (so the entry can be abandoned). ! 138: There is also a 32-bit file offset ! 139: which is used to indicate where in the file the next read ! 140: or write will take place. ! 141: Finally, there is a pointer to the ! 142: entry for the file in the ! 143: .I inode ! 144: table, ! 145: which contains a copy of the file's i-node. ! 146: .PP ! 147: Certain open files can be designated ``multiplexed'' ! 148: files, and several other flags apply to such ! 149: channels. ! 150: In such a case, instead of an offset, ! 151: there is a pointer to an associated multiplex channel table. ! 152: Multiplex channels will not be discussed here. ! 153: .PP ! 154: An entry in the ! 155: .I file ! 156: table corresponds precisely to an instance of ! 157: .I open ! 158: or ! 159: .I creat; ! 160: if the same file is opened several times, ! 161: it will have several ! 162: entries in this table. ! 163: However, ! 164: there is at most one entry ! 165: in the ! 166: .I inode ! 167: table for a given file. ! 168: Also, a file may enter the ! 169: .I inode ! 170: table not only because it is open, ! 171: but also because it is the current directory ! 172: of some process or because it ! 173: is a special file containing a currently-mounted ! 174: file system. ! 175: .PP ! 176: An entry in the ! 177: .I inode ! 178: table differs somewhat from the ! 179: corresponding i-node as stored on the disk; ! 180: the modified and accessed times are not stored, ! 181: and the entry is augmented ! 182: by a flag word containing information about the entry, ! 183: a count used to determine when it may be ! 184: allowed to disappear, ! 185: and the device and i-number ! 186: whence the entry came. ! 187: Also, the several block numbers that give addressing ! 188: information for the file are expanded from ! 189: the 3-byte, compressed format used on the disk to full ! 190: .I long ! 191: quantities. ! 192: .PP ! 193: During the processing of an ! 194: .I open ! 195: or ! 196: .I creat ! 197: call for a special file, ! 198: the system always calls the device's ! 199: .I open ! 200: routine to allow for any special processing ! 201: required (rewinding a tape, turning on ! 202: the data-terminal-ready lead of a modem, etc.). ! 203: However, ! 204: the ! 205: .I close ! 206: routine is called only when the last ! 207: process closes a file, ! 208: that is, when the i-node table entry ! 209: is being deallocated. ! 210: Thus it is not feasible ! 211: for a device to maintain, or depend on, ! 212: a count of its users, although it is quite ! 213: possible to ! 214: implement an exclusive-use device which cannot ! 215: be reopened until it has been closed. ! 216: .PP ! 217: When a ! 218: .I read ! 219: or ! 220: .I write ! 221: takes place, ! 222: the user's arguments ! 223: and the ! 224: .I file ! 225: table entry are used to set up the ! 226: variables ! 227: .I u.u_base, ! 228: .I u.u_count, ! 229: and ! 230: .I u.u_offset ! 231: which respectively contain the (user) address ! 232: of the I/O target area, the byte-count for the transfer, ! 233: and the current location in the file. ! 234: If the file referred to is ! 235: a character-type special file, the appropriate read ! 236: or write routine is called; it is responsible ! 237: for transferring data and updating the ! 238: count and current location appropriately ! 239: as discussed below. ! 240: Otherwise, the current location is used to calculate ! 241: a logical block number in the file. ! 242: If the file is an ordinary file the logical block ! 243: number must be mapped (possibly using indirect blocks) ! 244: to a physical block number; a block-type ! 245: special file need not be mapped. ! 246: This mapping is performed by the ! 247: .I bmap ! 248: routine. ! 249: In any event, the resulting physical block number ! 250: is used, as discussed below, to ! 251: read or write the appropriate device. ! 252: .SH ! 253: Character Device Drivers ! 254: .PP ! 255: The ! 256: .I cdevsw ! 257: table specifies the interface routines present for ! 258: character devices. ! 259: Each device provides five routines: ! 260: open, close, read, write, and special-function ! 261: (to implement the ! 262: .I ioctl ! 263: system call). ! 264: Any of these may be missing. ! 265: If a call on the routine ! 266: should be ignored, ! 267: (e.g. ! 268: .I open ! 269: on non-exclusive devices that require no setup) ! 270: the ! 271: .I cdevsw ! 272: entry can be given as ! 273: .I nulldev; ! 274: if it should be considered an error, ! 275: (e.g. ! 276: .I write ! 277: on read-only devices) ! 278: .I nodev ! 279: is used. ! 280: For terminals, ! 281: the ! 282: .I cdevsw ! 283: structure also contains a pointer to the ! 284: .I tty ! 285: structure associated with the terminal. ! 286: .PP ! 287: The ! 288: .I open ! 289: routine is called each time the file ! 290: is opened with the full device number as argument. ! 291: The second argument is a flag which is ! 292: non-zero only if the device is to be written upon. ! 293: .PP ! 294: The ! 295: .I close ! 296: routine is called only when the file ! 297: is closed for the last time, ! 298: that is when the very last process in ! 299: which the file is open closes it. ! 300: This means it is not possible for the driver to ! 301: maintain its own count of its users. ! 302: The first argument is the device number; ! 303: the second is a flag which is non-zero ! 304: if the file was open for writing in the process which ! 305: performs the final ! 306: .I close. ! 307: .PP ! 308: When ! 309: .I write ! 310: is called, it is supplied the device ! 311: as argument. ! 312: The per-user variable ! 313: .I u.u_count ! 314: has been set to ! 315: the number of characters indicated by the user; ! 316: for character devices, this number may be 0 ! 317: initially. ! 318: .I u.u_base ! 319: is the address supplied by the user from which to start ! 320: taking characters. ! 321: The system may call the ! 322: routine internally, so the ! 323: flag ! 324: .I u.u_segflg ! 325: is supplied that indicates, ! 326: if ! 327: .I on, ! 328: that ! 329: .I u.u_base ! 330: refers to the system address space instead of ! 331: the user's. ! 332: .PP ! 333: The ! 334: .I write ! 335: routine ! 336: should copy up to ! 337: .I u.u_count ! 338: characters from the user's buffer to the device, ! 339: decrementing ! 340: .I u.u_count ! 341: for each character passed. ! 342: For most drivers, which work one character at a time, ! 343: the routine ! 344: .I "cpass( )" ! 345: is used to pick up characters ! 346: from the user's buffer. ! 347: Successive calls on it return ! 348: the characters to be written until ! 349: .I u.u_count ! 350: goes to 0 or an error occurs, ! 351: when it returns \(mi1. ! 352: .I Cpass ! 353: takes care of interrogating ! 354: .I u.u_segflg ! 355: and updating ! 356: .I u.u_count. ! 357: .PP ! 358: Write routines which want to transfer ! 359: a probably large number of characters into an internal ! 360: buffer may also use the routine ! 361: .I "iomove(buffer, offset, count, flag)" ! 362: which is faster when many characters must be moved. ! 363: .I Iomove ! 364: transfers up to ! 365: .I count ! 366: characters into the ! 367: .I buffer ! 368: starting ! 369: .I offset ! 370: bytes from the start of the buffer; ! 371: .I flag ! 372: should be ! 373: .I B_WRITE ! 374: (which is 0) in the write case. ! 375: Caution: ! 376: the caller is responsible for making sure ! 377: the count is not too large and is non-zero. ! 378: As an efficiency note, ! 379: .I iomove ! 380: is much slower if any of ! 381: .I "buffer+offset, count" ! 382: or ! 383: .I u.u_base ! 384: is odd. ! 385: .PP ! 386: The device's ! 387: .I read ! 388: routine is called under conditions similar to ! 389: .I write, ! 390: except that ! 391: .I u.u_count ! 392: is guaranteed to be non-zero. ! 393: To return characters to the user, the routine ! 394: .I "passc(c)" ! 395: is available; it takes care of housekeeping ! 396: like ! 397: .I cpass ! 398: and returns \(mi1 as the last character ! 399: specified by ! 400: .I u.u_count ! 401: is returned to the user; ! 402: before that time, 0 is returned. ! 403: .I Iomove ! 404: is also usable as with ! 405: .I write; ! 406: the flag should be ! 407: .I B_READ ! 408: but the same cautions apply. ! 409: .PP ! 410: The ``special-functions'' routine ! 411: is invoked by the ! 412: .I stty ! 413: and ! 414: .I gtty ! 415: system calls as follows: ! 416: .I "(*p) (dev, v)" ! 417: where ! 418: .I p ! 419: is a pointer to the device's routine, ! 420: .I dev ! 421: is the device number, ! 422: and ! 423: .I v ! 424: is a vector. ! 425: In the ! 426: .I gtty ! 427: case, ! 428: the device is supposed to place up to 3 words of status information ! 429: into the vector; this will be returned to the caller. ! 430: In the ! 431: .I stty ! 432: case, ! 433: .I v ! 434: is 0; ! 435: the device should take up to 3 words of ! 436: control information from ! 437: the array ! 438: .I "u.u_arg[0...2]." ! 439: .PP ! 440: Finally, each device should have appropriate interrupt-time ! 441: routines. ! 442: When an interrupt occurs, it is turned into a C-compatible call ! 443: on the devices's interrupt routine. ! 444: The interrupt-catching mechanism makes ! 445: the low-order four bits of the ``new PS'' word in the ! 446: trap vector for the interrupt available ! 447: to the interrupt handler. ! 448: This is conventionally used by drivers ! 449: which deal with multiple similar devices ! 450: to encode the minor device number. ! 451: After the interrupt has been processed, ! 452: a return from the interrupt handler will ! 453: return from the interrupt itself. ! 454: .PP ! 455: A number of subroutines are available which are useful ! 456: to character device drivers. ! 457: Most of these handlers, for example, need a place ! 458: to buffer characters in the internal interface ! 459: between their ``top half'' (read/write) ! 460: and ``bottom half'' (interrupt) routines. ! 461: For relatively low data-rate devices, the best mechanism ! 462: is the character queue maintained by the ! 463: routines ! 464: .I getc ! 465: and ! 466: .I putc. ! 467: A queue header has the structure ! 468: .DS ! 469: struct { ! 470: int c_cc; /* character count */ ! 471: char *c_cf; /* first character */ ! 472: char *c_cl; /* last character */ ! 473: } queue; ! 474: .DE ! 475: A character is placed on the end of a queue by ! 476: .I "putc(c, &queue)" ! 477: where ! 478: .I c ! 479: is the character and ! 480: .I queue ! 481: is the queue header. ! 482: The routine returns \(mi1 if there is no space ! 483: to put the character, 0 otherwise. ! 484: The first character on the queue may be retrieved ! 485: by ! 486: .I "getc(&queue)" ! 487: which returns either the (non-negative) character ! 488: or \(mi1 if the queue is empty. ! 489: .PP ! 490: Notice that the space for characters in queues is ! 491: shared among all devices in the system ! 492: and in the standard system there are only some 600 ! 493: character slots available. ! 494: Thus device handlers, ! 495: especially write routines, must take ! 496: care to avoid gobbling up excessive numbers of characters. ! 497: .PP ! 498: The other major help available ! 499: to device handlers is the sleep-wakeup mechanism. ! 500: The call ! 501: .I "sleep(event, priority)" ! 502: causes the process to wait (allowing other processes to run) ! 503: until the ! 504: .I event ! 505: occurs; ! 506: at that time, the process is marked ready-to-run ! 507: and the call will return when there is no ! 508: process with higher ! 509: .I priority. ! 510: .PP ! 511: The call ! 512: .I "wakeup(event)" ! 513: indicates that the ! 514: .I event ! 515: has happened, that is, causes processes sleeping ! 516: on the event to be awakened. ! 517: The ! 518: .I event ! 519: is an arbitrary quantity agreed upon ! 520: by the sleeper and the waker-up. ! 521: By convention, it is the address of some data area used ! 522: by the driver, which guarantees that events ! 523: are unique. ! 524: .PP ! 525: Processes sleeping on an event should not assume ! 526: that the event has really happened; ! 527: they should check that the conditions which ! 528: caused them to sleep no longer hold. ! 529: .PP ! 530: Priorities can range from 0 to 127; ! 531: a higher numerical value indicates a less-favored ! 532: scheduling situation. ! 533: A distinction is made between processes sleeping ! 534: at priority less than the parameter ! 535: .I PZERO ! 536: and those at numerically larger priorities. ! 537: The former cannot ! 538: be interrupted by signals, although it ! 539: is conceivable that it may be swapped out. ! 540: Thus it is a bad idea to sleep with ! 541: priority less than PZERO on an event which might never occur. ! 542: On the other hand, calls to ! 543: .I sleep ! 544: with larger priority ! 545: may never return if the process is terminated by ! 546: some signal in the meantime. ! 547: Incidentally, it is a gross error to call ! 548: .I sleep ! 549: in a routine called at interrupt time, since the process ! 550: which is running is almost certainly not the ! 551: process which should go to sleep. ! 552: Likewise, none of the variables in the user area ! 553: ``\fIu\fB.\fR'' ! 554: should be touched, let alone changed, by an interrupt routine. ! 555: .PP ! 556: If a device driver ! 557: wishes to wait for some event for which it is inconvenient ! 558: or impossible to supply a ! 559: .I wakeup, ! 560: (for example, a device going on-line, which does not ! 561: generally cause an interrupt), ! 562: the call ! 563: .I "sleep(&lbolt, priority) ! 564: may be given. ! 565: .I Lbolt ! 566: is an external cell whose address is awakened once every 4 seconds ! 567: by the clock interrupt routine. ! 568: .PP ! 569: The routines ! 570: .I "spl4( ), spl5( ), spl6( ), spl7( )" ! 571: are available to ! 572: set the processor priority level as indicated to avoid ! 573: inconvenient interrupts from the device. ! 574: .PP ! 575: If a device needs to know about real-time intervals, ! 576: then ! 577: .I "timeout(func, arg, interval) ! 578: will be useful. ! 579: This routine arranges that after ! 580: .I interval ! 581: sixtieths of a second, the ! 582: .I func ! 583: will be called with ! 584: .I arg ! 585: as argument, in the style ! 586: .I "(*func)(arg). ! 587: Timeouts are used, for example, ! 588: to provide real-time delays after function characters ! 589: like new-line and tab in typewriter output, ! 590: and to terminate an attempt to ! 591: read the 201 Dataphone ! 592: .I dp ! 593: if there is no response within a specified number ! 594: of seconds. ! 595: Notice that the number of sixtieths of a second is limited to 32767, ! 596: since it must appear to be positive, ! 597: and that only a bounded number of timeouts ! 598: can be going on at once. ! 599: Also, the specified ! 600: .I func ! 601: is called at clock-interrupt time, so it should ! 602: conform to the requirements of interrupt routines ! 603: in general. ! 604: .SH ! 605: The Block-device Interface ! 606: .PP ! 607: Handling of block devices is mediated by a collection ! 608: of routines that manage a set of buffers containing ! 609: the images of blocks of data on the various devices. ! 610: The most important purpose of these routines is to assure ! 611: that several processes that access the same block of the same ! 612: device in multiprogrammed fashion maintain a consistent ! 613: view of the data in the block. ! 614: A secondary but still important purpose is to increase ! 615: the efficiency of the system by ! 616: keeping in-core copies of blocks that are being ! 617: accessed frequently. ! 618: The main data base for this mechanism is the ! 619: table of buffers ! 620: .I buf. ! 621: Each buffer header contains a pair of pointers ! 622: .I "(b_forw, b_back)" ! 623: which maintain a doubly-linked list ! 624: of the buffers associated with a particular ! 625: block device, and a ! 626: pair of pointers ! 627: .I "(av_forw, av_back)" ! 628: which generally maintain a doubly-linked list of blocks ! 629: which are ``free,'' that is, ! 630: eligible to be reallocated for another transaction. ! 631: Buffers that have I/O in progress ! 632: or are busy for other purposes do not appear in this list. ! 633: The buffer header ! 634: also contains the device and block number to which the ! 635: buffer refers, and a pointer to the actual storage associated with ! 636: the buffer. ! 637: There is a word count ! 638: which is the negative of the number of words ! 639: to be transferred to or from the buffer; ! 640: there is also an error byte and a residual word ! 641: count used to communicate information ! 642: from an I/O routine to its caller. ! 643: Finally, there is a flag word ! 644: with bits indicating the status of the buffer. ! 645: These flags will be discussed below. ! 646: .PP ! 647: Seven routines constitute ! 648: the most important part of the interface with the ! 649: rest of the system. ! 650: Given a device and block number, ! 651: both ! 652: .I bread ! 653: and ! 654: .I getblk ! 655: return a pointer to a buffer header for the block; ! 656: the difference is that ! 657: .I bread ! 658: is guaranteed to return a buffer actually containing the ! 659: current data for the block, ! 660: while ! 661: .I getblk ! 662: returns a buffer which contains the data in the ! 663: block only if it is already in core (whether it is ! 664: or not is indicated by the ! 665: .I B_DONE ! 666: bit; see below). ! 667: In either case the buffer, and the corresponding ! 668: device block, is made ``busy,'' ! 669: so that other processes referring to it ! 670: are obliged to wait until it becomes free. ! 671: .I Getblk ! 672: is used, for example, ! 673: when a block is about to be totally rewritten, ! 674: so that its previous contents are ! 675: not useful; ! 676: still, no other process can be allowed to refer to the block ! 677: until the new data is placed into it. ! 678: .PP ! 679: The ! 680: .I breada ! 681: routine is used to implement read-ahead. ! 682: it is logically similar to ! 683: .I bread, ! 684: but takes as an additional argument the number of ! 685: a block (on the same device) to be read asynchronously ! 686: after the specifically requested block is available. ! 687: .PP ! 688: Given a pointer to a buffer, ! 689: the ! 690: .I brelse ! 691: routine ! 692: makes the buffer again available to other processes. ! 693: It is called, for example, after ! 694: data has been extracted following a ! 695: .I bread. ! 696: There are three subtly-different write routines, ! 697: all of which take a buffer pointer as argument, ! 698: and all of which logically release the buffer for ! 699: use by others and place it on the free list. ! 700: .I Bwrite ! 701: puts the ! 702: buffer on the appropriate device queue, ! 703: waits for the write to be done, ! 704: and sets the user's error flag if required. ! 705: .I Bawrite ! 706: places the buffer on the device's queue, but does not wait ! 707: for completion, so that errors cannot be reflected directly to ! 708: the user. ! 709: .I Bdwrite ! 710: does not start any I/O operation at all, ! 711: but merely marks ! 712: the buffer so that if it happens ! 713: to be grabbed from the free list to contain ! 714: data from some other block, the data in it will ! 715: first be written ! 716: out. ! 717: .PP ! 718: .I Bwrite ! 719: is used when one wants to be sure that ! 720: I/O takes place correctly, and that ! 721: errors are reflected to the proper user; ! 722: it is used, for example, when updating i-nodes. ! 723: .I Bawrite ! 724: is useful when more overlap is desired ! 725: (because no wait is required for I/O to finish) ! 726: but when it is reasonably certain that the ! 727: write is really required. ! 728: .I Bdwrite ! 729: is used when there is doubt that the write is ! 730: needed at the moment. ! 731: For example, ! 732: .I bdwrite ! 733: is called when the last byte of a ! 734: .I write ! 735: system call falls short of the end of a ! 736: block, on the assumption that ! 737: another ! 738: .I write ! 739: will be given soon which will re-use the same block. ! 740: On the other hand, ! 741: as the end of a block is passed, ! 742: .I bawrite ! 743: is called, since probably the block will ! 744: not be accessed again soon and one might as ! 745: well start the writing process as soon as possible. ! 746: .PP ! 747: In any event, notice that the routines ! 748: .I "getblk" ! 749: and ! 750: .I bread ! 751: dedicate the given block exclusively to the ! 752: use of the caller, and make others wait, ! 753: while one of ! 754: .I "brelse, bwrite, bawrite," ! 755: or ! 756: .I bdwrite ! 757: must eventually be called to free the block for use by others. ! 758: .PP ! 759: As mentioned, each buffer header contains a flag ! 760: word which indicates the status of the buffer. ! 761: Since they provide ! 762: one important channel for information between the drivers and the ! 763: block I/O system, it is important to understand these flags. ! 764: The following names are manifest constants which ! 765: select the associated flag bits. ! 766: .IP B_READ 10 ! 767: This bit is set when the buffer is handed to the device strategy routine ! 768: (see below) to indicate a read operation. ! 769: The symbol ! 770: .I B_WRITE ! 771: is defined as 0 and does not define a flag; it is provided ! 772: as a mnemonic convenience to callers of routines like ! 773: .I swap ! 774: which have a separate argument ! 775: which indicates read or write. ! 776: .IP B_DONE 10 ! 777: This bit is set ! 778: to 0 when a block is handed to the the device strategy ! 779: routine and is turned on when the operation completes, ! 780: whether normally as the result of an error. ! 781: It is also used as part of the return argument of ! 782: .I getblk ! 783: to indicate if 1 that the returned ! 784: buffer actually contains the data in the requested block. ! 785: .IP B_ERROR 10 ! 786: This bit may be set to 1 when ! 787: .I B_DONE ! 788: is set to indicate that an I/O or other error occurred. ! 789: If it is set the ! 790: .I b_error ! 791: byte of the buffer header may contain an error code ! 792: if it is non-zero. ! 793: If ! 794: .I b_error ! 795: is 0 the nature of the error is not specified. ! 796: Actually no driver at present sets ! 797: .I b_error; ! 798: the latter is provided for a future improvement ! 799: whereby a more detailed error-reporting ! 800: scheme may be implemented. ! 801: .IP B_BUSY 10 ! 802: This bit indicates that the buffer header is not on ! 803: the free list, i.e. is ! 804: dedicated to someone's exclusive use. ! 805: The buffer still remains attached to the list of ! 806: blocks associated with its device, however. ! 807: When ! 808: .I getblk ! 809: (or ! 810: .I bread, ! 811: which calls it) searches the buffer list ! 812: for a given device and finds the requested ! 813: block with this bit on, it sleeps until the bit ! 814: clears. ! 815: .IP B_PHYS 10 ! 816: This bit is set for raw I/O transactions that ! 817: need to allocate the Unibus map on an 11/70. ! 818: .IP B_MAP 10 ! 819: This bit is set on buffers that have the Unibus map allocated, ! 820: so that the ! 821: .I iodone ! 822: routine knows to deallocate the map. ! 823: .IP B_WANTED 10 ! 824: This flag is used in conjunction with the ! 825: .I B_BUSY ! 826: bit. ! 827: Before sleeping as described ! 828: just above, ! 829: .I getblk ! 830: sets this flag. ! 831: Conversely, when the block is freed and the busy bit ! 832: goes down (in ! 833: .I brelse) ! 834: a ! 835: .I wakeup ! 836: is given for the block header whenever ! 837: .I B_WANTED ! 838: is on. ! 839: This strategem avoids the overhead ! 840: of having to call ! 841: .I wakeup ! 842: every time a buffer is freed on the chance that someone ! 843: might want it. ! 844: .IP B_AGE ! 845: This bit may be set on buffers just before releasing them; if it ! 846: is on, ! 847: the buffer is placed at the head of the free list, rather than at the ! 848: tail. ! 849: It is a performance heuristic ! 850: used when the caller judges that the same block will not soon be used again. ! 851: .IP B_ASYNC 10 ! 852: This bit is set by ! 853: .I bawrite ! 854: to indicate to the appropriate device driver ! 855: that the buffer should be released when the ! 856: write has been finished, usually at interrupt time. ! 857: The difference between ! 858: .I bwrite ! 859: and ! 860: .I bawrite ! 861: is that the former starts I/O, waits until it is done, and ! 862: frees the buffer. ! 863: The latter merely sets this bit and starts I/O. ! 864: The bit indicates that ! 865: .I relse ! 866: should be called for the buffer on completion. ! 867: .IP B_DELWRI 10 ! 868: This bit is set by ! 869: .I bdwrite ! 870: before releasing the buffer. ! 871: When ! 872: .I getblk, ! 873: while searching for a free block, ! 874: discovers the bit is 1 in a buffer it would otherwise grab, ! 875: it causes the block to be written out before reusing it. ! 876: .SH ! 877: Block Device Drivers ! 878: .PP ! 879: The ! 880: .I bdevsw ! 881: table contains the names of the interface routines ! 882: and that of a table for each block device. ! 883: .PP ! 884: Just as for character devices, block device drivers may supply ! 885: an ! 886: .I open ! 887: and a ! 888: .I close ! 889: routine ! 890: called respectively on each open and on the final close ! 891: of the device. ! 892: Instead of separate read and write routines, ! 893: each block device driver has a ! 894: .I strategy ! 895: routine which is called with a pointer to a buffer ! 896: header as argument. ! 897: As discussed, the buffer header contains ! 898: a read/write flag, the core address, ! 899: the block number, a (negative) word count, ! 900: and the major and minor device number. ! 901: The role of the strategy routine ! 902: is to carry out the operation as requested by the ! 903: information in the buffer header. ! 904: When the transaction is complete the ! 905: .I B_DONE ! 906: (and possibly the ! 907: .I B_ERROR) ! 908: bits should be set. ! 909: Then if the ! 910: .I B_ASYNC ! 911: bit is set, ! 912: .I brelse ! 913: should be called; ! 914: otherwise, ! 915: .I wakeup. ! 916: In cases where the device ! 917: is capable, under error-free operation, ! 918: of transferring fewer words than requested, ! 919: the device's word-count register should be placed ! 920: in the residual count slot of ! 921: the buffer header; ! 922: otherwise, the residual count should be set to 0. ! 923: This particular mechanism is really for the benefit ! 924: of the magtape driver; ! 925: when reading this device ! 926: records shorter than requested are quite normal, ! 927: and the user should be told the actual length of the record. ! 928: .PP ! 929: Although the most usual argument ! 930: to the strategy routines ! 931: is a genuine buffer header allocated as discussed above, ! 932: all that is actually required ! 933: is that the argument be a pointer to a place containing the ! 934: appropriate information. ! 935: For example the ! 936: .I swap ! 937: routine, which manages movement ! 938: of core images to and from the swapping device, ! 939: uses the strategy routine ! 940: for this device. ! 941: Care has to be taken that ! 942: no extraneous bits get turned on in the ! 943: flag word. ! 944: .PP ! 945: The device's table specified by ! 946: .I bdevsw ! 947: has a ! 948: byte to contain an active flag and an error count, ! 949: a pair of links which constitute the ! 950: head of the chain of buffers for the device ! 951: .I "(b_forw, b_back)," ! 952: and a first and last pointer for a device queue. ! 953: Of these things, all are used solely by the device driver ! 954: itself ! 955: except for the buffer-chain pointers. ! 956: Typically the flag encodes the state of the ! 957: device, and is used at a minimum to ! 958: indicate that the device is currently engaged in ! 959: transferring information and no new command should be issued. ! 960: The error count is useful for counting retries ! 961: when errors occur. ! 962: The device queue is used to remember stacked requests; ! 963: in the simplest case it may be maintained as a first-in ! 964: first-out list. ! 965: Since buffers which have been handed over to ! 966: the strategy routines are never ! 967: on the list of free buffers, ! 968: the pointers in the buffer which maintain the free list ! 969: .I "(av_forw, av_back)" ! 970: are also used to contain the pointers ! 971: which maintain the device queues. ! 972: .PP ! 973: A couple of routines ! 974: are provided which are useful to block device drivers. ! 975: .I "iodone(bp)" ! 976: arranges that the buffer to which ! 977: .I bp ! 978: points be released or awakened, ! 979: as appropriate, ! 980: when the ! 981: strategy module has finished with the buffer, ! 982: either normally or after an error. ! 983: (In the latter case the ! 984: .I B_ERROR ! 985: bit has presumably been set.) ! 986: .PP ! 987: The routine ! 988: .I "geterror(bp)" ! 989: can be used to examine the error bit in a buffer header ! 990: and arrange that any error indication found therein is ! 991: reflected to the user. ! 992: It may be called only in the non-interrupt ! 993: part of a driver when I/O has completed ! 994: .I (B_DONE ! 995: has been set). ! 996: .SH ! 997: Raw Block-device I/O ! 998: .PP ! 999: A scheme has been set up whereby block device drivers may ! 1000: provide the ability to transfer information ! 1001: directly between the user's core image and the device ! 1002: without the use of buffers and in blocks as large as ! 1003: the caller requests. ! 1004: The method involves setting up a character-type special file ! 1005: corresponding to the raw device ! 1006: and providing ! 1007: .I read ! 1008: and ! 1009: .I write ! 1010: routines which set up what is usually a private, ! 1011: non-shared buffer header with the appropriate information ! 1012: and call the device's strategy routine. ! 1013: If desired, separate ! 1014: .I open ! 1015: and ! 1016: .I close ! 1017: routines may be provided but this is usually unnecessary. ! 1018: A special-function routine might come in handy, especially for ! 1019: magtape. ! 1020: .PP ! 1021: A great deal of work has to be done to generate the ! 1022: ``appropriate information'' ! 1023: to put in the argument buffer for ! 1024: the strategy module; ! 1025: the worst part is to map relocated user addresses to physical addresses. ! 1026: Most of this work is done by ! 1027: .I "physio(strat, bp, dev, rw) ! 1028: whose arguments are the name of the ! 1029: strategy routine ! 1030: .I strat, ! 1031: the buffer pointer ! 1032: .I bp, ! 1033: the device number ! 1034: .I dev, ! 1035: and a read-write flag ! 1036: .I rw ! 1037: whose value is either ! 1038: .I B_READ ! 1039: or ! 1040: .I B_WRITE. ! 1041: .I Physio ! 1042: makes sure that the user's base address and count are ! 1043: even (because most devices work in words) ! 1044: and that the core area affected is contiguous ! 1045: in physical space; ! 1046: it delays until the buffer is not busy, and makes it ! 1047: busy while the operation is in progress; ! 1048: and it sets up user error return information.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.