Annotation of researchv10dc/vol2/streams/streams.ms, revision 1.1.1.1

1.1       root        1: .so ../ADM/mac
                      2: .XX streams 503 "A Stream Input-Output System"
                      3: .TL
                      4: A Stream Input-Output System\(dg
                      5: .AU
                      6: Dennis M. Ritchie
                      7: .AI
                      8: .MH
                      9: .AB
                     10: In a new version of the
                     11: .UX
                     12: operating system,
                     13: a flexible coroutine-based design replaces the traditional
                     14: rigid connection between processes and terminals or networks.
                     15: Processing modules may be inserted dynamically into the stream
                     16: that connects a user's program to a device.
                     17: Programs may also connect directly to programs, providing
                     18: inter-process communication.
                     19: .AE
                     20: .2C
                     21: .FS
                     22: \(dgThis paper is a reproduction of |reference(bltj streams).
                     23: .FE
                     24: .NH
                     25: Introduction
                     26: .PP
                     27: The part of the
                     28: .UX
                     29: operating system that deals with terminals
                     30: and other character devices
                     31: has always been complicated.
                     32: In recent versions of the system it has become even more so, for
                     33: two reasons.
                     34: .IP 1)
                     35: Network connections require protocols more ornate than are
                     36: easily accommodated in the existing structure.
                     37: A notion of ``line disciplines'' was only partially successful,
                     38: mostly because in the traditional system only one line discipline
                     39: can be active at a time.
                     40: .IP 2)
                     41: The fundamental data structure of the traditional character I/O system,
                     42: a queue of individual characters (the ``clist''),
                     43: is costly because it accepts and dispenses characters one at a time.
                     44: Attempts to avoid overhead by bypassing the mechanism entirely
                     45: or by introducing
                     46: .I "ad hoc
                     47: routines succeeded in speeding up the
                     48: code at the expense of regularity.
                     49: .LP
                     50: Patchwork solutions to specific problems were
                     51: destroying the modularity of this part of the system.
                     52: The time was ripe to redo the whole thing.
                     53: This paper describes the new organization.
                     54: .PP
                     55: The system described here runs on about 20 machines
                     56: in the Information Sciences Research Division of Bell Laboratories.
                     57: Although it is being investigated by other parts of Bell Labs,
                     58: it is not generally available.*
                     59: .FS
                     60: *AT&T's UNIX System V Release 3 introduced a reimplementation
                     61: (in a commercially available system)
                     62: of the streams mechanisms described here.
                     63: This version was incomplete; for example, it did not contain a new
                     64: terminal-handling scheme.
                     65: UNIX System V Release 4 is intended to complete the streams
                     66: implementation.
                     67: The System V version of streams is called STREAMS, and though
                     68: considerably more ornate, is based on the same ideas that are
                     69: described here.
                     70: .FE
                     71: .NH
                     72: Overview
                     73: .PP
                     74: This section summarizes the nomenclature, components, and mechanisms
                     75: of the new I/O system.
                     76: .NH 2
                     77: Streams
                     78: .PP
                     79: A
                     80: .I "stream
                     81: is a full-duplex connection between a user's process and a device
                     82: or pseudo-device.
                     83: It consists of several linearly connected processing modules,
                     84: and is analogous to a Shell pipeline, except that
                     85: data flows in both directions.
                     86: The modules in a stream communicate almost exclusively
                     87: by passing messages to their neighbors.
                     88: Except for some conventional variables used for flow control, modules do not
                     89: require access to the storage of their neighbors.
                     90: Moreover, a module provides only one entry point to each neighbor, namely
                     91: a routine that accepts messages.
                     92: .PP
                     93: At the end of the stream closest to the process
                     94: is a set of routines that provide the interface to the rest of the system.
                     95: A user's
                     96: .I "write
                     97: and
                     98: I/O control requests are turned into messages sent to the stream,
                     99: and
                    100: .I "read
                    101: requests take data from the stream and pass it to the user.
                    102: At the other end of the stream is a 
                    103: device driver module.
                    104: Here, data arriving from the stream is sent to the device;
                    105: characters and state transitions detected by the device are
                    106: composed into messages and sent into the stream towards the user program.
                    107: Intermediate modules process the messages in various ways.
                    108: .PP
                    109: The two end modules in a stream become connected automatically when
                    110: the device is opened;
                    111: intermediate modules are attached dynamically by request of the user's program.
                    112: Stream processing modules are
                    113: symmetrical; their read and write interfaces are identical.
                    114: .NH 2
                    115: Queues
                    116: .PP
                    117: Each stream processing module consists of a pair of
                    118: .I "queues,
                    119: one for each direction.
                    120: A queue comprises not only a data queue proper, but also two routines
                    121: and some status information.
                    122: One routine is the
                    123: .I "put procedure,
                    124: which is called by its neighbor
                    125: to place messages on the data queue.
                    126: The other, the
                    127: .I "service procedure,
                    128: is scheduled to execute whenever there is work for it to do.
                    129: The status information includes a pointer to the next queue downstream,
                    130: various flags, and a pointer to additional state information required
                    131: by the instantiation of the queue.
                    132: Queues are allocated in such a way that the routines associated with
                    133: one half of a stream module may find the queue associated with the other half.
                    134: (This is used, for example, in generating echos for terminal input.)
                    135: .NH 2
                    136: Message blocks
                    137: .PP
                    138: The objects passed between queues are blocks obtained from an allocator.
                    139: Each contains a
                    140: .I "read pointer,
                    141: a
                    142: .I "write pointer,
                    143: and a
                    144: .I "limit pointer,
                    145: which specify respectively the beginning of information being passed, its end,
                    146: and a bound on the extent to which the write pointer may be increased.
                    147: .PP
                    148: The header of a block specifies its type; the most common blocks contain
                    149: data.
                    150: There are also control blocks of various kinds,
                    151: all with the same form as data blocks and obtained from the same
                    152: allocator.
                    153: For example, there are control blocks to introduce delimiters
                    154: into the data stream, to pass user I/O control requests, and to announce
                    155: special conditions such as line break and carrier loss on terminal
                    156: devices.
                    157: .PP
                    158: Although data blocks arrive in discrete units
                    159: at the processing modules,
                    160: boundaries between them are semantically insignificant;
                    161: standard subroutines may try to coalesce adjacent
                    162: data blocks in the same queue.
                    163: Control blocks, however, are never coalesced.
                    164: .NH 2
                    165: Scheduling
                    166: .PP
                    167: Although each queue module behaves in some ways like a separate process,
                    168: it is not a real process; the system saves no state information
                    169: for a queue module that is not running.
                    170: In particular queue processing routines do not block when they cannot proceed,
                    171: but must explicitly return control.
                    172: A queue may be
                    173: .I "enabled
                    174: by mechanisms described below.
                    175: When a queue becomes enabled, the system will, as soon as convenient,
                    176: call its service procedure entry,
                    177: which removes successive blocks
                    178: from the associated data queue, processes them, and places them on the
                    179: next queue by calling its put procedure.
                    180: When there are no more blocks to process, or when the next queue becomes
                    181: full, the service procedure returns to the system.
                    182: Any special state information must be saved explicitly.
                    183: .PP
                    184: Standard routines make enabling of queue modules largely automatic.
                    185: For example, the routine that puts a block on a queue
                    186: enables the queue service routine if the queue was empty.
                    187: .NH 2
                    188: Flow Control
                    189: .PP
                    190: Associated with each queue is a pair of numbers used for flow control.
                    191: A high-water mark limits the amount of data that may be outstanding
                    192: in the queue; by convention, modules do not place data on a queue
                    193: above its limit.
                    194: A low-water mark is used for scheduling in this way:
                    195: when a queue has exceeded its high-water mark, a flag is set.
                    196: Then, when the routine that takes blocks from a data queue notices
                    197: that this flag is set and that the queue has dropped below the low-water
                    198: mark, the queue upstream of this one is enabled.
                    199: .NH
                    200: Simple Examples
                    201: .PP
                    202: Figure 1 depicts a stream device that has just
                    203: been opened.
                    204: The top-level routines, drawn as a pair of half-open rectangles
                    205: on the left, are invoked by users'
                    206: .I read
                    207: and
                    208: .I write
                    209: calls.
                    210: The
                    211: writer
                    212: routine sends messages to the device driver shown on the right.
                    213: Data arriving from the device is composed
                    214: into messages sent to the top-level
                    215: reader
                    216: routine, which returns the data to the user process
                    217: when it executes
                    218: .I read .
                    219: .1C
                    220: .KF
                    221: .PS
                    222: boxht = boxht/1.5
                    223: arrow "user" "write"
                    224: move up boxht/2
                    225: line right boxwid/2 then down boxht then left boxwid/2
                    226: move by (boxwid/2, boxht/2)
                    227: arrow right
                    228: line up boxht/2
                    229: line right boxwid/2 
                    230: move down boxht
                    231: line left boxwid/2 then up boxht/2
                    232: move right boxwid/2
                    233: arrow "device" "out"
                    234: move down boxht
                    235: arrow left "device" "in"
                    236: move down boxht/2
                    237: line left boxwid/2 then up boxht
                    238: move down boxht/2
                    239: arrow left
                    240: move up boxht/2
                    241: line down boxht then left boxwid/2
                    242: move up boxht/2
                    243: arrow left "user" "read"
                    244: .PE
                    245: .sp .5
                    246: .ce
                    247: Figure 1.  Configuration after device open.
                    248: .sp
                    249: .KE
                    250: .2C
                    251: .PP
                    252: Figure 2 shows an ordinary terminal connected by an RS-232 line.
                    253: Here a processing module (the pair of rectangles in the middle) is interposed;
                    254: it performs the services necessary to make terminals usable,
                    255: for example echoing, character-erase and line-kill, tab expansion
                    256: as required, and translation between carriage-return and new-line.
                    257: It is possible to use one of several terminal handling modules.
                    258: The standard one provides services like those of the Seventh
                    259: Edition system;
                    260: |reference(v7manual)
                    261: another resembles the Berkeley ``new tty'' driver.
                    262: |reference(bsdmanual)
                    263: .1C
                    264: .KF
                    265: .PS
                    266: arrow "user" "write"
                    267: move up boxht/2
                    268: line right boxwid/2 then down boxht then left boxwid/2
                    269: move by (boxwid/2, boxht/2)
                    270: arrow right
                    271: box "tty out"; arrow
                    272: line up boxht/2
                    273: line right boxwid/2 
                    274: move down boxht
                    275: line left boxwid/2 then up boxht/2
                    276: move right boxwid/2
                    277: arrow "device" "out"
                    278: move down boxht
                    279: arrow left "device" "in"
                    280: move down boxht/2
                    281: line left boxwid/2 then up boxht
                    282: move down boxht/2
                    283: arrow left
                    284: box "tty in"; arrow
                    285: move up boxht/2
                    286: line down boxht then left boxwid/2
                    287: move up boxht/2
                    288: arrow left "user" "read"
                    289: .PE
                    290: .sp .5
                    291: .ce
                    292: Figure 2.  Configuration for normal terminal attachment.
                    293: .sp
                    294: .KE
                    295: .2C
                    296: .PP
                    297: The processing modules in a stream are thought of as a stack whose
                    298: top (shown here on the left) is next to the user program.
                    299: Thus, to install the terminal processing module after opening
                    300: a terminal device, the program that makes
                    301: such connections executes a ``push'' I/O control call
                    302: naming the relevant stream and the desired processing module.
                    303: Other primitives pop a module from the stack
                    304: and determine the name of the topmost module.
                    305: .PP
                    306: Most of the machines using the version of the operating system 
                    307: described here are connected to a network based on the
                    308: Datakit\(tm packet switch.
                    309: |reference(datakit asynchronous)
                    310: Although there is a variety of host interfaces to the network,
                    311: most of ours are primitive, and require network protocols
                    312: to be conducted by the host machine, rather than by a front-end
                    313: processor.
                    314: Therefore, when terminals are connected to a host through the network,
                    315: a setup like that shown in Fig. 3 is used; the terminal processing
                    316: module is stacked on the network protocol module.
                    317: Again, there is a choice of protocol modules,
                    318: both a current standard and an older protocol that is being phased
                    319: out.
                    320: .1C
                    321: .KF
                    322: .PS
                    323: arrow "user" "write"
                    324: move up boxht/2
                    325: line right boxwid/2 then down boxht then left boxwid/2
                    326: move by (boxwid/2, boxht/2)
                    327: arrow right
                    328: box "tty out"; arrow
                    329: box "proto out"; arrow
                    330: line up boxht/2
                    331: line right boxwid/2 
                    332: move down boxht
                    333: line left boxwid/2 then up boxht/2
                    334: move right boxwid/2
                    335: arrow "device" "out"
                    336: move down boxht
                    337: arrow left "device" "in"
                    338: move down boxht/2
                    339: line left boxwid/2 then up boxht
                    340: move down boxht/2
                    341: arrow left
                    342: box "proto in"; arrow
                    343: box "tty in"; arrow
                    344: move up boxht/2
                    345: line down boxht then left boxwid/2
                    346: move up boxht/2
                    347: arrow left "user" "read"
                    348: .PE
                    349: .sp .5
                    350: .ce
                    351: Figure 3.  Configuration for network terminals.
                    352: .sp
                    353: .KE
                    354: .2C
                    355: .PP
                    356: A common fourth configuration (not illustrated) is used when
                    357: the network is used for file transfers or other purposes
                    358: when terminal processing is not needed.
                    359: It simply omits the ``tty'' module and uses only the protocol module.
                    360: Some of our machines, on the other hand, have front-end processors programmed
                    361: to conduct standard network protocol.
                    362: Here a connection for remote file transfer will resemble
                    363: that of Fig. 1, because the protocol is handled outside the operating
                    364: system; likewise network terminal connections via the front end will be
                    365: handled as shown in Fig. 2.
                    366: .NH
                    367: Messages
                    368: .PP
                    369: Most of the messages between modules contain data.
                    370: The allocator that dispenses message blocks
                    371: takes an argument specifying the smallest block its caller
                    372: is willing to accept.
                    373: The current allocator maintains an inventory of blocks
                    374: 4, 16, 64, and 1024 characters long.
                    375: Modules that allocate blocks choose a size by balancing
                    376: space loss in block linkage overhead against unused space in the block.
                    377: For example, the top-level
                    378: .I write
                    379: routine requests either 64- or 1024-character blocks,
                    380: because such calls
                    381: usually transmit many characters;
                    382: the network input routine allocates 16-byte blocks because
                    383: data arrives in packets of that size.
                    384: The smallest blocks are used only to carry arguments to the
                    385: control messages discussed below.
                    386: .PP
                    387: Besides data blocks, there are also several kinds of control messages.
                    388: The following messages are queued along with data messages,
                    389: in order to ensure that their effect occurs at the appropriate time.
                    390: .IP BREAK 10
                    391: is generated by a terminal device on detection of a line
                    392: break signal.
                    393: The standard terminal input processor turns this message into an interrupt
                    394: request.
                    395: It may also be sent to 
                    396: a terminal device driver to cause it to generate a break on the output line.
                    397: .IP HANGUP
                    398: is generated by a device when its remote connection drops.
                    399: When the message arrives at the top level it is turned into an interrupt
                    400: to the process, and it also marks the stream so that
                    401: further attempts to use it return errors.
                    402: .IP DELIM
                    403: is a delimiter in the data.
                    404: Most of the stream I/O system is prepared to provide true streams,
                    405: in which record boundaries are insignificant, but there are
                    406: various situations in which it is desirable to delimit the data.
                    407: For example, terminal input is read a line at a time;
                    408: .SM DELIM
                    409: is generated by the terminal input processor to demarcate lines.
                    410: .IP DELAY
                    411: tells terminal drivers to generate a real-time delay
                    412: on output; it allows time for slow terminals react to characters previously
                    413: sent.
                    414: .IP IOCTL
                    415: messages are generated by users'
                    416: .I ioctl
                    417: system calls.
                    418: The relevant parameters are gathered at the top level,
                    419: and if the request is not understood there, it and its parameters
                    420: are composed into a message and sent down the stream.
                    421: The first module that understands the particular
                    422: request acts on it and returns a positive acknowledgement.
                    423: Intermediate modules that do not recognize a particular
                    424: .SM IOCTL
                    425: request pass it on;
                    426: stream-end modules return a negative acknowledgement.
                    427: The top-level routine waits for the acknowledgement, and returns
                    428: any information it carries to the user.
                    429: .LP
                    430: Other control messages are asynchronous and jump over
                    431: queued data and non-priority control messages.
                    432: .IP IOCACK 10
                    433: .IP IOCNAK
                    434: acknowledge
                    435: .SM IOCTL
                    436: messages.
                    437: The device end of a stream must respond with one of these
                    438: messages;
                    439: the top level will eventually time out if no response is received.
                    440: .IP SIGNAL
                    441: messages are generated by the terminal processing module
                    442: and cause the top level to generate process signals such
                    443: as
                    444: .I quit
                    445: and
                    446: .I interrupt.
                    447: .IP FLUSH
                    448: messages are used to throw away data from input and output queues
                    449: after a signal or on request of the user.
                    450: .IP STOP
                    451: .IP START
                    452: messages are used by the terminal processor to halt and restart
                    453: output by a device, for example to implement the traditional
                    454: control-S/control-Q (X-on/X-off) flow control mechanism.
                    455: .1C
                    456: .KF top
                    457: .P1
                    458: struct queue {
                    459:        int     flag;           /* flag bits */
                    460:        void    (*putp)();      /* put procedure */
                    461:        void    (*servp)();     /* service procedure */
                    462:        struct  queue *next;    /* next queue downstream */
                    463:        struct  block *first;   /* first data block on queue */
                    464:        struct  block *last;    /* last data block on queue */
                    465:        int     hiwater;        /* max characters on queue */
                    466:        int     lowater;        /* wakeup point as queue drains */
                    467:        int     count;          /* characters now on queue */
                    468:        void    *ptr;           /* pointer to private storage */
                    469: };
                    470: .P2
                    471: .ce
                    472: Figure 4. The queue data structure
                    473: .KE
                    474: .2C
                    475: .NH
                    476: Queue Mechanisms and Interfaces
                    477: .PP
                    478: Associated with each direction of a full-duplex stream module
                    479: is a queue data structure (somewhat simplified
                    480: for exposition) shown in Figure 4.
                    481: The
                    482: .CW flag
                    483: word contains several bits used by low-level routines to control
                    484: scheduling: they show whether the downstream module wishes to read data,
                    485: or the upstream module wishes to write, or the queue is already
                    486: enabled.
                    487: One bit is examined by the upstream module; it tells whether this
                    488: queue is full.
                    489: .PP
                    490: The
                    491: .CW first
                    492: and
                    493: .CW last
                    494: members point to the head and tail of a singly-linked list of
                    495: data and control blocks that form the queue proper;
                    496: .CW hiwater
                    497: and
                    498: .CW lowater
                    499: are initialized when the queue is created, and when compared against
                    500: .CW count,
                    501: the current size of the queue, determine whether the queue is full
                    502: and whether it has emptied sufficiently to enable a blocked writer.
                    503: .PP
                    504: The
                    505: .CW ptr
                    506: member stores an untyped pointer that may be used by the queue
                    507: module to keep track of the location of storage private
                    508: to itself.
                    509: For example,
                    510: each instantiation of the terminal processing module
                    511: maintains
                    512: a structure containing various mode bits and special characters;
                    513: it stores a pointer to this structure here.
                    514: The type of
                    515: .CW ptr
                    516: is artificial.
                    517: It should be a union of pointers to each possible module
                    518: state structure.
                    519: .1C
                    520: .KF bottom
                    521: .P1
                    522: service(q)
                    523: struct queue *q
                    524: .sp .5
                    525:        while (q is not empty and q->next is not full) {
                    526:                get a block from q
                    527:                process message block
                    528:                call q->next->putp to dispose of
                    529:                     new or transformed block
                    530:        }
                    531: .P2
                    532: .ce
                    533: Figure 5. Typical service routine
                    534: .KE
                    535: .2C
                    536: .PP
                    537: Stream processing modules are written in one of two general
                    538: styles.
                    539: In the simpler kind, the queue module acts nearly as a classical
                    540: coroutine.
                    541: When it is instantiated, it sets its put procedure
                    542: .CW putp
                    543: to a system-supplied default routine, and supplies a service
                    544: procedure
                    545: .CW servp .
                    546: Its upstream module disposes of blocks by calling this
                    547: module's
                    548: .CW putp
                    549: routine, which places the block on this module's queue (by
                    550: manipulating the
                    551: .CW first
                    552: and
                    553: .CW last
                    554: pointers.)
                    555: The standard put procedure also
                    556: enables the current module;
                    557: a short time later the current module's
                    558: service procedure
                    559: .CW servp
                    560: is called by the scheduler.
                    561: Figure 5 shows the pseudo-code,
                    562: for a typical service routine.
                    563: This mechanism is appropriate in cases in which messages can be
                    564: processed independently of each other.
                    565: For example, it is used by the terminal output module.
                    566: All the scheduling details are taken care of by standard routines.
                    567: .PP
                    568: More complicated modules need finer control over scheduling.
                    569: A good example is terminal input.
                    570: Here the device module upstream produces characters, usually one at a time,
                    571: that must be gathered into a line to allow for character erase and kill
                    572: processing.
                    573: Therefore the stream input module provides a put procedure to be called
                    574: by the device driver or other module downstream from it;
                    575: Figure 6 shows an outline
                    576: of this routine and its accompanying service procedure.
                    577: The put procedure generates the echo characters
                    578: as promptly as possible;
                    579: when the terminal module is attached to a
                    580: device handler, they are created during the input interrupt
                    581: from the device, because the put procedure is called as a subroutine
                    582: of the handler.
                    583: On the other hand, line-gathering and erase and kill processing,
                    584: which can be lengthy, are done during the service procedure
                    585: at lower priority.
                    586: .1C
                    587: .KF top
                    588: .P1
                    589: putproc(q, bp)
                    590: struct queue *q; struct block *bp
                    591: .sp .5
                    592:        put bp on q
                    593:        echo characters in bp's data
                    594:        if (bp's data contains new-line or carriage return)
                    595:                enable q
                    596: .sp
                    597: service(q)
                    598: struct queue *q
                    599: .sp .5
                    600:        take data from q until new-line or carriage return,
                    601:           processing erase and kill characters
                    602:        call q->next->putp to hand line to upstream queue
                    603:        call q->next->putp with DELIM message
                    604: .P2
                    605: .ce
                    606: Figure 6. A typical put routine
                    607: .SP
                    608: .KE
                    609: .2C
                    610: .NH
                    611: Connection with the Rest of the System
                    612: .PP
                    613: Although all the drivers for terminal and network devices, and all
                    614: protocol handlers, were rewritten,
                    615: only minor changes were required elsewhere in the system.
                    616: Character devices and a character device switch,
                    617: as described by Thompson|reference(thompson implementation bstj),
                    618: are still present.
                    619: A pointer in the character device switch structure, if null,
                    620: causes the system to treat the device as always;
                    621: this is used for raw disk and tape, for example.
                    622: If not null, it
                    623: points to initialization information for the stream device;
                    624: when a stream device is opened, the queue structure shown in Fig. 1
                    625: is created, using this information,
                    626: and a pointer to the structure naming the stream is saved (in
                    627: the ``inode table;''|reference(thompson implementation  bstj).
                    628: .PP
                    629: Subsequently, when the user process makes
                    630: .I read ,
                    631: .I write ,
                    632: .I ioctl ,
                    633: or
                    634: .I close
                    635: calls, presence of a non-null stream pointer directs the
                    636: system to use a set of stream routines to generate and receive
                    637: queue messages; these are the ``top-level routines'' referred
                    638: to previously.
                    639: .PP
                    640: Only a few changes in user-level code are necessary,
                    641: most because opening a terminal puts it in the
                    642: ``very raw'' mode shown in Fig. 1.
                    643: In order to install the terminal-processing handler,
                    644: it is necessary for programs such as
                    645: .I init
                    646: to execute the appropriate
                    647: .I ioctl
                    648: call.
                    649: .NH
                    650: Interprocess Communication
                    651: .PP
                    652: As previously described, the stream I/O system constitutes a flexible
                    653: communication path between user processes and devices.
                    654: With a small addition, it also provides a mechanism for interprocess
                    655: communication.
                    656: A special device, the ``pseudo-terminal'' or
                    657: .SM PT ,
                    658: connects processes.
                    659: .SM PT
                    660: files come in even-odd pairs;
                    661: data written on the odd member of the pair
                    662: appears as input for the even member, and vice versa.
                    663: The idea is not new;
                    664: it appears in Tenex|reference(bobrow burchfiel tenex)
                    665: and its successors, for example.
                    666: It is analogous to pipes,
                    667: and especially to named pipes.|reference(system iii manual)
                    668: .SM PT
                    669: files differ from traditional
                    670: pipes in two ways:
                    671: they are full-duplex,
                    672: and control information passes through them as well as data.
                    673: They differ from the usual pseudo-terminal files
                    674: |reference(bsdmanual)
                    675: by not having any of the usual
                    676: terminal processing mechanisms inherently attached to them;
                    677: they are pure transmitters of control and data messages.
                    678: .SM PT
                    679: files are adequate for setting up
                    680: a reasonably general mechanism for explicit process communication,
                    681: but by themselves are not especially interesting.*
                    682: .FS
                    683: *More recent research systems (Ninth and Tenth editions) drop
                    684: .SM PT
                    685: files altogether.
                    686: They use pipes where the name of the connection
                    687: is not needed, and `mounted streams' where a name is needed.
                    688: See |reference(latest presotto ipc).
                    689: .FE
                    690: .1C
                    691: .KF bottom
                    692: .PS
                    693: box invis height 2*boxht "device" "process"
                    694: move up boxht/2
                    695: arrow right linewid/2
                    696: move up boxht/2
                    697: line right boxwid/2 then down boxht then left boxwid/2
                    698: move by (boxwid/2, boxht/2)
                    699: arrow right
                    700: box "mesg"; arrow
                    701: line up boxht/2
                    702: line right boxwid/2 
                    703: move down boxht
                    704: line invis left boxwid/4 "pt"
                    705: line left boxwid/4 then up boxht/2
                    706: move right boxwid/2
                    707: spline -> right boxwid then down 4*boxht then left boxwid
                    708: move down boxht/2
                    709: line left boxwid/2 then up boxht
                    710: line right boxwid/4
                    711: line invis right boxwid/4 "pt"
                    712: move by (-boxwid/2, -boxht/2)
                    713: arrow left
                    714: box "tty in"; arrow
                    715: move up boxht/2
                    716: line down boxht then left boxwid/2
                    717: move up boxht/2
                    718: arrow left linewid/2
                    719: move up boxht/2
                    720: left
                    721: box invis height boxht*2 "user" "process"
                    722: move up boxht/2
                    723: move right boxwid
                    724: arrow right linewid/2
                    725: move up boxht/2
                    726: line right boxwid/2 then down boxht then left boxwid/2
                    727: move by (boxwid/2, boxht/2)
                    728: arrow right
                    729: box "tty out"
                    730: arrow right
                    731: move down boxht/2
                    732: line up boxht then right boxwid/2
                    733: move down boxht/2
                    734: spline ->  right boxwid/2 then up 2*boxht then left boxwid/2
                    735: move down boxht/2
                    736: line left boxwid/2 then up boxht
                    737: move down boxht/2
                    738: arrow left
                    739: box "mesg"
                    740: arrow left
                    741: move up boxht/2
                    742: line down boxht then left boxwid/2
                    743: move up boxht/2
                    744: arrow left linewid/2
                    745: .PE
                    746: .sp .5
                    747: .ce
                    748: Figure 7.  Configuration for device simulator.
                    749: .sp
                    750: .KE
                    751: .2C
                    752: .PP
                    753: A special
                    754: .I message
                    755: module provides more intriguing possibilities.
                    756: In one direction,
                    757: the message processor takes control and data messages, such as those discussed
                    758: above, and transforms them into data blocks starting with a header giving
                    759: the message type, and followed by the message content.
                    760: In the other direction, it parses similarly-structured data messages
                    761: and creates the corresponding control blocks.
                    762: Figure 7 shows a configuration in which
                    763: a user process communicates through the terminal module, a
                    764: .SM PT
                    765: file pair, and the message module with another user-level process
                    766: that simulates a device driver.
                    767: Because
                    768: .SM PT
                    769: files are transparent, and the message module
                    770: maps bijectively between device-process data and stream control messages,
                    771: the device simulator may be completely faithful up to details
                    772: of timing.
                    773: In particular, user's
                    774: .I ioctl
                    775: requests are sent to the device process and are handled by it,
                    776: even if they are not understood by the operating system.
                    777: .PP
                    778: The usefulness of this setup is not so much to simulate new
                    779: devices, but to provide
                    780: ways for one program to control the environment
                    781: of another.
                    782: Pike|reference(blit bstj)
                    783: shows how these mechanisms are used to create
                    784: multiple virtual terminals on one physical terminal.
                    785: In another application, inter-machine connections in which
                    786: a user on one computer logs into another make use of the message module.
                    787: Here the
                    788: .I ioctl
                    789: requests generated by programs on the remote machine
                    790: are translated by this module into data messages that can be sent over
                    791: the network.
                    792: The local callout program translates them back into terminal control commands.
                    793: ......
                    794: .NH
                    795: Evaluation
                    796: .PP
                    797: My intent in rewriting the character I/O system
                    798: was to improve its structure by separating functions
                    799: that had been intertwined, and by allowing independent modules
                    800: to be connected dynamically across well-defined interfaces.
                    801: I also wanted to make the system faster and smaller.
                    802: The most difficult part of the project was the design of the interface.
                    803: It was guided by these decisions:
                    804: .IP 1)
                    805: It seemed to be necessary for efficiency
                    806: that the objects passed between modules be references to blocks of data.
                    807: The most important consequences of this principle, and those
                    808: that proved deciding, are that data need not be copied as it passes
                    809: across a module interface, and that many characters can be handled
                    810: during a single intermodule transmission.
                    811: Another effect, undesirable but accepted, is that each module
                    812: must be prepared to handle discrete chunks of data of unpredictable size.
                    813: For example, a protocol that expects records containing (say)
                    814: an 8-byte header must be prepared to paste together smaller data blocks
                    815: and split a block containing both a header and following data.
                    816: A related, although not necessarily consequent, decision was to make
                    817: the code assume that the data is addressable. 
                    818: .IP 2)
                    819: I decided, with regret, that each processing module
                    820: could not act as an independent process with its own call record.
                    821: The numbers seemed against it: on large systems it is necessary
                    822: to allow for as many as 1000 queues, and I saw no good way
                    823: to run this many processes without consuming inordinate amounts
                    824: of storage.
                    825: As a result,
                    826: stream server procedures are not allowed to block awaiting data,
                    827: but instead must return after saving necessary status information
                    828: explicitly.
                    829: The contortions required in the code are seldom serious in practice,
                    830: but the beauty of the scheme would increase if servers
                    831: could be written as a simple read-write loop in the true coroutine style.
                    832: .IP 3)
                    833: The characteristic feature of the design\-the server and put
                    834: procedures\-was the most difficult to work out.
                    835: I began with a belief that the intermodule interface
                    836: should be identical in the read and write directions.
                    837: Next, I observed that a pure call model (put procedure only)
                    838: would not work;
                    839: queueing would be necessary at some point.
                    840: For example, if the
                    841: .I write
                    842: system entry called through the terminal processing module to the device
                    843: driver, the driver would need to queue characters internally lest output
                    844: be completely synchronous.
                    845: On the other hand, a pure queueing model (service procedure only;
                    846: upstream modules always place their data in an input queue)
                    847: also appeared impractical.
                    848: As discussed above, a module (for example terminal input)
                    849: must often be activated at times that depend on its input data.
                    850: .PP
                    851: After considerable churning of details, the model presented here emerged.
                    852: In general its performance by various measures lives up to hopes.
                    853: .PP
                    854: The improvement in modularity is hard to measure,
                    855: but seems real;
                    856: for example, the number of
                    857: included header
                    858: files in stream modules drops to about one half of those required
                    859: by similar routines in the base system
                    860: (4.1 BSD).
                    861: Certainly stream modules may be composed more freely
                    862: than were the ``line disciplines'' of older systems.
                    863: .PP
                    864: The program text size of the version of the operating 
                    865: system described here is about 106 kilobytes on the VAX;
                    866: the base system was about 130KB.
                    867: The reduction was achieved by rewriting the various device drivers
                    868: and protocols and eliminating the Seventh Edition
                    869: multiplexed files
                    870: |reference(v7manual)
                    871: , most (though not all)
                    872: of whose functions are subsumed by other mechanisms.
                    873: On the other hand, the data space has increased.
                    874: On a VAX 11/750 configured for 32 users
                    875: about 32KB are used for storage of the structures for streams,
                    876: queues, and blocks.
                    877: The traditional character lists seem to require less;
                    878: similar systems from Berkeley and AT&T use between 14 and 19KB.
                    879: The tradeoff of program for data seems desirable.
                    880: .PP
                    881: Proper time comparisons
                    882: have not been made,
                    883: because of the difficulty of finding a comparable configuration.
                    884: On a VAX 11/750,
                    885: printing a large file on a directly-connected terminal
                    886: consumes 346 microseconds per character using the system described here;
                    887: this is about 10 per cent slower than the base system.
                    888: On the other hand, that system's per-character interrupt routine is
                    889: coded in assembly language, and the rest of its terminal handler
                    890: is replete with nonportable
                    891: interpolated assembly code;
                    892: the current system is written completely in C.
                    893: Printing the same file on a terminal connected through a primitive
                    894: network interface requires 136 microseconds per character,
                    895: half as much as the older network routines.
                    896: Pike
                    897: |reference(blit bstj)
                    898: observes that among the three implementations of Blit connection software,
                    899: the one based on the stream system is the only one that can download programs
                    900: at anything approaching line speed through a 19.2 Kbps connection.
                    901: In general I conclude that the new organization
                    902: never slows comparable tasks much, and that considerable speed improvements
                    903: are sometimes possible.
                    904: .PP
                    905: Although the new organization performs well, it has several peculiarities
                    906: and limitations.
                    907: Some of them seem inherent, some are fixable,
                    908: and some are the subject of current work.
                    909: .PP
                    910: I/O control calls turn into messages that require answers
                    911: before a result can be returned to the user.
                    912: Sometimes the message ultimately goes to another user-level process
                    913: that may reply tardily or never.
                    914: The stream is write-locked until the reply returns, in order
                    915: to eliminate the need to determine which process gets which reply.
                    916: A timeout breaks the lock,
                    917: so there is an unjustified error return if a reply
                    918: is late, and a long lockup period if one is lost.
                    919: The problem can be ameliorated by working harder on it, but
                    920: it typifies the difficulties that turn up when direct calls are replaced
                    921: by message-passing schemes.
                    922: .PP
                    923: Several oddities appear because time spent in server routines
                    924: cannot be assigned to any particular user or process.
                    925: It is impossible, for example, for devices to support privileged
                    926: .I ioctl
                    927: calls, because the device has no idea who generated the message.
                    928: Accounting and scheduling become less accurate;
                    929: a short census of several systems showed that between 4 and 8 per cent
                    930: of non-idle CPU time was being spent in server routines.
                    931: Finally, the anonymity of server processing most certainly makes
                    932: it more difficult to measure the performance of the new I/O system.
                    933: .PP
                    934: In its current form the stream I/O system is purely data-driven.
                    935: That is, data is presented by a user's
                    936: .I write
                    937: call, and passes through to the device; conversely, data appears unbidden
                    938: from a device and passes to the top level, where it is picked up by
                    939: .I read
                    940: calls.
                    941: Wherever possible flow control throttles down fast
                    942: generators of data, but nowhere except at the consumer end of a stream
                    943: is there knowledge of precisely how much data is desired.
                    944: Consider a command to execute possibly interactive program on another
                    945: machine connected by a stream.
                    946: The simplest such command sets up the connection
                    947: and invokes the remote program, and then copies characters from its
                    948: own standard input to the stream, and from the stream to its standard
                    949: output.
                    950: The scheme is adequate in practice,
                    951: but breaks when the user types more than the remote program expects.
                    952: For example, if the remote program reads no input at all,
                    953: any typed-ahead characters are sent to the remote system and lost.
                    954: This demonstrates a problem,
                    955: but I know of no solution inside the stream I/O mechanism itself;
                    956: other ideas will have to be applied.
                    957: .PP
                    958: Streams are linear connections; by themselves, they support no notion
                    959: of multiplexing, fan-in or fan-out.
                    960: Except at the ends of a stream, each invocation of a module has
                    961: a unique ``next'' and ``previous'' module.
                    962: Two locally-important applications
                    963: of streams testify to the importance of multiplexing:
                    964: Blit terminal connections
                    965: |reference(blit bstj)
                    966: , where the multiplexing is done well, though at some performance cost,
                    967: by a user program,
                    968: and remote execution of commands over a network,
                    969: where it is desired, but not now easy,
                    970: to separate the standard output from error output.
                    971: It seems likely that a general multiplexing mechanism
                    972: could help in both cases, but again, I do not yet know how to
                    973: design it.*
                    974: .FS
                    975: *Subsequent Research versions of the system still do not
                    976: have a general stream-multiplexing scheme built into the
                    977: kernel, while the System V streams implementations do.
                    978: We have not found any applications that suffer from the
                    979: lack of stream-multiplexing.
                    980: .FE
                    981: .PP
                    982: Although the current design provides elegant means for controlling
                    983: the semantics of communication channels already opened, it lacks
                    984: general ways of establishing channels between processes.
                    985: The
                    986: .SM PT
                    987: files described above are just fine for
                    988: Blit layers,
                    989: and work adequately for handling a few administrator-controlled
                    990: client-server relationships.
                    991: (Yes, we have multi-machine mazewar.)
                    992: Nevertheless, better naming mechanisms are called for.\(dg
                    993: .FS
                    994: \(dgSee |reference(latest presotto ipc)
                    995: for our current ideas.
                    996: .FE
                    997: .PP
                    998: In spite of these limitations, the stream I/O system works well.
                    999: Its aim was to improve design rather than to add features,
                   1000: in the belief that with proper design, the features come cheaply.
                   1001: This approach is arduous, but continues to succeed.
                   1002: .NH
                   1003: References
                   1004: .LP
                   1005: |reference_placement

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.