|
|
1.1 root 1: .so ../ADM/mac
2: .XX streams 503 "A Stream Input-Output System"
3: .TL
4: A Stream Input-Output System\(dg
5: .AU
6: Dennis M. Ritchie
7: .AI
8: .MH
9: .AB
10: In a new version of the
11: .UX
12: operating system,
13: a flexible coroutine-based design replaces the traditional
14: rigid connection between processes and terminals or networks.
15: Processing modules may be inserted dynamically into the stream
16: that connects a user's program to a device.
17: Programs may also connect directly to programs, providing
18: inter-process communication.
19: .AE
20: .2C
21: .FS
22: \(dgThis paper is a reproduction of |reference(bltj streams).
23: .FE
24: .NH
25: Introduction
26: .PP
27: The part of the
28: .UX
29: operating system that deals with terminals
30: and other character devices
31: has always been complicated.
32: In recent versions of the system it has become even more so, for
33: two reasons.
34: .IP 1)
35: Network connections require protocols more ornate than are
36: easily accommodated in the existing structure.
37: A notion of ``line disciplines'' was only partially successful,
38: mostly because in the traditional system only one line discipline
39: can be active at a time.
40: .IP 2)
41: The fundamental data structure of the traditional character I/O system,
42: a queue of individual characters (the ``clist''),
43: is costly because it accepts and dispenses characters one at a time.
44: Attempts to avoid overhead by bypassing the mechanism entirely
45: or by introducing
46: .I "ad hoc
47: routines succeeded in speeding up the
48: code at the expense of regularity.
49: .LP
50: Patchwork solutions to specific problems were
51: destroying the modularity of this part of the system.
52: The time was ripe to redo the whole thing.
53: This paper describes the new organization.
54: .PP
55: The system described here runs on about 20 machines
56: in the Information Sciences Research Division of Bell Laboratories.
57: Although it is being investigated by other parts of Bell Labs,
58: it is not generally available.*
59: .FS
60: *AT&T's UNIX System V Release 3 introduced a reimplementation
61: (in a commercially available system)
62: of the streams mechanisms described here.
63: This version was incomplete; for example, it did not contain a new
64: terminal-handling scheme.
65: UNIX System V Release 4 is intended to complete the streams
66: implementation.
67: The System V version of streams is called STREAMS, and though
68: considerably more ornate, is based on the same ideas that are
69: described here.
70: .FE
71: .NH
72: Overview
73: .PP
74: This section summarizes the nomenclature, components, and mechanisms
75: of the new I/O system.
76: .NH 2
77: Streams
78: .PP
79: A
80: .I "stream
81: is a full-duplex connection between a user's process and a device
82: or pseudo-device.
83: It consists of several linearly connected processing modules,
84: and is analogous to a Shell pipeline, except that
85: data flows in both directions.
86: The modules in a stream communicate almost exclusively
87: by passing messages to their neighbors.
88: Except for some conventional variables used for flow control, modules do not
89: require access to the storage of their neighbors.
90: Moreover, a module provides only one entry point to each neighbor, namely
91: a routine that accepts messages.
92: .PP
93: At the end of the stream closest to the process
94: is a set of routines that provide the interface to the rest of the system.
95: A user's
96: .I "write
97: and
98: I/O control requests are turned into messages sent to the stream,
99: and
100: .I "read
101: requests take data from the stream and pass it to the user.
102: At the other end of the stream is a
103: device driver module.
104: Here, data arriving from the stream is sent to the device;
105: characters and state transitions detected by the device are
106: composed into messages and sent into the stream towards the user program.
107: Intermediate modules process the messages in various ways.
108: .PP
109: The two end modules in a stream become connected automatically when
110: the device is opened;
111: intermediate modules are attached dynamically by request of the user's program.
112: Stream processing modules are
113: symmetrical; their read and write interfaces are identical.
114: .NH 2
115: Queues
116: .PP
117: Each stream processing module consists of a pair of
118: .I "queues,
119: one for each direction.
120: A queue comprises not only a data queue proper, but also two routines
121: and some status information.
122: One routine is the
123: .I "put procedure,
124: which is called by its neighbor
125: to place messages on the data queue.
126: The other, the
127: .I "service procedure,
128: is scheduled to execute whenever there is work for it to do.
129: The status information includes a pointer to the next queue downstream,
130: various flags, and a pointer to additional state information required
131: by the instantiation of the queue.
132: Queues are allocated in such a way that the routines associated with
133: one half of a stream module may find the queue associated with the other half.
134: (This is used, for example, in generating echos for terminal input.)
135: .NH 2
136: Message blocks
137: .PP
138: The objects passed between queues are blocks obtained from an allocator.
139: Each contains a
140: .I "read pointer,
141: a
142: .I "write pointer,
143: and a
144: .I "limit pointer,
145: which specify respectively the beginning of information being passed, its end,
146: and a bound on the extent to which the write pointer may be increased.
147: .PP
148: The header of a block specifies its type; the most common blocks contain
149: data.
150: There are also control blocks of various kinds,
151: all with the same form as data blocks and obtained from the same
152: allocator.
153: For example, there are control blocks to introduce delimiters
154: into the data stream, to pass user I/O control requests, and to announce
155: special conditions such as line break and carrier loss on terminal
156: devices.
157: .PP
158: Although data blocks arrive in discrete units
159: at the processing modules,
160: boundaries between them are semantically insignificant;
161: standard subroutines may try to coalesce adjacent
162: data blocks in the same queue.
163: Control blocks, however, are never coalesced.
164: .NH 2
165: Scheduling
166: .PP
167: Although each queue module behaves in some ways like a separate process,
168: it is not a real process; the system saves no state information
169: for a queue module that is not running.
170: In particular queue processing routines do not block when they cannot proceed,
171: but must explicitly return control.
172: A queue may be
173: .I "enabled
174: by mechanisms described below.
175: When a queue becomes enabled, the system will, as soon as convenient,
176: call its service procedure entry,
177: which removes successive blocks
178: from the associated data queue, processes them, and places them on the
179: next queue by calling its put procedure.
180: When there are no more blocks to process, or when the next queue becomes
181: full, the service procedure returns to the system.
182: Any special state information must be saved explicitly.
183: .PP
184: Standard routines make enabling of queue modules largely automatic.
185: For example, the routine that puts a block on a queue
186: enables the queue service routine if the queue was empty.
187: .NH 2
188: Flow Control
189: .PP
190: Associated with each queue is a pair of numbers used for flow control.
191: A high-water mark limits the amount of data that may be outstanding
192: in the queue; by convention, modules do not place data on a queue
193: above its limit.
194: A low-water mark is used for scheduling in this way:
195: when a queue has exceeded its high-water mark, a flag is set.
196: Then, when the routine that takes blocks from a data queue notices
197: that this flag is set and that the queue has dropped below the low-water
198: mark, the queue upstream of this one is enabled.
199: .NH
200: Simple Examples
201: .PP
202: Figure 1 depicts a stream device that has just
203: been opened.
204: The top-level routines, drawn as a pair of half-open rectangles
205: on the left, are invoked by users'
206: .I read
207: and
208: .I write
209: calls.
210: The
211: writer
212: routine sends messages to the device driver shown on the right.
213: Data arriving from the device is composed
214: into messages sent to the top-level
215: reader
216: routine, which returns the data to the user process
217: when it executes
218: .I read .
219: .1C
220: .KF
221: .PS
222: boxht = boxht/1.5
223: arrow "user" "write"
224: move up boxht/2
225: line right boxwid/2 then down boxht then left boxwid/2
226: move by (boxwid/2, boxht/2)
227: arrow right
228: line up boxht/2
229: line right boxwid/2
230: move down boxht
231: line left boxwid/2 then up boxht/2
232: move right boxwid/2
233: arrow "device" "out"
234: move down boxht
235: arrow left "device" "in"
236: move down boxht/2
237: line left boxwid/2 then up boxht
238: move down boxht/2
239: arrow left
240: move up boxht/2
241: line down boxht then left boxwid/2
242: move up boxht/2
243: arrow left "user" "read"
244: .PE
245: .sp .5
246: .ce
247: Figure 1. Configuration after device open.
248: .sp
249: .KE
250: .2C
251: .PP
252: Figure 2 shows an ordinary terminal connected by an RS-232 line.
253: Here a processing module (the pair of rectangles in the middle) is interposed;
254: it performs the services necessary to make terminals usable,
255: for example echoing, character-erase and line-kill, tab expansion
256: as required, and translation between carriage-return and new-line.
257: It is possible to use one of several terminal handling modules.
258: The standard one provides services like those of the Seventh
259: Edition system;
260: |reference(v7manual)
261: another resembles the Berkeley ``new tty'' driver.
262: |reference(bsdmanual)
263: .1C
264: .KF
265: .PS
266: arrow "user" "write"
267: move up boxht/2
268: line right boxwid/2 then down boxht then left boxwid/2
269: move by (boxwid/2, boxht/2)
270: arrow right
271: box "tty out"; arrow
272: line up boxht/2
273: line right boxwid/2
274: move down boxht
275: line left boxwid/2 then up boxht/2
276: move right boxwid/2
277: arrow "device" "out"
278: move down boxht
279: arrow left "device" "in"
280: move down boxht/2
281: line left boxwid/2 then up boxht
282: move down boxht/2
283: arrow left
284: box "tty in"; arrow
285: move up boxht/2
286: line down boxht then left boxwid/2
287: move up boxht/2
288: arrow left "user" "read"
289: .PE
290: .sp .5
291: .ce
292: Figure 2. Configuration for normal terminal attachment.
293: .sp
294: .KE
295: .2C
296: .PP
297: The processing modules in a stream are thought of as a stack whose
298: top (shown here on the left) is next to the user program.
299: Thus, to install the terminal processing module after opening
300: a terminal device, the program that makes
301: such connections executes a ``push'' I/O control call
302: naming the relevant stream and the desired processing module.
303: Other primitives pop a module from the stack
304: and determine the name of the topmost module.
305: .PP
306: Most of the machines using the version of the operating system
307: described here are connected to a network based on the
308: Datakit\(tm packet switch.
309: |reference(datakit asynchronous)
310: Although there is a variety of host interfaces to the network,
311: most of ours are primitive, and require network protocols
312: to be conducted by the host machine, rather than by a front-end
313: processor.
314: Therefore, when terminals are connected to a host through the network,
315: a setup like that shown in Fig. 3 is used; the terminal processing
316: module is stacked on the network protocol module.
317: Again, there is a choice of protocol modules,
318: both a current standard and an older protocol that is being phased
319: out.
320: .1C
321: .KF
322: .PS
323: arrow "user" "write"
324: move up boxht/2
325: line right boxwid/2 then down boxht then left boxwid/2
326: move by (boxwid/2, boxht/2)
327: arrow right
328: box "tty out"; arrow
329: box "proto out"; arrow
330: line up boxht/2
331: line right boxwid/2
332: move down boxht
333: line left boxwid/2 then up boxht/2
334: move right boxwid/2
335: arrow "device" "out"
336: move down boxht
337: arrow left "device" "in"
338: move down boxht/2
339: line left boxwid/2 then up boxht
340: move down boxht/2
341: arrow left
342: box "proto in"; arrow
343: box "tty in"; arrow
344: move up boxht/2
345: line down boxht then left boxwid/2
346: move up boxht/2
347: arrow left "user" "read"
348: .PE
349: .sp .5
350: .ce
351: Figure 3. Configuration for network terminals.
352: .sp
353: .KE
354: .2C
355: .PP
356: A common fourth configuration (not illustrated) is used when
357: the network is used for file transfers or other purposes
358: when terminal processing is not needed.
359: It simply omits the ``tty'' module and uses only the protocol module.
360: Some of our machines, on the other hand, have front-end processors programmed
361: to conduct standard network protocol.
362: Here a connection for remote file transfer will resemble
363: that of Fig. 1, because the protocol is handled outside the operating
364: system; likewise network terminal connections via the front end will be
365: handled as shown in Fig. 2.
366: .NH
367: Messages
368: .PP
369: Most of the messages between modules contain data.
370: The allocator that dispenses message blocks
371: takes an argument specifying the smallest block its caller
372: is willing to accept.
373: The current allocator maintains an inventory of blocks
374: 4, 16, 64, and 1024 characters long.
375: Modules that allocate blocks choose a size by balancing
376: space loss in block linkage overhead against unused space in the block.
377: For example, the top-level
378: .I write
379: routine requests either 64- or 1024-character blocks,
380: because such calls
381: usually transmit many characters;
382: the network input routine allocates 16-byte blocks because
383: data arrives in packets of that size.
384: The smallest blocks are used only to carry arguments to the
385: control messages discussed below.
386: .PP
387: Besides data blocks, there are also several kinds of control messages.
388: The following messages are queued along with data messages,
389: in order to ensure that their effect occurs at the appropriate time.
390: .IP BREAK 10
391: is generated by a terminal device on detection of a line
392: break signal.
393: The standard terminal input processor turns this message into an interrupt
394: request.
395: It may also be sent to
396: a terminal device driver to cause it to generate a break on the output line.
397: .IP HANGUP
398: is generated by a device when its remote connection drops.
399: When the message arrives at the top level it is turned into an interrupt
400: to the process, and it also marks the stream so that
401: further attempts to use it return errors.
402: .IP DELIM
403: is a delimiter in the data.
404: Most of the stream I/O system is prepared to provide true streams,
405: in which record boundaries are insignificant, but there are
406: various situations in which it is desirable to delimit the data.
407: For example, terminal input is read a line at a time;
408: .SM DELIM
409: is generated by the terminal input processor to demarcate lines.
410: .IP DELAY
411: tells terminal drivers to generate a real-time delay
412: on output; it allows time for slow terminals react to characters previously
413: sent.
414: .IP IOCTL
415: messages are generated by users'
416: .I ioctl
417: system calls.
418: The relevant parameters are gathered at the top level,
419: and if the request is not understood there, it and its parameters
420: are composed into a message and sent down the stream.
421: The first module that understands the particular
422: request acts on it and returns a positive acknowledgement.
423: Intermediate modules that do not recognize a particular
424: .SM IOCTL
425: request pass it on;
426: stream-end modules return a negative acknowledgement.
427: The top-level routine waits for the acknowledgement, and returns
428: any information it carries to the user.
429: .LP
430: Other control messages are asynchronous and jump over
431: queued data and non-priority control messages.
432: .IP IOCACK 10
433: .IP IOCNAK
434: acknowledge
435: .SM IOCTL
436: messages.
437: The device end of a stream must respond with one of these
438: messages;
439: the top level will eventually time out if no response is received.
440: .IP SIGNAL
441: messages are generated by the terminal processing module
442: and cause the top level to generate process signals such
443: as
444: .I quit
445: and
446: .I interrupt.
447: .IP FLUSH
448: messages are used to throw away data from input and output queues
449: after a signal or on request of the user.
450: .IP STOP
451: .IP START
452: messages are used by the terminal processor to halt and restart
453: output by a device, for example to implement the traditional
454: control-S/control-Q (X-on/X-off) flow control mechanism.
455: .1C
456: .KF top
457: .P1
458: struct queue {
459: int flag; /* flag bits */
460: void (*putp)(); /* put procedure */
461: void (*servp)(); /* service procedure */
462: struct queue *next; /* next queue downstream */
463: struct block *first; /* first data block on queue */
464: struct block *last; /* last data block on queue */
465: int hiwater; /* max characters on queue */
466: int lowater; /* wakeup point as queue drains */
467: int count; /* characters now on queue */
468: void *ptr; /* pointer to private storage */
469: };
470: .P2
471: .ce
472: Figure 4. The queue data structure
473: .KE
474: .2C
475: .NH
476: Queue Mechanisms and Interfaces
477: .PP
478: Associated with each direction of a full-duplex stream module
479: is a queue data structure (somewhat simplified
480: for exposition) shown in Figure 4.
481: The
482: .CW flag
483: word contains several bits used by low-level routines to control
484: scheduling: they show whether the downstream module wishes to read data,
485: or the upstream module wishes to write, or the queue is already
486: enabled.
487: One bit is examined by the upstream module; it tells whether this
488: queue is full.
489: .PP
490: The
491: .CW first
492: and
493: .CW last
494: members point to the head and tail of a singly-linked list of
495: data and control blocks that form the queue proper;
496: .CW hiwater
497: and
498: .CW lowater
499: are initialized when the queue is created, and when compared against
500: .CW count,
501: the current size of the queue, determine whether the queue is full
502: and whether it has emptied sufficiently to enable a blocked writer.
503: .PP
504: The
505: .CW ptr
506: member stores an untyped pointer that may be used by the queue
507: module to keep track of the location of storage private
508: to itself.
509: For example,
510: each instantiation of the terminal processing module
511: maintains
512: a structure containing various mode bits and special characters;
513: it stores a pointer to this structure here.
514: The type of
515: .CW ptr
516: is artificial.
517: It should be a union of pointers to each possible module
518: state structure.
519: .1C
520: .KF bottom
521: .P1
522: service(q)
523: struct queue *q
524: .sp .5
525: while (q is not empty and q->next is not full) {
526: get a block from q
527: process message block
528: call q->next->putp to dispose of
529: new or transformed block
530: }
531: .P2
532: .ce
533: Figure 5. Typical service routine
534: .KE
535: .2C
536: .PP
537: Stream processing modules are written in one of two general
538: styles.
539: In the simpler kind, the queue module acts nearly as a classical
540: coroutine.
541: When it is instantiated, it sets its put procedure
542: .CW putp
543: to a system-supplied default routine, and supplies a service
544: procedure
545: .CW servp .
546: Its upstream module disposes of blocks by calling this
547: module's
548: .CW putp
549: routine, which places the block on this module's queue (by
550: manipulating the
551: .CW first
552: and
553: .CW last
554: pointers.)
555: The standard put procedure also
556: enables the current module;
557: a short time later the current module's
558: service procedure
559: .CW servp
560: is called by the scheduler.
561: Figure 5 shows the pseudo-code,
562: for a typical service routine.
563: This mechanism is appropriate in cases in which messages can be
564: processed independently of each other.
565: For example, it is used by the terminal output module.
566: All the scheduling details are taken care of by standard routines.
567: .PP
568: More complicated modules need finer control over scheduling.
569: A good example is terminal input.
570: Here the device module upstream produces characters, usually one at a time,
571: that must be gathered into a line to allow for character erase and kill
572: processing.
573: Therefore the stream input module provides a put procedure to be called
574: by the device driver or other module downstream from it;
575: Figure 6 shows an outline
576: of this routine and its accompanying service procedure.
577: The put procedure generates the echo characters
578: as promptly as possible;
579: when the terminal module is attached to a
580: device handler, they are created during the input interrupt
581: from the device, because the put procedure is called as a subroutine
582: of the handler.
583: On the other hand, line-gathering and erase and kill processing,
584: which can be lengthy, are done during the service procedure
585: at lower priority.
586: .1C
587: .KF top
588: .P1
589: putproc(q, bp)
590: struct queue *q; struct block *bp
591: .sp .5
592: put bp on q
593: echo characters in bp's data
594: if (bp's data contains new-line or carriage return)
595: enable q
596: .sp
597: service(q)
598: struct queue *q
599: .sp .5
600: take data from q until new-line or carriage return,
601: processing erase and kill characters
602: call q->next->putp to hand line to upstream queue
603: call q->next->putp with DELIM message
604: .P2
605: .ce
606: Figure 6. A typical put routine
607: .SP
608: .KE
609: .2C
610: .NH
611: Connection with the Rest of the System
612: .PP
613: Although all the drivers for terminal and network devices, and all
614: protocol handlers, were rewritten,
615: only minor changes were required elsewhere in the system.
616: Character devices and a character device switch,
617: as described by Thompson|reference(thompson implementation bstj),
618: are still present.
619: A pointer in the character device switch structure, if null,
620: causes the system to treat the device as always;
621: this is used for raw disk and tape, for example.
622: If not null, it
623: points to initialization information for the stream device;
624: when a stream device is opened, the queue structure shown in Fig. 1
625: is created, using this information,
626: and a pointer to the structure naming the stream is saved (in
627: the ``inode table;''|reference(thompson implementation bstj).
628: .PP
629: Subsequently, when the user process makes
630: .I read ,
631: .I write ,
632: .I ioctl ,
633: or
634: .I close
635: calls, presence of a non-null stream pointer directs the
636: system to use a set of stream routines to generate and receive
637: queue messages; these are the ``top-level routines'' referred
638: to previously.
639: .PP
640: Only a few changes in user-level code are necessary,
641: most because opening a terminal puts it in the
642: ``very raw'' mode shown in Fig. 1.
643: In order to install the terminal-processing handler,
644: it is necessary for programs such as
645: .I init
646: to execute the appropriate
647: .I ioctl
648: call.
649: .NH
650: Interprocess Communication
651: .PP
652: As previously described, the stream I/O system constitutes a flexible
653: communication path between user processes and devices.
654: With a small addition, it also provides a mechanism for interprocess
655: communication.
656: A special device, the ``pseudo-terminal'' or
657: .SM PT ,
658: connects processes.
659: .SM PT
660: files come in even-odd pairs;
661: data written on the odd member of the pair
662: appears as input for the even member, and vice versa.
663: The idea is not new;
664: it appears in Tenex|reference(bobrow burchfiel tenex)
665: and its successors, for example.
666: It is analogous to pipes,
667: and especially to named pipes.|reference(system iii manual)
668: .SM PT
669: files differ from traditional
670: pipes in two ways:
671: they are full-duplex,
672: and control information passes through them as well as data.
673: They differ from the usual pseudo-terminal files
674: |reference(bsdmanual)
675: by not having any of the usual
676: terminal processing mechanisms inherently attached to them;
677: they are pure transmitters of control and data messages.
678: .SM PT
679: files are adequate for setting up
680: a reasonably general mechanism for explicit process communication,
681: but by themselves are not especially interesting.*
682: .FS
683: *More recent research systems (Ninth and Tenth editions) drop
684: .SM PT
685: files altogether.
686: They use pipes where the name of the connection
687: is not needed, and `mounted streams' where a name is needed.
688: See |reference(latest presotto ipc).
689: .FE
690: .1C
691: .KF bottom
692: .PS
693: box invis height 2*boxht "device" "process"
694: move up boxht/2
695: arrow right linewid/2
696: move up boxht/2
697: line right boxwid/2 then down boxht then left boxwid/2
698: move by (boxwid/2, boxht/2)
699: arrow right
700: box "mesg"; arrow
701: line up boxht/2
702: line right boxwid/2
703: move down boxht
704: line invis left boxwid/4 "pt"
705: line left boxwid/4 then up boxht/2
706: move right boxwid/2
707: spline -> right boxwid then down 4*boxht then left boxwid
708: move down boxht/2
709: line left boxwid/2 then up boxht
710: line right boxwid/4
711: line invis right boxwid/4 "pt"
712: move by (-boxwid/2, -boxht/2)
713: arrow left
714: box "tty in"; arrow
715: move up boxht/2
716: line down boxht then left boxwid/2
717: move up boxht/2
718: arrow left linewid/2
719: move up boxht/2
720: left
721: box invis height boxht*2 "user" "process"
722: move up boxht/2
723: move right boxwid
724: arrow right linewid/2
725: move up boxht/2
726: line right boxwid/2 then down boxht then left boxwid/2
727: move by (boxwid/2, boxht/2)
728: arrow right
729: box "tty out"
730: arrow right
731: move down boxht/2
732: line up boxht then right boxwid/2
733: move down boxht/2
734: spline -> right boxwid/2 then up 2*boxht then left boxwid/2
735: move down boxht/2
736: line left boxwid/2 then up boxht
737: move down boxht/2
738: arrow left
739: box "mesg"
740: arrow left
741: move up boxht/2
742: line down boxht then left boxwid/2
743: move up boxht/2
744: arrow left linewid/2
745: .PE
746: .sp .5
747: .ce
748: Figure 7. Configuration for device simulator.
749: .sp
750: .KE
751: .2C
752: .PP
753: A special
754: .I message
755: module provides more intriguing possibilities.
756: In one direction,
757: the message processor takes control and data messages, such as those discussed
758: above, and transforms them into data blocks starting with a header giving
759: the message type, and followed by the message content.
760: In the other direction, it parses similarly-structured data messages
761: and creates the corresponding control blocks.
762: Figure 7 shows a configuration in which
763: a user process communicates through the terminal module, a
764: .SM PT
765: file pair, and the message module with another user-level process
766: that simulates a device driver.
767: Because
768: .SM PT
769: files are transparent, and the message module
770: maps bijectively between device-process data and stream control messages,
771: the device simulator may be completely faithful up to details
772: of timing.
773: In particular, user's
774: .I ioctl
775: requests are sent to the device process and are handled by it,
776: even if they are not understood by the operating system.
777: .PP
778: The usefulness of this setup is not so much to simulate new
779: devices, but to provide
780: ways for one program to control the environment
781: of another.
782: Pike|reference(blit bstj)
783: shows how these mechanisms are used to create
784: multiple virtual terminals on one physical terminal.
785: In another application, inter-machine connections in which
786: a user on one computer logs into another make use of the message module.
787: Here the
788: .I ioctl
789: requests generated by programs on the remote machine
790: are translated by this module into data messages that can be sent over
791: the network.
792: The local callout program translates them back into terminal control commands.
793: ......
794: .NH
795: Evaluation
796: .PP
797: My intent in rewriting the character I/O system
798: was to improve its structure by separating functions
799: that had been intertwined, and by allowing independent modules
800: to be connected dynamically across well-defined interfaces.
801: I also wanted to make the system faster and smaller.
802: The most difficult part of the project was the design of the interface.
803: It was guided by these decisions:
804: .IP 1)
805: It seemed to be necessary for efficiency
806: that the objects passed between modules be references to blocks of data.
807: The most important consequences of this principle, and those
808: that proved deciding, are that data need not be copied as it passes
809: across a module interface, and that many characters can be handled
810: during a single intermodule transmission.
811: Another effect, undesirable but accepted, is that each module
812: must be prepared to handle discrete chunks of data of unpredictable size.
813: For example, a protocol that expects records containing (say)
814: an 8-byte header must be prepared to paste together smaller data blocks
815: and split a block containing both a header and following data.
816: A related, although not necessarily consequent, decision was to make
817: the code assume that the data is addressable.
818: .IP 2)
819: I decided, with regret, that each processing module
820: could not act as an independent process with its own call record.
821: The numbers seemed against it: on large systems it is necessary
822: to allow for as many as 1000 queues, and I saw no good way
823: to run this many processes without consuming inordinate amounts
824: of storage.
825: As a result,
826: stream server procedures are not allowed to block awaiting data,
827: but instead must return after saving necessary status information
828: explicitly.
829: The contortions required in the code are seldom serious in practice,
830: but the beauty of the scheme would increase if servers
831: could be written as a simple read-write loop in the true coroutine style.
832: .IP 3)
833: The characteristic feature of the design\-the server and put
834: procedures\-was the most difficult to work out.
835: I began with a belief that the intermodule interface
836: should be identical in the read and write directions.
837: Next, I observed that a pure call model (put procedure only)
838: would not work;
839: queueing would be necessary at some point.
840: For example, if the
841: .I write
842: system entry called through the terminal processing module to the device
843: driver, the driver would need to queue characters internally lest output
844: be completely synchronous.
845: On the other hand, a pure queueing model (service procedure only;
846: upstream modules always place their data in an input queue)
847: also appeared impractical.
848: As discussed above, a module (for example terminal input)
849: must often be activated at times that depend on its input data.
850: .PP
851: After considerable churning of details, the model presented here emerged.
852: In general its performance by various measures lives up to hopes.
853: .PP
854: The improvement in modularity is hard to measure,
855: but seems real;
856: for example, the number of
857: included header
858: files in stream modules drops to about one half of those required
859: by similar routines in the base system
860: (4.1 BSD).
861: Certainly stream modules may be composed more freely
862: than were the ``line disciplines'' of older systems.
863: .PP
864: The program text size of the version of the operating
865: system described here is about 106 kilobytes on the VAX;
866: the base system was about 130KB.
867: The reduction was achieved by rewriting the various device drivers
868: and protocols and eliminating the Seventh Edition
869: multiplexed files
870: |reference(v7manual)
871: , most (though not all)
872: of whose functions are subsumed by other mechanisms.
873: On the other hand, the data space has increased.
874: On a VAX 11/750 configured for 32 users
875: about 32KB are used for storage of the structures for streams,
876: queues, and blocks.
877: The traditional character lists seem to require less;
878: similar systems from Berkeley and AT&T use between 14 and 19KB.
879: The tradeoff of program for data seems desirable.
880: .PP
881: Proper time comparisons
882: have not been made,
883: because of the difficulty of finding a comparable configuration.
884: On a VAX 11/750,
885: printing a large file on a directly-connected terminal
886: consumes 346 microseconds per character using the system described here;
887: this is about 10 per cent slower than the base system.
888: On the other hand, that system's per-character interrupt routine is
889: coded in assembly language, and the rest of its terminal handler
890: is replete with nonportable
891: interpolated assembly code;
892: the current system is written completely in C.
893: Printing the same file on a terminal connected through a primitive
894: network interface requires 136 microseconds per character,
895: half as much as the older network routines.
896: Pike
897: |reference(blit bstj)
898: observes that among the three implementations of Blit connection software,
899: the one based on the stream system is the only one that can download programs
900: at anything approaching line speed through a 19.2 Kbps connection.
901: In general I conclude that the new organization
902: never slows comparable tasks much, and that considerable speed improvements
903: are sometimes possible.
904: .PP
905: Although the new organization performs well, it has several peculiarities
906: and limitations.
907: Some of them seem inherent, some are fixable,
908: and some are the subject of current work.
909: .PP
910: I/O control calls turn into messages that require answers
911: before a result can be returned to the user.
912: Sometimes the message ultimately goes to another user-level process
913: that may reply tardily or never.
914: The stream is write-locked until the reply returns, in order
915: to eliminate the need to determine which process gets which reply.
916: A timeout breaks the lock,
917: so there is an unjustified error return if a reply
918: is late, and a long lockup period if one is lost.
919: The problem can be ameliorated by working harder on it, but
920: it typifies the difficulties that turn up when direct calls are replaced
921: by message-passing schemes.
922: .PP
923: Several oddities appear because time spent in server routines
924: cannot be assigned to any particular user or process.
925: It is impossible, for example, for devices to support privileged
926: .I ioctl
927: calls, because the device has no idea who generated the message.
928: Accounting and scheduling become less accurate;
929: a short census of several systems showed that between 4 and 8 per cent
930: of non-idle CPU time was being spent in server routines.
931: Finally, the anonymity of server processing most certainly makes
932: it more difficult to measure the performance of the new I/O system.
933: .PP
934: In its current form the stream I/O system is purely data-driven.
935: That is, data is presented by a user's
936: .I write
937: call, and passes through to the device; conversely, data appears unbidden
938: from a device and passes to the top level, where it is picked up by
939: .I read
940: calls.
941: Wherever possible flow control throttles down fast
942: generators of data, but nowhere except at the consumer end of a stream
943: is there knowledge of precisely how much data is desired.
944: Consider a command to execute possibly interactive program on another
945: machine connected by a stream.
946: The simplest such command sets up the connection
947: and invokes the remote program, and then copies characters from its
948: own standard input to the stream, and from the stream to its standard
949: output.
950: The scheme is adequate in practice,
951: but breaks when the user types more than the remote program expects.
952: For example, if the remote program reads no input at all,
953: any typed-ahead characters are sent to the remote system and lost.
954: This demonstrates a problem,
955: but I know of no solution inside the stream I/O mechanism itself;
956: other ideas will have to be applied.
957: .PP
958: Streams are linear connections; by themselves, they support no notion
959: of multiplexing, fan-in or fan-out.
960: Except at the ends of a stream, each invocation of a module has
961: a unique ``next'' and ``previous'' module.
962: Two locally-important applications
963: of streams testify to the importance of multiplexing:
964: Blit terminal connections
965: |reference(blit bstj)
966: , where the multiplexing is done well, though at some performance cost,
967: by a user program,
968: and remote execution of commands over a network,
969: where it is desired, but not now easy,
970: to separate the standard output from error output.
971: It seems likely that a general multiplexing mechanism
972: could help in both cases, but again, I do not yet know how to
973: design it.*
974: .FS
975: *Subsequent Research versions of the system still do not
976: have a general stream-multiplexing scheme built into the
977: kernel, while the System V streams implementations do.
978: We have not found any applications that suffer from the
979: lack of stream-multiplexing.
980: .FE
981: .PP
982: Although the current design provides elegant means for controlling
983: the semantics of communication channels already opened, it lacks
984: general ways of establishing channels between processes.
985: The
986: .SM PT
987: files described above are just fine for
988: Blit layers,
989: and work adequately for handling a few administrator-controlled
990: client-server relationships.
991: (Yes, we have multi-machine mazewar.)
992: Nevertheless, better naming mechanisms are called for.\(dg
993: .FS
994: \(dgSee |reference(latest presotto ipc)
995: for our current ideas.
996: .FE
997: .PP
998: In spite of these limitations, the stream I/O system works well.
999: Its aim was to improve design rather than to add features,
1000: in the belief that with proper design, the features come cheaply.
1001: This approach is arduous, but continues to succeed.
1002: .NH
1003: References
1004: .LP
1005: |reference_placement
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.