|
|
1.1 root 1: .\" Copyright (c) 1986 The Regents of the University of California.
2: .\" All rights reserved.
3: .\"
4: .\" Redistribution and use in source and binary forms are permitted
5: .\" provided that the above copyright notice and this paragraph are
6: .\" duplicated in all such forms and that any documentation,
7: .\" advertising materials, and other materials related to such
8: .\" distribution and use acknowledge that the software was developed
9: .\" by the University of California, Berkeley. The name of the
10: .\" University may not be used to endorse or promote products derived
11: .\" from this software without specific prior written permission.
12: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
13: .\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
14: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
15: .\"
16: .\" @(#)5.t 1.6 (Berkeley) 3/7/89
17: .\"
18: .\".ds RH "Advanced Topics
19: .bp
20: .nr H1 5
21: .nr H2 0
22: .LG
23: .B
24: .ce
25: 5. ADVANCED TOPICS
26: .sp 2
27: .R
28: .NL
29: .PP
30: A number of facilities have yet to be discussed. For most users
31: of the IPC the mechanisms already
32: described will suffice in constructing distributed
33: applications. However, others will find the need to utilize some
34: of the features which we consider in this section.
35: .NH 2
36: Out of band data
37: .PP
38: The stream socket abstraction includes the notion of \*(lqout
39: of band\*(rq data. Out of band data is a logically independent
40: transmission channel associated with each pair of connected
41: stream sockets. Out of band data is delivered to the user
42: independently of normal data.
43: The abstraction defines that the out of band data facilities
44: must support the reliable delivery of at least one
45: out of band message at a time. This message may contain at least one
46: byte of data, and at least one message may be pending delivery
47: to the user at any one time. For communications protocols which
48: support only in-band signaling (i.e. the urgent data is
49: delivered in sequence with the normal data), the system normally extracts
50: the data from the normal data stream and stores it separately.
51: This allows users to choose between receiving the urgent data
52: in order and receiving it out of sequence without having to
53: buffer all the intervening data. It is possible
54: to ``peek'' (via MSG_PEEK) at out of band data.
55: If the socket has a process group, a SIGURG signal is generated
56: when the protocol is notified of its existence.
57: A process can set the process group
58: or process id to be informed by the SIGURG signal via the
59: appropriate \fIfcntl\fP call, as described below for
60: SIGIO.
61: If multiple sockets may have out of band data awaiting
62: delivery, a \fIselect\fP call for exceptional conditions
63: may be used to determine those sockets with such data pending.
64: Neither the signal nor the select indicate the actual arrival
65: of the out-of-band data, but only notification that it is pending.
66: .PP
67: In addition to the information passed, a logical mark is placed in
68: the data stream to indicate the point at which the out
69: of band data was sent. The remote login and remote shell
70: applications use this facility to propagate signals between
71: client and server processes. When a signal
72: flushs any pending output from the remote process(es), all
73: data up to the mark in the data stream is discarded.
74: .PP
75: To send an out of band message the MSG_OOB flag is supplied to
76: a \fIsend\fP or \fIsendto\fP calls,
77: while to receive out of band data MSG_OOB should be indicated
78: when performing a \fIrecvfrom\fP or \fIrecv\fP call.
79: To find out if the read pointer is currently pointing at
80: the mark in the data stream, the SIOCATMARK ioctl is provided:
81: .DS
82: ioctl(s, SIOCATMARK, &yes);
83: .DE
84: If \fIyes\fP is a 1 on return, the next read will return data
85: after the mark. Otherwise (assuming out of band data has arrived),
86: the next read will provide data sent by the client prior
87: to transmission of the out of band signal. The routine used
88: in the remote login process to flush output on receipt of an
89: interrupt or quit signal is shown in Figure 5.
90: It reads the normal data up to the mark (to discard it),
91: then reads the out-of-band byte.
92: .KF
93: .DS
94: #include <sys/ioctl.h>
95: #include <sys/file.h>
96: ...
97: oob()
98: {
99: int out = FWRITE, mark;
100: char waste[BUFSIZ];
101:
102: /* flush local terminal output */
103: ioctl(1, TIOCFLUSH, (char *)&out);
104: for (;;) {
105: if (ioctl(rem, SIOCATMARK, &mark) < 0) {
106: perror("ioctl");
107: break;
108: }
109: if (mark)
110: break;
111: (void) read(rem, waste, sizeof (waste));
112: }
113: if (recv(rem, &mark, 1, MSG_OOB) < 0) {
114: perror("recv");
115: ...
116: }
117: ...
118: }
119: .DE
120: .ce
121: Figure 5. Flushing terminal I/O on receipt of out of band data.
122: .sp
123: .KE
124: .PP
125: A process may also read or peek at the out-of-band data
126: without first reading up to the mark.
127: This is more difficult when the underlying protocol delivers
128: the urgent data in-band with the normal data, and only sends
129: notification of its presence ahead of time (e.g., the TCP protocol
130: used to implement streams in the Internet domain).
131: With such protocols, the out-of-band byte may not yet have arrived
132: when a \fIrecv\fP is done with the MSG_OOB flag.
133: In that case, the call will return an error of EWOULDBLOCK.
134: Worse, there may be enough in-band data in the input buffer
135: that normal flow control prevents the peer from sending the urgent data
136: until the buffer is cleared.
137: The process must then read enough of the queued data
138: that the urgent data may be delivered.
139: .PP
140: Certain programs that use multiple bytes of urgent data and must
141: handle multiple urgent signals (e.g., \fItelnet\fP\|(1C))
142: need to retain the position of urgent data within the stream.
143: This treatment is available as a socket-level option, SO_OOBINLINE;
144: see \fIsetsockopt\fP\|(2) for usage.
145: With this option, the position of urgent data (the \*(lqmark\*(rq)
146: is retained, but the urgent data immediately follows the mark
147: within the normal data stream returned without the MSG_OOB flag.
148: Reception of multiple urgent indications causes the mark to move,
149: but no out-of-band data are lost.
150: .NH 2
151: Non-Blocking Sockets
152: .PP
153: It is occasionally convenient to make use of sockets
154: which do not block; that is, I/O requests which
155: cannot complete immediately and
156: would therefore cause the process to be suspended awaiting completion are
157: not executed, and an error code is returned.
158: Once a socket has been created via
159: the \fIsocket\fP call, it may be marked as non-blocking
160: by \fIfcntl\fP as follows:
161: .DS
162: #include <fcntl.h>
163: ...
164: int s;
165: ...
166: s = socket(AF_INET, SOCK_STREAM, 0);
167: ...
168: if (fcntl(s, F_SETFL, FNDELAY) < 0)
169: perror("fcntl F_SETFL, FNDELAY");
170: exit(1);
171: }
172: ...
173: .DE
174: .PP
175: When performing non-blocking I/O on sockets, one must be
176: careful to check for the error EWOULDBLOCK (stored in the
177: global variable \fIerrno\fP), which occurs when
178: an operation would normally block, but the socket it
179: was performed on is marked as non-blocking.
180: In particular, \fIaccept\fP, \fIconnect\fP, \fIsend\fP, \fIrecv\fP,
181: \fIread\fP, and \fIwrite\fP can
182: all return EWOULDBLOCK, and processes should be prepared
183: to deal with such return codes.
184: If an operation such as a \fIsend\fP cannot be done in its entirety,
185: but partial writes are sensible (for example, when using a stream socket),
186: the data that can be sent immediately will be processed,
187: and the return value will indicate the amount actually sent.
188: .NH 2
189: Interrupt driven socket I/O
190: .PP
191: The SIGIO signal allows a process to be notified
192: via a signal when a socket (or more generally, a file
193: descriptor) has data waiting to be read. Use of
194: the SIGIO facility requires three steps: First,
195: the process must set up a SIGIO signal handler
196: by use of the \fIsignal\fP or \fIsigvec\fP calls. Second,
197: it must set the process id or process group id which is to receive
198: notification of pending input to its own process id,
199: or the process group id of its process group (note that
200: the default process group of a socket is group zero).
201: This is accomplished by use of an \fIfcntl\fP call.
202: Third, it must enable asynchronous notification of pending I/O requests
203: with another \fIfcntl\fP call. Sample code to
204: allow a given process to receive information on
205: pending I/O requests as they occur for a socket \fIs\fP
206: is given in Figure 6. With the addition of a handler for SIGURG,
207: this code can also be used to prepare for receipt of SIGURG signals.
208: .KF
209: .DS
210: #include <fcntl.h>
211: ...
212: int io_handler();
213: ...
214: signal(SIGIO, io_handler);
215:
216: /* Set the process receiving SIGIO/SIGURG signals to us */
217:
218: if (fcntl(s, F_SETOWN, getpid()) < 0) {
219: perror("fcntl F_SETOWN");
220: exit(1);
221: }
222:
223: /* Allow receipt of asynchronous I/O signals */
224:
225: if (fcntl(s, F_SETFL, FASYNC) < 0) {
226: perror("fcntl F_SETFL, FASYNC");
227: exit(1);
228: }
229: .DE
230: .ce
231: Figure 6. Use of asynchronous notification of I/O requests.
232: .sp
233: .KE
234: .NH 2
235: Signals and process groups
236: .PP
237: Due to the existence of the SIGURG and SIGIO signals each socket has an
238: associated process number, just as is done for terminals.
239: This value is initialized to zero,
240: but may be redefined at a later time with the F_SETOWN
241: \fIfcntl\fP, such as was done in the code above for SIGIO.
242: To set the socket's process id for signals, positive arguments
243: should be given to the \fIfcntl\fP call. To set the socket's
244: process group for signals, negative arguments should be
245: passed to \fIfcntl\fP. Note that the process number indicates
246: either the associated process id or the associated process
247: group; it is impossible to specify both at the same time.
248: A similar \fIfcntl\fP, F_GETOWN, is available for determining the
249: current process number of a socket.
250: .PP
251: Another signal which is useful when constructing server processes
252: is SIGCHLD. This signal is delivered to a process when any
253: child processes have changed state. Normally servers use
254: the signal to \*(lqreap\*(rq child processes that have exited
255: without explicitly awaiting their termination
256: or periodic polling for exit status.
257: For example, the remote login server loop shown in Figure 2
258: may be augmented as shown in Figure 7.
259: .KF
260: .DS
261: int reaper();
262: ...
263: signal(SIGCHLD, reaper);
264: listen(f, 5);
265: for (;;) {
266: int g, len = sizeof (from);
267:
268: g = accept(f, (struct sockaddr *)&from, &len,);
269: if (g < 0) {
270: if (errno != EINTR)
271: syslog(LOG_ERR, "rlogind: accept: %m");
272: continue;
273: }
274: ...
275: }
276: ...
277: #include <wait.h>
278: reaper()
279: {
280: union wait status;
281:
282: while (wait3(&status, WNOHANG, 0) > 0)
283: ;
284: }
285: .DE
286: .sp
287: .ce
288: Figure 7. Use of the SIGCHLD signal.
289: .sp
290: .KE
291: .PP
292: If the parent server process fails to reap its children,
293: a large number of \*(lqzombie\*(rq processes may be created.
294: .NH 2
295: Pseudo terminals
296: .PP
297: Many programs will not function properly without a terminal
298: for standard input and output. Since sockets do not provide
299: the semantics of terminals,
300: it is often necessary to have a process communicating over
301: the network do so through a \fIpseudo-terminal\fP. A pseudo-
302: terminal is actually a pair of devices, master and slave,
303: which allow a process to serve as an active agent in communication
304: between processes and users. Data written on the slave side
305: of a pseudo-terminal is supplied as input to a process reading
306: from the master side, while data written on the master side are
307: processed as terminal input for the slave.
308: In this way, the process manipulating
309: the master side of the pseudo-terminal has control over the
310: information read and written on the slave side
311: as if it were manipulating the keyboard and reading the screen
312: on a real terminal.
313: The purpose of this abstraction is to
314: preserve terminal semantics over a network connection\(em
315: that is, the slave side appears as a normal terminal to
316: any process reading from or writing to it.
317: .PP
318: For example, the remote
319: login server uses pseudo-terminals for remote login sessions.
320: A user logging in to a machine across the network is provided
321: a shell with a slave pseudo-terminal as standard input, output,
322: and error. The server process then handles the communication
323: between the programs invoked by the remote shell and the user's
324: local client process.
325: When a user sends a character that generates an interrupt
326: on the remote machine that flushes terminal output,
327: the pseudo-terminal generates a control message for the server process.
328: The server then sends an out of band message
329: to the client process to signal a flush of data at the real terminal
330: and on the intervening data buffered in the network.
331: .PP
332: Under 4.3BSD, the name of the slave side of a pseudo-terminal is of the form
333: \fI/dev/ttyxy\fP, where \fIx\fP is a single letter
334: starting at `p' and continuing to `t'.
335: \fIy\fP is a hexadecimal digit (i.e., a single
336: character in the range 0 through 9 or `a' through `f').
337: The master side of a pseudo-terminal is \fI/dev/ptyxy\fP,
338: where \fIx\fP and \fIy\fP correspond to the
339: slave side of the pseudo-terminal.
340: .PP
341: In general, the method of obtaining a pair of master and
342: slave pseudo-terminals is to
343: find a pseudo-terminal which
344: is not currently in use.
345: The master half of a pseudo-terminal is a single-open device;
346: thus, each master may be opened in turn until an open succeeds.
347: The slave side of the pseudo-terminal is then opened,
348: and is set to the proper terminal modes if necessary.
349: The process then \fIfork\fPs; the child closes
350: the master side of the pseudo-terminal, and \fIexec\fPs the
351: appropriate program. Meanwhile, the parent closes the
352: slave side of the pseudo-terminal and begins reading and
353: writing from the master side. Sample code making use of
354: pseudo-terminals is given in Figure 8; this code assumes
355: that a connection on a socket \fIs\fP exists, connected
356: to a peer who wants a service of some kind, and that the
357: process has disassociated itself from any previous controlling terminal.
358: .KF
359: .DS
360: gotpty = 0;
361: for (c = 'p'; !gotpty && c <= 's'; c++) {
362: line = "/dev/ptyXX";
363: line[sizeof("/dev/pty")-1] = c;
364: line[sizeof("/dev/ptyp")-1] = '0';
365: if (stat(line, &statbuf) < 0)
366: break;
367: for (i = 0; i < 16; i++) {
368: line[sizeof("/dev/ptyp")-1] = "0123456789abcdef"[i];
369: master = open(line, O_RDWR);
370: if (master > 0) {
371: gotpty = 1;
372: break;
373: }
374: }
375: }
376: if (!gotpty) {
377: syslog(LOG_ERR, "All network ports in use");
378: exit(1);
379: }
380:
381: line[sizeof("/dev/")-1] = 't';
382: slave = open(line, O_RDWR); /* \fIslave\fP is now slave side */
383: if (slave < 0) {
384: syslog(LOG_ERR, "Cannot open slave pty %s", line);
385: exit(1);
386: }
387:
388: ioctl(slave, TIOCGETP, &b); /* Set slave tty modes */
389: b.sg_flags = CRMOD|XTABS|ANYP;
390: ioctl(slave, TIOCSETP, &b);
391:
392: i = fork();
393: if (i < 0) {
394: syslog(LOG_ERR, "fork: %m");
395: exit(1);
396: } else if (i) { /* Parent */
397: close(slave);
398: ...
399: } else { /* Child */
400: (void) close(s);
401: (void) close(master);
402: dup2(slave, 0);
403: dup2(slave, 1);
404: dup2(slave, 2);
405: if (slave > 2)
406: (void) close(slave);
407: ...
408: }
409: .DE
410: .ce
411: Figure 8. Creation and use of a pseudo terminal
412: .sp
413: .KE
414: .NH 2
415: Selecting specific protocols
416: .PP
417: If the third argument to the \fIsocket\fP call is 0,
418: \fIsocket\fP will select a default protocol to use with
419: the returned socket of the type requested.
420: The default protocol is usually correct, and alternate choices are not
421: usually available.
422: However, when using ``raw'' sockets to communicate directly with
423: lower-level protocols or hardware interfaces,
424: the protocol argument may be important for setting up demultiplexing.
425: For example, raw sockets in the Internet family may be used to implement
426: a new protocol above IP, and the socket will receive packets
427: only for the protocol specified.
428: To obtain a particular protocol one determines the protocol number
429: as defined within the communication domain. For the Internet
430: domain one may use one of the library routines
431: discussed in section 3, such as \fIgetprotobyname\fP:
432: .DS
433: #include <sys/types.h>
434: #include <sys/socket.h>
435: #include <netinet/in.h>
436: #include <netdb.h>
437: ...
438: pp = getprotobyname("newtcp");
439: s = socket(AF_INET, SOCK_STREAM, pp->p_proto);
440: .DE
441: This would result in a socket \fIs\fP using a stream
442: based connection, but with protocol type of ``newtcp''
443: instead of the default ``tcp.''
444: .PP
445: In the NS domain, the available socket protocols are defined in
446: <\fInetns/ns.h\fP>. To create a raw socket for Xerox Error Protocol
447: messages, one might use:
448: .DS
449: #include <sys/types.h>
450: #include <sys/socket.h>
451: #include <netns/ns.h>
452: ...
453: s = socket(AF_NS, SOCK_RAW, NSPROTO_ERROR);
454: .DE
455: .NH 2
456: Address binding
457: .PP
458: As was mentioned in section 2,
459: binding addresses to sockets in the Internet and NS domains can be
460: fairly complex. As a brief reminder, these associations
461: are composed of local and foreign
462: addresses, and local and foreign ports. Port numbers are
463: allocated out of separate spaces, one for each system and one
464: for each domain on that system.
465: Through the \fIbind\fP system call, a
466: process may specify half of an association, the
467: <local address, local port> part, while the
468: \fIconnect\fP
469: and \fIaccept\fP
470: primitives are used to complete a socket's association by
471: specifying the <foreign address, foreign port> part.
472: Since the association is created in two steps the association
473: uniqueness requirement indicated previously could be violated unless
474: care is taken. Further, it is unrealistic to expect user
475: programs to always know proper values to use for the local address
476: and local port since a host may reside on multiple networks and
477: the set of allocated port numbers is not directly accessible
478: to a user.
479: .PP
480: To simplify local address binding in the Internet domain the notion of a
481: \*(lqwildcard\*(rq address has been provided. When an address
482: is specified as INADDR_ANY (a manifest constant defined in
483: <netinet/in.h>), the system interprets the address as
484: \*(lqany valid address\*(rq. For example, to bind a specific
485: port number to a socket, but leave the local address unspecified,
486: the following code might be used:
487: .DS
488: #include <sys/types.h>
489: #include <netinet/in.h>
490: ...
491: struct sockaddr_in sin;
492: ...
493: s = socket(AF_INET, SOCK_STREAM, 0);
494: sin.sin_family = AF_INET;
495: sin.sin_addr.s_addr = htonl(INADDR_ANY);
496: sin.sin_port = htons(MYPORT);
497: bind(s, (struct sockaddr *) &sin, sizeof (sin));
498: .DE
499: Sockets with wildcarded local addresses may receive messages
500: directed to the specified port number, and sent to any
501: of the possible addresses assigned to a host. For example,
502: if a host has addresses 128.32.0.4 and 10.0.0.78, and a socket is bound as
503: above, the process will be
504: able to accept connection requests which are addressed to
505: 128.32.0.4 or 10.0.0.78.
506: If a server process wished to only allow hosts on a
507: given network connect to it, it would bind
508: the address of the host on the appropriate network.
509: .PP
510: In a similar fashion, a local port may be left unspecified
511: (specified as zero), in which case the system will select an
512: appropriate port number for it. This shortcut will work
513: both in the Internet and NS domains. For example, to
514: bind a specific local address to a socket, but to leave the
515: local port number unspecified:
516: .DS
517: hp = gethostbyname(hostname);
518: if (hp == NULL) {
519: ...
520: }
521: bcopy(hp->h_addr, (char *) sin.sin_addr, hp->h_length);
522: sin.sin_port = htons(0);
523: bind(s, (struct sockaddr *) &sin, sizeof (sin));
524: .DE
525: The system selects the local port number based on two criteria.
526: The first is that on 4BSD systems,
527: Internet ports below IPPORT_RESERVED (1024) (for the Xerox domain,
528: 0 through 3000) are reserved
529: for privileged users (i.e., the super user);
530: Internet ports above IPPORT_USERRESERVED (50000) are reserved
531: for non-privileged servers. The second is
532: that the port number is not currently bound to some other
533: socket. In order to find a free Internet port number in the privileged
534: range the \fIrresvport\fP library routine may be used as follows
535: to return a stream socket in with a privileged port number:
536: .DS
537: int lport = IPPORT_RESERVED \- 1;
538: int s;
539: ...
540: s = rresvport(&lport);
541: if (s < 0) {
542: if (errno == EAGAIN)
543: fprintf(stderr, "socket: all ports in use\en");
544: else
545: perror("rresvport: socket");
546: ...
547: }
548: .DE
549: The restriction on allocating ports was done to allow processes
550: executing in a \*(lqsecure\*(rq environment to perform authentication
551: based on the originating address and port number. For example,
552: the \fIrlogin\fP(1) command allows users to log in across a network
553: without being asked for a password, if two conditions hold:
554: First, the name of the system the user
555: is logging in from is in the file
556: \fI/etc/hosts.equiv\fP on the system he is logging
557: in to (or the system name and the user name are in
558: the user's \fI.rhosts\fP file in the user's home
559: directory), and second, that the user's rlogin
560: process is coming from a privileged port on the machine from which he is
561: logging. The port number and network address of the
562: machine from which the user is logging in can be determined either
563: by the \fIfrom\fP result of the \fIaccept\fP call, or
564: from the \fIgetpeername\fP call.
565: .PP
566: In certain cases the algorithm used by the system in selecting
567: port numbers is unsuitable for an application. This is because
568: associations are created in a two step process. For example,
569: the Internet file transfer protocol, FTP, specifies that data
570: connections must always originate from the same local port. However,
571: duplicate associations are avoided by connecting to different foreign
572: ports. In this situation the system would disallow binding the
573: same local address and port number to a socket if a previous data
574: connection's socket still existed. To override the default port
575: selection algorithm, an option call must be performed prior
576: to address binding:
577: .DS
578: ...
579: int on = 1;
580: ...
581: setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
582: bind(s, (struct sockaddr *) &sin, sizeof (sin));
583: .DE
584: With the above call, local addresses may be bound which
585: are already in use. This does not violate the uniqueness
586: requirement as the system still checks at connect time to
587: be sure any other sockets with the same local address and
588: port do not have the same foreign address and port.
589: If the association already exists, the error EADDRINUSE is returned.
590: .NH 2
591: Broadcasting and determining network configuration
592: .PP
593: By using a datagram socket, it is possible to send broadcast
594: packets on many networks supported by the system.
595: The network itself must support broadcast; the system
596: provides no simulation of broadcast in software.
597: Broadcast messages can place a high load on a network since they force
598: every host on the network to service them. Consequently,
599: the ability to send broadcast packets has been limited
600: to sockets which are explicitly marked as allowing broadcasting.
601: Broadcast is typically used for one of two reasons:
602: it is desired to find a resource on a local network without prior
603: knowledge of its address,
604: or important functions such as routing require that information
605: be sent to all accessible neighbors.
606: .PP
607: To send a broadcast message, a datagram socket
608: should be created:
609: .DS
610: s = socket(AF_INET, SOCK_DGRAM, 0);
611: .DE
612: or
613: .DS
614: s = socket(AF_NS, SOCK_DGRAM, 0);
615: .DE
616: The socket is marked as allowing broadcasting,
617: .DS
618: int on = 1;
619:
620: setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof (on));
621: .DE
622: and at least a port number should be bound to the socket:
623: .DS
624: sin.sin_family = AF_INET;
625: sin.sin_addr.s_addr = htonl(INADDR_ANY);
626: sin.sin_port = htons(MYPORT);
627: bind(s, (struct sockaddr *) &sin, sizeof (sin));
628: .DE
629: or, for the NS domain,
630: .DS
631: sns.sns_family = AF_NS;
632: netnum = htonl(net);
633: sns.sns_addr.x_net = *(union ns_net *) &netnum; /* insert net number */
634: sns.sns_addr.x_port = htons(MYPORT);
635: bind(s, (struct sockaddr *) &sns, sizeof (sns));
636: .DE
637: The destination address of the message to be broadcast
638: depends on the network(s) on which the message is to be broadcast.
639: The Internet domain supports a shorthand notation for broadcast
640: on the local network, the address INADDR_BROADCAST (defined in
641: <\fInetinet/in.h\fP>.
642: To determine the list of addresses for all reachable neighbors
643: requires knowledge of the networks to which the host is connected.
644: Since this information should
645: be obtained in a host-independent fashion and may be impossible
646: to derive, 4.3BSD provides a method of
647: retrieving this information from the system data structures.
648: The SIOCGIFCONF \fIioctl\fP call returns the interface
649: configuration of a host in the form of a
650: single \fIifconf\fP structure; this structure contains
651: a ``data area'' which is made up of an array of
652: of \fIifreq\fP structures, one for each network interface
653: to which the host is connected.
654: These structures are defined in
655: \fI<net/if.h>\fP as follows:
656: .DS
657: .if t .ta .5i 1.0i 1.5i 3.5i
658: .if n .ta .7i 1.4i 2.1i 3.4i
659: struct ifconf {
660: int ifc_len; /* size of associated buffer */
661: union {
662: caddr_t ifcu_buf;
663: struct ifreq *ifcu_req;
664: } ifc_ifcu;
665: };
666:
667: #define ifc_buf ifc_ifcu.ifcu_buf /* buffer address */
668: #define ifc_req ifc_ifcu.ifcu_req /* array of structures returned */
669:
670: #define IFNAMSIZ 16
671:
672: struct ifreq {
673: char ifr_name[IFNAMSIZ]; /* if name, e.g. "en0" */
674: union {
675: struct sockaddr ifru_addr;
676: struct sockaddr ifru_dstaddr;
677: struct sockaddr ifru_broadaddr;
678: short ifru_flags;
679: caddr_t ifru_data;
680: } ifr_ifru;
681: };
682:
683: .if t .ta \w' #define'u +\w' ifr_broadaddr'u +\w' ifr_ifru.ifru_broadaddr'u
684: #define ifr_addr ifr_ifru.ifru_addr /* address */
685: #define ifr_dstaddr ifr_ifru.ifru_dstaddr /* other end of p-to-p link */
686: #define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */
687: #define ifr_flags ifr_ifru.ifru_flags /* flags */
688: #define ifr_data ifr_ifru.ifru_data /* for use by interface */
689: .DE
690: The actual call which obtains the
691: interface configuration is
692: .DS
693: struct ifconf ifc;
694: char buf[BUFSIZ];
695:
696: ifc.ifc_len = sizeof (buf);
697: ifc.ifc_buf = buf;
698: if (ioctl(s, SIOCGIFCONF, (char *) &ifc) < 0) {
699: ...
700: }
701: .DE
702: After this call \fIbuf\fP will contain one \fIifreq\fP structure for
703: each network to which the host is connected, and
704: \fIifc.ifc_len\fP will have been modified to reflect the number
705: of bytes used by the \fIifreq\fP structures.
706: .PP
707: For each structure
708: there exists a set of ``interface flags'' which tell
709: whether the network corresponding to that interface is
710: up or down, point to point or broadcast, etc. The
711: SIOCGIFFLAGS \fIioctl\fP retrieves these
712: flags for an interface specified by an \fIifreq\fP
713: structure as follows:
714: .DS
715: struct ifreq *ifr;
716:
717: ifr = ifc.ifc_req;
718:
719: for (n = ifc.ifc_len / sizeof (struct ifreq); --n >= 0; ifr++) {
720: /*
721: * We must be careful that we don't use an interface
722: * devoted to an address family other than those intended;
723: * if we were interested in NS interfaces, the
724: * AF_INET would be AF_NS.
725: */
726: if (ifr->ifr_addr.sa_family != AF_INET)
727: continue;
728: if (ioctl(s, SIOCGIFFLAGS, (char *) ifr) < 0) {
729: ...
730: }
731: /*
732: * Skip boring cases.
733: */
734: if ((ifr->ifr_flags & IFF_UP) == 0 ||
735: (ifr->ifr_flags & IFF_LOOPBACK) ||
736: (ifr->ifr_flags & (IFF_BROADCAST | IFF_POINTTOPOINT)) == 0)
737: continue;
738: .DE
739: .PP
740: Once the flags have been obtained, the broadcast address
741: must be obtained. In the case of broadcast networks this is
742: done via the SIOCGIFBRDADDR \fIioctl\fP, while for point-to-point networks
743: the address of the destination host is obtained with SIOCGIFDSTADDR.
744: .DS
745: struct sockaddr dst;
746:
747: if (ifr->ifr_flags & IFF_POINTTOPOINT) {
748: if (ioctl(s, SIOCGIFDSTADDR, (char *) ifr) < 0) {
749: ...
750: }
751: bcopy((char *) ifr->ifr_dstaddr, (char *) &dst, sizeof (ifr->ifr_dstaddr));
752: } else if (ifr->ifr_flags & IFF_BROADCAST) {
753: if (ioctl(s, SIOCGIFBRDADDR, (char *) ifr) < 0) {
754: ...
755: }
756: bcopy((char *) ifr->ifr_broadaddr, (char *) &dst, sizeof (ifr->ifr_broadaddr));
757: }
758: .DE
759: .PP
760: After the appropriate \fIioctl\fP's have obtained the broadcast
761: or destination address (now in \fIdst\fP), the \fIsendto\fP call may be
762: used:
763: .DS
764: sendto(s, buf, buflen, 0, (struct sockaddr *)&dst, sizeof (dst));
765: }
766: .DE
767: In the above loop one \fIsendto\fP occurs for every
768: interface to which the host is connected that supports the notion of
769: broadcast or point-to-point addressing.
770: If a process only wished to send broadcast
771: messages on a given network, code similar to that outlined above
772: would be used, but the loop would need to find the
773: correct destination address.
774: .PP
775: Received broadcast messages contain the senders address
776: and port, as datagram sockets are bound before
777: a message is allowed to go out.
778: .NH 2
779: Socket Options
780: .PP
781: It is possible to set and get a number of options on sockets
782: via the \fIsetsockopt\fP and \fIgetsockopt\fP system calls.
783: These options include such things as marking a socket for
784: broadcasting, not to route, to linger on close, etc.
785: The general forms of the calls are:
786: .DS
787: setsockopt(s, level, optname, optval, optlen);
788: .DE
789: and
790: .DS
791: getsockopt(s, level, optname, optval, optlen);
792: .DE
793: .PP
794: The parameters to the calls are as follows: \fIs\fP
795: is the socket on which the option is to be applied.
796: \fILevel\fP specifies the protocol layer on which the
797: option is to be applied; in most cases this is
798: the ``socket level'', indicated by the symbolic constant
799: SOL_SOCKET, defined in \fI<sys/socket.h>.\fP
800: The actual option is specified in \fIoptname\fP, and is
801: a symbolic constant also defined in \fI<sys/socket.h>\fP.
802: \fIOptval\fP and \fIOptlen\fP point to the value of the
803: option (in most cases, whether the option is to be turned
804: on or off), and the length of the value of the option,
805: respectively.
806: For \fIgetsockopt\fP, \fIoptlen\fP is
807: a value-result parameter, initially set to the size of
808: the storage area pointed to by \fIoptval\fP, and modified
809: upon return to indicate the actual amount of storage used.
810: .PP
811: An example should help clarify things. It is sometimes
812: useful to determine the type (e.g., stream, datagram, etc.)
813: of an existing socket; programs
814: under \fIinetd\fP (described below) may need to perform this
815: task. This can be accomplished as follows via the
816: SO_TYPE socket option and the \fIgetsockopt\fP call:
817: .DS
818: #include <sys/types.h>
819: #include <sys/socket.h>
820:
821: int type, size;
822:
823: size = sizeof (int);
824:
825: if (getsockopt(s, SOL_SOCKET, SO_TYPE, (char *) &type, &size) < 0) {
826: ...
827: }
828: .DE
829: After the \fIgetsockopt\fP call, \fItype\fP will be set
830: to the value of the socket type, as defined in
831: \fI<sys/socket.h>\fP. If, for example, the socket were
832: a datagram socket, \fItype\fP would have the value
833: corresponding to SOCK_DGRAM.
834: .NH 2
835: NS Packet Sequences
836: .PP
837: The semantics of NS connections demand that
838: the user both be able to look inside the network header associated
839: with any incoming packet and be able to specify what should go
840: in certain fields of an outgoing packet.
841: Using different calls to \fIsetsockopt\fP, it is possible
842: to indicate whether prototype headers will be associated by
843: the user with each outgoing packet (SO_HEADERS_ON_OUTPUT),
844: to indicate whether the headers received by the system should be
845: delivered to the user (SO_HEADERS_ON_INPUT), or to indicate
846: default information that should be associated with all
847: outgoing packets on a given socket (SO_DEFAULT_HEADERS).
848: .PP
849: The contents of a SPP header (minus the IDP header) are:
850: .DS
851: .if t .ta \w" #define"u +\w" u_short"u +2.0i
852: struct sphdr {
853: u_char sp_cc; /* connection control */
854: #define SP_SP 0x80 /* system packet */
855: #define SP_SA 0x40 /* send acknowledgement */
856: #define SP_OB 0x20 /* attention (out of band data) */
857: #define SP_EM 0x10 /* end of message */
858: u_char sp_dt; /* datastream type */
859: u_short sp_sid; /* source connection identifier */
860: u_short sp_did; /* destination connection identifier */
861: u_short sp_seq; /* sequence number */
862: u_short sp_ack; /* acknowledge number */
863: u_short sp_alo; /* allocation number */
864: };
865: .DE
866: Here, the items of interest are the \fIdatastream type\fP and
867: the \fIconnection control\fP fields. The semantics of the
868: datastream type are defined by the application(s) in question;
869: the value of this field is, by default, zero, but it can be
870: used to indicate things such as Xerox's Bulk Data Transfer
871: Protocol (in which case it is set to one). The connection control
872: field is a mask of the flags defined just below it. The user may
873: set or clear the end-of-message bit to indicate
874: that a given message is the last of a given substream type,
875: or may set/clear the attention bit as an alternate way to
876: indicate that a packet should be sent out-of-band.
877: As an example, to associate prototype headers with outgoing
878: SPP packets, consider:
879: .DS
880: #include <sys/types.h>
881: #include <sys/socket.h>
882: #include <netns/ns.h>
883: #include <netns/sp.h>
884: ...
885: struct sockaddr_ns sns, to;
886: int s, on = 1;
887: struct databuf {
888: struct sphdr proto_spp; /* prototype header */
889: char buf[534]; /* max. possible data by Xerox std. */
890: } buf;
891: ...
892: s = socket(AF_NS, SOCK_SEQPACKET, 0);
893: ...
894: bind(s, (struct sockaddr *) &sns, sizeof (sns));
895: setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_OUTPUT, &on, sizeof(on));
896: ...
897: buf.proto_spp.sp_dt = 1; /* bulk data */
898: buf.proto_spp.sp_cc = SP_EM; /* end-of-message */
899: strcpy(buf.buf, "hello world\en");
900: sendto(s, (char *) &buf, sizeof(struct sphdr) + strlen("hello world\en"),
901: (struct sockaddr *) &to, sizeof(to));
902: ...
903: .DE
904: Note that one must be careful when writing headers; if the prototype
905: header is not written with the data with which it is to be associated,
906: the kernel will treat the first few bytes of the data as the
907: header, with unpredictable results.
908: To turn off the above association, and to indicate that packet
909: headers received by the system should be passed up to the user,
910: one might use:
911: .DS
912: #include <sys/types.h>
913: #include <sys/socket.h>
914: #include <netns/ns.h>
915: #include <netns/sp.h>
916: ...
917: struct sockaddr sns;
918: int s, on = 1, off = 0;
919: ...
920: s = socket(AF_NS, SOCK_SEQPACKET, 0);
921: ...
922: bind(s, (struct sockaddr *) &sns, sizeof (sns));
923: setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_OUTPUT, &off, sizeof(off));
924: setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_INPUT, &on, sizeof(on));
925: ...
926: .DE
927: .PP
928: Output is handled somewhat differently in the IDP world.
929: The header of an IDP-level packet looks like:
930: .DS
931: .if t .ta \w'struct 'u +\w" struct ns_addr"u +2.0i
932: struct idp {
933: u_short idp_sum; /* Checksum */
934: u_short idp_len; /* Length, in bytes, including header */
935: u_char idp_tc; /* Transport Control (i.e., hop count) */
936: u_char idp_pt; /* Packet Type (i.e., level 2 protocol) */
937: struct ns_addr idp_dna; /* Destination Network Address */
938: struct ns_addr idp_sna; /* Source Network Address */
939: };
940: .DE
941: The primary field of interest in an IDP header is the \fIpacket type\fP
942: field. The standard values for this field are (as defined
943: in <\fInetns/ns.h\fP>):
944: .DS
945: .if t .ta \w" #define"u +\w" NSPROTO_ERROR"u +1.0i
946: #define NSPROTO_RI 1 /* Routing Information */
947: #define NSPROTO_ECHO 2 /* Echo Protocol */
948: #define NSPROTO_ERROR 3 /* Error Protocol */
949: #define NSPROTO_PE 4 /* Packet Exchange */
950: #define NSPROTO_SPP 5 /* Sequenced Packet */
951: .DE
952: For SPP connections, the contents of this field are
953: automatically set to NSPROTO_SPP; for IDP packets,
954: this value defaults to zero, which means ``unknown''.
955: .PP
956: Setting the value of that field with SO_DEFAULT_HEADERS is
957: easy:
958: .DS
959: #include <sys/types.h>
960: #include <sys/socket.h>
961: #include <netns/ns.h>
962: #include <netns/idp.h>
963: ...
964: struct sockaddr sns;
965: struct idp proto_idp; /* prototype header */
966: int s, on = 1;
967: ...
968: s = socket(AF_NS, SOCK_DGRAM, 0);
969: ...
970: bind(s, (struct sockaddr *) &sns, sizeof (sns));
971: proto_idp.idp_pt = NSPROTO_PE; /* packet exchange */
972: setsockopt(s, NSPROTO_IDP, SO_DEFAULT_HEADERS, (char *) &proto_idp,
973: sizeof(proto_idp));
974: ...
975: .DE
976: .PP
977: Using SO_HEADERS_ON_OUTPUT is somewhat more difficult. When
978: SO_HEADERS_ON_OUTPUT is turned on for an IDP socket, the socket
979: becomes (for all intents and purposes) a raw socket. In this
980: case, all the fields of the prototype header (except the
981: length and checksum fields, which are computed by the kernel)
982: must be filled in correctly in order for the socket to send and
983: receive data in a sensible manner. To be more specific, the
984: source address must be set to that of the host sending the
985: data; the destination address must be set to that of the
986: host for whom the data is intended; the packet type must be
987: set to whatever value is desired; and the hopcount must be
988: set to some reasonable value (almost always zero). It should
989: also be noted that simply sending data using \fIwrite\fP
990: will not work unless a \fIconnect\fP or \fIsendto\fP call
991: is used, in spite of the fact that it is the destination
992: address in the prototype header that is used, not the one
993: given in either of those calls. For almost
994: all IDP applications , using SO_DEFAULT_HEADERS is easier and
995: more desirable than writing headers.
996: .NH 2
997: Three-way Handshake
998: .PP
999: The semantics of SPP connections indicates that a three-way
1000: handshake, involving changes in the datastream type, should \(em
1001: but is not absolutely required to \(em take place before a SPP
1002: connection is closed. Almost all SPP connections are
1003: ``well-behaved'' in this manner; when communicating with
1004: any process, it is best to assume that the three-way handshake
1005: is required unless it is known for certain that it is not
1006: required. In a three-way close, the closing process
1007: indicates that it wishes to close the connection by sending
1008: a zero-length packet with end-of-message set and with
1009: datastream type 254. The other side of the connection
1010: indicates that it is OK to close by sending a zero-length
1011: packet with end-of-message set and datastream type 255. Finally,
1012: the closing process replies with a zero-length packet with
1013: substream type 255; at this point, the connection is considered
1014: closed. The following code fragments are simplified examples
1015: of how one might handle this three-way handshake at the user
1016: level; in the future, support for this type of close will
1017: probably be provided as part of the C library or as part of
1018: the kernel. The first code fragment below illustrates how a process
1019: might handle three-way handshake if it sees that the process it
1020: is communicating with wants to close the connection:
1021: .DS
1022: #include <sys/types.h>
1023: #include <sys/socket.h>
1024: #include <netns/ns.h>
1025: #include <netns/sp.h>
1026: ...
1027: #ifndef SPPSST_END
1028: #define SPPSST_END 254
1029: #define SPPSST_ENDREPLY 255
1030: #endif
1031: struct sphdr proto_sp;
1032: int s;
1033: ...
1034: read(s, buf, BUFSIZE);
1035: if (((struct sphdr *)buf)->sp_dt == SPPSST_END) {
1036: /*
1037: * SPPSST_END indicates that the other side wants to
1038: * close.
1039: */
1040: proto_sp.sp_dt = SPPSST_ENDREPLY;
1041: proto_sp.sp_cc = SP_EM;
1042: setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp,
1043: sizeof(proto_sp));
1044: write(s, buf, 0);
1045: /*
1046: * Write a zero-length packet with datastream type = SPPSST_ENDREPLY
1047: * to indicate that the close is OK with us. The packet that we
1048: * don't see (because we don't look for it) is another packet
1049: * from the other side of the connection, with SPPSST_ENDREPLY
1050: * on it it, too. Once that packet is sent, the connection is
1051: * considered closed; note that we really ought to retransmit
1052: * the close for some time if we do not get a reply.
1053: */
1054: close(s);
1055: }
1056: ...
1057: .DE
1058: To indicate to another process that we would like to close the
1059: connection, the following code would suffice:
1060: .DS
1061: #include <sys/types.h>
1062: #include <sys/socket.h>
1063: #include <netns/ns.h>
1064: #include <netns/sp.h>
1065: ...
1066: #ifndef SPPSST_END
1067: #define SPPSST_END 254
1068: #define SPPSST_ENDREPLY 255
1069: #endif
1070: struct sphdr proto_sp;
1071: int s;
1072: ...
1073: proto_sp.sp_dt = SPPSST_END;
1074: proto_sp.sp_cc = SP_EM;
1075: setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp,
1076: sizeof(proto_sp));
1077: write(s, buf, 0); /* send the end request */
1078: proto_sp.sp_dt = SPPSST_ENDREPLY;
1079: setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp,
1080: sizeof(proto_sp));
1081: /*
1082: * We assume (perhaps unwisely)
1083: * that the other side will send the
1084: * ENDREPLY, so we'll just send our final ENDREPLY
1085: * as if we'd seen theirs already.
1086: */
1087: write(s, buf, 0);
1088: close(s);
1089: ...
1090: .DE
1091: .NH 2
1092: Packet Exchange
1093: .PP
1094: The Xerox standard protocols include a protocol that is both
1095: reliable and datagram-oriented. This protocol is known as
1096: Packet Exchange (PEX or PE) and, like SPP, is layered on top
1097: of IDP. PEX is important for a number of things: Courier
1098: remote procedure calls may be expedited through the use
1099: of PEX, and many Xerox servers are located by doing a PEX
1100: ``BroadcastForServers'' operation. Although there is no
1101: implementation of PEX in the kernel,
1102: it may be simulated at the user level with some clever coding
1103: and the use of one peculiar \fIgetsockopt\fP. A PEX packet
1104: looks like:
1105: .DS
1106: .if t .ta \w'struct 'u +\w" struct idp"u +2.0i
1107: /*
1108: * The packet-exchange header shown here is not defined
1109: * as part of any of the system include files.
1110: */
1111: struct pex {
1112: struct idp p_idp; /* idp header */
1113: u_short ph_id[2]; /* unique transaction ID for pex */
1114: u_short ph_client; /* client type field for pex */
1115: };
1116: .DE
1117: The \fIph_id\fP field is used to hold a ``unique id'' that
1118: is used in duplicate suppression; the \fIph_client\fP
1119: field indicates the PEX client type (similar to the packet
1120: type field in the IDP header). PEX reliability stems from the
1121: fact that it is an idempotent (``I send a packet to you, you
1122: send a packet to me'') protocol. Processes on each side of
1123: the connection may use the unique id to determine if they have
1124: seen a given packet before (the unique id field differs on each
1125: packet sent) so that duplicates may be detected, and to indicate
1126: which message a given packet is in response to. If a packet with
1127: a given unique id is sent and no response is received in a given
1128: amount of time, the packet is retransmitted until it is decided
1129: that no response will ever be received. To simulate PEX, one
1130: must be able to generate unique ids -- something that is hard to
1131: do at the user level with any real guarantee that the id is really
1132: unique. Therefore, a means (via \fIgetsockopt\fP) has been provided
1133: for getting unique ids from the kernel. The following code fragment
1134: indicates how to get a unique id:
1135: .DS
1136: long uniqueid;
1137: int s, idsize = sizeof(uniqueid);
1138: ...
1139: s = socket(AF_NS, SOCK_DGRAM, 0);
1140: ...
1141: /* get id from the kernel -- only on IDP sockets */
1142: getsockopt(s, NSPROTO_PE, SO_SEQNO, (char *)&uniqueid, &idsize);
1143: ...
1144: .DE
1145: The retransmission and duplicate suppression code required to
1146: simulate PEX fully is left as an exercise for the reader.
1147: .NH 2
1148: Inetd
1149: .PP
1150: One of the daemons provided with 4.3BSD is \fIinetd\fP, the
1151: so called ``internet super-server.'' \fIInetd\fP is invoked at boot
1152: time, and determines from the file \fI/etc/inetd.conf\fP the
1153: servers for which it is to listen. Once this information has been
1154: read and a pristine environment created, \fIinetd\fP proceeds
1155: to create one socket for each service it is to listen for,
1156: binding the appropriate port number to each socket.
1157: .PP
1158: \fIInetd\fP then performs a \fIselect\fP on all these
1159: sockets for read availability, waiting for somebody wishing
1160: a connection to the service corresponding to
1161: that socket. \fIInetd\fP then performs an \fIaccept\fP on
1162: the socket in question, \fIfork\fPs, \fIdup\fPs the new
1163: socket to file descriptors 0 and 1 (stdin and
1164: stdout), closes other open file
1165: descriptors, and \fIexec\fPs the appropriate server.
1166: .PP
1167: Servers making use of \fIinetd\fP are considerably simplified,
1168: as \fIinetd\fP takes care of the majority of the IPC work
1169: required in establishing a connection. The server invoked
1170: by \fIinetd\fP expects the socket connected to its client
1171: on file descriptors 0 and 1, and may immediately perform
1172: any operations such as \fIread\fP, \fIwrite\fP, \fIsend\fP,
1173: or \fIrecv\fP. Indeed, servers may use
1174: buffered I/O as provided by the ``stdio'' conventions, as
1175: long as as they remember to use \fIfflush\fP when appropriate.
1176: .PP
1177: One call which may be of interest to individuals writing
1178: servers under \fIinetd\fP is the \fIgetpeername\fP call,
1179: which returns the address of the peer (process) connected
1180: on the other end of the socket. For example, to log the
1181: Internet address in ``dot notation'' (e.g., ``128.32.0.4'')
1182: of a client connected to a server under
1183: \fIinetd\fP, the following code might be used:
1184: .DS
1185: struct sockaddr_in name;
1186: int namelen = sizeof (name);
1187: ...
1188: if (getpeername(0, (struct sockaddr *)&name, &namelen) < 0) {
1189: syslog(LOG_ERR, "getpeername: %m");
1190: exit(1);
1191: } else
1192: syslog(LOG_INFO, "Connection from %s", inet_ntoa(name.sin_addr));
1193: ...
1194: .DE
1195: While the \fIgetpeername\fP call is especially useful when
1196: writing programs to run with \fIinetd\fP, it can be used
1197: under other circumstances. Be warned, however, that \fIgetpeername\fP will
1198: fail on UNIX domain sockets.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.