|
|
1.1 ! root 1: .\" Copyright (c) 1986 Regents of the University of California. ! 2: .\" All rights reserved. The Berkeley software License Agreement ! 3: .\" specifies the terms and conditions for redistribution. ! 4: .\" ! 5: .\" @(#)netinet.t 1.8 (Berkeley) 4/11/86 ! 6: .\" ! 7: .hw SUBNETSARELOCAL ! 8: .NH ! 9: Internet network protocols ! 10: .PP ! 11: There are numerous bug fixes and extensions in the Internet ! 12: protocol support (\fB/sys/netinet\fP). ! 13: This section describes some of the more important changes ! 14: with very little detail. ! 15: As many of the changes span several source files, ! 16: and as it is very difficult to merge this code with earlier versions ! 17: of these protocols, ! 18: it is strongly recommended that the 4.3BSD network be adopted ! 19: intact, with local hacks merged into it only if necessary. ! 20: .NH 2 ! 21: Internet common code ! 22: .PP ! 23: By far, the most important change in IP and the shared Internet support ! 24: layer is the addition of subnetwork addressing. ! 25: This facility is used (and required) by a number of large university ! 26: and other networks that include multiple physical networks ! 27: as well as connections with the DARPA Internet. ! 28: Subnet support allows a collection of interconnected local networks ! 29: to share a single network number, ! 30: hiding the complexity of the local environment and routing ! 31: from external hosts and gateways. ! 32: The subnet support in 4.3BSD conforms with the Internet standard ! 33: for subnet addressing, RFC-950. ! 34: For each network interface, a network mask is set along with the address. ! 35: This mask determines which portion of the address is the network number, ! 36: including the subnet, and by default is set according to the network ! 37: class (A, B, or C, with 8, 16, or 24 bits of network part, respectively). ! 38: Within a subnetted network each subnet appears as a distinct network; ! 39: externally, the entire network appears to be a single entity. ! 40: .PP ! 41: Another important change in IP addressing ! 42: is a change to the default IP broadcast address. ! 43: The default broadcast address is the address with a host part of all ones ! 44: (using the definition INADDR_BROADCAST), ! 45: in conformance with RFC-919. ! 46: In 4.2BSD, the broadcast address was the address with a host part ! 47: of all zeros (INADDR_ANY). ! 48: To facilitate the conversion process, ! 49: and to help avoid breaking networks with forwarded broadcasts, ! 50: 4.3BSD allows the broadcast address to be set for each interface. ! 51: IP recognizes and accepts network broadcasts ! 52: as well as subnet broadcasts when subnets are enabled. ! 53: Such broadcasts normally originate from hosts that do not know about subnets. ! 54: IP also accepts old-style (4.2) broadcasts using a host part of all ! 55: zeros, either as a network or subnet broadcast. ! 56: An address of all ones ! 57: is recognized as ``broadcast on this network,'' and an address of all ! 58: zeros is accepted as well. ! 59: The latter two are sometimes used in ! 60: broadcast information requests or network mask requests in the course ! 61: of starting a diskless workstation. ! 62: ICMP includes support for the Network Mask Request and Response. ! 63: A new routine, \fIin_broadcast\fP, ! 64: was added for the use of link layer output routines ! 65: to determine whether an IP packet should be broadcast. ! 66: .PP ! 67: Network numbers are now stored and used unshifted to ! 68: minimize conversions and reduce the overhead associated with comparisons. ! 69: 4.2BSD shifted network numbers to the low-order part of the word. ! 70: The structure defining Internet addresses no longer includes ! 71: the old IMP-host fields, but only a featureless 32-bit address. ! 72: .XP in.h ! 73: The definitions of Internet port numbers in this file ! 74: were deleted, as they have been superceded by the \fIgetservicebyname\fP ! 75: interface. ! 76: A definition was added for the single ! 77: option at the IP level accessible through \fIsetsockopt\fP, ! 78: IP_OPTIONS. ! 79: .XP in_pcb.h ! 80: The Internet protocol control block includes a pointer to an optional ! 81: mbuf containing IP options. ! 82: .XP in_var.h ! 83: This new header file contains the declaration of the Internet ! 84: variety of the per-interface address information. ! 85: The \fIin_ifaddr\fP structure includes the network, subnet, network mask ! 86: and broadcast information. ! 87: .XP in.c ! 88: The \fIif_*\fP routines which manipulate Internet addresses ! 89: were renamed to \fIin_*\fP. ! 90: \fIin_netof\fP and \fIin_lnaof\fP check whether the address ! 91: is for a directly-connected network, and if so they use the local ! 92: network mask to return the subnet/net and host portions, respectively. ! 93: \fIin_localaddr\fP determines whether an address corresponds ! 94: to a directly-connected network. ! 95: By default, this includes any subnet of a local network; ! 96: a configuration option, SUBNETSARELOCAL=0, changes this to return ! 97: true only for a directly-connected subnet or non-subnetted network. ! 98: Interface \fIioctl\fPs that get or set addresses or related status information ! 99: are forwarded to \fIin_control\fP, which implements them. ! 100: \fIin_iaonnetof\fP replaces \fIif_ifonnetof\fP for Internet addresses only. ! 101: .XP in_pcb.c ! 102: The destination address of a \fIconnect\fP may be given as INADDR_ANY (0) ! 103: as a shorthand notation for ``this host.'' ! 104: This simplifies the process of connecting to local servers ! 105: such as the name-domain server that translates host names to addresses. ! 106: Also, the short-hand address INADDR_BROADCAST is converted to the broadcast ! 107: address for the primary local network; it fails if that network ! 108: is incapable of broadcast. ! 109: The source address for a connection or datagram ! 110: is selected according to the outgoing interface; ! 111: the initial route is allocated at this time and stored ! 112: in the protocol control block, so that it may be used again ! 113: when actually sending the packet(s). ! 114: The \fIin_pcbnotify\fP routine was generalized to apply any function ! 115: and/or report an error to all connections to a destination; ! 116: it is used to notify connections of routing changes and other ! 117: non-error situations as well as errors. ! 118: New entries have been added to this level to invalidate cached ! 119: routes when routing changes occur, ! 120: as well as to report possible routing failures detected by ! 121: higher levels. ! 122: .XP in_proto.c ! 123: The protocol switch table for Internet protocols includes entries ! 124: for the \fIctloutput\fP routines. ! 125: ICMP may be used with raw sockets. ! 126: A raw wildcard entry allows raw sockets to use any protocol ! 127: not already implemented in the kernel (e.g., EGP). ! 128: .NH 2 ! 129: IP ! 130: .PP ! 131: Support was added for IP source routing and other IP options ! 132: (partly derived from BBN's implementation). ! 133: On output, IP options such as strict or loose source route and record ! 134: may be set by a client process using TCP, UDP or raw IP sockets. ! 135: IP properly updates source-route and record-route options ! 136: when forwarding (and leaves them in the packet, unlike 4.2 which ! 137: stripped them out after updating). ! 138: IP input preserves any source-routing information in an incoming packet ! 139: and passes it up to the receiving protocol upon request, ! 140: reversing it and arranging it in the same way as user-supplied options. ! 141: Both TCP and ICMP retrieve incoming source routes for use in replies. ! 142: Most of the option-handling code has been converted to use ! 143: \fIbcopy\fP instead of structure assignments when copying addresses, ! 144: as the alignment in the incoming packet may not be correct for the host. ! 145: This is not required on the VAX, but is needed on most other machines ! 146: running 4.2BSD. ! 147: .XP ip.h ! 148: The IP time-to-live field is decremented by one when forwarding; ! 149: in 4.2BSD this value was five. ! 150: .XP ip_var.h ! 151: Data structures and definitions were added for storing ! 152: IP options. ! 153: New fields have been added to the structure containing IP statistics. ! 154: .XP ip_input.c ! 155: The changes to save and present incoming IP source-routing information ! 156: to higher level protocols are in this file. ! 157: The identity of the interface that received the packet is also ! 158: determined by \fIip_input\fP and passed to the next protocol ! 159: receiving the packet. ! 160: To avoid using uninitialized data structures, ! 161: IP must not begin receiving packets until at least one Internet address ! 162: has been set. ! 163: A bug in the reassembly of IP packets with options has been corrected. ! 164: Machines with only a single network interface (in addition to the loopback ! 165: interface) no longer attempt to forward received IP packets that are ! 166: not destined for them; ! 167: they also do not respond with ICMP errors unless configured with ! 168: the GATEWAY option. ! 169: This change prevents large increases in network activity which used to result ! 170: when an IP packet that was broadcast was not understood as a broadcast. ! 171: A one-element route cache was added to the IP forwarding routine. ! 172: When a packet is forwarded using the same interface on which it arrived, ! 173: if the source host is on the directly-attached network, ! 174: an ICMP redirect is sent to the source. ! 175: If the route used for forwarding was a route to a host ! 176: or a route to a subnet, ! 177: a host redirect is used, otherwise a network redirect is sent. ! 178: The generation of redirects may be disabled by a configuration option, ! 179: IPSENDREDIRECTS=0. ! 180: More statistics are collected, in particular on traffic and fragmentation. ! 181: The \fIip_ctlinput\fP routine was moved to each of the upper-level ! 182: protocols, as they each have somewhat different requirements. ! 183: .XP ip_output.c ! 184: The IP output routine manages a cached route in the protocol ! 185: control block for each TCP, UDP or raw IP socket. ! 186: If the destination has changed, the route has been marked down, ! 187: or the route was freed because of a routing change, a new route ! 188: is obtained. ! 189: The route is not used if the IP_ROUTETOIF (aka SO_DONTROUTE or MSG_DONTROUTE) ! 190: option is present. ! 191: Preformed IP options passed to \fIip_output\fP are inserted, ! 192: changing the destination address as required. ! 193: The \fIip_ctloutput\fP routine allows options to be set for an individual ! 194: socket, validating and internalizing them as appropriate. ! 195: .XP raw_ip.c ! 196: The type-of-service and offset fields in the IP header ! 197: are set to zero on output. ! 198: The SO_DONTROUTE flag is handled properly. ! 199: .NH 2 ! 200: ICMP ! 201: .PP ! 202: There have been numerous fixes and corrections to ICMP. ! 203: Length calculations have been corrected, allowing ! 204: most ICMP packet lengths to be received and allowing errors ! 205: to be sent about smaller input packets. ! 206: ICMP now uses information about the interface on which a message ! 207: was received to determine the ! 208: correct source address on returned error packets ! 209: and replies to information requests. ! 210: Support was added for the Network Mask Request. ! 211: Responses to source-routed requests use the reversed source route ! 212: for the return trip. ! 213: Timestamps are created with \fImicrotime\fP, allowing 1-millisecond ! 214: resolution. ! 215: The \fIicmp_error\fP routine is capable of sending ICMP redirects. ! 216: When processing network redirects, the returned source address is converted ! 217: to a network address before passing it to the routing redirect handler. ! 218: The translation of ICMP errors to Unix error returns was updated. ! 219: .NH 2 ! 220: TCP ! 221: .PP ! 222: In addition to bug fixes, several performance changes have been ! 223: made to TCP. ! 224: Several of these address overall network performance and congestion ! 225: avoidance, while others address performance of an individual connection. ! 226: The most important changes concern the TCP send policy. ! 227: First, the sender silly-window syndrome avoidance strategy was fixed. ! 228: In 4.2BSD, the amount that could be sent was compared to the offered window, ! 229: and thus small amounts could still be sent if the receiver offered ! 230: a silly window. ! 231: Once this was fixed, there were problems with peers that never offered ! 232: windows large enough for a maximum segment, or at least 512 bytes ! 233: (e.g., the peer is a TAC or an IBM PC). ! 234: Code was then added to maintain estimates of the peer's receive and send ! 235: buffer sizes. ! 236: The send policy will now send if the offered ! 237: window is at least one-half of the receiver's buffer, as well as when ! 238: the window is at least a full-sized segment. ! 239: (When the window is large enough for all data that is queued, ! 240: the data will also be sent.) ! 241: The send buffer size estimate is not yet used, but is desired for a new ! 242: delayed-acknowledgement scheme that has yet to be tested. ! 243: Another problem that was exposed when the silly-window avoidance was fixed ! 244: was that the persist code didn't expect to be used with a non-zero window. ! 245: The persist now lasts only until the first timeout, at which time ! 246: a packet is sent of the largest size allowed by the window. ! 247: If this packet is not acknowledged, the output routine must begin retransmission ! 248: rather than returning to the persist state. ! 249: .PP ! 250: Another change related to the send policy is a strategy designed to minimize ! 251: the number of small packets outstanding on slow links. ! 252: This is an implementation of an algorithm proposed by John Nagle ! 253: in RFC-896. ! 254: The algorithm is very simple: ! 255: when there is outstanding, unacknowledged data pending ! 256: on a connection, new data are not sent unless they fill a maximum-sized ! 257: segment. ! 258: This allows bulk data transfers to proceed, ! 259: but causes small-packet traffic such as remote login to bundle together ! 260: data received during a single round-trip time. ! 261: On high-bandwidth, low-delay networks such as a local Ethernet, ! 262: this change seldom causes delay, but over slow links or across the Internet, ! 263: the number of small packets can be reduced considerably. ! 264: This algorithm does interact poorly with one type of usage, however, ! 265: as demonstrated by the X window system. ! 266: When small packets are sent in a stream, such as when doing rubber-banding ! 267: to position a new window, and when no echo or other acknowledgement ! 268: is being received from the other end of the connection, ! 269: the round-trip delay becomes as large as the delayed-acknowledgement timer ! 270: on the remote end. ! 271: For such clients, a TCP option may be set with \fIsetsockopt\fP ! 272: to defeat this part of the send policy. ! 273: .PP ! 274: For bulk-data transfers, the largest single change to improve performance ! 275: is to increase the size of the send and receive buffers. ! 276: The default buffer size in 4.3BSD is 4096 bytes, double the value in 4.2BSD. ! 277: These values allow more outstanding data and reduce the amount of time ! 278: waiting for a window update from the receiver. ! 279: They also improve the utility of the delayed-acknowledgement strategy. ! 280: The delayed acknowledgment strategy withholds acknowledgements ! 281: until a window update would uncover at least 35% of the window; ! 282: in 4.2BSD, with 1024-byte packets on an Ethernet and 2048-byte windows, ! 283: this took only a single packet. ! 284: With 4096-byte windows, up to 50% of the acknowledgements may be avoided. ! 285: .PP ! 286: The use of larger buffers might cause problems when bulk-data transfers ! 287: must traverse several networks and gateways with limited buffering capacity. ! 288: The source-quench ICMP message was provided to allow gateways in such ! 289: circumstances to cause source hosts to slow their rate of packet injection ! 290: into the network. ! 291: While 4.2BSD ignored such messages, the 4.3BSD TCP includes a mechanism ! 292: for throttling back the sender when a source quench is received. ! 293: This is done by creating an artificially small window (one which is 80% ! 294: of the outstanding data at the time the quench is received, but no less than ! 295: one segment). ! 296: This artificial congestion window is slowly opened as acknowledgements ! 297: are received. ! 298: The result under most circumstances is a slow fluctuation around the buffering ! 299: limit of the intermediate gateways, depending on the other traffic flowing ! 300: at the same time. ! 301: .PP ! 302: A final set of changes designed to improve network throughput ! 303: concerns the retransmission policy. ! 304: The retransmission timer is set according to the current round-trip ! 305: time estimate. ! 306: Unfortunately, the round-trip timing code in 4.2BSD had several bugs ! 307: which caused retransmissions to begin much too early. ! 308: These bugs in round trip timing have been corrected. ! 309: Also, the retransmission code has been tuned, using a faster ! 310: backoff after the first retransmission. ! 311: On an initial connection request where there is no round-trip time estimate, ! 312: a much more conservative policy is used. ! 313: When a slow link intervenes between the sender and the destination, ! 314: this policy avoids queuing large numbers of retransmitted connection requests ! 315: before a reply can be received. It also avoids saturation when ! 316: the destination host ! 317: is down or nonexistent. ! 318: During a connection, when the retransmission timer expires, ! 319: only a single packet is sent. ! 320: When only a single packet has been lost, this avoids resending ! 321: data that was successfully received; ! 322: when a host has gone down or become unreachable, it avoids sending ! 323: multiple packets at each timeout. ! 324: Once another acknowledgement is received, the transmission policy ! 325: returns to normal. ! 326: .PP ! 327: 4.2BSD offered a maximum receive segment size of 1024 for all connections, ! 328: and accepted such offers whenever made. ! 329: However, that size was especially poor for the Arpanet ! 330: and other 1822-based IMP networks (sorry, make that PSN networks) ! 331: where the maximum packet size is 1007 bytes. ! 332: This was compounded by a bug in the LH/DH driver that did not allow ! 333: space for an end-of-packet bit in the receive buffer, ! 334: and thus maximum size packets that were received were split across buffers. ! 335: This, in turn, aggravated a hardware ! 336: problem causing small packets following a segmented packet to be concatenated ! 337: with the previous packet. ! 338: The result of this set of conditions was that performance across ! 339: the Arpanet was sometimes abominably slow. ! 340: The maximum size segment selected by 4.3BSD is chosen according ! 341: to the destination and the interface to be used. ! 342: The segment size chosen is somewhat less than the maximum transmission unit ! 343: of the outgoing interface. ! 344: If the destination is not local, ! 345: the segment size is a convenient small size near ! 346: the default maximum size (512 bytes). ! 347: This value is both the maximum segment size ! 348: offered to the sender by the receive side, ! 349: and the maximum size segment that will be sent. ! 350: Of course, the send size is also limited ! 351: to be no more than the receiver has indicated it is willing to receive. ! 352: .PP ! 353: The initial sequence number prototype for TCP is now ! 354: incremented much more quickly; this has exposed two bugs. ! 355: Both the window-update receiving code and the urgent data receiving ! 356: code compared sequence numbers to 0 the first time they were called ! 357: on a connection. This fails if the initial sequence number has ! 358: wrapped around to negative numbers. Both are now initialized ! 359: when the connection is set up. This still remains a problem ! 360: in maintaining compatibility with 4.2BSD systems; ! 361: thus an option, TCP_COMPAT_42, was added to avoid using such sequence numbers ! 362: until 4.2 systems have been upgraded. ! 363: .PP ! 364: Additional changes in TCP are listed by source file: ! 365: .XP tcp_input.c ! 366: The common case of TCP data input, the arrival of the next ! 367: expected data segment with an empty reassembly queue, was made ! 368: into a simplified macro for efficiency. ! 369: \fITcp_input\fP was modified to know when it needed to call the output side, ! 370: reducing unnecessary tests for most acknowledgement-only packets. ! 371: The receive window size calculation on input was modified ! 372: to avoid shrinking the offered window; ! 373: this change was needed due to a change in input data ! 374: packaging by the link layer. ! 375: A bug in handling TCP packets received with both data and options ! 376: (that are not supposed to be used) has been corrected. ! 377: If data is received on a connection after the process has closed, ! 378: the other end is sent a reset, preventing connections from ! 379: hanging in CLOSE_WAIT on one end and FIN_WAIT_2 on the other. ! 380: (4.2BSD contained code to do this, but it was never executed ! 381: because such input packets had already been dropped ! 382: as being outside of the receive window.) ! 383: A timer is now started upon entering ! 384: FIN_WAIT_2 state if the local user has closed, closing the connection ! 385: if the final FIN is not received within a reasonable time. ! 386: Half-open connections are now reset more reliably; there were circumstances ! 387: under which one end could be rebooted, and new connection requests ! 388: that used the same port number might not receive a reset. ! 389: The urgent-data code was modified to remember which data had ! 390: already been read by the user, avoiding possible confusion if two ! 391: urgent-data signals were received close together. ! 392: Another change was made specifically for connections with a TAC. ! 393: The TAC doesn't fill in the window field on its initial packet (SYN), ! 394: and the apparent window is random. ! 395: There is some question as to the validity of the window field ! 396: if the packet does not have ACK set, ! 397: and therefore TCP was changed to ignore the window information ! 398: on those packets. ! 399: .XP tcp_output.c ! 400: The advertised window is never allowed to shrink, ! 401: in correspondence with the earlier change in the input handler. ! 402: The retransmit code was changed to check for shrinking windows, ! 403: updating the connection state rather than timing out ! 404: while waiting for acknowledgement. ! 405: The modifications to the send policy described above are largely ! 406: within this file. ! 407: .XP tcp_timer.c ! 408: The timer routines were changed to allow a longer wait for acknowledgements. ! 409: (TCP would generally time out before the routing protocol ! 410: had changed routes.) ! 411: .NH 2 ! 412: UDP ! 413: .PP ! 414: An error in the checksumming of output UDP packets was corrected. ! 415: Checksums are now checked by default, unless the COMPAT_42 configuration ! 416: option is specified; it is provided to allow communication with the 4.2BSD UDP ! 417: implementation, which generates incorrect checksums. ! 418: When UDP datagrams are received for a port at which no process is listening, ! 419: ICMP unreachable messages are sent in response unless the input packet ! 420: was a broadcast. ! 421: The size of the receive buffer was increased, as several large datagrams ! 422: and their attached addresses could otherwise fill the buffer. ! 423: The time-to-live of output datagrams was reduced from 255 ! 424: to 30. ! 425: UDP uses its own \fIctlinput\fP routine for handling of ICMP errors, ! 426: so that errors may be reported to the sender without closing the socket. ! 427: .NH 2 ! 428: Address Resolution Protocol ! 429: .PP ! 430: The address resolution protocol has been generalized somewhat. ! 431: It was specific for IP on 10\ Mb/s Ethernet; it now handles multiple ! 432: protocols on 10 Mb/s Ethernet and could easily be adapted to other ! 433: hardware as well. ! 434: This change was made while adding ARP resolution ! 435: of trailer protocol addresses. ! 436: Hosts desiring to receive trailer ! 437: encapsulations must now indicate that by the use of ARP. This allows ! 438: trailers to be used between cooperating 4.3 machines while using ! 439: non-trailer encapsulations with other hosts. ! 440: The negotiation need not be symmetrical: a VAX may request trailers, ! 441: for example, and a SUN may note this and send trailer packets ! 442: to the VAX without itself requesting trailers. ! 443: This change requires modifications to the 10 Mb/s Ethernet drivers, ! 444: which must provide an additional argument to \fIarpresolve\fP, ! 445: a pointer for the additional return value indicating whether trailer ! 446: encapsulations may be sent. ! 447: With this change, the IFF_NOTRAILERS flag on each interface is interpreted ! 448: to mean that trailers should not be requested. ! 449: Modifications to ARP from SUN Microsystems add \fIioctl\fP operations ! 450: to examine and modify entries in the ARP address translation table, ! 451: and to allow ARP translations to be ``published.'' ! 452: When future requests are received for Ethernet address translations, ! 453: if the translation is in the table and is marked as published, ! 454: they will be answered for that host. ! 455: Those modifications superceded the ``oldmap'' algorithmic translation ! 456: from IP addresses, which has been removed. ! 457: Packets are not forwarded to the loopback interface if it is not marked ! 458: up, and a bug causing an mbuf to be freed twice ! 459: if the loopback output fails was corrected. ! 460: ARP complains if a host lists the broadcast address as its Ethernet address. ! 461: The ARP tables were enlarged to reflect larger network configurations ! 462: now in use. ! 463: A new function for use in driver messages, \fIether_sprintf\fP, ! 464: formats a 48-bit Ethernet address and returns a pointer to the resulting string. ! 465: .NH 2 ! 466: IMP support ! 467: .PP ! 468: The support facilities for connections to an 1822 (or X.25) IMP port ! 469: (\fB/sys/netimp\fP) ! 470: have had several bug fixes and one extension. ! 471: Unit numbers are now checked more carefully during autoconfiguration. ! 472: Code from BRL was installed to support class B and C networks. ! 473: Error packets received from the IMP such as Host Dead are queued ! 474: in the interrupt handler for reprocessing from a software interrupt, ! 475: avoiding state transitions in the protocols at priorities above \fIsplnet\fP. ! 476: The host-dead timer is no longer restarted when attempting new output, ! 477: as a persistent sender could otherwise prevent new output from being attempted ! 478: once a host was reported down. ! 479: The network number is always taken from the address ! 480: configured for the interface at boot time; ! 481: network 10 is no longer assumed. ! 482: A timer is used to prevent blocking if RFNM messages from the IMP are lost. ! 483: A race was fixed when freeing mbufs containing host table entries, ! 484: as the mbuf had been used after it was freed.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.