43BSDTahoe/new/X/doc/Paper/x.mss - annotate

Return to x.mss CVS log
Up to [CSRG BSD Unix] / 43BSDTahoe / new / X / doc / Paper
Annotation of 43BSDTahoe/new/X/doc/Paper/x.mss, revision 1.1.1.1

1.1       root        1: @device(postscript)
                      2: @make(article)
                      3: @style(references=cacm)
                      4: @set(page=+1)
                      5: 
                      6: @majorheading(The X Window System)
                      7: @center(Robert W. Scheifler@footnote( 545 Technology Square, Cambridge, MA 02139.)
                      8: MIT Laboratory for Computer Science
                      9: 
                     10: Jim Gettys@footnote( Project Athena, MIT, Cambridge, MA 02139.)
                     11: Digital Equipment Corporation
                     12: MIT Project Athena
                     13: 
                     14: July 1986
                     15: Revised October 1986@footnote( To appear in Transactions on Graphics #63, 
                     16: Special Issue on User Interface Software, Copyright 1986, 
                     17: Association for Computing Machinery. Permission to copy without fee all or
                     18: part of this material is granted provided that the copies are not made or 
                     19: distributed for direct commercial advantage, the ACM copyright notice and the 
                     20: title of the publication and its date appear, 
                     21: and notice is given that copying is by permission of the Association for
                     22: Computing Machinery.
                     23: To copy otherwise, or to republish requires a fee and/or specific permission.)
                     24: 
                     25: @blankspace(2 lines)
                     26: 
                     27: @begin(abstract)
                     28: 
                     29: An overview of the X Window System is presented, focusing on the system
                     30: substrate and the low-level facilities provided to build applications and to
                     31: manage the desktop.  The system provides high-performance, high-level,
                     32: device-independent graphics.  A hierarchy of resizable, overlapping windows
                     33: allows a wide variety of application and user interfaces to be built easily.
                     34: Network-transparent access to the display provides an important degree of
                     35: functional separation, without significantly affecting performance, that is
                     36: crucial to building applications for a distributed environment.  To a
                     37: reasonable extent, desktop management can be custom tailored to individual
                     38: environments, without modifying the base system and typically without affecting
                     39: applications.
                     40: 
                     41: Categories and Subject Descriptors:  C.2.2 [@b(Computer-Communication Networks)]:
                     42: Network Protocols - @i(protocol architecture); C.2.4 [@b(Computer-Communication
                     43: Networks)]: Distributed Systems - @i(distributed applications); D.4.4 [@b(Operating
                     44: Systems)]: Communication Management - @i(network communication, terminal management);
                     45: H.1.2 [@b(Information Systems)]: User/Machine Systems - @i(human factors); I.3.2
                     46: [@b(Computer Graphics)]: Graphic Systems - @i(distributed/network graphics);
                     47: I.3.4 [@b(Computer Graphics)]: Graphics Utilities - @i(graphics packages, software
                     48: support); I.3.6 [@b(Computer Graphics)]: Methodology and Techniques - @i(device
                     49: independence, interaction techniques)
                     50: 
                     51: General terms:  Design, Experimentation, Human Factors, Standardization
                     52: 
                     53: Additional Key Words and Phrases:  window systems, window managers, virtual terminals
                     54: 
                     55: @end(abstract)
                     56: 
                     57: @section(Introduction)
                     58: 
                     59: The X Window System (or simply X) developed at MIT has achieved fairly
                     60: widespread popularity recently, particularly in the Unix@footnote( Unix is a
                     61: trademark of AT&T Bell Laboratories.) community.  In this paper, we present an
                     62: overview of X, focusing on the system substrate and the low-level facilities
                     63: provided to build applications and to manage the desktop.  In X, this base
                     64: window system provides high-performance graphics to a hierarchy of resizable
                     65: windows.  Rather than mandating a particular user interface, X provides
                     66: primitives to support several policies and styles.  Unlike most window systems,
                     67: the base system in X is defined by a @i(network protocol):  asynchronous
                     68: stream-based inter-process communication replaces the traditional procedure
                     69: call or kernel call interface.  An application can utilize windows on any
                     70: display in a network in a device-independent, network-transparent fashion.
                     71: Interposing a network connection greatly enhances the utility of the window
                     72: system, without significantly affecting performance.  The performance of
                     73: existing X implementations is comparable to contemporary window systems, and in
                     74: general is limited by display hardware rather than network communication.  For
                     75: example, 19500 characters per second and 3500 short vectors per second are
                     76: possible on Digital Equipment Corporation's VAXStation-II/GPX, both locally and
                     77: over a local area network, and these figures are very close to the limits of
                     78: the display hardware.
                     79: 
                     80: X is the result of the simultaneous need for a window system from two separate
                     81: groups at MIT.  In the summer of 1984, the Argus system@cite(argus) at the
                     82: Laboratory for Computer Science needed a debugging environment for
                     83: multi-process distributed applications, and a window system seemed the only
                     84: viable solution.  Project Athena@cite(athena) was faced with dozens, and
                     85: eventually thousands of workstations with bitmap displays, and needed a window
                     86: system to make the displays useful.  Both groups were starting with the Digital
                     87: VS100 display@cite(vs100) and VAX hardware, but it was clear at the outset that
                     88: other architectures and displays had to be supported.  In particular, equal
                     89: numbers of IBM workstations with bitmap displays of unknown type were expected
                     90: eventually within Project Athena.  Portability was therefore a goal from the
                     91: start.  Although all of the initial implementation work was for Berkeley Unix,
                     92: it was clear that the network protocol should not depend on aspects of the
                     93: operating system.
                     94: 
                     95: The name X derives from the lineage of the system.  At Stanford University,
                     96: Paul Asente and Brian Reid had begun work on the W window system@cite(w), as an
                     97: alternative to VGTS@cite(vgts1,vgts2) for the V system@cite(v).  Both VGTS and
                     98: W allow network-transparent access to the display, using the synchronous V
                     99: communication mechanism.  Both systems provide "text" windows for ASCII
                    100: terminal emulation.  VGTS provides graphics windows driven by fairly high-level
                    101: object definitions from a structured display file; W provides graphics windows
                    102: based on a simple display-list mechanism, with limited functionality.  We
                    103: acquired a Unix-based version of W for the VS100 (with synchronous
                    104: communication over TCP@cite(tcp)) done by Asente and Chris Kent at Digital's
                    105: Western Research Laboratory.  From just a few days of experimentation, it was
                    106: clear that a network-transparent hierarchical window system was desirable, but
                    107: that restricting the system to any fixed set of application-specific modes was
                    108: completely inadequate.  It was also clear that, although synchronous
                    109: communication was perhaps acceptable in the V system (due to very fast
                    110: networking primitives), it was completely inadequate in most other operating
                    111: environments.  X is our "reaction" to W.  The X window hierarchy comes directly
                    112: from W, although numerous systems have been built with hierarchy in at least
                    113: some form@cite(lucasfilm,star1,lispm,sunwin,mg1,genera,cedar,metheus,tajo).
                    114: The asynchronous communication protocol used in X is a significant improvement
                    115: over the synchronous protocol used in W, but is very similar to that used in
                    116: Andrew@cite(wm,andrew).  X differs from all of these systems in the degree to
                    117: which both graphics functions and "system" functions are pushed back (across
                    118: the network) as application functions, and in the ability to transparently
                    119: tailor desktop management.
                    120: 
                    121: The next section presents several high-level requirements that we believe a
                    122: window system must satisfy to be a viable standard in a network environment,
                    123: and indicates where the design of X fails to meet some of these requirements.
                    124: In Section 3 we describe the overall X system model, and the effect of
                    125: network-based communication on that model.  Section 4 describes the structure
                    126: of windows, and the primitives for manipulating that structure.  Section 5
                    127: explains the color model used in X, and Section 6 presents the text and
                    128: graphics facilities.  Section 7 discusses the issues of window exposure and
                    129: refresh, and their resolution in X.  Section 8 deals with input event handling.
                    130: In Section 9, we describe the mechanisms for desktop management.
                    131: 
                    132: This paper describes the version@footnote( Version 10.) of X that is currently
                    133: in widespread use.  The design of this version is inadequate in several
                    134: respects.  With our experience to date, and encouraged by the number of
                    135: universities and manufacturers taking a serious interest in X, we have designed
                    136: a new version that should satisfy a significantly wider community.  Section 10
                    137: discusses a number of problems with the current X design, and gives a general
                    138: idea of what changes are contemplated.
                    139: 
                    140: @section(Requirements)
                    141: 
                    142: A window system contains many interfaces.  A @i(programming) interface is a
                    143: library of routines and types provided in a programming language for
                    144: interacting with the window system.  Both low-level (e.g., line drawing) and
                    145: high-level (e.g., menus) interfaces are typically provided.  An @i(application)
                    146: interface is the mechanical interaction with the user and the visual appearance
                    147: that is specific to the application.  A @i(management) interface is the
                    148: mechanical interaction with the user dealing with overall control of the
                    149: desktop and the input devices.  The management interface defines how
                    150: applications are arranged and rearranged on the screen, and how the user
                    151: switches between applications; an individual application interface defines how
                    152: information is presented and manipulated within that application.  The @i(user)
                    153: interface is the sum total of all application and management interfaces.
                    154: 
                    155: Besides applications, we distinguish three major components of a window system.
                    156: The @i(window manager)@footnote( Some people use this term for what we call the
                    157: base window system; that is not the meaning here.) implements the desktop
                    158: portion of the management interface; it controls the size and placement of
                    159: application windows, and also may control application window attributes such as
                    160: titles and borders.  The @i(input manager) implements the remainder of the
                    161: management interface; it controls which applications see input from which
                    162: devices (e.g., keyboard and mouse).  The @i(base window system) is the
                    163: substrate on which applications, window managers, and input managers are built.
                    164: 
                    165: In this paper we are concerned with the base window system of X, with the
                    166: facilities it provides to build applications and managers.  The following
                    167: requirements on the base window system crystallized during the design of X (a
                    168: few were not formulated until late in the design process):
                    169: 
                    170: @begin(enumerate)
                    171: 
                    172: @begin(multiple)
                    173: 
                    174: The system should be implementable on a variety of displays.
                    175: 
                    176: The system should work with nearly any bitmap display, and a variety of input
                    177: devices.  Our design focused on workstation-class display technology likely to
                    178: be available in a university environment over the next few years.  At one end
                    179: of the spectrum is a simple frame buffer and monochrome monitor, driven
                    180: directly by the host CPU with no additional hardware support.  At the other end
                    181: of the spectrum is a multi-plane display with color monitor, driven by a
                    182: high-performance graphics co-processor.  Input devices such as keyboards, mice,
                    183: tablets, joysticks, light pens, and touch screens should be supported.
                    184: 
                    185: @end(multiple)
                    186: @begin(multiple)
                    187: 
                    188: Applications must be device independent.
                    189: 
                    190: There are several aspects to device independence.  Most importantly, it must
                    191: not be necessary to rewrite, recompile, or even relink an application for each
                    192: new hardware display.  Nearly as important, every graphics function defined by
                    193: the system should work on virtually every supported display; the alternative,
                    194: which is to use GKS-style inquire operations@cite(gks) to determine the set of
                    195: implemented functions at run-time, leads to tedious case analysis in every
                    196: application, and to inconsistent user interfaces.  A third aspect of device
                    197: independence is that, as far as possible, applications should not need dual
                    198: control paths to work on both monochrome and color displays.
                    199: 
                    200: @end(multiple)
                    201: @begin(multiple)
                    202: 
                    203: The system must be network transparent:  an application running on one
                    204: machine must be able to utilize a display on some other machine.  The two
                    205: machines should not have to have the same architecture or operating system.
                    206: 
                    207: There are numerous examples of why this important:  a compute-intensive VLSI
                    208: design program executing on a mainframe, but displaying results on a
                    209: workstation; an application distributed over several stand-alone processors,
                    210: but interacting with a user at a workstation; a professor running a program on
                    211: one workstation, presenting results simultaneously on all student workstations.
                    212: 
                    213: In a network environment, there are certain to be applications that must run on
                    214: particular machines or architectures.  Examples include proprietary software,
                    215: applications depending on specific architectural properties, and programs
                    216: manipulating large databases.  Such applications still should be accessible to
                    217: all users.  In a truly heterogeneous environment, not all programming languages
                    218: and programming systems are supported on all machines, and it is very
                    219: undesirable to have to write an interactive front end in multiple languages in
                    220: order to make the application generally available.  With network-transparent
                    221: access, this is not necessary; a single front end written in the same language
                    222: as the application suffices.
                    223: 
                    224: One might think that remote display will be extremely infrequent, and that
                    225: performance therefore is much less important than for local display.
                    226: Experience at MIT, however, indicates that many users routinely make use of the
                    227: remote display capabilities in X, and that the performance of remote display is
                    228: quite important.  The desktop display, although physically connected to a
                    229: single computer, is used as a true @i(network virtual terminal); indeed, the
                    230: idea of an X server (see the next section) built into a Blit-like
                    231: terminal@cite(blit) is an intriguing one.
                    232: 
                    233: @end(multiple)
                    234: @begin(multiple)
                    235: 
                    236: The system must support multiple applications displaying concurrently.
                    237: 
                    238: For example, it should be possible to display a clock with a sweep second hand
                    239: in one window, while simultaneously editing a file in another window.
                    240: 
                    241: @end(multiple)
                    242: @begin(multiple)
                    243: 
                    244: The system should be capable of supporting many different application and
                    245: management interfaces.
                    246: 
                    247: No single user interface is "best"; different communities have radically
                    248: different ideas about user interfaces.  Even within a single community,
                    249: "experts" and "novices" place different demands on an interface.  Rather than
                    250: mandating a particular user interface, the base window system should support a
                    251: wide range of interfaces.
                    252: 
                    253: To achieve this, the system must provide @i(hooks) (mechanism) rather than
                    254: @i(religion) (policy).  For example, since menu styles and semantics vary
                    255: dramatically among different user interfaces, the base window system must
                    256: provide primitives from which menus can be built, rather than just providing a
                    257: fixed menu facility.
                    258: 
                    259: The system should be designed in such a way that it is possible to implement
                    260: management policy both external to the base window system and external to
                    261: applications.  Applications should be largely independent of management policy
                    262: and mechanism; applications should @i(react to) management decisions, rather
                    263: than @i(directing) those decisions.  For example, an application needs to be
                    264: informed when one of its windows is resized, and should react by reformatting
                    265: the information displayed, but involvement of the application should not be
                    266: required in order for the user to change the size.  Making applications
                    267: management-independent, as well as device-independent, facilitates the sharing
                    268: of applications between diverse cultures.
                    269: 
                    270: @end(multiple)
                    271: @begin(multiple)
                    272: 
                    273: The system must support overlapping windows, including output to partially
                    274: obscured windows.
                    275: 
                    276: This is in some sense a by-product of the previous requirement, but is
                    277: important enough to merit explicit statement.  Not all user interfaces allow
                    278: windows to overlap arbitrarily.  However, even interfaces that do not allow
                    279: application windows to overlap typically provide some form of pop-up menu that
                    280: overlaps application windows.  If such menus are built from windows, then
                    281: support for overlapping windows must exist.
                    282: 
                    283: @end(multiple)
                    284: @begin(multiple)
                    285: 
                    286: The system should support a hierarchy of resizable windows, and an application
                    287: should be able to use many windows at once.
                    288: 
                    289: Subwindows provide a clean, powerful mechanism for exporting much of the basic
                    290: system machinery back to the application for direct use.  Many applications
                    291: make use of their own window-like abstractions; some even implement what is
                    292: essentially another window system, nested within the "real" window system.  It
                    293: is important to support arbitrary levels of nesting.  What is viewed as a
                    294: single window at one abstraction level may well require multiple subwindows at
                    295: a lower level.  By providing a true window hierarchy, application windows can
                    296: be implemented as true windows within the system, freeing the application from
                    297: duplicating machinery such as clipping and input control.
                    298: 
                    299: @end(multiple)
                    300: @begin(multiple)
                    301: 
                    302: The system should provide high-performance, high-quality support for text,
                    303: 2-D synthetic graphics, and imaging.
                    304: 
                    305: The base window system must provide "immediate" or "transparent" graphics:  the
                    306: application describes the image precisely, and the system does not attempt to
                    307: second-guess the application.  The use of high-level models, whereby the
                    308: application describes @i(what) it wants in terms of fairly abstract objects and
                    309: the system determines @i(how) best to render the image, cannot be imposed as
                    310: the only form of graphics interface.  Such models generally fail to provide
                    311: adequate support for some important class of applications, and different user
                    312: communities tend to have strong opinions about which model is "best".
                    313: High-level models are extremely important to provide, but they should be built
                    314: in layers on top of the base window system.
                    315: 
                    316: Support for 3-D graphics is not listed as a requirement, but this is not to say
                    317: it is unimportant.  We simply have not considered 3-D graphics, due to lack of
                    318: expertise and lack of time.
                    319: 
                    320: @end(multiple)
                    321: @begin(multiple)
                    322: The system should be extensible.
                    323: 
                    324: For example, the core system may not support 3-D graphics, but it should be
                    325: possible to extend the system with such support.  The extension mechanism
                    326: should allow communities to extend the system non-cooperatively, yet allow such
                    327: independent extensions to be merged gracefully.
                    328: 
                    329: @end(multiple)
                    330: @end(enumerate)
                    331: 
                    332: We believe that a window system must satisfy these requirements to be a viable
                    333: standard in an environment of high-performance workstations and mainframes
                    334: connected via high-performance local area networks.  X satisfies most of these
                    335: requirements, but currently fails to satisfy a few due to practical
                    336: considerations of staffing and time constraints:  the design and much of the
                    337: implementation of the base window system was to be handled solely by the first
                    338: author; it was important to get a working system up fairly quickly; and the
                    339: immediate applications only required relatively simple text and graphics
                    340: support.  As a result, X is not designed to handle high-end color displays or
                    341: to deal with input devices other than a keyboard and mouse; some support for
                    342: high-quality text and graphics is missing; X only provides support for one
                    343: class of management policy; and no provision has been made for extensions.  As
                    344: discussed in Section 10, these and other problems are being addressed in a
                    345: redesign of X.
                    346: 
                    347: @begin(fullpagefigure)
                    348: @blankspace(7 inches)
                    349: @caption(System Structure)
                    350: @end(fullpagefigure)
                    351: 
                    352: @section(System Model)
                    353: 
                    354: The X window system is based on a client-server model; this model follows
                    355: naturally from requirements two and three in the previous section.  For each
                    356: physical display, there is a controlling server.  A client application and a
                    357: server communicate over a reliable duplex (8-bit) byte stream.  A simple block
                    358: stream protocol is layered on top of the byte stream.  If the client and server
                    359: are on the same machine, the stream is typically based on a local inter-process
                    360: communication (IPC) mechanism, and otherwise a network connection is
                    361: established between the pair.  Requiring nothing more than a reliable duplex
                    362: byte stream (without urgent data) for communication makes X usable in many
                    363: environments.  For example, the X protocol can be used over TCP@cite(tcp),
                    364: DECnet@cite(decnet), and Chaos@cite(chaos).
                    365: 
                    366: Multiple clients can have connections open to a server simultaneously, and a
                    367: client can have connections open to multiple servers simultaneously.  The
                    368: essential tasks of the server are to multiplex requests from clients to the
                    369: display, and demultiplex keyboard and mouse input back to the appropriate
                    370: clients.  Typically, the server is implemented as a single sequential process,
                    371: using round-robin scheduling among the clients, and this centralized control
                    372: trivially solves many synchronization problems; however, a multi-process server
                    373: has also been implemented.  Although one might place the server in the kernel
                    374: of the operating system in an attempt to increase performance, a user-level
                    375: server process is vastly easier to debug and maintain, and performance under
                    376: Unix in fact does not seem to suffer.  Similar performance results have been
                    377: obtained in Andrew@cite(wm).  Various tricks are used in both clients and
                    378: server to optimize performance, principally by minimizing the number of
                    379: operating system calls@cite(hacks).
                    380: 
                    381: The server encapsulates the base window system.  It provides the fundamental
                    382: resources and mechanisms, and the hooks required to implement various user
                    383: interfaces.  All device dependencies are encapsulated by the server; the
                    384: communication protocol between clients and the server is device independent.
                    385: By placing all device dependencies on one end of a network connection,
                    386: applications are truly device independent.  The addition of a new display type
                    387: simply requires the addition of a new server implementation; no application
                    388: changes are required.  Of course, the server itself is designed as device
                    389: independent code layered on top of a device dependent core, so only the "back
                    390: end" of the server need be reimplemented for each new display.@footnote( A back
                    391: end has been implemented using a programming interface to X itself, such that a
                    392: complete "recursive" X server executes inside a window of another X server.)
                    393: 
                    394: @subsection(Network Considerations)
                    395: 
                    396: It is extremely important for the server to be robust with respect to client
                    397: failures.  The server, and the network protocol, must be designed so that the
                    398: server never trusts clients to provide correct data.  As a corollary, the
                    399: protocol must be designed in such a way that, if the server ever has to wait
                    400: for a response from a client, it must be possible to continue servicing other
                    401: clients.  Without this property, a buggy client or a network failure could
                    402: easily cause the entire display to freeze up.
                    403: 
                    404: Byte ordering is a standard problem in network communication:  when a 16-bit or
                    405: 32-bit quantity is transmitted over an 8-bit byte stream, is the most
                    406: significant byte transmitted first (big-endian byte order) or is the least
                    407: significant byte transmitted first (little-endian byte order)?  Some machines
                    408: with byte-addressable memory use big-endian order internally, and others use
                    409: little-endian order.  If a single order is chosen for network communication,
                    410: some machines will suffer the overhead of swapping bytes, even when
                    411: communicating with a machine using the same internal byte order.  Such an
                    412: approach also means that both parties in the communication must worry about
                    413: byte order.
                    414: 
                    415: The X protocol uses a different approach.  The server is designed to accept
                    416: both big-endian and little-endian connections.  For example, using TCP this is
                    417: accomplished by having the server listen on two distinct ports; little-endian
                    418: clients connect to the server on one port, and big-endian clients connect on
                    419: the other.  Clients always transmit and receive in their native byte order.
                    420: The server alone is responsible for byte swapping, and byte swapping only
                    421: occurs between dissimilar architectures.  This eliminates the byte swapping
                    422: overhead in the most common situations, and greatly simplifies the building of
                    423: client-side interface libraries in various programming languages.  X is not
                    424: unique in its use of this trick; the current VGTS implementation uses the same
                    425: trick, and similar protocol optimizations have been used in various
                    426: network-based applications.
                    427: 
                    428: Another potential problem in protocol design is word alignment.  In particular,
                    429: some architectures require 16-bit quantities to be aligned on 16-bit boundaries
                    430: and 32-bit quantities to be aligned on 32-bit boundaries in memory.  To allow
                    431: efficient implementations of the protocol across a spectrum of 16-bit and
                    432: 32-bit architectures, the protocol is defined to consist of blocks that are
                    433: always multiples of 32 bits, and each 16-bit and 32-bit quantity within a block
                    434: is aligned on 16-bit and 32-bit boundaries, respectively.
                    435: 
                    436: X is designed to operate in an environment where the inter-process
                    437: communication round-trip time is between 5 and 50 milliseconds, both for local
                    438: and for network communication.  We also assume that data transmission rates are
                    439: comparable to display rates; for example, to transmit and display 5000
                    440: characters per second, a data rate of approximately 50Kb (kilobits per second)
                    441: will be needed, and to transmit and display 20000 characters per second, a data
                    442: rate of approximately 200Kb will be needed.  Networks and protocol
                    443: implementations with these characteristics are now quite commonplace.  For
                    444: example, workstations running Berkeley Unix, connected via 10Mb (megabits per
                    445: second) local area networks, typically have round-trip times of 15 to 30
                    446: milliseconds, and data rates of 500Kb to 1Mb.
                    447: 
                    448: The round-trip time is important in determining the form of the communication
                    449: protocol.  The most common communication will be text and graphics requests
                    450: sent from a client to the server.  Examples of individual requests might be to
                    451: draw a string of text or to draw a line.  Such requests could be sent either
                    452: synchronously, in which case the client sends a request only after receiving a
                    453: reply from the server to the previous request, or they could be sent
                    454: asynchronously, without the server generating any replies.  However, since the
                    455: requests are sent over a reliable stream, they are guaranteed to arrive, and
                    456: arrive in order, so replies from the server to graphics requests serve no
                    457: useful purpose.  Moreover, with round-trip times over 5 milliseconds, output to
                    458: the display must be asynchronous, or it will be impossible to drive high-speed
                    459: displays adequately.  For example, at 80 characters per request and a 25
                    460: millisecond round-trip time, only 3200 characters per second can be drawn
                    461: synchronously, whereas many hardware devices are capable of displaying between
                    462: 5000 and 30000 characters per second.
                    463: 
                    464: Similarly, polling the server for keyboard and mouse input would be
                    465: unacceptable in many applications, particularly those written in sequential
                    466: languages.  For example, an application attempting to provide real-time
                    467: response to input has to poll periodically for input during screen updates.
                    468: For an application with a single thread of control, this effectively results in
                    469: synchronous output, and consequent performance loss.  Hence, input must be
                    470: generated asynchronously by the server, so that applications need at most
                    471: perform local polling.
                    472: 
                    473: The round-trip time is also important in determining what user interfaces can
                    474: be supported without embedding them directly in the server.  The most important
                    475: concern is whether remote, application-level mouse tracking is feasible.  By
                    476: @i(tracking), we do not mean maintaining the cursor image on the screen as the
                    477: user moves the mouse; that function is performed autonomously by the X server,
                    478: often directly in hardware.  Rather, applications track the mouse by animating
                    479: some other image on the screen in real time as the mouse moves.  For round-trip
                    480: times under 50 milliseconds, tracking is perfectly reasonable, driven either by
                    481: motion events generated by the server or by continuous polling from the
                    482: application.  With a refresh occurring up to 30 times every second, remote
                    483: tracking is demonstrably "instantaneous" with mouse motion.
                    484: 
                    485: For tracking to be effective, however, relatively little time can be spent
                    486: updating the display at each movement, so typically only relatively small
                    487: changes can be made to the screen while tracking.  This is certainly the case
                    488: for common operations, such as rubber banding window outlines and highlighting
                    489: menu items.  It might be argued that the ability to run application-specific
                    490: code in the server is required for acceptable hand-eye coordination during
                    491: complex tracking.  For example, NeWS@cite(news) provides such a mechanism in a
                    492: novel way.  However, we are not convinced there are sufficient benefits to
                    493: justify such complexity.  Complex tracking typically is bound up intimately
                    494: with application-specific data structures and knowledge representations, and
                    495: such information is used by the "back end" of the application as well as the
                    496: "front end".  In a distributed system it is folly to believe that applications
                    497: will download large front ends into a server; communication round-trip times
                    498: are a reality that cannot be escaped.
                    499: 
                    500: @subsection(Resources)
                    501: 
                    502: The basic resources provided by the server are windows, fonts, mouse cursors,
                    503: and off-screen images; later sections describe each of these.  Clients request
                    504: creation of a resource by supplying appropriate parameters (such as the name of
                    505: the font); the server allocates the resource and returns a 31-bit unique
                    506: identifier used to represent it.  The use and interpretation of a resource
                    507: identifier is independent of any network connection.  Any client that knows (or
                    508: guesses) the identifier for a resource can use and manipulate the resource
                    509: freely, even if it was created by another client.  This capability is required
                    510: to allow window managers to be written independently of applications, and to
                    511: allow multi-process applications to manipulate shared resources.  However, to
                    512: avoid problems associated with clients that fail to clean up their resources at
                    513: termination (which is all too common in operating systems where users can
                    514: unilaterally abort processes), the maximum lifetime of a resource is always
                    515: tied to the connection over which it was created.  Thus, when a client
                    516: terminates, all of the resources it created are destroyed automatically.
                    517: 
                    518: Access control is performed only when a client attempts to establish a
                    519: connection to the server; once the connection is established the client can
                    520: freely manipulate any resource.  Since accidental manipulation of some other
                    521: client's resource is extremely unlikely (both in theory and in practice), we
                    522: believe introducing access control on a per-resource basis would only serve to
                    523: decrease performance, not to significantly increase security or robustness.
                    524: The current access control mechanism is based simply on host network addresses,
                    525: as this information is provided by most network stream protocols, and there
                    526: seems to be no widely used or even widely available user-level authentication
                    527: mechanism.  Host-based access control has proven to be marginally acceptable in
                    528: a workstation environment, but is rather unacceptable for time-shared
                    529: machines.@footnote( It is interesting that @i(professors) at MIT have argued
                    530: vociferously to disable all access control.)
                    531: 
                    532: Each client-generated protocol request is a simple data block consisting of an
                    533: opcode, some number of fixed-length parameters, and possibly a variable-length
                    534: parameter.  For example, to display text in a window, the fixed-length
                    535: parameters include the drawing color and the identifiers for the window and the
                    536: font, and the variable-length parameter is the string of characters.  All
                    537: operations on a resource explicitly contain the identifier of the resource as a
                    538: parameter.  In this way, an application can multiplex use of many windows over
                    539: a single network connection.  This multiplexing makes it easy for the client to
                    540: control the time-order of updates to multiple windows.  Similarly, each input
                    541: event generated by the server contains the identifier of the window in which
                    542: the event occurred.  Multiplexing over a single stream allows the client to act
                    543: on events from multiple windows in correct time order; timestamps alone are
                    544: inadequate without strong guarantees from the stream mechanism.
                    545: 
                    546: Numerous Unix-based window
                    547: systems@cite(masscomp,andrew,sapphire,pnx,sunwin,mg1,metheus) use file or
                    548: channel descriptors to represent windows; window creation involves an
                    549: interaction with the operating system, which results in the creation of such a
                    550: descriptor.  Typically, this means the window cannot be named (and hence cannot
                    551: be shared) by programs running on different machines, and perhaps not even by
                    552: programs running on the same machine.  More serious, there is often a severe
                    553: restriction on the number of active descriptors a process may have:  20 on
                    554: older systems and usually 64 on newer systems.  The use of 50 or more windows
                    555: (albeit nested inside a single top-level window) is quite common in X
                    556: applications.  The use of a single connection, over which an arbitrary number
                    557: of windows can be multiplexed, is clearly a better approach.
                    558: 
                    559: @section(Window Hierarchy)
                    560: 
                    561: The server supports an arbitrarily branching hierarchy of rectangular windows.
                    562: At the top is the @i(root) window, which covers the entire screen.  The
                    563: @i(top-level) windows of applications are created as subwindows of the root
                    564: window.  The window hierarchy models the now-familiar "stacks of papers"
                    565: desktop.  For a given window, its subwindows can be stacked in any order, with
                    566: arbitrary overlaps.  When window W1 partially or completely covers window W2,
                    567: we say that W1 @i(obscures) W2.  This relationship is not restricted to
                    568: siblings; if W1 obscures W2, then W1 may also obscure subwindows of W2.  A
                    569: window also obscures its parent.  Window hierarchies never interleave; if
                    570: window W1 obscures sibling window W2, then subwindows of W2 never obscure W1 or
                    571: subwindows of W1.  A window is not restricted in size or placement by the
                    572: boundaries of its parent, but a window is always visibly clipped by its parent:
                    573: portions of the window that extend outside the boundaries of the parent are
                    574: never displayed, and do not obscure other windows.  Finally, a window can be
                    575: either @i(mapped) or @i(unmapped).  An unmapped window is never visible on the
                    576: screen; a mapped window can only be visible if all of its ancestors are also
                    577: mapped.
                    578: 
                    579: Output to a leaf window (one with no subwindows) is always clipped to the
                    580: visible portions of the window; drawing on such a window never draws into
                    581: obscuring windows.  Output to a window that contains subwindows can be
                    582: performed in two modes.  In @i(clipped) mode the output is clipped normally by
                    583: all obscuring windows (including subwindows), but in @i(draw-through) mode the
                    584: output is not clipped by subwindows.  For example, draw-through mode is used on
                    585: the root window during window management, tracking the mouse with the outline
                    586: of a window to indicate how the window is to be moved or resized.  If clipped
                    587: mode were used instead, the entire outline would not be visible.
                    588: 
                    589: The coordinate system is defined with the X axis horizontal and the Y axis
                    590: vertical.  Each window has its own coordinate system, with the origin at the
                    591: upper left corner of the window.  Having per-window coordinate systems is
                    592: crucial, particularly for top-level windows; applications are almost always
                    593: designed to be insensitive to their position on the screen, and having to worry
                    594: about race conditions when moving windows would be a disaster.  The coordinate
                    595: system is discrete: each pixel in the window corresponds to a single unit in
                    596: the coordinate system, with coordinates centered on the pixels, and all
                    597: coordinates are expressed as integers in the protocol.  We believe fractional
                    598: coordinates are not required at the protocol level for the raster graphics
                    599: provided in X (see section 6), although they may be required for high-end color
                    600: graphics, such as anti-aliasing.  The aspect ratio of the screen is not masked
                    601: by the protocol, since we believe that most displays have a one to one aspect
                    602: ratio; in this regard X is arguably device dependent.
                    603: 
                    604: Although the coordinate system is discrete at the protocol level, continuous or
                    605: alternate-origin coordinate systems certainly can be used at the application
                    606: level, but client-side libraries must eventually translate to the discrete
                    607: coordinates defined by the protocol.  In this way, we can ignore the many
                    608: variations in floating-point (or even fixed-point) formats among architectures.
                    609: Further, the coordinates can be expressed in the protocol as 16-bit quantities,
                    610: which can be manipulated efficiently in virtually every machine/display
                    611: architecture, and which minimizes the number of data bytes transmitted over the
                    612: network.  The use of 16-bit quantities does have a drawback, in that some
                    613: applications (particularly CAD tools) like to perform zoom operations simply by
                    614: scaling coordinates and redrawing, relying on the window system to clip
                    615: appropriately.  Since scaling quickly overflows 16 bits, additional clipping
                    616: must be performed explicitly by such applications.
                    617: 
                    618: A window can optionally have a @i(border), a shaded outer frame maintained
                    619: explicitly by the X server.  The origin of the window's coordinate system is
                    620: inside the border, and output to the window is clipped automatically so as not
                    621: to extend into the border.  The presence of borders slightly complicates the
                    622: semantics of the window system; for simplicity we will ignore them in the
                    623: remainder of this paper.
                    624: 
                    625: The basic operations on window structure are straightforward.  An unmapped
                    626: window is created by specifying the parent window, the position within the
                    627: parent of the upper left corner of the new window, and the width and height (in
                    628: coordinate units) of the new window.  A window can be destroyed, in which case
                    629: all windows below it in the hierarchy are also destroyed.  A window can be
                    630: mapped and unmapped, without changing its position.  A window can be moved and
                    631: resized, including being moved and resized simultaneously.  A window can also
                    632: be "depthwise" raised to the top or lowered to the bottom the stack with
                    633: respect to its siblings, without changing its coordinate position.  Currently
                    634: mapping or configuring a window forces the window to be raised.  This
                    635: restriction appeared to simplify the server implementation, but also happened
                    636: to match the basic management interface we expected to build.  This restriction
                    637: will be eliminated in the next version.
                    638: 
                    639: The windows described above are the usual @i(opaque) windows.  X also provides
                    640: @i(transparent) windows.  A transparent window is always invisible on the
                    641: screen, and does not obscure output to, or visibility of, other windows.
                    642: Output to a transparent window is clipped to that window, but is actually drawn
                    643: on the parent window.  Thus, for output, a transparent window is simply a
                    644: clipping rectangle that can be applied to restrict output within a (parent)
                    645: window.  Input processing for transparent and opaque windows is identical, as
                    646: described in Section 8.  In Section 10 we will argue that most uses of
                    647: transparent windows are better satisfied with other mechanisms.  Therefore, for
                    648: simplicity, we will ignore transparent windows in the rest of this paper.
                    649: 
                    650: The X server is designed explicitly to make windows inexpensive.  Our goal was
                    651: to make it reasonable to use windows for such things as individual menu items,
                    652: buttons, even individual items in forms and spreadsheets.  As such, the server
                    653: must deal efficiently with hundreds (though not necessarily thousands) of
                    654: windows on the screen simultaneously.  Experience with X has shown that many
                    655: implementors find this capability extremely useful.
                    656: 
                    657: @section(Color)
                    658: 
                    659: The screen is viewed as two dimensional, with an N-bit @i(pixel) value stored
                    660: at each coordinate.  The number of bits in a pixel value, and how a value
                    661: translates into a color, depends on the hardware.  X is designed to support two
                    662: types of hardware:  monochrome and pseudo-color.  A monochrome display has one
                    663: bit per pixel, and the two values translate into black and white.  Pseudo-color
                    664: displays typically have between four and twelve bits per pixel; the pixel value
                    665: is used as an index into a color map, yielding red, green, and blue
                    666: intensities.  The color map can be changed dynamically, so that a given pixel
                    667: value can represent different colors over time.  Gray-scale is viewed as a
                    668: degenerate case of pseudo-color.
                    669: 
                    670: We desire a design matching most display hardware, while abstracting
                    671: differences in such a way that programmers do not have to double or triple-code
                    672: their applications to cover the spectrum.  We also want multiple applications
                    673: to coexist within a single color map, so that applications always show true
                    674: color on the screen.  To allow this, and to keep applications device
                    675: independent, pixel values should not be coded explicitly into applications.
                    676: Instead, the server must be responsible for managing the color map, and color
                    677: map allocation must be expressed in hardware-independent terms.
                    678: 
                    679: All graphics operations in X are expressed in terms of pixel values.  For
                    680: example, to draw a line, one specifies not only the coordinates of the
                    681: end-points but the pixel value with which to draw the line.  (Logic functions
                    682: and plane-select masks are also specified, as described in Section 6.)  On a
                    683: monochrome display, the only two pixel values are zero and one, which are
                    684: (somewhat arbitrarily) defined to be black and white, respectively.  On a
                    685: pseudo-color display, pixel values zero and one are pre-allocated by the
                    686: server, for use as "black" and "white", so that monochrome applications display
                    687: correctly on color displays.  Of course, the actual colors need not be black
                    688: and white, but can be set by the user.
                    689: 
                    690: There are two ways for a client to obtain pixel values.  In the simplest
                    691: request, the client specifies red, green, and blue color values, and the server
                    692: allocates an arbitrary pixel value and sets the color map so the pixel value
                    693: represents the closest color the hardware can provide.  The color map entry for
                    694: this pixel value cannot be changed by the client, so if some other client
                    695: requests an equivalent color, the server is free to respond with the same pixel
                    696: value.  Such sharing is important in maximizing use of the color map.  To
                    697: isolate applications from variations in color representation among displays
                    698: (due, for example, to the standard of illumination used for calibration), the
                    699: server provides a color database which clients can use to translate string
                    700: names of colors into red, green, and blue values tailored for the particular
                    701: display.
                    702: 
                    703: The second request allocates writable map entries.  This mechanism was designed
                    704: explicitly for X; we are not aware of a comparable mechanism in any other
                    705: window system.  The client specifies two numbers, @i(C) and @i(P), with @i(C)
                    706: positive and @i(P) non-negative; the request can be expressed as "allocate
                    707: @i(C) colors and @i(P) planes".  The total number of pixel values allocated by
                    708: the server is @i(C*2@+(P)).  The values passed back to the client consist of
                    709: @i(C) base pixel values, and a plane mask containing @i(P) bits.  None of the
                    710: base pixel values have any one bits in common with the plane mask, and the
                    711: complete set of allocated pixel values is obtained by combining all possible
                    712: combinations of one bits from the plane mask with each of the base pixel
                    713: values.  The client can optionally require the @i(P) planes to be contiguous,
                    714: in which case all @i(P) bits in the plane mask will be contiguous.
                    715: 
                    716: There are three common uses of this second request.  One is simply to allocate
                    717: a number of "unrelated" pixel values; in this case, @i(P) will be zero.  A
                    718: second use is in imaging applications, where it is convenient to be able to
                    719: perform simple arithmetic on pixel values.  In this case, a contiguous block of
                    720: pixel values is allocated by setting @i(C) to one and @i(P) to the log (base 2)
                    721: of the number of pixel values required, and requesting contiguous allocation.
                    722: Arithmetic on the pixel values then requires at most some additional shift and
                    723: mask operations.
                    724: 
                    725: A third form of allocation arises in applications that want some form of
                    726: overlay graphics, such as highlighting or outlining regions.  Here the
                    727: requirement is to be able to draw and then erase graphics without disturbing
                    728: existing window contents.  For example, suppose an application typically uses
                    729: four colors, but needs to be able to overlay a rectangle outline in a fifth
                    730: color.  An allocation request with C set to four and P set to one results in
                    731: two groups of four pixel values.  The four base pixel values are assigned the
                    732: four normal colors, and the four alternate pixel values are all assigned the
                    733: fifth color.  Overlay graphics can then be drawn by restricting output (see the
                    734: next section) to the single bit plane specified in the mask returned by the
                    735: color allocation.  Turning bits in this plane on (to ones) changes the image to
                    736: the fifth color, and turning them off reverts the image to its original color.
                    737: 
                    738: @section(Graphics and Text)
                    739: 
                    740: Graphics operations are often the most complex part of any window system,
                    741: simply because so many different effects and variations are required to satisfy
                    742: a wide range of applications.  In this section we sketch the operations
                    743: provided in X, so that the basic level of graphics support can be understood.
                    744: The operations are essentially a subset of the Digital Workstation Graphics
                    745: Architecture; the VS100 display@cite(vs100) implements this architecture for
                    746: 1-bit pixel values.  The set of operations purposely was kept simple, in order
                    747: to maximize portability.
                    748: 
                    749: Graphics operations in X are expressed in terms of relatively high-level
                    750: concepts, such as lines, rectangles, curves, and fonts.  This is in contrast to
                    751: systems in which the basic primitives are to read and write individual pixels.
                    752: Basing applications on pixel-level primitives works well when display memory
                    753: can be mapped into the application's address space for direct manipulation.
                    754: However, both display hardware and operating systems exist for which such
                    755: direct access is not possible, and emulating pixel-level manipulations in such
                    756: an environment results in extremely poor performance.  Expressing operations at
                    757: a higher level avoids such device dependencies, and also avoids potential
                    758: problems with network bandwidth.  With high-level operations, a protocol
                    759: request transmitted as a small number of bits over the network typically
                    760: affects ten to one hundred times as many pixels on the screen.
                    761: 
                    762: @subsection(Images)
                    763: 
                    764: Two forms of off-screen images are supported in X:  bitmaps and pixmaps.  A
                    765: bitmap is a single plane (bit) rectangle.  A pixmap is an N-plane (pixel)
                    766: rectangle, where @i(N) is the number of bits per pixel used by the particular
                    767: display.  A bitmap or pixmap can be created by transmitting all of the bits to
                    768: the server; a pixmap can also be created by copying a rectangular region of a
                    769: window.  Bitmaps and pixmaps of arbitrary size can be created.  Transmitting
                    770: very large (or deep) images over a network connection can be quite slow;
                    771: however, the ability to make use of shared memory in conjunction with the IPC
                    772: mechanism would help enormously when the client and server are on the same
                    773: machine.
                    774: 
                    775: The primary use of bitmaps is as masks (clipping regions).  Several graphics
                    776: requests allow a bitmap to be used as a clipping region@cite(warnock).  Bitmaps
                    777: are also used to construct cursors, as described in Section 8.  Pixmaps are
                    778: used to store frequently drawn images, and as temporary backing-store for
                    779: pop-up menus (as described in Section 8).  However, the principal use of
                    780: pixmaps is as tiles, that is, as patterns which are replicated in two
                    781: dimensions to cover a region.  Since there are often hardware restrictions as
                    782: to what tile shapes can be replicated efficiently, guaranteed shapes are not
                    783: defined by the X protocol.  An application can query the server to determine
                    784: what shapes are supported, although to date most applications simply assume 16
                    785: by 16 tiles are supported.  A better semantics is to support arbitrary shapes,
                    786: but allow applications to query as to which shapes are most efficient.
                    787: 
                    788: The tiling origin used in X is almost always the origin of the destination
                    789: window.  That is, if enough tiles were laid out, one tile would have its upper
                    790: left corner at the upper left corner of the window.  In this way, the contents
                    791: of the window are independent of the window's position on the screen, and the
                    792: window can be moved transparently to the application.
                    793: 
                    794: Servers vary widely in the amount of off-screen memory provided.  For example,
                    795: some servers limit off-screen memory to that accessible directly to the
                    796: graphics processor (typically one to three times the size of screen memory),
                    797: and fonts and other resources are allocated from this same pool.  Other servers
                    798: utilize their entire virtual address space for off-screen memory.  Since
                    799: off-screen memory for images is finite, an explicit part of the X protocol is
                    800: the possibility that bitmap or pixmap creation can fail.  Depending on the
                    801: intended use of the image, the application may or may not be able to cope with
                    802: the failure.  For example, if the image was being stored simply to speed up
                    803: redisplay, the application can always transmit the image directly each time
                    804: (see below).  If the image was to be a temporary backing-store for a window,
                    805: the application can fall back on normal exposure processing (as described in
                    806: Section 7).  Servers should be constructed in such a way as to virtually
                    807: guarantee sufficient memory (e.g., by caching images) for creating at least
                    808: small tiles and cursors, although this is not true in current implementations.
                    809: 
                    810: @subsection(Graphics)
                    811: 
                    812: All graphics and text requests include a logic function and a plane-select mask
                    813: (an integer with the same number of bits as a pixel value) to modify the
                    814: operation.  All sixteen logic functions are provided.  Given a source and
                    815: destination pixel, the function is computed bitwise on corresponding bits of
                    816: the pixels, but only on bits specified in the plane-select mask.  Thus the
                    817: result pixel is computed as
                    818: @begin(format, leftmargin +5)
                    819: ((source FUNC destination) AND mask) OR (destination AND (NOT mask))
                    820: @end(format)
                    821: The most common operation is simply replacing the destination with the source in
                    822: all planes.
                    823: 
                    824: The simplest graphics request takes a single source pixel value and combines it
                    825: with every pixel in a rectangular region of a window.  Typically this is used
                    826: to fill a region with a color, but by varying the logic function or masks,
                    827: other effects can be achieved.  A second request takes a tile, effectively
                    828: constructs a tiled rectangular source with it, and then combines the source
                    829: with a rectangular region of a window.
                    830: 
                    831: An arbitrary image can be displayed directly, without first being stored
                    832: off-screen.  For monochrome images, the full contents of a bitmap are
                    833: transmitted, along with a pair of pixel values; the image is displayed in a
                    834: region of a window with those two colors.  For color images, the full contents
                    835: of a pixmap can be transmitted and displayed.  In order to avoid inordinate
                    836: buffer space in the server, very large images must be broken into sections on
                    837: the client side and displayed in separate requests.
                    838: 
                    839: The CopyArea request allows one region of a window to be moved to (or combined
                    840: with) another region of the same window.  This is the usual @i(bitblt), or "bit
                    841: block transfer" operation.  The source and destination are given as rectangular
                    842: regions of the window; the two regions have the same dimensions.  The operation
                    843: is such that overlap of the source and destination does not affect the result.
                    844: 
                    845: X provides a complex primitive for line drawing.  It provides for arbitrary
                    846: combinations of straight and curved segments, defining both open and closed
                    847: shapes.  Lines can be @i(solid), by drawing with a single source pixel value,
                    848: @i(dashed), by alternately drawing with a single source pixel value and not
                    849: drawing, and @i(patterned), by alternately drawing with two source pixel
                    850: values.  Lines are drawn with a rectangular brush.  Clients can query the
                    851: server to determine what brush shapes are supported; a better semantics would
                    852: be to support arbitrary shapes, but allow applications to query as to which
                    853: shapes are most efficient.
                    854: 
                    855: A final request allows an arbitrary closed shape (such as could be specified in
                    856: the line drawing request) to be filled with either a single source pixel value
                    857: or a tile.  For self-intersecting shapes, the even-odd rule is used: a point is
                    858: inside the shape if an infinite ray with the point as origin crosses the path
                    859: an odd number of times.
                    860: 
                    861: @subsection(Text)
                    862: 
                    863: For high-performance text, X provides direct support for bitmap fonts.  A font
                    864: consists of up to 256 bitmaps; each bitmap in a font has the same height but
                    865: can vary in width.  To allow server-specific font representations, clients
                    866: "create" fonts by specifying a name rather than by downloading bitmap images
                    867: into the server.  An application can use an arbitrary number of fonts, but (as
                    868: with all resources) font allocation can fail for lack of memory.  A reasonably
                    869: implemented server should support an essentially unbounded number of fonts
                    870: (e.g., by caching), but some existing server implementations are deficient in
                    871: this respect.  Unlike Andrew@cite(wm), no heuristics are applied by the server
                    872: when resolving a name to a font; specific communities or applications may
                    873: demand a variety of heuristics, and as such they belong outside the base window
                    874: system.  Also unlike Andrew, the X server is not free to dynamically substitute
                    875: one font for another; we do not believe such behavior is necessary or
                    876: appropriate.
                    877: 
                    878: A string of text can be displayed using a font either as a mask or as a source.
                    879: Using a font as a mask, the foreground (the one bits in the bitmap) of each
                    880: character is drawn with a single source pixel value.  Using a font as a source,
                    881: the entire image of each character is drawn, using a pair of pixel values.
                    882: Source font output is provided specifically for applications using fixed-width
                    883: fonts in emulating traditional terminals.
                    884: 
                    885: To support "cut and paste" operations between applications, the server provides
                    886: a number of buffers into which a client can read and write an arbitrary string
                    887: of bytes.  (This mechanism was adopted from Andrew.)  Although these buffers
                    888: are used principally for text strings, the server imposes no interpretation on
                    889: the data, so cooperating applications can use the buffers to exchange such
                    890: things as resource identifiers and images.
                    891: 
                    892: @section(Exposures)
                    893: 
                    894: Given that output to obscured windows is possible, the issue of @i(exposure)
                    895: must be addressed.  When all (or a piece) of an obscured window again becomes
                    896: visible (for example, as the result of the window being raised), is the client
                    897: or the server responsible for restoring the contents of the window?  In X, it
                    898: is the responsibility of the client.  When a region of a window becomes
                    899: exposed, the server sends an asynchronous event to the client, specifying the
                    900: window and the region that has been exposed; the rest is up to the application.
                    901: A trivial application might simply redraw the entire window; a more
                    902: sophisticated application would only redraw the exposed region.
                    903: 
                    904: Why is the client responsible?  Because X imposes no structure on, or
                    905: relationships between, graphics operations from a client, there are only two
                    906: basic mechanisms by which the server might restore window contents:  by
                    907: maintaining display lists, and by maintaining off-screen images.  In the first
                    908: approach, the server essentially retains a list of all output requests
                    909: performed on the window.  When a region of the window becomes exposed, the
                    910: server either re-executes all requests to the entire window, or only
                    911: re-executes requests that affect the region while clipping the output to that
                    912: region.  In the alternative approach, when a window becomes obscured the server
                    913: saves the obscured region (or perhaps the entire window) in off-screen memory.
                    914: All subsequent output requests are executed not only to the visible regions of
                    915: the window, but to the off-screen image as well.  When an obscured region
                    916: becomes visible again, the off-screen copy is simply restored.
                    917: 
                    918: We believe neither server-based approach is acceptable.  With display lists,
                    919: the server is unlikely to have any reasonable notion of when later output
                    920: requests nullify earlier ones.  Either the display list becomes unmanageably
                    921: long, and a refresh that should appear nearly instantaneous instead appears as
                    922: a slow-motion replay, or the server spends a significant length of time pruning
                    923: the display list, and normal-case performance is considerably reduced.  One
                    924: problem with the off-screen image approach is (virtual) memory consumption:  on
                    925: a 1024 by 1024 8-plane display, just one full-screen image requires one
                    926: megabyte of storage, and multiple overlapping windows could easily require many
                    927: times that amount.  Another problem is that the cost of the implementation can
                    928: be prohibitive.  Consider, for example, the QDSS display@cite(qdss), which has
                    929: a graphics co-processor.  In the QDSS, display memory is inaccessible to the
                    930: host processor.  In addition, the co-processor cannot perform operations in
                    931: host memory, and has relatively little off-screen memory of its own.  The only
                    932: viable way to maintain off-screen images for displays like the QDSS may be to
                    933: emulate the co-processor in software.  It can easily take tens of thousands of
                    934: lines of code to emulate a co-processor, and such emulation may execute orders
                    935: of magnitude slower than the co-processor.
                    936: 
                    937: Our belief is that many applications can take advantage of their own
                    938: information structures to facilitate rapid redisplay, without the expense of
                    939: maintaining a distinct display structure or backing-store in the client or the
                    940: server, and often with even better performance.  (Sapphire@cite(sapphire)
                    941: permits client refresh for this reason.)  For example, a text editor can
                    942: redisplay directly from the source, and a VLSI editor can redisplay directly
                    943: from the layout and component definitions.  Many applications will be built on
                    944: top of high-level graphics libraries that automatically maintain the data
                    945: structures necessary to implement rapid redisplay.  For example, the structured
                    946: display file mechanism in VGTS could be supported in a client library.  Of
                    947: course, pushing the responsibility back on the application may not simplify
                    948: matters, particularly when retrofitting old systems to a new environment.  For
                    949: example, the current GKS design does not provide adequate hooks for automatic,
                    950: system-generated refresh of application windows, nor does it provide an
                    951: adequate mechanism for forcing refresh back on the application.
                    952: 
                    953: Relying on client-controlled refresh also derives from window management
                    954: philosophy.  Our belief is that applications cannot be written with fixed
                    955: top-level window sizes built in.  Rather, they must function correctly with
                    956: almost any size, and continue to function correctly as windows are dynamically
                    957: resized.  This is necessary if applications are to be usable on a variety of
                    958: displays under a variety of window management policies.  (Of course, an
                    959: application may need a minimum size to function reasonably, and may prefer the
                    960: width or height to be a multiple of some number; X allows the client to attach
                    961: a resize hint to each window to inform window managers of this.)  Our belief is
                    962: that most applications, for one reason or another, will already have code for
                    963: performing a complete redisplay of the window, and that it is usually
                    964: straightforward to modify this code to deal with partial exposures.  Similar
                    965: arguments were used in the design of both Andrew and Mex, and experience has
                    966: confirmed their decision@cite(wm,mex).
                    967: 
                    968: This is not to argue that the server should never maintain window contents,
                    969: only that it should not be @i(required) to maintain contents.  For complex
                    970: imaging and graphics applications, efficient maintenance by the server may be
                    971: critical for acceptable performance of window management functions.  There is
                    972: nothing inherent in the X protocol that precludes the server from maintaining
                    973: window contents and not generating exposure events.  In the next version of X,
                    974: windows will have several attributes to advise the server as to when and how
                    975: contents should be maintained.
                    976: 
                    977: In X, clients are never informed of what regions are obscured, only of what
                    978: regions have become visible.  Thus, clients have insufficient information to
                    979: try and optimize output by only drawing to visible regions.  However, we feel
                    980: this is justified on two grounds.  First, realistically, users seldom stack
                    981: windows such that the active ones are obscured, so there is little point in
                    982: complicating applications to optimize this case.  More importantly, allowing
                    983: applications to restrict output to only visible regions would conflict with the
                    984: desire to have the server maintain obscured regions automatically when
                    985: possible.
                    986: 
                    987: An interesting complication with the CopyArea request (described in Section 6)
                    988: arises, having decided on client refresh.  If part of the source region of the
                    989: CopyArea is obscured, then not all of the destination region can be updated
                    990: properly, and the client must be notified (with an exposure event) so that it
                    991: can correct the problem.  Since output requests are asynchronous, care must be
                    992: taken by the application to handle exposure events when using CopyArea.  In
                    993: particular, if a region is exposed and an event sent by the server, a
                    994: subsequent CopyArea may move all or part of the region before the event is
                    995: actually received by the application.  Several simple algorithms have been
                    996: designed to deal with this situation, but we will not present them here.
                    997: 
                    998: Client refresh raises a visual problem in a network environment.  When a region
                    999: of a window becomes exposed, what contents should the server initially place in
                   1000: that window?  In a local, tightly-coupled environment, it might be perfectly
                   1001: reasonable to leave the contents unaltered, because the client can almost
                   1002: instantaneously begin to refresh the region.  In a network environment however
                   1003: (and even in a local system where processes can get "swapped out" and take
                   1004: considerable time to swap back in), inevitable delays can lead to visually
                   1005: confusing results.  For example, the user may move a window, and see two images
                   1006: of the window on the screen for a significant length of time, or resize a
                   1007: window and see no immediate change in the appearance of the screen.
                   1008: 
                   1009: To avoid such anomalies in X, clients must define a @i(background) for every
                   1010: window.  The background can be a single color, or it can be a tiling pattern.
                   1011: Whenever a region of a window is exposed, the server immediately paints the
                   1012: region with the background.  Users therefore see window shapes immediately,
                   1013: even if the "contents" are slow to arrive.  Of course, many application windows
                   1014: have some notion of a background anyway, so having the server initialize with a
                   1015: background seldom results in extraneous redisplay.  In fact, many non-leaf
                   1016: windows typically contain nothing but a background, and having the server paint
                   1017: that background frees the applications from performing any redisplay at all to
                   1018: those windows.
                   1019: 
                   1020: Although we believe client-generated refresh is acceptable most of the time, it
                   1021: does not always perform well with momentary pop-up menus, where speed is at a
                   1022: premium.  To avoid potentially expensive refresh when a menu is removed from
                   1023: the screen, a client can explicitly copy the region to be covered by the menu
                   1024: into off-screen memory (within the server) before mapping the menu window.  A
                   1025: special unmap request is used to remove the menu:  it unmaps the window without
                   1026: affecting the contents of the screen or generating exposure events.  The
                   1027: original contents are then copied back onto the screen.  In addition, the
                   1028: client usually @i(grabs) the server for the entire sequence, using a request
                   1029: which freezes all other clients until a corresponding ungrab request is issued
                   1030: (or the grabbing client terminates).  Without this, concurrent output from
                   1031: other clients to regions obscured by the menu would be lost.  Although freezing
                   1032: other clients is in general a poor idea, it seems acceptable for momentary
                   1033: menus.
                   1034: 
                   1035: @section(Input)
                   1036: 
                   1037: We now turn to a discussion of input events, but first we briefly describe the
                   1038: support for mouse cursors.  Clients can define arbitrary shapes for use as
                   1039: mouse cursors.  A cursor is defined by a source bitmap, a pair of pixel values
                   1040: with which to display the bitmap, a mask bitmap which defines the precise shape
                   1041: of the image, and a coordinate within the source bitmap which defines the
                   1042: "center" or "hot spot" of the cursor.  Cursors of arbitrary size can be
                   1043: constructed, although only a portion of the cursor may be displayed on some
                   1044: hardware.  Clients can query the server to determine what cursor sizes are
                   1045: supported, but existing applications typically just assume a 16 by 16 image can
                   1046: always be displayed.  Cursors also can be constructed from character images in
                   1047: fonts; this provides a simple form of named indirection, allowing custom
                   1048: tailoring to each display without having to modify the applications.
                   1049: 
                   1050: A window is said to @i(contain) the mouse if the hot spot of the cursor is
                   1051: within a visible portion of the window or one of its subwindows.  The mouse is
                   1052: said to be @i(in) a window if the window contains the mouse but no subwindow
                   1053: contains the mouse.  Every window can have a mouse cursor defined for it.  The
                   1054: server automatically displays the cursor of whatever window the mouse is
                   1055: currently in; if the window has no cursor defined, the server displays the
                   1056: cursor of the closest ancestor with a cursor defined.
                   1057: 
                   1058: Input is associated with windows.  Input to a given window is controlled by a
                   1059: single client, which need not be the client that created the window.  Events
                   1060: are classified into various types, and the controlling client selects which
                   1061: types are of interest to it.  Only events matching in type with this selection
                   1062: are sent to the client.  When an input event is generated for a window and the
                   1063: controlling client has not selected that type, the server @i(propagates) the
                   1064: event to the closest ancestor window for which some client has selected the
                   1065: type, and sends the event to that client instead.  Every event includes the
                   1066: window that had the event type selected; this window is called the @i(event
                   1067: window).  If the event has been propagated, the event also includes the next
                   1068: window down in the hierarchy between the event window and the original window
                   1069: on which the event was generated.
                   1070: 
                   1071: @subsection(The Keyboard)
                   1072: 
                   1073: For the keyboard, a client can selectively receive events on the press or
                   1074: release of a key.  Keyboard events are not reported in terms of ASCII character
                   1075: codes; instead, each key is assigned a unique code, and client software must
                   1076: translate these codes into the appropriate characters.  The mapping from
                   1077: keycaps to keycodes is intended to be "universal" and predefined; a given
                   1078: keycap has the same keycode on all keyboards.  Applications generally have been
                   1079: written to read a "keymap file" from the user's home directory, so that users
                   1080: can remap the keyboard as they see fit.
                   1081: 
                   1082: The use of coded keys is secondary to the ability to detect both up and down
                   1083: transitions on the keyboard.  For example, a common trick in window systems is
                   1084: for mouse button operations to be affected by keyboard @i(modifiers) such as
                   1085: the Shift, Control, and Meta keys.  A useful feature of the Genera@cite(genera)
                   1086: system is the use of a "mouse documentation line", which changes dynamically as
                   1087: modifiers are pressed and released, indicating the function of the mouse
                   1088: buttons.  A base window system must provide this capability.  Transitions are
                   1089: not only useful on modifiers; various applications for systems other than X
                   1090: have been designed to use "chords" (groups of keys pressed simultaneously), and
                   1091: again the window system should support them.
                   1092: 
                   1093: The keyboard is always @i(attached) to some window (typically the root window
                   1094: or a top-level window); we call this window the @i(focus) window.  A request
                   1095: can be used (usually by the input manager) to attach the keyboard to any
                   1096: window.  The window that receives keyboard input depends on both the mouse
                   1097: position and the focus window.  If the mouse is in some descendant of the focus
                   1098: window, that descendant receives the input.  If the mouse is not in a
                   1099: descendant of the focus window, then the focus window receives the input, even
                   1100: if the mouse is outside the focus window.  For applications that wish to have
                   1101: the mouse state modify the effect of keyboard input, a keyboard event contains
                   1102: the mouse coordinates, both relative to the event window and global to the
                   1103: screen, as well as the state of the mouse buttons.
                   1104: 
                   1105: To provide a reasonable user interface, keyboard events also contain the state
                   1106: of the most common modifier keys:  Shift, ShiftLock, Control, and Meta.
                   1107: Without this information, anomalous behavior can result.  If the user switches
                   1108: windows while modifier keys are down, the new client must somehow determine
                   1109: which modifiers are down.  Placing the modifier state in the keyboard events
                   1110: solves such problems, and also has another benefit:  most clients do not have
                   1111: to maintain their own shadow of the modifier state, and so often can completely
                   1112: ignore key release events.  However, there is a conflict between this
                   1113: server-maintained state and client-maintained keyboard mappings.  In
                   1114: particular, clients cannot use non-standard keys as modifiers, or use chords
                   1115: without the possibility of anomalies such as described above.  We believe the
                   1116: correct solution (not yet supported in X) is for the server to maintain a bit
                   1117: mask reflecting the full state of the keyboard, and to allow clients to read
                   1118: this mask.  An application using chords or non-standard modifiers would request
                   1119: the server to send this mask automatically whenever the mouse entered the
                   1120: application's window.
                   1121: 
                   1122: @subsection(The Mouse)
                   1123: 
                   1124: The X protocol is (somewhat arbitrarily) designed for mice with up to three
                   1125: buttons.  An application can selectively receive events on the press or release
                   1126: of each button.  Each event contains the current mouse coordinates (both local
                   1127: to the window and global to the screen), the current state of all buttons and
                   1128: modifier keys, and a timestamp which can be used, for example, to decide when a
                   1129: succession of clicks constitutes a double or triple click.  An application can
                   1130: also choose to receive mouse motion events, either whenever the mouse is in the
                   1131: window, or only when particular buttons have also been pressed.  The
                   1132: application cannot control the granularity of the reporting, nor is any minimum
                   1133: granularity guaranteed.  In fact, typical server implementations make an effort
                   1134: to compact motion events, to minimize system overhead and wired memory in
                   1135: device drivers.  As such, X may not serve adequately for fine-grained tracking,
                   1136: such as in fast moving free-hand drawing applications.
                   1137: 
                   1138: Even with motion compaction, servers can generate considerable numbers of
                   1139: motion events.  If an application attempts to respond in real time to every
                   1140: event, it can easily get far behind relative to the actual position of the
                   1141: mouse.  Instead, many applications simply treat motion events as hints.  When a
                   1142: motion event is received, the event is simply discarded, and the client then
                   1143: explicitly queries the server for the current mouse position.  In waiting for
                   1144: the reply, more motion events may be received; these are also discarded.  The
                   1145: client then reacts based on the queried mouse position.  The advantage of this
                   1146: scheme over continuously polling the mouse position is that no CPU time is
                   1147: consumed while the mouse is stationary.
                   1148: 
                   1149: Clients can also receive an event each time the mouse enters or leaves a
                   1150: window.  This can be particularly useful in implementing menus.  For example,
                   1151: each menu item can be placed in a separate subwindow of the overall menu
                   1152: window.  When the mouse enters a subwindow, the item is highlighted in some
                   1153: fashion (e.g., by inverting the video sense), and when the mouse leaves the
                   1154: window the item is restored to normal.  Implementing a menu in this manner
                   1155: requires considerably less CPU overhead than continuous polling of the mouse,
                   1156: and also less overhead than using motion events, since most motion events would
                   1157: be within windows and thus uninteresting.
                   1158: 
                   1159: Due to the nature of overlapping windows, and because continuous tracking by
                   1160: the server is not guaranteed, the mouse may appear to move instantaneously
                   1161: between any pair of windows on the screen.  Certainly the window the mouse was
                   1162: in should be notified of the mouse leaving, and the window the mouse is now in
                   1163: should be notified of the mouse entering.  However, all of the windows "in
                   1164: between" in the hierarchy may also be interested in the transition.  This is
                   1165: useful in simplifying the structure of some applications, and is necessary in
                   1166: implementing certain kinds of window managers and input managers.  Thus, when
                   1167: the mouse moves from window A to window B, with window W as their closest
                   1168: (least) common ancestor, all ancestors of A below W also receive leave events,
                   1169: and all ancestors of B below W receive enter events.
                   1170: 
                   1171: Except for mouse motion events, it might be argued that events are infrequent
                   1172: enough that the server should always send all events to the client, and
                   1173: eliminate the complexity of selecting events.  However, some applications are
                   1174: written with interrupt-driven input; events are received asynchronously, and
                   1175: cause the current computation to be suspended so that the input can be
                   1176: processed.  For example, a text editor might use interrupt-driven input, with
                   1177: the normal computation being redisplay of the window.  The receipt of
                   1178: extraneous input events (for example, key release events) can cause noticeable
                   1179: "hiccups" in such redisplay.
                   1180: 
                   1181: @section(Input and Window Management)
                   1182: 
                   1183: There are two basic modes of keyboard management:  @i(real-estate) and
                   1184: @i(listener).  In real-estate mode, the keyboard "follows" the mouse; keyboard
                   1185: input is directed to whatever window the mouse is in.  In listener mode,
                   1186: keyboard input is directed to a specific window, independent of the mouse
                   1187: position.  Some systems provide only real-estate mode@cite(apollo,sunwin), some
                   1188: only listener mode@cite(lucasfilm,sapphire,pnx,mex,mg1,genera), and
                   1189: Andrew@cite(wm) provides both, although the mode cannot be changed during a
                   1190: session.  Both modes are supported in X, and the mode can be changed
                   1191: dynamically.  Real-estate mode is the default behavior, with the root window as
                   1192: the focus window, as described in the previous section.  An input manager can
                   1193: also make some other (typically top-level) window the focus window, yielding
                   1194: listener mode.  Note however, that in listener mode in X, the client
                   1195: controlling the focus window can still get real-estate behavior for subwindows,
                   1196: if desired; this capability has proven useful in several applications.
                   1197: 
                   1198: The primary function of a window manager is reconfiguration:  restacking,
                   1199: resizing, and repositioning top-level windows.  The configuration of nested
                   1200: windows is assumed to be application-specific, and under control of the
                   1201: applications.  There are two broad categories of window managers:  manual and
                   1202: automatic.  A manual window manager is "passive", and simply provides an
                   1203: interface to allow the user to manipulate the desktop; windows can be resized
                   1204: and reorganized at will.  The initial size and position of a window typically
                   1205: (but not always) is under user or application control.  Automatic window
                   1206: managers are "active", and operate for the most part without human interaction;
                   1207: size and position at window creation, and reconfiguration at window
                   1208: destruction, are chosen by the system.  Automatic managers typically tile the
                   1209: screen with windows, such that no two windows overlap, automatically adjusting
                   1210: the layout as windows are created and destroyed.  Andrew@cite(wm),
                   1211: Star@cite(star2), and Cedar@cite(cedar) provide automatic management, plus
                   1212: limited manual reconfiguration capability.
                   1213: 
                   1214: Existing window managers for X are manual.  Automatic management that is
                   1215: transparent to applications cannot be accomplished reasonably in X; future
                   1216: support for automatic management is discussed in Section 10.  In the current X
                   1217: design, clients are responsible for initially sizing and placing their
                   1218: top-level windows, not window managers.  In this way, applications continue to
                   1219: work when no window manager is present.  Typically, the user either specifies
                   1220: geometry information in the application command line, or uses the mouse to
                   1221: sweep out a rectangle on the screen.  (For the latter, the application grabs
                   1222: the mouse, as described below.)
                   1223: 
                   1224: @subsection(Mouse-Driven Management)
                   1225: 
                   1226: Existing managers are primarily mouse-driven, and are based on the ability to
                   1227: "steal" events.  Specifically, a manager (or any other client) can @i(grab) a
                   1228: mouse button in combination with a set of modifier keys, with the following
                   1229: effect.  Whenever the modifier keys are down and the button is pressed, the
                   1230: event is reported to the grabbing client, regardless of what window the mouse
                   1231: is in.  All mouse-related events continue to be sent to that client until the
                   1232: button is released.  As part of the grab, the client also specifies a mouse
                   1233: cursor to be used for the duration of the grab, and a window to be used as the
                   1234: event window.  A manager specifies the root window as the event window when
                   1235: grabbing buttons; with the event propagation semantics described in Section 8,
                   1236: the grabbed events contain not only the global mouse coordinates, but also the
                   1237: top-level application window (if any) containing the mouse.  This is sufficient
                   1238: information to manipulate top-level windows.
                   1239: 
                   1240: Using this button-grab mechanism, several different management interfaces have
                   1241: been built, including a "programmable" interface@cite(uwm) allowing the user to
                   1242: assign individual commands or user-defined menus of commands to any number of
                   1243: button/modifier combinations.  For example, a button click (press and release
                   1244: without intervening motion) might be interpreted as a command to raise or lower
                   1245: a window, or to attach the keyboard; a press/motion/release sequence might be
                   1246: interpreted as a command to move a window to a new position; or a button press
                   1247: might cause a menu to pop up, with the selection indicated by the mouse
                   1248: position at the release of the button.  By allowing both specific commands and
                   1249: menus to be bound to buttons, a range of interfaces can be constructed to
                   1250: satisfy both "expert" and "novice" users.
                   1251: 
                   1252: Another form of manager simply displays a static menu bar along the top of the
                   1253: screen, with items for such operations as moving a window and attaching the
                   1254: keyboard.  The menu is used in combination with a mouse-grab primitive, with
                   1255: which a client can unilaterally grab the mouse and then later explicitly
                   1256: release it; during such a mouse-grab, events are redirected to the grabbing
                   1257: client, just as for button-grabs.  When the user clicks on a menu bar item with
                   1258: any button, the manager unilaterally grabs the mouse.  The user then uses the
                   1259: mouse to execute the specific command.  For example, having clicked on the
                   1260: "move" item, the user indicates the window to move by placing the mouse in the
                   1261: window and pressing a button, then indicates the new position by moving the
                   1262: mouse and releasing the button.  The manager then releases the mouse.
                   1263: 
                   1264: @subsection(Icons)
                   1265: 
                   1266: One important "resizing" operation performed by a window manager is
                   1267: transforming a window into a small icon and back again.  In X, icons are merely
                   1268: windows.  Transforming a window into an icon simply involves unmapping the
                   1269: window and mapping its associated icon.  The association between a window and
                   1270: its icon is maintained in the server, rather than the window manager, and
                   1271: either the application or the manager can provide the icon.  In this way, the
                   1272: manager can provide a default icon form for most clients, but clients can
                   1273: provide their own if desired, possibly with dynamic rather than static
                   1274: contents.  The client is still insulated from management policy, even if it
                   1275: provides the icon:  the manager is responsible for positioning, mapping, and
                   1276: unmapping the icon, and the client is responsible only for displaying the
                   1277: contents.
                   1278: 
                   1279: The icon state is maintained in the server not only to allow clients to provide
                   1280: icons, but to avoid the loss of state if the window manager should terminate
                   1281: abnormally.  When a window manager terminates, any windows it has created are
                   1282: destroyed, including icon windows.  With knowledge of icons, the server can
                   1283: detect when an icon is destroyed, and automatically remap the associated client
                   1284: window.  Without this, abnormal termination of the window manager would result
                   1285: in "lost" windows.
                   1286: 
                   1287: @subsection(Race Conditions)
                   1288: 
                   1289: There are many race conditions that must be dealt with in input and window
                   1290: management, due to the asynchronous nature of event handling.  For example, if
                   1291: a manager attempts to grab the mouse in response to a press of a button, the
                   1292: mouse-grab request might not reach the server until after the button is
                   1293: released, and intervening mouse events would be missed.  Or, if the user clicks
                   1294: on a window to attach the keyboard there, and then immediately begins typing,
                   1295: the first few keystrokes might occur before the manager actually responds to
                   1296: the click and the server actually moves the keyboard focus.  A final example is
                   1297: a simple interface in which clicking on a window lowers it.  Given a stack of
                   1298: three windows, the user might rapidly click twice in the same spot, expecting
                   1299: the top two windows to be lowered.  Unless the first click is sent to the
                   1300: manager and the resulting request to lower is processed by the server before
                   1301: the second click takes place, the event window for the second click will be the
                   1302: same as for the first click, and the manager will lower the first window twice.
                   1303: 
                   1304: A work-around for the last example, used by existing managers, is to ignore the
                   1305: event window reported in most events.  Instead, the global mouse coordinates
                   1306: reported in the event are used in a follow-up query request to determine which
                   1307: top-level window now contains that coordinate.  However, not all race
                   1308: conditions have acceptable solutions within the current X design.  For a
                   1309: general solution, it must be possible for the manager to synchronize operations
                   1310: explicitly with event processing in the server.  For example, a manager might
                   1311: specify that, at the press of a button, event processing in the server should
                   1312: cease until an explicit acknowledgment is received from the manager.
                   1313: 
                   1314: @section(Future)
                   1315: 
                   1316: Based on critiques from numerous universities and commercial firms, a fairly
                   1317: extensive evaluation and redesign of the X protocol has been underway since May
                   1318: 1986.  Our desire is to define a "core" protocol that can serve as a standard
                   1319: for window system construction over the next several years.  We expect to
                   1320: present the rationale for this new design in the very near future, once it has
                   1321: been validated by at least a preliminary implementation.  In this section, we
                   1322: highlight the major protocol changes.
                   1323: 
                   1324: @subsection(Resource Allocation)
                   1325: 
                   1326: Since the server is responsible for assigning identifiers to resources, each
                   1327: resource allocation currently requires a round-trip time to perform.  For
                   1328: applications that allocate many resources, this causes a considerable start-up
                   1329: delay.  For example, a multi-pane menu might consist of dozens of windows,
                   1330: numerous fonts, and several different mouse cursors, leading to a delay of one
                   1331: second or longer.
                   1332: 
                   1333: In retrospect, this is the most significant defect in the design of X.  To get
                   1334: around these delays, programming interfaces have been augmented to provide
                   1335: "batch mode" operations.  If several resources must be created, but there are
                   1336: no inter-dependencies among the allocation requests, all of the requests are
                   1337: sent in a batch, and then all of the replies are received.  This effectively
                   1338: reduces the delay to a single round-trip time.
                   1339: 
                   1340: A better solution to this problem is to make clients generate the identifiers.
                   1341: When the client establishes a connection to the server, it is given a specific
                   1342: subrange from which it can allocate.  This change will significantly improve
                   1343: start-up times without affecting applications, as identifiers can be generated
                   1344: inside low-level libraries without changing programming interfaces.
                   1345: 
                   1346: @subsection(Transparent Windows)
                   1347: 
                   1348: One use of transparent windows is as clipping regions.  However, they are
                   1349: unsatisfactory for this purpose because every coordinate in a graphics request
                   1350: must be translated by the client from the "real" window's origin to the
                   1351: transparent window's origin.  A better approach to clipping regions is to allow
                   1352: clients to create clipping regions and attach them to all graphics requests.
                   1353: As noted in Section 6, X currently allows a clipping region in the form of a
                   1354: bitmap to be attached to a few graphics requests.  Allowing a clipping region,
                   1355: specified either as a bitmap or a list of rectangles, to be attached to all
                   1356: graphics requests provides a more uniform mechanism.
                   1357: 
                   1358: The major use of transparent windows to date is actually as inexpensive opaque
                   1359: windows.  In the current server implementation, transparent windows can be
                   1360: created and transformed significantly faster than opaque windows.  Because of
                   1361: this, transparent windows are often used when opaque windows would otherwise be
                   1362: adequate.  We believe a new implementation of the server will improve the
                   1363: performance of opaque windows to the point that this will no longer be
                   1364: necessary.
                   1365: 
                   1366: With explicit clipping regions added for graphics, and the performance
                   1367: advantages of transparent windows reduced, the only remaining use of
                   1368: transparent windows is for input (and cursor) control.  Various applications
                   1369: want relatively fine-grained input control, and such control must not affect
                   1370: graphics output.  Close control of cursor images and mouse motion events seems
                   1371: particularly important.  However, the vast majority of the time control
                   1372: naturally is associated with normal window boundaries, so it would be unwise to
                   1373: divorce input control completely from windows.  As such, the new protocol
                   1374: provides "input-only" windows, which act like normal windows for the purposes
                   1375: of input and cursor control, but which cannot be used as a source or
                   1376: destination in graphics requests, and which are completely invisible as far as
                   1377: output is concerned.
                   1378: 
                   1379: @subsection(Color)
                   1380: 
                   1381: X originally was not designed to deal with direct-color displays.  Direct-color
                   1382: displays typically have between 12 and 36 bits per pixel; the pixel value
                   1383: consists of three subfields, which are used as indexes into three independent
                   1384: color maps: one for red intensities, one for green, and one for blue.  Some
                   1385: direct-color displays also have a fourth subfield, sometimes referred to as
                   1386: "z-channel" information, used to control attributes such as blending or chroma
                   1387: keying.  We now understand how to incorporate direct-color displays without
                   1388: z-channel information into X, in such a way that the differences between
                   1389: direct-color and pseudo-color color maps need not be apparent to the
                   1390: application, yet still allowing all of the usual color map tricks to played.
                   1391: 
                   1392: At present there is only one color map for all applications, and color
                   1393: applications fail when this map gets full.  Although dozens of applications
                   1394: typically can be run under X within a single 8-bit pseudo-color map, a single
                   1395: map is clearly unacceptable when dealing with small color maps, or with
                   1396: multiple applications (e.g., CAD tools) that need large portions of the color
                   1397: map.  The solution is to support multiple virtual color maps, still permitting
                   1398: applications to coexist within any map, but allowing the possibility that not
                   1399: all applications show true color simultaneously.  This also matches
                   1400: next-generation displays, which actually support multiple color maps in
                   1401: hardware@cite(rainbow).
                   1402: 
                   1403: @subsection(Graphics)
                   1404: 
                   1405: Perhaps the biggest mistake in the graphics area was failing to support fonts
                   1406: with kerning (side bearings).  For example, a relatively complete emulation of
                   1407: the Andrew programming interface was built for X, but Andrew applications
                   1408: depend heavily on kerned fonts.  There are other deficiencies that will be
                   1409: corrected.  For example, large glyph-sets (e.g., Japanese) will be supported,
                   1410: as well as stippling (using a clip mask constructed by tiling a region with a
                   1411: bitmap).  The notions of line width, join style, and end style found in
                   1412: PostScript@cite(postscript) are usually preferred to brush shapes for line
                   1413: drawing, and will be supported.
                   1414: 
                   1415: In an attempt to support a wide range of devices, the exact path followed for
                   1416: lines and filled shapes was originally left undefined in X (the class of curve
                   1417: was not even specified).  Different devices use slightly different algorithms
                   1418: to draw straight lines, and it seemed better to have high performance with
                   1419: minor variation than to have uniformity with poor performance.  Relatively few
                   1420: devices support curve drawing in hardware, but some support it in firmware, and
                   1421: again performance seemed more important than accuracy.  In retrospect, however,
                   1422: allowing such device dependent behavior was a poor decision.  The vast majority
                   1423: of applications draw lines aligned on an axis, and speed and precision are not
                   1424: an issue.  The applications that do require complex shapes also require
                   1425: predictable results, so precise specifications are important.
                   1426: 
                   1427: A notable feature missing in X is the ability to perform graphics operations
                   1428: off screen.  The reasons for this are essentially the same as those presented
                   1429: when discussing exposures in Section 7.  In particular, not all graphics
                   1430: co-processors can operate on host memory, and emulating such processors can be
                   1431: expensive.  However, application builders have demanded this capability, and
                   1432: the demand appears to be sufficient leverage to convince server implementors to
                   1433: provide the capability.  Off-screen graphics will be possible in the new
                   1434: protocol, although the amount of off-screen memory and its performance
                   1435: characteristics may vary widely.  In addition, the protocol is being extended
                   1436: to allow the manipulation of both images and windows of varying depths.  For
                   1437: example, a server might support depths of 1, 4, 8, 12, and 24 bits.  This
                   1438: allows imaging applications to transmit data more compactly, allows for more
                   1439: efficient memory utilization in the server, and provides a match with
                   1440: next-generation display hardware.
                   1441: 
                   1442: A common debate in graphics systems is whether and where to have state.  Should
                   1443: parameters such as logic function, plane mask, source pixel value or tile,
                   1444: tiling origin, font, line width and style, and clipping region be explicit in
                   1445: every request or collected into a state object?  The current X protocol is
                   1446: stateless, for the following reasons:  both state and stateless programming
                   1447: interfaces can be built easily on top of the protocol; the currently supported
                   1448: graphics requests have just few enough parameters that they can be represented
                   1449: compactly; and the initial set of displays we were interested in (and the
                   1450: implementations we had in mind for them) would not benefit from the addition of
                   1451: state.  However, we now believe that a state-based protocol is generally
                   1452: superior, as it handles complex graphics gracefully and allows significantly
                   1453: faster implementations on some displays.
                   1454: 
                   1455: @subsection(Management)
                   1456: 
                   1457: An obvious interface style presently not supported in X is the ability to use
                   1458: the keyboard for management commands.  To allow this, a key-grab mechanism,
                   1459: akin to the button-grab mechanism described in Section 9, will be provided.  To
                   1460: allow such styles as using the first button click in a window to attach the
                   1461: keyboard, both button-grabs and key-grabs have been extended to apply to
                   1462: specific sub-hierarchies, rather than always to the entire screen.  To handle
                   1463: the kinds of race conditions described in Section 9, a general event
                   1464: synchronization mechanism has been incorporated into the grab mechanisms.
                   1465: 
                   1466: To support automatic window management, a manager must be able to intercept
                   1467: certain management requests from clients (such as mapping or moving a window)
                   1468: before they are executed by the server, and to be notified about others (such
                   1469: as unmapping a window) after they are executed.  In addition, some managers
                   1470: want to provide uniform title bars and border decorations automatically.  To
                   1471: allow this, it is useful to be able to "splice" hierarchies:  to move a window
                   1472: from one parent to another.  To allow input managers and window managers to be
                   1473: implemented as separate applications, the ability for multiple clients to
                   1474: select events on the same window is being added.  For example, both a window
                   1475: manager and an input manager might be interested in the unmapping or
                   1476: destruction of a window.
                   1477: 
                   1478: @subsection(Extensibility)
                   1479: 
                   1480: The information that input and window managers might desire from applications
                   1481: is quite varied, and it would be a mistake to try and define a fixed set.
                   1482: Similarly, the information paths between applications (e.g., in support of "cut
                   1483: and paste") need to be flexible.  To this end, we are adding a Lisp-ish
                   1484: property list@cite(CLtL) mechanism to windows, and the event mechanism is being
                   1485: augmented to provide a simple form of inter-client communication.
                   1486: 
                   1487: The new X protocol explicitly continues to avoid certain areas, such as 3-D
                   1488: graphics and anti-aliasing.  However, a general mechanism has been designed to
                   1489: allow extension libraries to be included in a server.  The intention is that
                   1490: all servers implement the "core" protocol, but each server can provide
                   1491: arbitrary extensions.  If an extension becomes widely accepted by the X
                   1492: community, it can be adopted as part of the core.  Each extension library is
                   1493: assigned a global name, and an application can query the server at run-time to
                   1494: determine if a particular extension is present.  Request opcodes and event
                   1495: types are allocated dynamically, so that applications need not be modified to
                   1496: execute in each new environment.
                   1497: 
                   1498: @section(Summary)
                   1499: 
                   1500: The X Window System provides high-performance, high-level, device-independent
                   1501: graphics.  A hierarchy of resizable, overlapping windows allows a wide variety
                   1502: of application and user interfaces to be built easily.  Network-transparent
                   1503: access to the display provides an important degree of functional separation,
                   1504: without significantly affecting performance, that is crucial to building
                   1505: applications for a distributed environment.  To a reasonable extent, desktop
                   1506: management can be custom tailored to individual environments, without modifying
                   1507: the base system and typically without affecting applications.
                   1508: 
                   1509: To date, the X design and implementation effort has focused on the base window
                   1510: system, as described in this paper, and in essential applications and
                   1511: programming interfaces.  The design of the network protocol, the design and
                   1512: implementation of device-independent layer of server, and the implementation of
                   1513: several applications and a prototype window manager, were carried out by the
                   1514: first author.  The design and implementation of the C programming interface,
                   1515: the implementation of major portions of several applications, and the
                   1516: coordination of efforts within Project Athena and Digital, were carried out by
                   1517: the second author.  In addition, many other persons from Project Athena, the
                   1518: Laboratory for Computer Science, and institutions outside MIT have contributed
                   1519: software.
                   1520: 
                   1521: Necessary applications such as window managers and VT100 and Tektronics 4014
                   1522: terminal emulators have been created, and numerous existing applications, such
                   1523: as text editors and VLSI layout systems, have been ported to the X environment.
                   1524: Although several different menu packages have been implemented, we are only now
                   1525: beginning to see a rich library of tools (scroll bars, frames, panels, more
                   1526: menus, etc.) to facilitate the rapid construction of high-quality user
                   1527: interfaces.  Tool building is taking place at many sites, and several
                   1528: universities are now attempting to unify window systems work with X as a base,
                   1529: so that such tools can be shared.
                   1530: 
                   1531: The use of X has grown far beyond anything we had imagined.  Digital has
                   1532: incorporated X into a commercial product, and other manufacturers are following
                   1533: suit.  With the appearance of such products, and the release of complete X
                   1534: sources on the Berkeley 4.3 Unix distribution tapes, it is no longer feasible
                   1535: to track all X use and development.  Existing applications written in C are
                   1536: known to have been ported to seven machine architectures of more than twelve
                   1537: manufacturers, and the C server to six machine architectures and more than
                   1538: sixteen display architectures.  In most cases the code is running under Unix,
                   1539: but other operating systems are also involved.  In addition, relatively
                   1540: complete server implementations exist in two Lisp dialects.  Apart from
                   1541: designing the system to be portable, a large part of this success is due to
                   1542: MIT's decision to distribute X sources without any licensing restrictions, and
                   1543: the willingness of people in both educational and commercial institutions to
                   1544: contribute code without restrictions.
                   1545: 
                   1546: @b(Acknowledgments)
                   1547: 
                   1548: Our thanks go to the many people who have contributed to the success of X.
                   1549: Particular thanks go to those who have made significant contributions to the
                   1550: non-proprietary implementation:  Paul Asente (Stanford University), Scott Bates
                   1551: (Brown University), Mike Braca (Brown), Dave Bundy (Brown), Dave Carver
                   1552: (Digital), Tony Della Fera (Digital), Mike Gancarz (Digital), James Gosling
                   1553: (Sun Microsystems), Doug Mink (Smithsonian Astrophysical Observatory), Bob
                   1554: McNamara (Digital), Ron Newman (MIT), Ram Rao (Digital), Dave Rosenthal (Sun),
                   1555: Dan Stone (Brown), Stephen Sutphen (University of Alberta), and Mark
                   1556: Vandevoorde (MIT).
                   1557: 
                   1558: Special thanks go to Digital Equipment Corporation.  A redesign of the protocol
                   1559: and a reimplementation of the server to deal with color and to increase
                   1560: performance was made possible with funding (in the form of hardware) from
                   1561: Digital.  To their credit, all of the resulting device-independent code
                   1562: remained the property of MIT.
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.