|
|
1.1 root 1: .SH
2: Shared Memory
3: .PP
4: On a fast display and processor, X may be performing more than
5: one thousand operations (X requests) per second.
6: If every access to the device requires a system call, the overhead
7: rapidly predominates all other costs.
8: X uses a shared memory structure with the device driver for two purposes:
9: 1) to get mouse and keyboard input
10: and
11: 2) to access the device or write into a memory bitmap.
12: .PP
13: As pointed out before, X is a single threaded server.
14: Since client programs should be able to overlap with
15: the window system as much as possible (remember that you may be
16: running applications on other machines), it is particularly
17: important to send input events to the correct client as soon
18: as possible.
19: It is therefore desirable to test if there is input after each
20: graphic output operation.
21: This test can be performed in only a couple of instructions given shared
22: memory, and would otherwise require either one system call/output
23: operation (to check for new input) or a compromise in how quickly
24: input would be handled.
25: .PP
26: All input events are put into a shared memory circular buffer; since
27: the driver only inserts into the buffer, and X only removes from the
28: buffer, synchronization is easy to provide with separate head and tail
29: indices (presuming a write to shared memory is atomic).
30: .PP
31: Output on the QVSS is directly to a mapped bitmap.
32: In the case of the Vs100, a piece of the UNIBUS\(dg and a shared DMA buffer
33: are statically mapped where both the driver and the X server can access
34: them.
35: .FS \(dg
36: UNIBUS is a trademark of Digital Equipment Corporation.
37: .sp
38: .FE
39: Output requests to the Vs100 are directly formated into this buffer,
40: minimizing copying of data.\(dd
41: .FS \(dd
42: Our thanks go to Phil Karlton, of Digital's Western Research Lab, for
43: the first implementation of this mechanism.
44: .FE
45: This permits the device dependent routines to start I/O transfers without
46: system call overhead (by directly accessing device CSR registers),
47: and avoids UNIBUS map setup overhead that DMA from user space requires.
48: .PP
49: These changes dramatically increased performance and improved
50: interactive feel when implemented, while greatly reducting CPU overhead.
51: Since proper memory sharing primitives are lacking in 4.2BSD,
52: it was implemented by making pages readable and writable in system space,
53: where they are accessible to any process.
54: In theory, any program on the machine could cause a Vs100 implementation to
55: machine check (odd byte access in the UNIBUS space), though in practice it
56: has never happened.
57: None the less, it is the ugliest piece of the current X implementation.
58: We are more willing to allow a server process to access hardware
59: directly than kernel code,
60: as it is much easier to debug user processes than kernel code.
61: .PP
62: The current X implementation uses a TCP stream both locally and
63: remotely, though one could easily use
64: .UX
65: domain sockets for the local
66: case at the cost of a file descriptor.
67: For current applications, the bandwidth limitations (of approximately
68: 1 million bits/second on 780 class processor) is not major,
69: though faster devices (and image processing applications) would probably
70: benefit from implementation of a shared memory path between the X server
71: and client applications.
72: .PP
73: Current shared memory implementations in variants of
74: .UX
75: are not sufficient.
76: Memory sharing primitives should allow appropriately
77: privileged programs to both share memory with other processes and map to
78: both kernel space and I/O space.
79: Shared libraries (available in some versions of
80: .UX )
81: would also increase the options available to window system
82: designers (see below).
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.