|
|
1.1 root 1: .\" Copyright (c) 1986 The Regents of the University of California.
2: .\" All rights reserved.
3: .\"
4: .\" Redistribution and use in source and binary forms are permitted
5: .\" provided that the above copyright notice and this paragraph are
6: .\" duplicated in all such forms and that any documentation,
7: .\" advertising materials, and other materials related to such
8: .\" distribution and use acknowledge that the software was developed
9: .\" by the University of California, Berkeley. The name of the
10: .\" University may not be used to endorse or promote products derived
11: .\" from this software without specific prior written permission.
12: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
13: .\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
14: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
15: .\"
16: .\" @(#)4.t 6.2 (Berkeley) 3/7/89
17: .\"
18: .ds RH Performance
19: .NH
20: Performance
21: .PP
22: Ultimately, the proof of the effectiveness of the
23: algorithms described in the previous section
24: is the long term performance of the new file system.
25: .PP
26: Our empirical studies have shown that the inode layout policy has
27: been effective.
28: When running the ``list directory'' command on a large directory
29: that itself contains many directories (to force the system
30: to access inodes in multiple cylinder groups),
31: the number of disk accesses for inodes is cut by a factor of two.
32: The improvements are even more dramatic for large directories
33: containing only files,
34: disk accesses for inodes being cut by a factor of eight.
35: This is most encouraging for programs such as spooling daemons that
36: access many small files,
37: since these programs tend to flood the
38: disk request queue on the old file system.
39: .PP
40: Table 2 summarizes the measured throughput of the new file system.
41: Several comments need to be made about the conditions under which these
42: tests were run.
43: The test programs measure the rate at which user programs can transfer
44: data to or from a file without performing any processing on it.
45: These programs must read and write enough data to
46: insure that buffering in the
47: operating system does not affect the results.
48: They are also run at least three times in succession;
49: the first to get the system into a known state
50: and the second two to insure that the
51: experiment has stabilized and is repeatable.
52: The tests used and their results are
53: discussed in detail in [Kridle83]\(dg.
54: .FS
55: \(dg A UNIX command that is similar to the reading test that we used is
56: ``cp file /dev/null'', where ``file'' is eight megabytes long.
57: .FE
58: The systems were running multi-user but were otherwise quiescent.
59: There was no contention for either the CPU or the disk arm.
60: The only difference between the UNIBUS and MASSBUS tests
61: was the controller.
62: All tests used an AMPEX Capricorn 330 megabyte Winchester disk.
63: As Table 2 shows, all file system test runs were on a VAX 11/750.
64: All file systems had been in production use for at least
65: a month before being measured.
66: The same number of system calls were performed in all tests;
67: the basic system call overhead was a negligible portion of
68: the total running time of the tests.
69: .KF
70: .DS B
71: .TS
72: box;
73: c c|c s s
74: c c|c c c.
75: Type of Processor and Read
76: File System Bus Measured Speed Bandwidth % CPU
77: _
78: old 1024 750/UNIBUS 29 Kbytes/sec 29/983 3% 11%
79: new 4096/1024 750/UNIBUS 221 Kbytes/sec 221/983 22% 43%
80: new 8192/1024 750/UNIBUS 233 Kbytes/sec 233/983 24% 29%
81: new 4096/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 73%
82: new 8192/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 54%
83: .TE
84: .ce 1
85: Table 2a \- Reading rates of the old and new UNIX file systems.
86: .TS
87: box;
88: c c|c s s
89: c c|c c c.
90: Type of Processor and Write
91: File System Bus Measured Speed Bandwidth % CPU
92: _
93: old 1024 750/UNIBUS 48 Kbytes/sec 48/983 5% 29%
94: new 4096/1024 750/UNIBUS 142 Kbytes/sec 142/983 14% 43%
95: new 8192/1024 750/UNIBUS 215 Kbytes/sec 215/983 22% 46%
96: new 4096/1024 750/MASSBUS 323 Kbytes/sec 323/983 33% 94%
97: new 8192/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 95%
98: .TE
99: .ce 1
100: Table 2b \- Writing rates of the old and new UNIX file systems.
101: .DE
102: .KE
103: .PP
104: Unlike the old file system,
105: the transfer rates for the new file system do not
106: appear to change over time.
107: The throughput rate is tied much more strongly to the
108: amount of free space that is maintained.
109: The measurements in Table 2 were based on a file system
110: with a 10% free space reserve.
111: Synthetic work loads suggest that throughput deteriorates
112: to about half the rates given in Table 2 when the file
113: systems are full.
114: .PP
115: The percentage of bandwidth given in Table 2 is a measure
116: of the effective utilization of the disk by the file system.
117: An upper bound on the transfer rate from the disk is calculated
118: by multiplying the number of bytes on a track by the number
119: of revolutions of the disk per second.
120: The bandwidth is calculated by comparing the data rates
121: the file system is able to achieve as a percentage of this rate.
122: Using this metric, the old file system is only
123: able to use about 3\-5% of the disk bandwidth,
124: while the new file system uses up to 47%
125: of the bandwidth.
126: .PP
127: Both reads and writes are faster in the new system than in the old system.
128: The biggest factor in this speedup is because of the larger
129: block size used by the new file system.
130: The overhead of allocating blocks in the new system is greater
131: than the overhead of allocating blocks in the old system,
132: however fewer blocks need to be allocated in the new system
133: because they are bigger.
134: The net effect is that the cost per byte allocated is about
135: the same for both systems.
136: .PP
137: In the new file system, the reading rate is always at least
138: as fast as the writing rate.
139: This is to be expected since the kernel must do more work when
140: allocating blocks than when simply reading them.
141: Note that the write rates are about the same
142: as the read rates in the 8192 byte block file system;
143: the write rates are slower than the read rates in the 4096 byte block
144: file system.
145: The slower write rates occur because
146: the kernel has to do twice as many disk allocations per second,
147: making the processor unable to keep up with the disk transfer rate.
148: .PP
149: In contrast the old file system is about 50%
150: faster at writing files than reading them.
151: This is because the write system call is asynchronous and
152: the kernel can generate disk transfer
153: requests much faster than they can be serviced,
154: hence disk transfers queue up in the disk buffer cache.
155: Because the disk buffer cache is sorted by minimum seek distance,
156: the average seek between the scheduled disk writes is much
157: less than it would be if the data blocks were written out
158: in the random disk order in which they are generated.
159: However when the file is read,
160: the read system call is processed synchronously so
161: the disk blocks must be retrieved from the disk in the
162: non-optimal seek order in which they are requested.
163: This forces the disk scheduler to do long
164: seeks resulting in a lower throughput rate.
165: .PP
166: In the new system the blocks of a file are more optimally
167: ordered on the disk.
168: Even though reads are still synchronous,
169: the requests are presented to the disk in a much better order.
170: Even though the writes are still asynchronous,
171: they are already presented to the disk in minimum seek
172: order so there is no gain to be had by reordering them.
173: Hence the disk seek latencies that limited the old file system
174: have little effect in the new file system.
175: The cost of allocation is the factor in the new system that
176: causes writes to be slower than reads.
177: .PP
178: The performance of the new file system is currently
179: limited by memory to memory copy operations
180: required to move data from disk buffers in the
181: system's address space to data buffers in the user's
182: address space. These copy operations account for
183: about 40% of the time spent performing an input/output operation.
184: If the buffers in both address spaces were properly aligned,
185: this transfer could be performed without copying by
186: using the VAX virtual memory management hardware.
187: This would be especially desirable when transferring
188: large amounts of data.
189: We did not implement this because it would change the
190: user interface to the file system in two major ways:
191: user programs would be required to allocate buffers on page boundaries,
192: and data would disappear from buffers after being written.
193: .PP
194: Greater disk throughput could be achieved by rewriting the disk drivers
195: to chain together kernel buffers.
196: This would allow contiguous disk blocks to be read
197: in a single disk transaction.
198: Many disks used with UNIX systems contain either
199: 32 or 48 512 byte sectors per track.
200: Each track holds exactly two or three 8192 byte file system blocks,
201: or four or six 4096 byte file system blocks.
202: The inability to use contiguous disk blocks
203: effectively limits the performance
204: on these disks to less than 50% of the available bandwidth.
205: If the next block for a file cannot be laid out contiguously,
206: then the minimum spacing to the next allocatable
207: block on any platter is between a sixth and a half a revolution.
208: The implication of this is that the best possible layout without
209: contiguous blocks uses only half of the bandwidth of any given track.
210: If each track contains an odd number of sectors,
211: then it is possible to resolve the rotational delay to any number of sectors
212: by finding a block that begins at the desired
213: rotational position on another track.
214: The reason that block chaining has not been implemented is because it
215: would require rewriting all the disk drivers in the system,
216: and the current throughput rates are already limited by the
217: speed of the available processors.
218: .PP
219: Currently only one block is allocated to a file at a time.
220: A technique used by the DEMOS file system
221: when it finds that a file is growing rapidly,
222: is to preallocate several blocks at once,
223: releasing them when the file is closed if they remain unused.
224: By batching up allocations, the system can reduce the
225: overhead of allocating at each write,
226: and it can cut down on the number of disk writes needed to
227: keep the block pointers on the disk
228: synchronized with the block allocation [Powell79].
229: This technique was not included because block allocation
230: currently accounts for less than 10% of the time spent in
231: a write system call and, once again, the
232: current throughput rates are already limited by the speed
233: of the available processors.
234: .ds RH Functional enhancements
235: .sp 2
236: .ne 1i
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.