Annotation of 43BSDReno/share/doc/smm/14.fastfs/4.t, revision 1.1.1.1

1.1       root        1: .\" Copyright (c) 1986 The Regents of the University of California.
                      2: .\" All rights reserved.
                      3: .\"
                      4: .\" Redistribution and use in source and binary forms are permitted
                      5: .\" provided that the above copyright notice and this paragraph are
                      6: .\" duplicated in all such forms and that any documentation,
                      7: .\" advertising materials, and other materials related to such
                      8: .\" distribution and use acknowledge that the software was developed
                      9: .\" by the University of California, Berkeley.  The name of the
                     10: .\" University may not be used to endorse or promote products derived
                     11: .\" from this software without specific prior written permission.
                     12: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
                     13: .\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
                     14: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
                     15: .\"
                     16: .\"    @(#)4.t 6.2 (Berkeley) 3/7/89
                     17: .\"
                     18: .ds RH Performance
                     19: .NH 
                     20: Performance
                     21: .PP
                     22: Ultimately, the proof of the effectiveness of the
                     23: algorithms described in the previous section
                     24: is the long term performance of the new file system.
                     25: .PP
                     26: Our empirical studies have shown that the inode layout policy has
                     27: been effective.
                     28: When running the ``list directory'' command on a large directory
                     29: that itself contains many directories (to force the system
                     30: to access inodes in multiple cylinder groups),
                     31: the number of disk accesses for inodes is cut by a factor of two.
                     32: The improvements are even more dramatic for large directories
                     33: containing only files,
                     34: disk accesses for inodes being cut by a factor of eight.
                     35: This is most encouraging for programs such as spooling daemons that
                     36: access many small files,
                     37: since these programs tend to flood the
                     38: disk request queue on the old file system.
                     39: .PP
                     40: Table 2 summarizes the measured throughput of the new file system.
                     41: Several comments need to be made about the conditions under which these
                     42: tests were run.
                     43: The test programs measure the rate at which user programs can transfer
                     44: data to or from a file without performing any processing on it.
                     45: These programs must read and write enough data to
                     46: insure that buffering in the
                     47: operating system does not affect the results.
                     48: They are also run at least three times in succession;
                     49: the first to get the system into a known state
                     50: and the second two to insure that the 
                     51: experiment has stabilized and is repeatable.
                     52: The tests used and their results are
                     53: discussed in detail in [Kridle83]\(dg.
                     54: .FS
                     55: \(dg A UNIX command that is similar to the reading test that we used is
                     56: ``cp file /dev/null'', where ``file'' is eight megabytes long.
                     57: .FE
                     58: The systems were running multi-user but were otherwise quiescent.
                     59: There was no contention for either the CPU or the disk arm.
                     60: The only difference between the UNIBUS and MASSBUS tests
                     61: was the controller.
                     62: All tests used an AMPEX Capricorn 330 megabyte Winchester disk.
                     63: As Table 2 shows, all file system test runs were on a VAX 11/750.
                     64: All file systems had been in production use for at least
                     65: a month before being measured.
                     66: The same number of system calls were performed in all tests;
                     67: the basic system call overhead was a negligible portion of
                     68: the total running time of the tests.
                     69: .KF
                     70: .DS B
                     71: .TS
                     72: box;
                     73: c c|c s s
                     74: c c|c c c.
                     75: Type of        Processor and   Read
                     76: File System    Bus Measured    Speed   Bandwidth       % CPU
                     77: _
                     78: old 1024       750/UNIBUS      29 Kbytes/sec   29/983 3%       11%
                     79: new 4096/1024  750/UNIBUS      221 Kbytes/sec  221/983 22%     43%
                     80: new 8192/1024  750/UNIBUS      233 Kbytes/sec  233/983 24%     29%
                     81: new 4096/1024  750/MASSBUS     466 Kbytes/sec  466/983 47%     73%
                     82: new 8192/1024  750/MASSBUS     466 Kbytes/sec  466/983 47%     54%
                     83: .TE
                     84: .ce 1
                     85: Table 2a \- Reading rates of the old and new UNIX file systems.
                     86: .TS
                     87: box;
                     88: c c|c s s
                     89: c c|c c c.
                     90: Type of        Processor and   Write
                     91: File System    Bus Measured    Speed   Bandwidth       % CPU
                     92: _
                     93: old 1024       750/UNIBUS      48 Kbytes/sec   48/983 5%       29%
                     94: new 4096/1024  750/UNIBUS      142 Kbytes/sec  142/983 14%     43%
                     95: new 8192/1024  750/UNIBUS      215 Kbytes/sec  215/983 22%     46%
                     96: new 4096/1024  750/MASSBUS     323 Kbytes/sec  323/983 33%     94%
                     97: new 8192/1024  750/MASSBUS     466 Kbytes/sec  466/983 47%     95%
                     98: .TE
                     99: .ce 1
                    100: Table 2b \- Writing rates of the old and new UNIX file systems.
                    101: .DE
                    102: .KE
                    103: .PP
                    104: Unlike the old file system,
                    105: the transfer rates for the new file system do not
                    106: appear to change over time.
                    107: The throughput rate is tied much more strongly to the
                    108: amount of free space that is maintained.
                    109: The measurements in Table 2 were based on a file system
                    110: with a 10% free space reserve.
                    111: Synthetic work loads suggest that throughput deteriorates
                    112: to about half the rates given in Table 2 when the file
                    113: systems are full.
                    114: .PP
                    115: The percentage of bandwidth given in Table 2 is a measure
                    116: of the effective utilization of the disk by the file system.
                    117: An upper bound on the transfer rate from the disk is calculated 
                    118: by multiplying the number of bytes on a track by the number
                    119: of revolutions of the disk per second.
                    120: The bandwidth is calculated by comparing the data rates
                    121: the file system is able to achieve as a percentage of this rate.
                    122: Using this metric, the old file system is only
                    123: able to use about 3\-5% of the disk bandwidth,
                    124: while the new file system uses up to 47%
                    125: of the bandwidth.
                    126: .PP
                    127: Both reads and writes are faster in the new system than in the old system.
                    128: The biggest factor in this speedup is because of the larger
                    129: block size used by the new file system.
                    130: The overhead of allocating blocks in the new system is greater
                    131: than the overhead of allocating blocks in the old system,
                    132: however fewer blocks need to be allocated in the new system
                    133: because they are bigger.
                    134: The net effect is that the cost per byte allocated is about
                    135: the same for both systems.
                    136: .PP
                    137: In the new file system, the reading rate is always at least
                    138: as fast as the writing rate.
                    139: This is to be expected since the kernel must do more work when
                    140: allocating blocks than when simply reading them.
                    141: Note that the write rates are about the same 
                    142: as the read rates in the 8192 byte block file system;
                    143: the write rates are slower than the read rates in the 4096 byte block
                    144: file system.
                    145: The slower write rates occur because
                    146: the kernel has to do twice as many disk allocations per second,
                    147: making the processor unable to keep up with the disk transfer rate.
                    148: .PP
                    149: In contrast the old file system is about 50%
                    150: faster at writing files than reading them.
                    151: This is because the write system call is asynchronous and
                    152: the kernel can generate disk transfer
                    153: requests much faster than they can be serviced,
                    154: hence disk transfers queue up in the disk buffer cache.
                    155: Because the disk buffer cache is sorted by minimum seek distance,
                    156: the average seek between the scheduled disk writes is much
                    157: less than it would be if the data blocks were written out
                    158: in the random disk order in which they are generated.
                    159: However when the file is read,
                    160: the read system call is processed synchronously so
                    161: the disk blocks must be retrieved from the disk in the
                    162: non-optimal seek order in which they are requested.
                    163: This forces the disk scheduler to do long
                    164: seeks resulting in a lower throughput rate.
                    165: .PP
                    166: In the new system the blocks of a file are more optimally
                    167: ordered on the disk.
                    168: Even though reads are still synchronous, 
                    169: the requests are presented to the disk in a much better order.
                    170: Even though the writes are still asynchronous,
                    171: they are already presented to the disk in minimum seek
                    172: order so there is no gain to be had by reordering them.
                    173: Hence the disk seek latencies that limited the old file system
                    174: have little effect in the new file system.
                    175: The cost of allocation is the factor in the new system that 
                    176: causes writes to be slower than reads.
                    177: .PP
                    178: The performance of the new file system is currently
                    179: limited by memory to memory copy operations
                    180: required to move data from disk buffers in the
                    181: system's address space to data buffers in the user's
                    182: address space.  These copy operations account for
                    183: about 40% of the time spent performing an input/output operation.
                    184: If the buffers in both address spaces were properly aligned, 
                    185: this transfer could be performed without copying by
                    186: using the VAX virtual memory management hardware.
                    187: This would be especially desirable when transferring
                    188: large amounts of data.
                    189: We did not implement this because it would change the
                    190: user interface to the file system in two major ways:
                    191: user programs would be required to allocate buffers on page boundaries, 
                    192: and data would disappear from buffers after being written.
                    193: .PP
                    194: Greater disk throughput could be achieved by rewriting the disk drivers
                    195: to chain together kernel buffers.
                    196: This would allow contiguous disk blocks to be read
                    197: in a single disk transaction.
                    198: Many disks used with UNIX systems contain either
                    199: 32 or 48 512 byte sectors per track.
                    200: Each track holds exactly two or three 8192 byte file system blocks,
                    201: or four or six 4096 byte file system blocks.
                    202: The inability to use contiguous disk blocks
                    203: effectively limits the performance
                    204: on these disks to less than 50% of the available bandwidth.
                    205: If the next block for a file cannot be laid out contiguously,
                    206: then the minimum spacing to the next allocatable
                    207: block on any platter is between a sixth and a half a revolution.
                    208: The implication of this is that the best possible layout without
                    209: contiguous blocks uses only half of the bandwidth of any given track.
                    210: If each track contains an odd number of sectors, 
                    211: then it is possible to resolve the rotational delay to any number of sectors
                    212: by finding a block that begins at the desired 
                    213: rotational position on another track.
                    214: The reason that block chaining has not been implemented is because it
                    215: would require rewriting all the disk drivers in the system,
                    216: and the current throughput rates are already limited by the
                    217: speed of the available processors.
                    218: .PP
                    219: Currently only one block is allocated to a file at a time.
                    220: A technique used by the DEMOS file system
                    221: when it finds that a file is growing rapidly,
                    222: is to preallocate several blocks at once,
                    223: releasing them when the file is closed if they remain unused.
                    224: By batching up allocations, the system can reduce the
                    225: overhead of allocating at each write,
                    226: and it can cut down on the number of disk writes needed to
                    227: keep the block pointers on the disk
                    228: synchronized with the block allocation [Powell79].
                    229: This technique was not included because block allocation 
                    230: currently accounts for less than 10% of the time spent in
                    231: a write system call and, once again, the
                    232: current throughput rates are already limited by the speed
                    233: of the available processors.
                    234: .ds RH Functional enhancements
                    235: .sp 2
                    236: .ne 1i

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.