Annotation of 43BSDReno/share/doc/smm/14.fastfs/4.t, revision 1.1

1.1     ! root        1: .\" Copyright (c) 1986 The Regents of the University of California.
        !             2: .\" All rights reserved.
        !             3: .\"
        !             4: .\" Redistribution and use in source and binary forms are permitted
        !             5: .\" provided that the above copyright notice and this paragraph are
        !             6: .\" duplicated in all such forms and that any documentation,
        !             7: .\" advertising materials, and other materials related to such
        !             8: .\" distribution and use acknowledge that the software was developed
        !             9: .\" by the University of California, Berkeley.  The name of the
        !            10: .\" University may not be used to endorse or promote products derived
        !            11: .\" from this software without specific prior written permission.
        !            12: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
        !            13: .\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
        !            14: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
        !            15: .\"
        !            16: .\"    @(#)4.t 6.2 (Berkeley) 3/7/89
        !            17: .\"
        !            18: .ds RH Performance
        !            19: .NH 
        !            20: Performance
        !            21: .PP
        !            22: Ultimately, the proof of the effectiveness of the
        !            23: algorithms described in the previous section
        !            24: is the long term performance of the new file system.
        !            25: .PP
        !            26: Our empirical studies have shown that the inode layout policy has
        !            27: been effective.
        !            28: When running the ``list directory'' command on a large directory
        !            29: that itself contains many directories (to force the system
        !            30: to access inodes in multiple cylinder groups),
        !            31: the number of disk accesses for inodes is cut by a factor of two.
        !            32: The improvements are even more dramatic for large directories
        !            33: containing only files,
        !            34: disk accesses for inodes being cut by a factor of eight.
        !            35: This is most encouraging for programs such as spooling daemons that
        !            36: access many small files,
        !            37: since these programs tend to flood the
        !            38: disk request queue on the old file system.
        !            39: .PP
        !            40: Table 2 summarizes the measured throughput of the new file system.
        !            41: Several comments need to be made about the conditions under which these
        !            42: tests were run.
        !            43: The test programs measure the rate at which user programs can transfer
        !            44: data to or from a file without performing any processing on it.
        !            45: These programs must read and write enough data to
        !            46: insure that buffering in the
        !            47: operating system does not affect the results.
        !            48: They are also run at least three times in succession;
        !            49: the first to get the system into a known state
        !            50: and the second two to insure that the 
        !            51: experiment has stabilized and is repeatable.
        !            52: The tests used and their results are
        !            53: discussed in detail in [Kridle83]\(dg.
        !            54: .FS
        !            55: \(dg A UNIX command that is similar to the reading test that we used is
        !            56: ``cp file /dev/null'', where ``file'' is eight megabytes long.
        !            57: .FE
        !            58: The systems were running multi-user but were otherwise quiescent.
        !            59: There was no contention for either the CPU or the disk arm.
        !            60: The only difference between the UNIBUS and MASSBUS tests
        !            61: was the controller.
        !            62: All tests used an AMPEX Capricorn 330 megabyte Winchester disk.
        !            63: As Table 2 shows, all file system test runs were on a VAX 11/750.
        !            64: All file systems had been in production use for at least
        !            65: a month before being measured.
        !            66: The same number of system calls were performed in all tests;
        !            67: the basic system call overhead was a negligible portion of
        !            68: the total running time of the tests.
        !            69: .KF
        !            70: .DS B
        !            71: .TS
        !            72: box;
        !            73: c c|c s s
        !            74: c c|c c c.
        !            75: Type of        Processor and   Read
        !            76: File System    Bus Measured    Speed   Bandwidth       % CPU
        !            77: _
        !            78: old 1024       750/UNIBUS      29 Kbytes/sec   29/983 3%       11%
        !            79: new 4096/1024  750/UNIBUS      221 Kbytes/sec  221/983 22%     43%
        !            80: new 8192/1024  750/UNIBUS      233 Kbytes/sec  233/983 24%     29%
        !            81: new 4096/1024  750/MASSBUS     466 Kbytes/sec  466/983 47%     73%
        !            82: new 8192/1024  750/MASSBUS     466 Kbytes/sec  466/983 47%     54%
        !            83: .TE
        !            84: .ce 1
        !            85: Table 2a \- Reading rates of the old and new UNIX file systems.
        !            86: .TS
        !            87: box;
        !            88: c c|c s s
        !            89: c c|c c c.
        !            90: Type of        Processor and   Write
        !            91: File System    Bus Measured    Speed   Bandwidth       % CPU
        !            92: _
        !            93: old 1024       750/UNIBUS      48 Kbytes/sec   48/983 5%       29%
        !            94: new 4096/1024  750/UNIBUS      142 Kbytes/sec  142/983 14%     43%
        !            95: new 8192/1024  750/UNIBUS      215 Kbytes/sec  215/983 22%     46%
        !            96: new 4096/1024  750/MASSBUS     323 Kbytes/sec  323/983 33%     94%
        !            97: new 8192/1024  750/MASSBUS     466 Kbytes/sec  466/983 47%     95%
        !            98: .TE
        !            99: .ce 1
        !           100: Table 2b \- Writing rates of the old and new UNIX file systems.
        !           101: .DE
        !           102: .KE
        !           103: .PP
        !           104: Unlike the old file system,
        !           105: the transfer rates for the new file system do not
        !           106: appear to change over time.
        !           107: The throughput rate is tied much more strongly to the
        !           108: amount of free space that is maintained.
        !           109: The measurements in Table 2 were based on a file system
        !           110: with a 10% free space reserve.
        !           111: Synthetic work loads suggest that throughput deteriorates
        !           112: to about half the rates given in Table 2 when the file
        !           113: systems are full.
        !           114: .PP
        !           115: The percentage of bandwidth given in Table 2 is a measure
        !           116: of the effective utilization of the disk by the file system.
        !           117: An upper bound on the transfer rate from the disk is calculated 
        !           118: by multiplying the number of bytes on a track by the number
        !           119: of revolutions of the disk per second.
        !           120: The bandwidth is calculated by comparing the data rates
        !           121: the file system is able to achieve as a percentage of this rate.
        !           122: Using this metric, the old file system is only
        !           123: able to use about 3\-5% of the disk bandwidth,
        !           124: while the new file system uses up to 47%
        !           125: of the bandwidth.
        !           126: .PP
        !           127: Both reads and writes are faster in the new system than in the old system.
        !           128: The biggest factor in this speedup is because of the larger
        !           129: block size used by the new file system.
        !           130: The overhead of allocating blocks in the new system is greater
        !           131: than the overhead of allocating blocks in the old system,
        !           132: however fewer blocks need to be allocated in the new system
        !           133: because they are bigger.
        !           134: The net effect is that the cost per byte allocated is about
        !           135: the same for both systems.
        !           136: .PP
        !           137: In the new file system, the reading rate is always at least
        !           138: as fast as the writing rate.
        !           139: This is to be expected since the kernel must do more work when
        !           140: allocating blocks than when simply reading them.
        !           141: Note that the write rates are about the same 
        !           142: as the read rates in the 8192 byte block file system;
        !           143: the write rates are slower than the read rates in the 4096 byte block
        !           144: file system.
        !           145: The slower write rates occur because
        !           146: the kernel has to do twice as many disk allocations per second,
        !           147: making the processor unable to keep up with the disk transfer rate.
        !           148: .PP
        !           149: In contrast the old file system is about 50%
        !           150: faster at writing files than reading them.
        !           151: This is because the write system call is asynchronous and
        !           152: the kernel can generate disk transfer
        !           153: requests much faster than they can be serviced,
        !           154: hence disk transfers queue up in the disk buffer cache.
        !           155: Because the disk buffer cache is sorted by minimum seek distance,
        !           156: the average seek between the scheduled disk writes is much
        !           157: less than it would be if the data blocks were written out
        !           158: in the random disk order in which they are generated.
        !           159: However when the file is read,
        !           160: the read system call is processed synchronously so
        !           161: the disk blocks must be retrieved from the disk in the
        !           162: non-optimal seek order in which they are requested.
        !           163: This forces the disk scheduler to do long
        !           164: seeks resulting in a lower throughput rate.
        !           165: .PP
        !           166: In the new system the blocks of a file are more optimally
        !           167: ordered on the disk.
        !           168: Even though reads are still synchronous, 
        !           169: the requests are presented to the disk in a much better order.
        !           170: Even though the writes are still asynchronous,
        !           171: they are already presented to the disk in minimum seek
        !           172: order so there is no gain to be had by reordering them.
        !           173: Hence the disk seek latencies that limited the old file system
        !           174: have little effect in the new file system.
        !           175: The cost of allocation is the factor in the new system that 
        !           176: causes writes to be slower than reads.
        !           177: .PP
        !           178: The performance of the new file system is currently
        !           179: limited by memory to memory copy operations
        !           180: required to move data from disk buffers in the
        !           181: system's address space to data buffers in the user's
        !           182: address space.  These copy operations account for
        !           183: about 40% of the time spent performing an input/output operation.
        !           184: If the buffers in both address spaces were properly aligned, 
        !           185: this transfer could be performed without copying by
        !           186: using the VAX virtual memory management hardware.
        !           187: This would be especially desirable when transferring
        !           188: large amounts of data.
        !           189: We did not implement this because it would change the
        !           190: user interface to the file system in two major ways:
        !           191: user programs would be required to allocate buffers on page boundaries, 
        !           192: and data would disappear from buffers after being written.
        !           193: .PP
        !           194: Greater disk throughput could be achieved by rewriting the disk drivers
        !           195: to chain together kernel buffers.
        !           196: This would allow contiguous disk blocks to be read
        !           197: in a single disk transaction.
        !           198: Many disks used with UNIX systems contain either
        !           199: 32 or 48 512 byte sectors per track.
        !           200: Each track holds exactly two or three 8192 byte file system blocks,
        !           201: or four or six 4096 byte file system blocks.
        !           202: The inability to use contiguous disk blocks
        !           203: effectively limits the performance
        !           204: on these disks to less than 50% of the available bandwidth.
        !           205: If the next block for a file cannot be laid out contiguously,
        !           206: then the minimum spacing to the next allocatable
        !           207: block on any platter is between a sixth and a half a revolution.
        !           208: The implication of this is that the best possible layout without
        !           209: contiguous blocks uses only half of the bandwidth of any given track.
        !           210: If each track contains an odd number of sectors, 
        !           211: then it is possible to resolve the rotational delay to any number of sectors
        !           212: by finding a block that begins at the desired 
        !           213: rotational position on another track.
        !           214: The reason that block chaining has not been implemented is because it
        !           215: would require rewriting all the disk drivers in the system,
        !           216: and the current throughput rates are already limited by the
        !           217: speed of the available processors.
        !           218: .PP
        !           219: Currently only one block is allocated to a file at a time.
        !           220: A technique used by the DEMOS file system
        !           221: when it finds that a file is growing rapidly,
        !           222: is to preallocate several blocks at once,
        !           223: releasing them when the file is closed if they remain unused.
        !           224: By batching up allocations, the system can reduce the
        !           225: overhead of allocating at each write,
        !           226: and it can cut down on the number of disk writes needed to
        !           227: keep the block pointers on the disk
        !           228: synchronized with the block allocation [Powell79].
        !           229: This technique was not included because block allocation 
        !           230: currently accounts for less than 10% of the time spent in
        !           231: a write system call and, once again, the
        !           232: current throughput rates are already limited by the speed
        !           233: of the available processors.
        !           234: .ds RH Functional enhancements
        !           235: .sp 2
        !           236: .ne 1i

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.