|
|
1.1 ! root 1: .\" Copyright (c) 1982 Regents of the University of California. ! 2: .\" All rights reserved. The Berkeley software License Agreement ! 3: .\" specifies the terms and conditions for redistribution. ! 4: .\" ! 5: .\" @(#)2.t 4.3 (Berkeley) 2/1/86 ! 6: .\" ! 7: .ds RH Overview of the file system ! 8: .NH ! 9: Overview of the file system ! 10: .PP ! 11: The file system is discussed in detail in [Mckusick84]; ! 12: this section gives a brief overview. ! 13: .NH 2 ! 14: Superblock ! 15: .PP ! 16: A file system is described by its ! 17: .I "super-block" . ! 18: The super-block is built when the file system is created (\c ! 19: .I newfs (8)) ! 20: and never changes. ! 21: The super-block ! 22: contains the basic parameters of the file system, ! 23: such as the number of data blocks it contains ! 24: and a count of the maximum number of files. ! 25: Because the super-block contains critical data, ! 26: .I newfs ! 27: replicates it to protect against catastrophic loss. ! 28: The ! 29: .I "default super block" ! 30: always resides at a fixed offset from the beginning ! 31: of the file system's disk partition. ! 32: The ! 33: .I "redundant super blocks" ! 34: are not referenced unless a head crash ! 35: or other hard disk error causes the default super-block ! 36: to be unusable. ! 37: The redundant blocks are sprinkled throughout the disk partition. ! 38: .PP ! 39: Within the file system are files. ! 40: Certain files are distinguished as directories and contain collections ! 41: of pointers to files that may themselves be directories. ! 42: Every file has a descriptor associated with it called an ! 43: .I "inode". ! 44: The inode contains information describing ownership of the file, ! 45: time stamps indicating modification and access times for the file, ! 46: and an array of indices pointing to the data blocks for the file. ! 47: In this section, ! 48: we assume that the first 12 blocks ! 49: of the file are directly referenced by values stored ! 50: in the inode structure itself\(dg. ! 51: .FS ! 52: \(dgThe actual number may vary from system to system, but is usually in ! 53: the range 5-13. ! 54: .FE ! 55: The inode structure may also contain references to indirect blocks ! 56: containing further data block indices. ! 57: In a file system with a 4096 byte block size, a singly indirect ! 58: block contains 1024 further block addresses, ! 59: a doubly indirect block contains 1024 addresses of further single indirect ! 60: blocks, ! 61: and a triply indirect block contains 1024 addresses of further doubly indirect ! 62: blocks (the triple indirect block is never needed in practice). ! 63: .PP ! 64: In order to create files with up to ! 65: 2\(ua32 bytes, ! 66: using only two levels of indirection, ! 67: the minimum size of a file system block is 4096 bytes. ! 68: The size of file system blocks can be any power of two ! 69: greater than or equal to 4096. ! 70: The block size of the file system is maintained in the super-block, ! 71: so it is possible for file systems of different block sizes ! 72: to be accessible simultaneously on the same system. ! 73: The block size must be decided when ! 74: .I newfs ! 75: creates the file system; ! 76: the block size cannot be subsequently ! 77: changed without rebuilding the file system. ! 78: .NH 2 ! 79: Summary information ! 80: .PP ! 81: Associated with the super block is non replicated ! 82: .I "summary information" . ! 83: The summary information changes ! 84: as the file system is modified. ! 85: The summary information contains ! 86: the number of blocks, fragments, inodes and directories in the file system. ! 87: .NH 2 ! 88: Cylinder groups ! 89: .PP ! 90: The file system partitions the disk into one or more areas called ! 91: .I "cylinder groups". ! 92: A cylinder group is comprised of one or more consecutive ! 93: cylinders on a disk. ! 94: Each cylinder group includes inode slots for files, a ! 95: .I "block map" ! 96: describing available blocks in the cylinder group, ! 97: and summary information describing the usage of data blocks ! 98: within the cylinder group. ! 99: A fixed number of inodes is allocated for each cylinder group ! 100: when the file system is created. ! 101: The current policy is to allocate one inode for each 2048 ! 102: bytes of disk space; ! 103: this is expected to be far more inodes than will ever be needed. ! 104: .PP ! 105: All the cylinder group bookkeeping information could be ! 106: placed at the beginning of each cylinder group. ! 107: However if this approach were used, ! 108: all the redundant information would be on the top platter. ! 109: A single hardware failure that destroyed the top platter ! 110: could cause the loss of all copies of the redundant super-blocks. ! 111: Thus the cylinder group bookkeeping information ! 112: begins at a floating offset from the beginning of the cylinder group. ! 113: The offset for ! 114: the ! 115: .I "i+1" st ! 116: cylinder group is about one track further ! 117: from the beginning of the cylinder group ! 118: than it was for the ! 119: .I "i" th ! 120: cylinder group. ! 121: In this way, ! 122: the redundant ! 123: information spirals down into the pack; ! 124: any single track, cylinder, ! 125: or platter can be lost without losing all copies of the super-blocks. ! 126: Except for the first cylinder group, ! 127: the space between the beginning of the cylinder group ! 128: and the beginning of the cylinder group information stores data. ! 129: .NH 2 ! 130: Fragments ! 131: .PP ! 132: To avoid waste in storing small files, ! 133: the file system space allocator divides a single ! 134: file system block into one or more ! 135: .I "fragments". ! 136: The fragmentation of the file system is specified ! 137: when the file system is created; ! 138: each file system block can be optionally broken into ! 139: 2, 4, or 8 addressable fragments. ! 140: The lower bound on the size of these fragments is constrained ! 141: by the disk sector size; ! 142: typically 512 bytes is the lower bound on fragment size. ! 143: The block map associated with each cylinder group ! 144: records the space availability at the fragment level. ! 145: Aligned fragments are examined ! 146: to determine block availability. ! 147: .PP ! 148: On a file system with a block size of 4096 bytes ! 149: and a fragment size of 1024 bytes, ! 150: a file is represented by zero or more 4096 byte blocks of data, ! 151: and possibly a single fragmented block. ! 152: If a file system block must be fragmented to obtain ! 153: space for a small amount of data, ! 154: the remainder of the block is made available for allocation ! 155: to other files. ! 156: For example, ! 157: consider an 11000 byte file stored on ! 158: a 4096/1024 byte file system. ! 159: This file uses two full size blocks and a 3072 byte fragment. ! 160: If no fragments with at least 3072 bytes ! 161: are available when the file is created, ! 162: a full size block is split yielding the necessary 3072 byte ! 163: fragment and an unused 1024 byte fragment. ! 164: This remaining fragment can be allocated to another file, as needed. ! 165: .NH 2 ! 166: Updates to the file system ! 167: .PP ! 168: Every working day hundreds of files ! 169: are created, modified, and removed. ! 170: Every time a file is modified, ! 171: the operating system performs a ! 172: series of file system updates. ! 173: These updates, when written on disk, yield a consistent file system. ! 174: The file system stages ! 175: all modifications of critical information; ! 176: modification can ! 177: either be completed or cleanly backed out after a crash. ! 178: Knowing the information that is first written to the file system, ! 179: deterministic procedures can be developed to ! 180: repair a corrupted file system. ! 181: To understand this process, ! 182: the order that the update ! 183: requests were being honored must first be understood. ! 184: .PP ! 185: When a user program does an operation to change the file system, ! 186: such as a ! 187: .I write , ! 188: the data to be written is copied into an internal ! 189: .I "in-core" ! 190: buffer in the kernel. ! 191: Normally, the disk update is handled asynchronously; ! 192: the user process is allowed to proceed even though ! 193: the data has not yet been written to the disk. ! 194: The data, ! 195: along with the inode information reflecting the change, ! 196: is eventually written out to disk. ! 197: The real disk write may not happen until long after the ! 198: .I write ! 199: system call has returned. ! 200: Thus at any given time, the file system, ! 201: as it resides on the disk, ! 202: lags the state of the file system represented by the in-core information. ! 203: .PP ! 204: The disk information is updated to reflect the in-core information ! 205: when the buffer is required for another use, ! 206: when a ! 207: .I sync (2) ! 208: is done (at 30 second intervals) by ! 209: .I "/etc/update" "(8)," ! 210: or by manual operator intervention with the ! 211: .I sync (8) ! 212: command. ! 213: If the system is halted without writing out the in-core information, ! 214: the file system on the disk will be in an inconsistent state. ! 215: .PP ! 216: If all updates are done asynchronously, several serious ! 217: inconsistencies can arise. ! 218: One inconsistency is that a block may be claimed by two inodes. ! 219: Such an inconsistency can occur when the system is halted before ! 220: the pointer to the block in the old inode has been cleared ! 221: in the copy of the old inode on the disk, ! 222: and after the pointer to the block in the new inode has been written out ! 223: to the copy of the new inode on the disk. ! 224: Here, ! 225: there is no deterministic method for deciding ! 226: which inode should really claim the block. ! 227: A similar problem can arise with a multiply claimed inode. ! 228: .PP ! 229: The problem with asynchronous inode updates ! 230: can be avoided by doing all inode deallocations synchronously. ! 231: Consequently, ! 232: inodes and indirect blocks are written to the disk synchronously ! 233: (\fIi.e.\fP the process blocks until the information is ! 234: really written to disk) ! 235: when they are being deallocated. ! 236: Similarly inodes are kept consistent by synchronously ! 237: deleting, adding, or changing directory entries. ! 238: .ds RH Fixing corrupted file systems
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.