Annotation of 43BSDReno/share/doc/smm/05.fsck/2.t, revision 1.1.1.1

1.1       root        1: .\" Copyright (c) 1982 Regents of the University of California.
                      2: .\" All rights reserved.  The Berkeley software License Agreement
                      3: .\" specifies the terms and conditions for redistribution.
                      4: .\"
                      5: .\"    @(#)2.t 4.3 (Berkeley) 2/1/86
                      6: .\"
                      7: .ds RH Overview of the file system
                      8: .NH
                      9: Overview of the file system
                     10: .PP
                     11: The file system is discussed in detail in [Mckusick84];
                     12: this section gives a brief overview.
                     13: .NH 2
                     14: Superblock
                     15: .PP
                     16: A file system is described by its
                     17: .I "super-block" .
                     18: The super-block is built when the file system is created (\c
                     19: .I newfs (8))
                     20: and never changes.
                     21: The super-block
                     22: contains the basic parameters of the file system,
                     23: such as the number of data blocks it contains
                     24: and a count of the maximum number of files.
                     25: Because the super-block contains critical data,
                     26: .I newfs
                     27: replicates it to protect against catastrophic loss.
                     28: The
                     29: .I "default super block"
                     30: always resides at a fixed offset from the beginning
                     31: of the file system's disk partition.
                     32: The
                     33: .I "redundant super blocks"
                     34: are not referenced unless a head crash
                     35: or other hard disk error causes the default super-block
                     36: to be unusable.
                     37: The redundant blocks are sprinkled throughout the disk partition.
                     38: .PP
                     39: Within the file system are files.
                     40: Certain files are distinguished as directories and contain collections
                     41: of pointers to files that may themselves be directories.
                     42: Every file has a descriptor associated with it called an
                     43: .I "inode".
                     44: The inode contains information describing ownership of the file,
                     45: time stamps indicating modification and access times for the file,
                     46: and an array of indices pointing to the data blocks for the file.
                     47: In this section,
                     48: we assume that the first 12 blocks
                     49: of the file are directly referenced by values stored
                     50: in the inode structure itself\(dg.
                     51: .FS
                     52: \(dgThe actual number may vary from system to system, but is usually in
                     53: the range 5-13.
                     54: .FE
                     55: The inode structure may also contain references to indirect blocks
                     56: containing further data block indices.
                     57: In a file system with a 4096 byte block size, a singly indirect
                     58: block contains 1024 further block addresses,
                     59: a doubly indirect block contains 1024 addresses of further single indirect
                     60: blocks,
                     61: and a triply indirect block contains 1024 addresses of further doubly indirect
                     62: blocks (the triple indirect block is never needed in practice).
                     63: .PP
                     64: In order to create files with up to
                     65: 2\(ua32 bytes,
                     66: using only two levels of indirection,
                     67: the minimum size of a file system block is 4096 bytes.
                     68: The size of file system blocks can be any power of two
                     69: greater than or equal to 4096.
                     70: The block size of the file system is maintained in the super-block,
                     71: so it is possible for file systems of different block sizes
                     72: to be accessible simultaneously on the same system.
                     73: The block size must be decided when
                     74: .I newfs
                     75: creates the file system;
                     76: the block size cannot be subsequently
                     77: changed without rebuilding the file system.
                     78: .NH 2
                     79: Summary information
                     80: .PP
                     81: Associated with the super block is non replicated
                     82: .I "summary information" .
                     83: The summary information changes
                     84: as the file system is modified.
                     85: The summary information contains
                     86: the number of blocks, fragments, inodes and directories in the file system.
                     87: .NH 2
                     88: Cylinder groups
                     89: .PP
                     90: The file system partitions the disk into one or more areas called
                     91: .I "cylinder groups".
                     92: A cylinder group is comprised of one or more consecutive
                     93: cylinders on a disk.
                     94: Each cylinder group includes inode slots for files, a
                     95: .I "block map"
                     96: describing available blocks in the cylinder group,
                     97: and summary information describing the usage of data blocks
                     98: within the cylinder group.
                     99: A fixed number of inodes is allocated for each cylinder group
                    100: when the file system is created.
                    101: The current policy is to allocate one inode for each 2048
                    102: bytes of disk space;
                    103: this is expected to be far more inodes than will ever be needed.
                    104: .PP
                    105: All the cylinder group bookkeeping information could be
                    106: placed at the beginning of each cylinder group.
                    107: However if this approach were used,
                    108: all the redundant information would be on the top platter.
                    109: A single hardware failure that destroyed the top platter
                    110: could cause the loss of all copies of the redundant super-blocks.
                    111: Thus the cylinder group bookkeeping information
                    112: begins at a floating offset from the beginning of the cylinder group.
                    113: The offset for
                    114: the
                    115: .I "i+1" st
                    116: cylinder group is about one track further
                    117: from the beginning of the cylinder group
                    118: than it was for the
                    119: .I "i" th
                    120: cylinder group.
                    121: In this way,
                    122: the redundant
                    123: information spirals down into the pack;
                    124: any single track, cylinder,
                    125: or platter can be lost without losing all copies of the super-blocks.
                    126: Except for the first cylinder group,
                    127: the space between the beginning of the cylinder group
                    128: and the beginning of the cylinder group information stores data.
                    129: .NH 2
                    130: Fragments
                    131: .PP
                    132: To avoid waste in storing small files,
                    133: the file system space allocator divides a single
                    134: file system block into one or more
                    135: .I "fragments".
                    136: The fragmentation of the file system is specified
                    137: when the file system is created;
                    138: each file system block can be optionally broken into
                    139: 2, 4, or 8 addressable fragments.
                    140: The lower bound on the size of these fragments is constrained
                    141: by the disk sector size;
                    142: typically 512 bytes is the lower bound on fragment size.
                    143: The block map associated with each cylinder group
                    144: records the space availability at the fragment level.
                    145: Aligned fragments are examined
                    146: to determine block availability.
                    147: .PP
                    148: On a file system with a block size of 4096 bytes
                    149: and a fragment size of 1024 bytes,
                    150: a file is represented by zero or more 4096 byte blocks of data,
                    151: and possibly a single fragmented block.
                    152: If a file system block must be fragmented to obtain
                    153: space for a small amount of data,
                    154: the remainder of the block is made available for allocation
                    155: to other files.
                    156: For example,
                    157: consider an 11000 byte file stored on
                    158: a 4096/1024 byte file system.
                    159: This file uses two full size blocks and a 3072 byte fragment.
                    160: If no fragments with at least 3072 bytes
                    161: are available when the file is created,
                    162: a full size block is split yielding the necessary 3072 byte
                    163: fragment and an unused 1024 byte fragment.
                    164: This remaining fragment can be allocated to another file, as needed.
                    165: .NH 2
                    166: Updates to the file system
                    167: .PP
                    168: Every working day hundreds of files
                    169: are created, modified, and removed.
                    170: Every time a file is modified,
                    171: the operating system performs a
                    172: series of file system updates.
                    173: These updates, when written on disk, yield a consistent file system.
                    174: The file system stages
                    175: all modifications of critical information;
                    176: modification can
                    177: either be completed or cleanly backed out after a crash.
                    178: Knowing the information that is first written to the file system,
                    179: deterministic procedures can be developed to
                    180: repair a corrupted file system.
                    181: To understand this process,
                    182: the order that the update
                    183: requests were being honored must first be understood.
                    184: .PP
                    185: When a user program does an operation to change the file system,
                    186: such as a 
                    187: .I write ,
                    188: the data to be written is copied into an internal
                    189: .I "in-core"
                    190: buffer in the kernel.
                    191: Normally, the disk update is handled asynchronously;
                    192: the user process is allowed to proceed even though
                    193: the data has not yet been written to the disk.
                    194: The data,
                    195: along with the inode information reflecting the change,
                    196: is eventually written out to disk.
                    197: The real disk write may not happen until long after the
                    198: .I write
                    199: system call has returned.
                    200: Thus at any given time, the file system,
                    201: as it resides on the disk,
                    202: lags the state of the file system represented by the in-core information.
                    203: .PP
                    204: The disk information is updated to reflect the in-core information
                    205: when the buffer is required for another use,
                    206: when a
                    207: .I sync (2)
                    208: is done (at 30 second intervals) by
                    209: .I "/etc/update" "(8),"
                    210: or by manual operator intervention with the
                    211: .I sync (8)
                    212: command.
                    213: If the system is halted without writing out the in-core information,
                    214: the file system on the disk will be in an inconsistent state.
                    215: .PP
                    216: If all updates are done asynchronously, several serious
                    217: inconsistencies can arise.
                    218: One inconsistency is that a block may be claimed by two inodes.
                    219: Such an inconsistency can occur when the system is halted before
                    220: the pointer to the block in the old inode has been cleared
                    221: in the copy of the old inode on the disk,
                    222: and after the pointer to the block in the new inode has been written out
                    223: to the copy of the new inode on the disk.
                    224: Here,
                    225: there is no deterministic method for deciding
                    226: which inode should really claim the block.
                    227: A similar problem can arise with a multiply claimed inode.
                    228: .PP
                    229: The problem with asynchronous inode updates
                    230: can be avoided by doing all inode deallocations synchronously. 
                    231: Consequently,
                    232: inodes and indirect blocks are written to the disk synchronously
                    233: (\fIi.e.\fP the process blocks until the information is
                    234: really written to disk)
                    235: when they are being deallocated.
                    236: Similarly inodes are kept consistent by synchronously
                    237: deleting, adding, or changing directory entries.
                    238: .ds RH Fixing corrupted file systems

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.