Annotation of 43BSDTahoe/man/man8/tahoe/crash.8, revision 1.1.1.1

1.1       root        1: .\" Copyright (c) 1980 Regents of the University of California.
                      2: .\" All rights reserved.  The Berkeley software License Agreement
                      3: .\" specifies the terms and conditions for redistribution.
                      4: .\"
                      5: .\"    @(#)crash.8v    6.2 (Berkeley) 5/20/86
                      6: .\"
                      7: .TH CRASH 8V "May 20, 1986"
                      8: .UC 4
                      9: .SH NAME
                     10: crash \- what happens when the system crashes
                     11: .SH DESCRIPTION
                     12: This section explains what happens when the system crashes
                     13: and (very briefly) how to analyze crash dumps.
                     14: .PP
                     15: When the system crashes voluntarily it prints a message of the form
                     16: .IP
                     17: panic: why i gave up the ghost
                     18: .LP
                     19: on the console, takes a dump on a mass storage peripheral,
                     20: and then invokes an automatic reboot procedure as
                     21: described in
                     22: .IR reboot (8).
                     23: (If auto-reboot is disabled on the front panel of the machine the system
                     24: will simply halt at this point.)
                     25: Unless some unexpected inconsistency is encountered in the state
                     26: of the file systems due to hardware or software failure, the system
                     27: will then resume multi-user operations.
                     28: .PP
                     29: The system has a large number of internal consistency checks; if one
                     30: of these fails, then it will panic with a very short message indicating
                     31: which one failed.
                     32: In many instances, this will be the name of the routine which detected
                     33: the error, or a two-word description of the inconsistency.
                     34: A full understanding of most panic messages requires perusal of the
                     35: source code for the system.
                     36: .PP
                     37: The most common cause of system failures is hardware failure, which
                     38: can reflect itself in different ways.  Here are the messages which
                     39: are most likely, with some hints as to causes.
                     40: Left unstated in all cases is the possibility that hardware or software
                     41: error produced the message in some unexpected way.
                     42: .TP
                     43: .B iinit
                     44: This cryptic panic message results from a failure to mount the root filesystem
                     45: during the bootstrap process.
                     46: Either the root filesystem has been corrupted,
                     47: or the system is attempting to use the wrong device as root filesystem.
                     48: Usually, an alternate copy of the system binary or an alternate root
                     49: filesystem can be used to bring up the system to investigate.
                     50: .TP
                     51: .B Can't exec /etc/init
                     52: This is not a panic message, as reboots are likely to be futile.
                     53: Late in the bootstrap procedure, the system was unable to locate
                     54: and execute the initialization process,
                     55: .IR init (8).
                     56: The root filesystem is incorrect or has been corrupted, or the mode
                     57: or type of /etc/init forbids execution.
                     58: .TP
                     59: .B IO err in push
                     60: .ns
                     61: .TP
                     62: .B hard IO err in swap
                     63: The system encountered an error trying to write to the paging device
                     64: or an error in reading critical information from a disk drive.
                     65: The offending disk should be fixed if it is broken or unreliable.
                     66: .TP
                     67: .B realloccg: bad optim
                     68: .ns
                     69: .TP
                     70: .B ialloc: dup alloc
                     71: .ns
                     72: .TP
                     73: .B alloccgblk: cyl groups corrupted
                     74: .ns
                     75: .TP
                     76: .B ialloccg: map corrupted
                     77: .ns
                     78: .TP
                     79: .B free: freeing free block
                     80: .ns
                     81: .TP
                     82: .B free: freeing free frag
                     83: .ns
                     84: .TP
                     85: .B ifree: freeing free inode
                     86: .ns
                     87: .TP
                     88: .B alloccg: map corrupted
                     89: These panic messages are among those that may be produced
                     90: when filesystem inconsistencies are detected.
                     91: The problem generally results from a failure to repair damaged filesystems
                     92: after a crash, hardware failures, or other condition that should not
                     93: normally occur.
                     94: A filesystem check will normally correct the problem.
                     95: .TP
                     96: .B timeout table overflow
                     97: .ns
                     98: This really shouldn't be a panic, but until the data structure
                     99: involved is made to be extensible, running out of entries causes a crash.
                    100: If this happens, make the timeout table bigger.
                    101: .TP
                    102: .B KSP not valid
                    103: .ns
                    104: .TP
                    105: .B SBI fault
                    106: .ns
                    107: .TP
                    108: .B CHM? in kernel
                    109: These indicate either a serious bug in the system or, more often,
                    110: a glitch or failing hardware.
                    111: If SBI faults recur, check out the hardware or call
                    112: field service.  If the other faults recur, there is likely a bug somewhere
                    113: in the system, although these can be caused by a flakey processor.
                    114: Run processor microdiagnostics.
                    115: .TP
                    116: .B machine check %x:
                    117: .I description
                    118: .ns
                    119: .TP
                    120: .I \0\0\0machine dependent machine-check information
                    121: .ns
                    122: Machine checks are different on each type of CPU.
                    123: Most of the internal processor registers are saved at the time of the fault
                    124: and are printed on the console.
                    125: For most processors, there is one line that summarizes the type of machine
                    126: check.
                    127: Often, the nature of the problem is apparent from this messaage
                    128: and/or the contents of key registers.
                    129: The VAX Hardware Handbook should be consulted,
                    130: and, if necessary, your friendly field service people should be informed
                    131: of the problem.
                    132: .TP
                    133: .B trap type %d, code=%x, pc=%x
                    134: A unexpected trap has occurred within the system; the trap types are:
                    135: .sp
                    136: .nf
                    137: 0      reserved addressing fault
                    138: 1      privileged instruction fault
                    139: 2      reserved operand fault
                    140: 3      bpt instruction fault
                    141: 4      xfc instruction fault
                    142: 5      system call trap
                    143: 6      arithmetic trap
                    144: 7      ast delivery trap
                    145: 8      segmentation fault
                    146: 9      protection fault
                    147: 10     trace trap
                    148: 11     compatibility mode fault
                    149: 12     page fault
                    150: 13     page table fault
                    151: .fi
                    152: .sp
                    153: The favorite trap types in system crashes are trap types 8 and 9,
                    154: indicating
                    155: a wild reference.  The code is the referenced address, and the pc at the
                    156: time of the fault is printed.  These problems tend to be easy to track
                    157: down if they are kernel bugs since the processor stops cold, but random
                    158: flakiness seems to cause this sometimes.
                    159: The debugger can be used to locate the instruction and subroutine
                    160: corresponding to the PC value.
                    161: If that is insufficient to suggest the nature of the problem,
                    162: more detailed examination of the system status at the time of the trap
                    163: usually can produce an explanation.
                    164: .TP
                    165: .B init died
                    166: The system initialization process has exited.  This is bad news, as no new
                    167: users will then be able to log in.  Rebooting is the only fix, so the
                    168: system just does it right away.
                    169: .TP
                    170: .B out of mbufs: map full
                    171: The network has exhausted its private page map for network buffers.
                    172: This usually indicates that buffers are being lost, and rather than
                    173: allow the system to slowly degrade, it reboots immediately.
                    174: The map may be made larger if necessary.
                    175: .PP
                    176: That completes the list of panic types you are likely to see.
                    177: .PP
                    178: When the system crashes it writes (or at least attempts to write)
                    179: an image of memory into the back end of the dump device,
                    180: usually the same as the primary swap
                    181: area.  After the system is rebooted, the program
                    182: .IR savecore (8)
                    183: runs and preserves a copy of this core image and the current
                    184: system in a specified directory for later perusal.  See
                    185: .IR savecore (8)
                    186: for details.
                    187: .PP
                    188: To analyze a dump you should begin by running
                    189: .IR adb (1)
                    190: with the 
                    191: .B \-k
                    192: flag on the system load image and core dump.
                    193: If the core image is the result of a panic,
                    194: the panic message is printed.
                    195: Normally the command
                    196: ``$c''
                    197: will provide a stack trace from the point of
                    198: the crash and this will provide a clue as to
                    199: what went wrong.
                    200: A more complete discussion
                    201: of system debugging is impossible here.
                    202: See, however,
                    203: ``Using ADB to Debug the UNIX Kernel''.
                    204: .SH "SEE ALSO"
                    205: adb(1),
                    206: reboot(8)
                    207: .br
                    208: .I "VAX 11/780 System Maintenance Guide"
                    209: and
                    210: .I "VAX Hardware Handbook"
                    211: for more information about machine checks.
                    212: .br
                    213: .I "Using ADB to Debug the UNIX Kernel"

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.