|
|
1.1 root 1: .\" Copyright (c) 1990 Regents of the University of California.
2: .\" All rights reserved. The Berkeley software License Agreement
3: .\" specifies the terms and conditions for redistribution.
4: .\"
5: .\" @(#)crash.8 5.1 (Berkeley) 6/29/90
6: .\"
7: .TH CRASH 8 "June 29, 1990"
8: .UC 7
9: .SH NAME
10: crash \- what happens when the system crashes
11: .SH DESCRIPTION
12: This section explains what happens when the system crashes
13: and (very briefly) how to analyze crash dumps.
14: .PP
15: When the system crashes voluntarily it prints a message of the form
16: .IP
17: panic: why i gave up the ghost
18: .LP
19: on the console, takes a dump on a mass storage peripheral,
20: and then invokes an automatic reboot procedure as
21: described in
22: .IR reboot (8).
23: Unless some unexpected inconsistency is encountered in the state
24: of the file systems due to hardware or software failure, the system
25: will then resume multi-user operations.
26: .PP
27: The system has a large number of internal consistency checks; if one
28: of these fails, then it will panic with a very short message indicating
29: which one failed.
30: In many instances, this will be the name of the routine which detected
31: the error, or a two-word description of the inconsistency.
32: A full understanding of most panic messages requires perusal of the
33: source code for the system.
34: .PP
35: The most common cause of system failures is hardware failure, which
36: can reflect itself in different ways. Here are the messages which
37: are most likely, with some hints as to causes.
38: Left unstated in all cases is the possibility that hardware or software
39: error produced the message in some unexpected way.
40: .TP
41: .B iinit
42: This cryptic panic message results from a failure to mount the root filesystem
43: during the bootstrap process.
44: Either the root filesystem has been corrupted,
45: or the system is attempting to use the wrong device as root filesystem.
46: Usually, an alternate copy of the system binary or an alternate root
47: filesystem can be used to bring up the system to investigate.
48: .TP
49: .B Can't exec /etc/init
50: This is not a panic message, as reboots are likely to be futile.
51: Late in the bootstrap procedure, the system was unable to locate
52: and execute the initialization process,
53: .IR init (8).
54: The root filesystem is incorrect or has been corrupted, or the mode
55: or type of /etc/init forbids execution.
56: .TP
57: .B IO err in push
58: .ns
59: .TP
60: .B hard IO err in swap
61: The system encountered an error trying to write to the paging device
62: or an error in reading critical information from a disk drive.
63: The offending disk should be fixed if it is broken or unreliable.
64: .TP
65: .B realloccg: bad optim
66: .ns
67: .TP
68: .B ialloc: dup alloc
69: .ns
70: .TP
71: .B alloccgblk: cyl groups corrupted
72: .ns
73: .TP
74: .B ialloccg: map corrupted
75: .ns
76: .TP
77: .B free: freeing free block
78: .ns
79: .TP
80: .B free: freeing free frag
81: .ns
82: .TP
83: .B ifree: freeing free inode
84: .ns
85: .TP
86: .B alloccg: map corrupted
87: These panic messages are among those that may be produced
88: when filesystem inconsistencies are detected.
89: The problem generally results from a failure to repair damaged filesystems
90: after a crash, hardware failures, or other condition that should not
91: normally occur.
92: A filesystem check will normally correct the problem.
93: .TP
94: .B timeout table overflow
95: .ns
96: This really shouldn't be a panic, but until the data structure
97: involved is made to be extensible, running out of entries causes a crash.
98: If this happens, make the timeout table bigger.
99: .TP
100: .B "trap type %d, code = %x, v = %x"
101: An unexpected trap has occurred within the system; the trap types are:
102: .sp
103: .nf
104: 0 bus error
105: 1 address error
106: 2 illegal instruction
107: 3 divide by zero
108: 4 \fIchk\fP instruction
109: 5 \fItrapv\fP instruction
110: 6 privileged instruction
111: 7 trace trap
112: 8 MMU fault
113: 9 simulated software interrupt
114: 10 format error
115: 11 FP coprocessor fault
116: 12 coprocessor fault
117: 13 simulated AST
118: .fi
119: .sp
120: The favorite trap type in system crashes is trap type 8,
121: indicating a wild reference.
122: ``code'' (hex) is the concatenation of the MMU status register
123: (see <hp300/cpu.h>)
124: in the high 16 bits and the 68020 special status word
125: (see the 68020 manual, page 6-17)
126: in the low 16.
127: ``v'' (hex) is the virtual address which caused the fault.
128: Additionally, the kernel will dump about a screenful of semi-useful
129: information.
130: ``pid'' (decimal) is the process id of the process running at the
131: time of the exception.
132: Note that if we panic in an interrupt routine,
133: this process may not be related to the panic.
134: ``ps'' (hex) is the 68020 processor status register ``ps''.
135: ``pc'' (hex) is the value of the program counter saved
136: on the hardware exception frame.
137: It may
138: .I not
139: be the PC of the instruction causing the fault.
140: ``sfc'' and ``dfc'' (hex) are the 68020 source/destination function codes.
141: They should always be one.
142: ``p0'' and ``p1'' are the VAX-like region registers.
143: They are of the form:
144: .sp
145: <length> '@' <kernel VA>
146: .sp
147: where both are in hex.
148: Following these values are a dump of the processor registers (hex).
149: Finally, is a dump of the stack (user/kernel) at the time of the offense.
150: .TP
151: .B init died
152: The system initialization process has exited. This is bad news, as no new
153: users will then be able to log in. Rebooting is the only fix, so the
154: system just does it right away.
155: .TP
156: .B out of mbufs: map full
157: The network has exhausted its private page map for network buffers.
158: This usually indicates that buffers are being lost, and rather than
159: allow the system to slowly degrade, it reboots immediately.
160: The map may be made larger if necessary.
161: .PP
162: That completes the list of panic types you are likely to see.
163: .PP
164: When the system crashes it writes (or at least attempts to write)
165: an image of memory into the back end of the dump device,
166: usually the same as the primary swap
167: area. After the system is rebooted, the program
168: .IR savecore (8)
169: runs and preserves a copy of this core image and the current
170: system in a specified directory for later perusal. See
171: .IR savecore (8)
172: for details.
173: .PP
174: To analyze a dump you should begin by running
175: .IR adb (1)
176: with the
177: .B \-k
178: flag on the system load image and core dump.
179: If the core image is the result of a panic,
180: the panic message is printed.
181: Normally the command
182: ``$c''
183: will provide a stack trace from the point of
184: the crash and this will provide a clue as to
185: what went wrong.
186: A more complete discussion
187: of system debugging is impossible here.
188: See, however,
189: ``Using ADB to Debug the UNIX Kernel''.
190: .SH "SEE ALSO"
191: adb(1),
192: reboot(8)
193: .br
194: .I "MC68020 32-bit Microprocessor User's Manual"
195: .br
196: .I "Using ADB to Debug the UNIX Kernel"
197: .br
198: .I "4.3BSD for the HP300"
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.