|
|
1.1 root 1: .\" Copyright (c) 1980, 1987 Regents of the University of California.
2: .\" All rights reserved. The Berkeley software License Agreement
3: .\" specifies the terms and conditions for redistribution.
4: .\"
5: .\" @(#)uda.4 6.5 (Berkeley) 6/13/88
6: .\"
7: .TH UDA 4 "June 13, 1988"
8: .UC 4
9: .SH NAME
10: uda \- UDA50 disk controller interface
11: .SH SYNOPSIS
12: .B "controller uda0 at uba0 csr 0172150 vector udintr"
13: .br
14: .B "disk ra0 at uda0 drive 0"
15: .br
16: .B "options MSCP_PARANOIA"
17: .SH DESCRIPTION
18: This is a driver for the DEC UDA50 disk controller and other
19: compatible controllers. The UDA50 communicates with the host through
20: a packet protocol known as the Mass Storage Control Protocol (MSCP).
21: Consult the file
22: .RI < vax/mscp.h >
23: for a detailed description of this protocol.
24: .PP
25: Files with minor device numbers 0 through 7 refer to various portions
26: of drive 0; minor devices 8 through 15 refer to drive 1, etc. The
27: standard device names begin with `ra' followed by the drive number
28: and then a letter a-h for partitions 0-7 respectively.
29: The character ? stands here for a drive number in the range 0-7.
30: .PP
31: The block files access the disk via the system's normal buffering
32: mechanism mechanism and may be read and written without regard to
33: physical disk records. There is also a `raw' interface which provides
34: for direct transmission between the disk and the user's read or write
35: buffer. A single read or write call results in exactly one I/O
36: operation and therefore raw I/O is considerably more efficient when
37: many words are transmitted. The names of the raw files conventionally
38: begin with an extra `r'.
39: .PP
40: In raw I/O counts should be a multiple of 512 bytes (a disk sector).
41: Likewise
42: .I seek
43: calls should specify a multiple of 512 bytes.
44: .PP
45: The
46: .B MSCP_PARANOIA
47: option enables runtime checking on all transfer completion responses
48: from the controller. This increases disk I/O overhead and may
49: be undesirable on slow machines, but is otherwise recommended.
50: .PP
51: The first sector of each disk contains both a first-stage bootstrap program
52: and a disk label containing geometry information and partition layouts (see
53: .IR disklabel (5).
54: This sector is normally write-protected, and disk-to-disk copies should
55: avoid copying this sector.
56: The label may be updated with
57: .IR disklabel (8),
58: which can also be used to write-enable and write-disable the sector.
59: The next 15 sectors contain a second-stage bootstrap program.
60: .SH "DISK SUPPORT"
61: During autoconfiguration,
62: as well as when a drive is opened after all partitions are closed,
63: the first sector of the drive is examined for a disk label.
64: If a label is found, the geometry of the drive and the partition tables
65: are taken from it.
66: If no label is found,
67: the driver configures the type of each drive when it is first
68: encountered. A default partition table in the driver is used for each type
69: of disk when a pack is not labelled. The origin and size
70: (in sectors) of the default pseudo-disks on each
71: drive are shown below. Not all partitions begin on cylinder
72: boundaries, as on other drives, because previous drivers used one
73: partition table for all drive types. Variants of the partition tables
74: are common; check the driver and the file
75: .IR /etc/disktab ( disktab (5))
76: for other possibilities.
77: .PP
78: .nf
79: .ta .5i +\w'000000 'u +\w'000000 'u +\w'000000 'u +\w'000000 'u
80: .PP
81: RA60 partitions
82: disk start length
83: ra?a 0 15884
84: ra?b 15884 33440
85: ra?c 0 400176
86: ra?d 49324 82080 same as 4.2BSD ra?g
87: ra?e 131404 268772 same as 4.2BSD ra?h
88: ra?f 49324 350852
89: ra?g 242606 157570
90: ra?h 49324 193282
91: .PP
92: RA70 partitions
93: disk start length
94: ra?a 0 15884
95: ra?b 15972 33440
96: ra?c 0 547041
97: ra?d 34122 15884
98: ra?e 357192 55936
99: ra?f 413457 133584
100: ra?g 341220 205821
101: ra?h 49731 29136
102: .PP
103: RA80 partitions
104: disk start length
105: ra?a 0 15884
106: ra?b 15884 33440
107: ra?c 0 242606
108: ra?e 49324 193282 same as old Berkeley ra?g
109: ra?f 49324 82080 same as 4.2BSD ra?g
110: ra?g 49910 192696
111: ra?h 131404 111202 same as 4.2BSD
112: .PP
113: RA81 partitions
114: disk start length
115: ra?a 0 15884
116: ra?b 16422 66880
117: ra?c 0 891072
118: ra?d 375564 15884
119: ra?e 391986 307200
120: ra?f 699720 191352
121: ra?g 375564 515508
122: ra?h 83538 291346
123: .PP
124: RA81 partitions with 4.2BSD-compatible partitions
125: disk start length
126: ra?a 0 15884
127: ra?b 16422 66880
128: ra?c 0 891072
129: ra?d 49324 82080 same as 4.2BSD ra?g
130: ra?e 131404 759668 same as 4.2BSD ra?h
131: ra?f 412490 478582 same as 4.2BSD ra?f
132: ra?g 375564 515508
133: ra?h 83538 291346
134: .PP
135: RA82 partitions
136: disk start length
137: ra?a 0 15884
138: ra?b 16245 66880
139: ra?c 0 1135554
140: ra?d 375345 15884
141: ra?e 391590 307200
142: ra?f 669390 466164
143: ra?g 375345 760209
144: ra?h 83790 291346
145: .DT
146: .fi
147: .PP
148: The ra?a partition is normally used for the root file system, the ra?b
149: partition as a paging area, and the ra?c partition for pack-pack
150: copying (it maps the entire disk).
151: .SH FILES
152: /dev/ra[0-9][a-f]
153: .br
154: /dev/rra[0-9][a-f]
155: .SH SEE ALSO
156: disklabel(5), disklabel(8)
157: .SH DIAGNOSTICS
158: .TP
159: panic: udaslave
160: No command packets were available while the driver was looking
161: for disk drives. The controller is not extending enough credits
162: to use the drives.
163: .TP
164: uda%d: no response to Get Unit Status request
165: A disk drive was found, but did not respond to a status request.
166: This is either a hardware problem or someone pulling unit number
167: plugs very fast.
168: .TP
169: uda%d: unit %d off line
170: While searching for drives, the controller found one that
171: seems to be manually disabled. It is ignored.
172: .TP
173: uda%d: unable to get unit status
174: Something went wrong while trying to determine the status of
175: a disk drive. This is followed by an error detail.
176: .TP
177: uda%d: unit %d, next %d
178: This probably never happens, but I wanted to know if it did. I
179: have no idea what one should do about it.
180: .TP
181: uda%d: cannot handle unit number %d (max is %d)
182: The controller found a drive whose unit number is too large.
183: Valid unit numbers are those in the range [0..7].
184: .TP
185: ra%d: don't have a partition table for %s; using (s,t,c)=(%d,%d,%d)
186: The controller found a drive whose media identifier (e.g. `RA 25')
187: does not have a default partition table. A temporary partition
188: table containing only an `a' partition has been created covering
189: the entire disk, which has the indicated numbers of sectors per
190: track (s), tracks per cylinder (t), and total cylinders (c).
191: Give the pack a label with the
192: .I disklabel
193: utility.
194: .TP
195: uda%d: uballoc map failed
196: Unibus resource map allocation failed during initialisation. This
197: can only happen if you have 496 devices on a Unibus.
198: .TP
199: uda%d: timeout during init
200: The controller did not initialise within ten seconds. A hardware
201: problem, but it sometimes goes away if you try again.
202: .TP
203: uda%d: init failed, sa=%b
204: The controller refused to initalise.
205: .TP
206: uda%d: controller hung
207: The controller never finished initialisation. Retrying may sometimes
208: fix it.
209: .TP
210: ra%d: drive will not come on line
211: The drive will not come on line, probably because it is spun down.
212: This should be preceded by a message giving details as to why the
213: drive stayed off line.
214: .TP
215: uda%d: still hung
216: When the controller hangs, the driver occasionally tries to reinitialise
217: it. This means it just tried, without success.
218: .TP
219: panic: udastart: bp==NULL
220: A bug in the driver has put an empty drive queue on a controller queue.
221: .TP
222: uda%d: command ring too small
223: If you increase NCMDL2, you may see a performance improvement.
224: (See /sys/vaxuba/uda.c.)
225: .TP
226: panic: udastart
227: A drive was found marked for status or on-line functions while performing
228: status or on-line functions. This indicates a bug in the driver.
229: .TP
230: uda%d: controller error, sa=0%o (%s)
231: The controller reported an error. The error code is printed in
232: octal, along with a short description if the code is known (see the
233: .IR "UDA50 Maintenance Guide" ,
234: DEC part number AA-M185B-TC, pp. 18-22).
235: If this occurs during normal
236: operation, the driver will reset it and retry pending I/O. If
237: it occurs during configuration, the controller may be ignored.
238: .TP
239: uda%d: stray intr
240: The controller interrupted when it should have stayed quiet. The
241: interrupt has been ignored.
242: .TP
243: uda%d: init step %d failed, sa=%b
244: The controller reported an error during the named initialisation step.
245: The driver will retry initialisation later.
246: .TP
247: uda%d: version %d model %d
248: An informational message giving the revision level of the controller.
249: .TP
250: uda%d: DMA burst size set to %d
251: An informational message showing the DMA burst size, in words.
252: .TP
253: panic: udaintr
254: Indicates a bug in the generic MSCP code.
255: .TP
256: uda%d: driver bug, state %d
257: The driver has a bogus value for the controller state. Something
258: is quite wrong. This is immediately followed by a `panic: udastate'.
259: .TP
260: uda%d: purge bdp %d
261: A benign message tracing BDP purges. I have been trying to figure
262: out what BDP purges are for. You might want to comment out this
263: call to log() in /sys/vaxuba/uda.c.
264: .TP
265: .RI "uda%d: SETCTLRC failed: " detail
266: The Set Controller Characteristics command (the last part of the
267: controller initialisation sequence) failed. The
268: .I detail
269: message tells why.
270: .TP
271: .RI "uda%d: attempt to bring ra%d on line failed: " detail
272: The drive could not be brought on line. The
273: .I detail
274: message tells why.
275: .TP
276: uda%d: ra%d: unknown type %d
277: The type index of the named drive is not known to the driver, so the
278: drive will be ignored.
279: .TP
280: ra%d: changed types! was %d now %d
281: A drive somehow changed from one kind to another, e.g., from an RA80
282: to an RA60. The numbers printed are the encoded media identifiers (see
283: .RI < vax/mscp.h >
284: for the encoding).
285: The driver believes the new type.
286: .TP
287: ra%d: uda%d, unit %d, size = %d sectors
288: The named drive is on the indicated controller as the given unit,
289: and has that many sectors of user-file area. This is printed
290: during configuration.
291: .TP
292: .RI "uda%d: attempt to get status for ra%d failed: " detail
293: A status request failed. The
294: .I detail
295: message should tell why.
296: .TP
297: ra%d: bad block report: %d
298: The drive has reported the given block as bad. If there are multiple
299: bad blocks, the drive will report only the first; in this case this
300: message will be followed by `+ others'. Get DEC to forward the
301: block with EVRLK.
302: .TP
303: ra%d: serious exception reported
304: I have no idea what this really means.
305: .TP
306: panic: udareplace
307: The controller reported completion of a REPLACE operation. The
308: driver never issues any REPLACEs, so something is wrong.
309: .TP
310: panic: udabb
311: The controller reported completion of bad block related I/O. The
312: driver never issues any such, so something is wrong.
313: .TP
314: uda%d: lost interrupt
315: The controller has gone out to lunch, and is being reset to try to bring
316: it back.
317: .TP
318: panic: mscp_go: AEB_MAX_BP too small
319: You defined AVOID_EMULEX_BUG and increased NCMDL2 and Emulex has
320: new firmware. Raise AEB_MAX_BP or turn off AVOID_EMULEX_BUG.
321: .TP
322: uda%d: unit %d: unknown message type 0x%x ignored
323: The controller responded with a mysterious message type. See
324: /sys/vax/mscp.h for a list of known message types. This is probably
325: a controller hardware problem.
326: .TP
327: uda%d: unit %d out of range
328: The disk drive unit number (the unit plug) is higher than the
329: maximum number the driver allows (currently 7).
330: .TP
331: uda%d: unit %d not configured, \fImessage\fP ignored
332: The named disk drive has announced its presence to the controller,
333: but was not, or cannot now be, configured into the running system.
334: .I Message
335: is one of `available attention' (an `I am here' message) or
336: `stray response op 0x%x status 0x%x' (anything else).
337: .TP
338: ra%d: bad lbn (%d)?
339: The drive has reported an invalid command error, probably due to an
340: invalid block number. If the lbn value is very much greater than the
341: size reported by the drive, this is the problem. It is probably due to
342: an improperly configured partition table. Other invalid commands
343: indicate a bug in the driver, or hardware trouble.
344: .TP
345: ra%d: duplicate ONLINE ignored
346: The drive has come on-line while already on-line. This condition
347: can probably be ignored (and has been).
348: .TP
349: ra%d: io done, but no buffer?
350: Hardware trouble, or a bug; the drive has finished an I/O request,
351: but the response has an invalid (zero) command reference number.
352: .TP
353: Emulex SC41/MS screwup: uda%d, got %d correct, then
354: .br
355: .ti -5
356: changed 0x%x to 0x%x
357: .br
358: You turned on AVOID_EMULEX_BUG, and the driver successfully
359: avoided the bug. The number of correctly-handled requests is
360: reported, along with the expected and actual values relating to
361: the bug being avoided.
362: .TP
363: panic: unrecoverable Emulex screwup
364: You turned on AVOID_EMULEX_BUG, but Emulex was too clever and
365: avoided the avoidance. Try turning on MSCP_PARANOIA instead.
366: .TP
367: uda%d: bad response packet ignored
368: You turned on MSCP_PARANOIA, and the driver caught the controller in
369: a lie. The lie has been ignored, and the controller will soon be
370: reset (after a `lost' interrupt). This is followed by a hex dump of
371: the offending packet.
372: .TP
373: ra%d: bogus REPLACE end
374: The drive has reported finishing a bad sector replacement, but the
375: driver never issues bad sector replacement commands. The report
376: is ignored. This is likely a hardware problem.
377: .TP
378: ra%d: unknown opcode 0x%x status 0x%x ignored
379: The drive has reported something that the driver cannot understand.
380: Perhaps DEC has been inventive, or perhaps your hardware is ill.
381: This is followed by a hex dump of the offending packet.
382: .TP
383: \fBra%d%c: hard error %sing fsbn %d [of %d-%d] (ra%d bn %d cn %d tn %d sn %d)\fP.
384: An unrecoverable error occurred during transfer of the specified
385: filesystem block number(s),
386: which are logical block numbers on the indicated partition.
387: If the transfer involved multiple blocks, the block range is printed as well.
388: The parenthesized fields list the actual disk sector number
389: relative to the beginning of the drive,
390: as well as the cylinder, track and sector number of the block.
391: .TP
392: uda%d: %s error datagram
393: The controller has reported some kind of error, either `hard'
394: (unrecoverable) or `soft' (recoverable). If the controller is going on
395: (attempting to fix the problem), this message includes the remark
396: `(continuing)'. Emulex controllers wrongly claim that all soft errors
397: are hard errors. This message may be followed by
398: one of the following 5 messages, depending on its type, and will always
399: be followed by a failure detail message (also listed below).
400: .RS
401: .TP
402: memory addr 0x%x
403: A host memory access error; this is the address that could not be
404: read.
405: .TP
406: unit %d: level %d retry %d, %s %d
407: A typical disk error; the retry count and error recovery levels are
408: printed, along with the block type (`lbn', or logical block; or `rbn',
409: or replacement block) and number. If the string is something else, DEC
410: has been clever, or your hardware has gone to Australia for vacation
411: (unless you live there; then it might be in New Zealand, or Brazil).
412: .TP
413: unit %d: %s %d
414: Also a disk error, but an `SDI' error, whatever that is. (I doubt
415: it has anything to do with Ronald Reagan.) This lists the block
416: type (`lbn' or `rbn') and number. This is followed by a second
417: message indicating a microprocessor error code and a front panel
418: code. These latter codes are drive-specific, and are intended to
419: be used by field service as an aid in locating failing hardware.
420: The codes for RA81s can be found in the
421: .IR "RA81 Maintenance Guide" ,
422: DEC order number AA-M879A-TC, in appendices E and F.
423: .TP
424: unit %d: small disk error, cyl %d
425: Yet another kind of disk error, but for small disks. (`That's what
426: it says, guv'nor. Dunnask me what it means.')
427: .TP
428: unit %d: unknown error, format 0x%x
429: A mysterious error: the given format code is not known.
430: .RE
431: .PP
432: The detail messages are as follows:
433: .RS
434: .TP
435: success (%s) (code 0, subcode %d)
436: Everything worked, but the controller thought it would let you know
437: that something went wrong. No matter what subcode, this can probably
438: be ignored.
439: .TP
440: invalid command (%s) (code 1, subcode %d)
441: This probably cannot occur unless the hardware is out; %s should be
442: `invalid msg length', meaning some command was too short or too long.
443: .TP
444: command aborted (unknown subcode) (code 2, subcode %d)
445: This should never occur, as the driver never aborts commands.
446: .TP
447: unit offline (%s) (code 3, subcode %d)
448: The drive is offline, either because it is not around (`unknown
449: drive'), stopped (`not mounted'), out of order (`inoperative'), has the
450: same unit number as some other drive (`duplicate'), or has been
451: disabled for diagnostics (`in diagnosis').
452: .TP
453: unit available (unknown subcode) (code 4, subcode %d)
454: The controller has decided to report a perfectly normal event as
455: an error. (Why?)
456: .TP
457: media format error (%s) (code 5, subcode %d)
458: The drive cannot be used without reformatting. The Format Control
459: Table cannot be read (`fct unread - edc'), there is a bad sector
460: header (`invalid sector header'), the drive is not set for 512-byte
461: sectors (`not 512 sectors'), the drive is not formatted (`not formatted'),
462: or the FCT has an uncorrectable ECC error (`fct ecc').
463: .TP
464: write protected (%s) (code 6, subcode %d)
465: The drive is write protected, either by the front panel switch
466: (`hardware') or via the driver (`software'). The driver never
467: sets software write protect.
468: .TP
469: compare error (unknown subcode) (code 7, subcode %d)
470: A compare operation showed some sort of difference. The driver
471: never uses compare operations.
472: .TP
473: data error (%s) (code 7, subcode %d)
474: Something went wrong reading or writing a data sector. A `forced
475: error' is a software-asserted error used to mark a sector that contains
476: suspect data. Rewriting the sector will clear the forced error. This
477: is normally set only during bad block replacment, and the driver does
478: no bad block replacement, so these should not occur. A `header
479: compare' error probably means the block is shot. A `sync timeout'
480: presumably has something to do with sector synchronisation.
481: An `uncorrectable ecc' error is an ordinary data error that cannot
482: be fixed via ECC logic. A `%d symbol ecc' error is a data error
483: that can be (and presumably has been) corrected by the ECC logic.
484: It might indicate a sector that is imperfect but usable, or that
485: is starting to go bad. If any of these errors recur, the sector
486: may need to be replaced.
487: .TP
488: host buffer access error (%s) (code %d, subcode %d)
489: Something went wrong while trying to copy data to or from the host
490: (Vax). The subcode is one of `odd xfer addr', `odd xfer count',
491: `non-exist. memory', or `memory parity'. The first two could be a
492: software glitch; the last two indicate hardware problems.
493: .TP
494: controller error (%s) (code %d, subcode %d)
495: The controller has detected a hardware error in itself. A
496: `serdes overrun' is a serialiser / deserialiser overrun; `edc'
497: probably stands for `error detection code'; and `inconsistent
498: internal data struct' is obvious.
499: .TP
500: drive error (%s) (code %d, subcode %d)
501: Either the controller or the drive has detected a hardware error
502: in the drive. I am not sure what an `sdi command timeout' is, but
503: these seem to occur benignly on occasion. A `ctlr detected protocol'
504: error means that the controller and drive do not agree on a protocol;
505: this could be a cabling problem, or a version mismatch. A `positioner'
506: error means the drive seek hardware is ailing; `lost rd/wr ready'
507: means the drive read/write logic is sick; and `drive clock dropout'
508: means that the drive clock logic is bad, or the media is hopelessly
509: scrambled. I have no idea what `lost recvr ready' means. A `drive
510: detected error' is a catch-all for drive hardware trouble; `ctlr
511: detected pulse or parity' errors are often caused by cabling problems.
512: .RE
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.