|
|
1.1 root 1: License to copy and use this document and the software described
2: herein is granted provided it is identified as the "RSA Data
3: Security, Inc. MD4 Message Digest Algorithm" in all materials
4: mentioning or referencing this software, function, or document.
5:
6: License is also granted to make derivative works provided that such
7: works are identified as "derived from the RSA Data Security, Inc. MD4
8: Message Digest Algorithm" in all material mentioning or referencing
9: the derived work.
10:
11: RSA Data Security, Inc. makes no representations concerning the
12: merchantability of this algorithm or software or their suitability
13: for any specific purpose. It is provided "as is" without express or
14: implied warranty of any kind.
15:
16: These notices must be retained in any copies of any part of this
17: documentation and/or software.
18: ------------------------------------------------------------------------
19:
20: The MD4 Message Digest Algorithm
21: --------------------------------
22: by Ronald L. Rivest
23: MIT Laboratory for Computer Science, Cambridge, Mass. 02139
24: and
25: RSA Data Security, Inc., Redwood City, California 94065
26: (Version 2/17/90 -- Revised)
27:
28:
29: Abstract:
30: ---------
31:
32: This note describes the MD4 message digest algorithm. The algorithm
33: takes as input an input message of arbitrary length and produces as
34: output a 128-bit ``fingerprint'' or ``message digest'' of the input.
35: It is conjectured that it is computationally infeasible to produce two
36: messages having the same message digest, or to produce any message
37: having a given prespecified target message digest. The MD4 algorithm
38: is thus ideal for digital signature applications, where a large file
39: must be ``compressed'' in a secure manner before being signed with the
40: RSA public-key cryptosystem.
41:
42: The MD4 algorithm is designed to be quite fast on 32-bit machines. On
43: a SUN Sparc station, MD4 runs at 1,450,000 bytes/second. On a DEC
44: MicroVax II, MD4 runs at approximately 70,000 bytes/second. On a 20MHz
45: 80286, MD4 runs at approximately 32,000 bytes/second. In addition, the
46: MD4 algorithm does not require any large substitution tables; the
47: algorithm can be coded quite compactly.
48:
49: The MD4 algorithm is being placed in the public domain for review and
50: possible adoption as a standard.
51:
52: (Note: The document supersedes an earlier draft. The algorithm described
53: here is a slight modification of the one described in the draft.)
54:
55:
56: I. Terminology and Notation
57: ---------------------------
58:
59: In this note a ``word'' is a 32-bit quantity and a byte is an 8-byte
60: quantity. A sequence of bits can be interpreted in a natural manner
61: as a sequence of bytes, where each consecutive group of 8 bits is
62: interpreted as a byte with the high-order (most significant) bit of
63: each byte listed first. Similarly, a sequence of bytes can be
64: interpreted as a sequence of 32-bit words, where each consecutive
65: group of 4 bytes is interpreted as a word with the low-order (least
66: significant) byte given first.
67:
68: Let x_i denote ``x sub i''. If the subscript is an expression, we
69: surround it in braces, as in x_{i+1}. Similarly, we use ^ for
70: superscripts (exponentiation), so that x^i denotes x to the i-th
71: power.
72:
73: Let the symbol ``+'' denote addition of words (i.e., modulo-2^32
74: addition). Let X <<< s denote the 32-bit value obtained by circularly
75: shifting (rotating) X left by s bit positions. Let not(X) denote the
76: bit-wise complement of X, and let X v Y denote the bit-wise OR of X
77: and Y. Let X xor Y denote the bit-wise XOR of X and Y, and let XY
78: denote the bit-wise AND of X and Y.
79:
80:
81: II. MD4 Algorithm Description
82: -----------------------------
83:
84: We begin by supposing that we have a b-bit message as input, and
85: that we wish to find its message digest. Here b is an arbitrary
86: nonnegative integer; b may be zero, it need not be a multiple of 8,
87: and it may be arbitrarily large. We imagine the bits of the message
88: written down as follows:
89:
90: m_0 m_1 ... m_{b-1} .
91:
92: The following five steps are performed to compute the message digest of the
93: message.
94:
95:
96: Step 1. Append padding bits
97: ---------------------------
98:
99: The message is ``padded'' (extended) so that its length (in bits) is
100: congruent to 448, modulo 512. That is, the message is extended so
101: that it is just 64 bits shy of being a multiple of 512 bits long.
102: Padding is always performed, even if the length of the message is
103: already congruent to 448, modulo 512 (in which case 512 bits of
104: padding are added).
105:
106: Padding is performed as follows: a single ``1'' bit is appended to the
107: message, and then enough zero bits are appended so that the length in bits
108: of the padded message becomes congruent to 448, modulo 512.
109:
110:
111: Step 2. Append length
112: ---------------------
113:
114: A 64-bit representation of b (the length of the message before the
115: padding bits were added) is appended to the result of the previous
116: step. In the unlikely event that b is greater than 2^64, then only
117: the low-order 64 bits of b are used. (These bits are appended as two
118: 32-bit words and appended low-order word first in accordance with the
119: previous conventions.)
120:
121: At this point the resulting message (after padding with bits and with
122: b) has a length that is an exact multiple of 512 bits. Equivalently,
123: this message has a length that is an exact multiple of 16 (32-bit)
124: words. Let M[0 ... N-1] denote the words of the resulting message,
125: where N is a multiple of 16.
126:
127:
128: Step 3. Initialize MD buffer
129: ----------------------------
130:
131: A 4-word buffer (A,B,C,D) is used to compute the message digest. Here
132: each of A,B,C,D are 32-bit registers. These registers are initialized
133: to the following values (in hexadecimal, low-order bytes first):
134:
135: word A: 01 23 45 67
136: word B: 89 ab cd ef
137: word C: fe dc ba 98
138: word D: 76 54 32 10
139:
140:
141: Step 4. Process message in 16-word blocks
142: -----------------------------------------
143:
144: We first define three auxiliary functions that each take as input
145: three 32-bit words and produce as output one 32-bit word.
146:
147: f(X,Y,Z) = XY v not(X)Z
148: g(X,Y,Z) = XY v XZ v YZ
149: h(X,Y,Z) = X xor Y xor Z
150:
151: In each bit position f acts as a conditional: if x then y else z.
152: (The function f could have been defined using + instead of v since XY
153: and not(X)Z will never have 1's in the same bit position.) In each
154: bit position g acts as a majority function: if at least two of x, y, z
155: are on, then g has a one in that bit position, else g has a zero. It
156: is interesting to note that if the bits of X, Y, and Z are independent
157: and unbiased, the each bit of f(X,Y,Z) will be independent and
158: unbiased, and similarly each bit of g(X,Y,Z) will be independent and
159: unbiased. The function h is the bit-wise ``xor'' or ``parity'' function;
160: it has properties similar to those of f and g.
161:
162: Do the following:
163:
164: For i = 0 to N/16-1 do /* process each 16-word block */
165: For j = 0 to 15 do: /* copy block i into X */
166: Set X[j] to M[i*16+j].
167: end /* of loop on j */
168: Save A as AA, B as BB, C as CC, and D as DD.
169:
170: [Round 1]
171: Let [A B C D i s] denote the operation
172: A = (A + f(B,C,D) + X[i]) <<< s .
173: Do the following 16 operations:
174: [A B C D 0 3]
175: [D A B C 1 7]
176: [C D A B 2 11]
177: [B C D A 3 19]
178: [A B C D 4 3]
179: [D A B C 5 7]
180: [C D A B 6 11]
181: [B C D A 7 19]
182: [A B C D 8 3]
183: [D A B C 9 7]
184: [C D A B 10 11]
185: [B C D A 11 19]
186: [A B C D 12 3]
187: [D A B C 13 7]
188: [C D A B 14 11]
189: [B C D A 15 19]
190:
191: [Round 2]
192: Let [A B C D i s] denote the operation
193: A = (A + g(B,C,D) + X[i] + 5A827999) <<< s .
194: (The value 5A..99 is a hexadecimal 32-bit constant, written with
195: the high-order digit first. This constant represents the square
196: root of 2. The octal value of this constant is 013240474631.
197: See Knuth, The Art of Programming, Volume 2
198: (Seminumerical Algorithms), Second Edition (1981), Addison-Wesley.
199: Table 2, page 660.)
200: Do the following 16 operations:
201: [A B C D 0 3]
202: [D A B C 4 5]
203: [C D A B 8 9]
204: [B C D A 12 13]
205: [A B C D 1 3]
206: [D A B C 5 5]
207: [C D A B 9 9]
208: [B C D A 13 13]
209: [A B C D 2 3]
210: [D A B C 6 5]
211: [C D A B 10 9]
212: [B C D A 14 13]
213: [A B C D 3 3]
214: [D A B C 7 5]
215: [C D A B 11 9]
216: [B C D A 15 13]
217:
218: [Round 3]
219: Let [A B C D i s] denote the operation
220: A = (A + h(B,C,D) + X[i] + 6ED9EBA1) <<< s
221: (The value 6E..A1 is a hexadecimal 32-bit constant, written with
222: the high-order digit first. This constant represents the square
223: root of 3. The octal value of this constant is 015666365641.
224: See Knuth, The Art of Programming, Volume 2
225: (Seminumerical Algorithms), Second Edition (1981), Addison-Wesley.
226: Table 2, page 660.)
227: Do the following 16 operations:
228: [A B C D 0 3]
229: [D A B C 8 9]
230: [C D A B 4 11]
231: [B C D A 12 15]
232: [A B C D 2 3]
233: [D A B C 10 9]
234: [C D A B 6 11]
235: [B C D A 14 15]
236: [A B C D 1 3]
237: [D A B C 9 9]
238: [C D A B 5 11]
239: [B C D A 13 15]
240: [A B C D 3 3]
241: [D A B C 11 9]
242: [C D A B 7 11]
243: [B C D A 15 15]
244:
245: Then perform the following additions:
246: A = A + AA
247: B = B + BB
248: C = C + CC
249: D = D + DD
250: (That is, each of the four registers is incremented by the value it had
251: before this block was started.)
252:
253: end /* of loop on i */
254:
255:
256: Step 5. Output
257: --------------
258:
259: The message digest produced as output is A,B,C,D.
260: That is, we begin with the low-order byte of A, and end with the
261: high-order byte of D.
262:
263: This completes the description of MD4. A reference implementation in
264: C is given in the Appendix.
265:
266:
267: III. Extensions
268: ---------------
269:
270: If more than 128 bits of output are required, then the following
271: procedure is recommended to obtain a 256-bit output. (There is no
272: provision made for obtaining more than 256 bits.)
273:
274: Two copies of MD4 are run in parallel over the input. The first copy
275: is standard as described above. The second copy is modified as follows.
276:
277: The initial state of the second copy is:
278: word A: 00 11 22 33
279: word B: 44 55 66 77
280: word C: 88 99 aa bb
281: word D: cc dd ee ff
282:
283: The magic constants in rounds 2 and 3 for the second copy of MD4 are
284: changed from sqrt(2) and sqrt(3) to cuberoot(2) and cuberoot(3):
285: Octal Hex
286: Round 2 constant 012050505746 50a28be6
287: Round 3 constant 013423350444 5c4dd124
288:
289: Finally, after every 16-word block is processed (including the last
290: block), the values of the A registers in the two copies are exchanged.
291:
292: The final message digest is obtaining by appending the result of the
293: second copy of MD4 to the end of the result of the first copy of MD4.
294:
295:
296: IV. Summary
297: ------------
298:
299: The MD4 message digest algorithm is simple to implement, and provides
300: a ``fingerprint'' or message digest of a message of arbitrary length.
301:
302: It is conjectured that the difficulty of coming up with two messages
303: having the same message digest is on the order of 2^64 operations, and
304: that the difficulty of coming up with any message having a given
305: message digest is on the order of 2^128 operations. The MD4 algorithm
306: has been carefully scrutinized for weaknesses. It is, however, a
307: relatively new algorithm and further security analysis is of course
308: justified, as is the case with any new proposal of this sort. The
309: level of security provided by MD4 should be sufficient for
310: implementing very high security hybrid digital signature schemes based
311: on MD4 and the RSA public-key cryptosystem.
312:
313: V. Acknowledgements
314: -------------------
315:
316: I'd like to thank Don Coppersmith, Burt Kaliski, Ralph Merkle, and
317: Noam Nisan for numerous helpful comments and suggestions.
318:
319:
320: APPENDIX. Reference Implementation
321: ----------------------------------
322:
323: This appendix contains the following files:
324: md4.h -- a header file for using the MD4 implementation
325: md4.c -- the source code for the MD4 routines
326: md4driver.c -- a sample ``user'' routine
327: session -- sample results of running md4driver
328:
329:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.