|
|
1.1 root 1: +---------------------------------------------------------------------------+
2: | wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |
3: | |
4: | Copyright (C) 1992 W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |
5: | Australia. E-mail [email protected] |
6: | |
7: | This program is free software; you can redistribute it and/or modify |
8: | it under the terms of the GNU General Public License version 2 as |
9: | published by the Free Software Foundation. |
10: | |
11: | This program is distributed in the hope that it will be useful, |
12: | but WITHOUT ANY WARRANTY; without even the implied warranty of |
13: | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
14: | GNU General Public License for more details. |
15: | |
16: | You should have received a copy of the GNU General Public License |
17: | along with this program; if not, write to the Free Software |
18: | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
19: | |
20: +---------------------------------------------------------------------------+
21:
22:
23: ***NOTE*** THIS SHOULD BE REGARDED AS AN ALPHA TEST VERSION
24: (although the beta version may be identical)
25:
26:
27: wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
28: which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
29: in turn based upon emu387 which was written by DJ Delorie for djgpp.
30: The interface to the Linux kernel is based upon the original Linux
31: math emulator by Linus Torvalds.
32:
33: My target FPU for wm-FPU-emu is that described in the Intel486
34: Programmer's Reference Manual (1992 edition). Numerous facets of the
35: functioning of the FPU are not well covered in the Reference Manual;
36: in the absence of clear details I have made guesses about the most
37: reasonable behaviour.
38:
39: wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.
40: See "Limitations" later in this file for a partial list of some
41: differences. I believe that the missing features are never used by
42: normal C or FORTRAN programs.
43:
44: Please report bugs, etc to me at:
45: [email protected]
46:
47:
48: --Bill Metzenthen
49: Oct 1992
50:
51: ----------------------- Internals of wm-FPU-emu -----------------------
52:
53: Numeric algorithms:
54: (1) Add, subtract, and multiply. Nothing remarkable in these.
55: (2) Divide has been tuned to get reasonable performance. The algorithm
56: is not the obvious one which most people seem to use, but is designed
57: to take advantage of the characteristics of the 80386. I expect that
58: it has been invented many times before I discovered it, but I have not
59: seen it. It is based upon one of those ideas which one carries around
60: for years without ever bothering to check it out.
61: (3) The sqrt function has been tuned to get good performance. It is based
62: upon Newton's classic method. Performance was improved by capitalizing
63: upon the properties of Newton's method, and the code is once again
64: structured taking account of the 80386 characteristics.
65: (4) The trig, log, and exp functions are based in each case upon quasi-
66: "optimal" polynomial approximations. My definition of "optimal" was
67: based upon getting good accuracy with reasonable speed.
68:
69: The code of the emulator is complicated slightly by the need to
70: account for a limited form of re-entrancy. Normally, the emulator will
71: emulate each FPU instruction to completion without interruption.
72: However, it may happen that when the emulator is accessing the user
73: memory space, swapping may be needed. In this case the emulator may be
74: temporarily suspended while disk i/o takes place. During this time
75: another process may use the emulator, thereby changing some static
76: variables (eg FPU_st0_ptr, etc). The code which accesses user memory
77: is confined to five files:
78: fpu_entry.c
79: reg_ld_str.c
80: load_store.c
81: get_address.c
82: errors.c
83:
84: ----------------------- Limitations of wm-FPU-emu -----------------------
85:
86: There are a number of differences between the current wm-FPU-emu
87: (version ALPHA 0.7) and the 80486 FPU (apart from bugs). Some of the
88: more important differences are listed below:
89:
90: Internal computations do not use de-normal numbers (but External
91: de-normals ARE recognised and generated). The design of wm-FPU-emu
92: allows a larger exponent range than the 80486 FPU for internal
93: computations.
94:
95: All computations are performed at full 64 bit precision (the PC bits
96: of the FPU control word are ignored). Under Linux, the FPU normally
97: runs at 64 bits precision.
98:
99: The precision flag (PE of the FPU status word) is not implemented.
100: Does anyone write code which uses this feature?
101:
102: The Roundup flag (C1) is not implemented.
103:
104: The functions which load/store the FPU state are partially implemented,
105: but the implementation should be sufficient for handling FPU errors etc
106: in 32 bit protected mode.
107:
108: ----------------------- Performance of wm-FPU-emu -----------------------
109:
110: Speed.
111: -----
112:
113: The speed of floating point computation with the emulator will depend
114: upon instruction mix. Relative performance is best for the instructions
115: which require most computation. The simple instructions are adversely
116: affected by the fpu instruction trap overhead.
117:
118:
119: Timing: Some simple timing tests have been made on the emulator functions.
120: The times include load/store instructions. All times are in microseconds
121: measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
122: ms-dos, the next two columns are for emulators running with the djgpp
123: ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
124: using libm4.0 (hard).
125:
126: function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu
127:
128: + 60.5 154.8 76.5 139.4
129: - 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7
130: * 71.0 190.8 79.6 146.6
131: / 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1
132:
133: sin() 310.8 4692.0 319.0 398.5
134: cos() 284.4 4855.2 308.0 388.7
135: tan() 495.0 8807.1 394.9 504.7
136: atan() 328.9 4866.4 601.1 419.5-491.9
137:
138: sqrt() 128.7 crashed 145.2 227.0
139: log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1
140: exp() 479.1 6619.2 469.1 850.8
141:
142:
143: The performance under Linux is improved by the use of look-ahead code.
144: The following results show the improvement which is obtained under
145: Linux due to the look-ahead code. Also given are the times for the
146: original Linux emulator with the 4.1 'soft' lib.
147:
148: [ Linus' note: I changed look-ahead to be the default under linux, as
149: there was no reason not to use it after I had edited it to be
150: disabled during tracing ]
151:
152: wm-FPU-emu w original w
153: look-ahead 'soft' lib
154: + 106.4 190.2
155: - 108.6-111.6 192.4-216.2
156: * 113.4 193.1
157: / 108.8-124.4 700.1-706.2
158:
159: sin() 390.5 2642.0
160: cos() 381.5 2767.4
161: tan() 496.5 3153.3
162: atan() 367.2-435.5 2439.4-3396.8
163:
164: sqrt() 195.1 4732.5
165: log() 358.0-387.5 3359.2-3390.3
166: exp() 619.3 4046.4
167:
168:
169: ----------------------- Accuracy of wm-FPU-emu -----------------------
170:
171:
172: Accuracy: The following table gives the accuracy of the sqrt(), trig
173: and log functions. Each function was tested at about 400 points. Ideal
174: results would be 64 bits. The reduced accuracy of cos() and tan() for
175: arguments greater than pi/4 can be thought of as being due to the
176: precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
177: accurate to 64 bits can result in a relative accuracy in cos() of about
178: 64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
179: in the last column.
180:
181:
182: Function Tested x range Worst result (bits) Turbo C
183:
184: sqrt(x) 1 .. 2 64.1 63.2
185: atan(x) 1e-10 .. 200 62.6 62.8
186: cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4
187: 35.2 (x = pi/2-(1e-10)) 31.9
188: sin(x) 1e-10 .. pi/2 63.0 62.8
189: tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1
190: 35.2 (x = pi/2-(1e-10)) 31.9
191: exp(x) 0 .. 1 63.1 62.9
192: log(x) 1+1e-6 .. 2 62.4 62.1
193:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.