Annotation of coherent/b/kernel/emulator/README, revision 1.1.1.1

1.1       root        1:  +---------------------------------------------------------------------------+
                      2:  |  wm-FPU-emu   an FPU emulator for 80386 and 80486SX microprocessors.      |
                      3:  |                                                                           |
                      4:  | Copyright (C) 1992    W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
                      5:  |                       Australia.  E-mail [email protected]    |
                      6:  |                                                                           |
                      7:  |    This program is free software; you can redistribute it and/or modify   |
                      8:  |    it under the terms of the GNU General Public License version 2 as      |
                      9:  |    published by the Free Software Foundation.                             |
                     10:  |                                                                           |
                     11:  |    This program is distributed in the hope that it will be useful,        |
                     12:  |    but WITHOUT ANY WARRANTY; without even the implied warranty of         |
                     13:  |    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the          |
                     14:  |    GNU General Public License for more details.                           |
                     15:  |                                                                           |
                     16:  |    You should have received a copy of the GNU General Public License      |
                     17:  |    along with this program; if not, write to the Free Software            |
                     18:  |    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.              |
                     19:  |                                                                           |
                     20:  +---------------------------------------------------------------------------+
                     21: 
                     22: 
                     23: ***NOTE***       THIS SHOULD BE REGARDED AS AN ALPHA TEST VERSION
                     24:                  (although the beta version may be identical)
                     25: 
                     26: 
                     27: wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
                     28: which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
                     29: in turn based upon emu387 which was written by DJ Delorie for djgpp.
                     30: The interface to the Linux kernel is based upon the original Linux
                     31: math emulator by Linus Torvalds.
                     32: 
                     33: My target FPU for wm-FPU-emu is that described in the Intel486
                     34: Programmer's Reference Manual (1992 edition). Numerous facets of the
                     35: functioning of the FPU are not well covered in the Reference Manual;
                     36: in the absence of clear details I have made guesses about the most
                     37: reasonable behaviour.
                     38: 
                     39: wm-FPU-emu does not implement all of the behaviour of the 80486 FPU. 
                     40: See "Limitations" later in this file for a partial list of some
                     41: differences.  I believe that the missing features are never used by
                     42: normal C or FORTRAN programs. 
                     43: 
                     44: Please report bugs, etc to me at:
                     45:        [email protected]
                     46: 
                     47: 
                     48: --Bill Metzenthen
                     49:   Oct 1992
                     50: 
                     51: ----------------------- Internals of wm-FPU-emu -----------------------
                     52: 
                     53: Numeric algorithms:
                     54: (1) Add, subtract, and multiply. Nothing remarkable in these.
                     55: (2) Divide has been tuned to get reasonable performance. The algorithm
                     56:     is not the obvious one which most people seem to use, but is designed
                     57:     to take advantage of the characteristics of the 80386. I expect that
                     58:     it has been invented many times before I discovered it, but I have not
                     59:     seen it. It is based upon one of those ideas which one carries around
                     60:     for years without ever bothering to check it out.
                     61: (3) The sqrt function has been tuned to get good performance. It is based
                     62:     upon Newton's classic method. Performance was improved by capitalizing
                     63:     upon the properties of Newton's method, and the code is once again
                     64:     structured taking account of the 80386 characteristics.
                     65: (4) The trig, log, and exp functions are based in each case upon quasi-
                     66:     "optimal" polynomial approximations. My definition of "optimal" was
                     67:     based upon getting good accuracy with reasonable speed.
                     68: 
                     69: The code of the emulator is complicated slightly by the need to
                     70: account for a limited form of re-entrancy. Normally, the emulator will
                     71: emulate each FPU instruction to completion without interruption.
                     72: However, it may happen that when the emulator is accessing the user
                     73: memory space, swapping may be needed. In this case the emulator may be
                     74: temporarily suspended while disk i/o takes place. During this time
                     75: another process may use the emulator, thereby changing some static
                     76: variables (eg FPU_st0_ptr, etc). The code which accesses user memory
                     77: is confined to five files:
                     78:     fpu_entry.c
                     79:     reg_ld_str.c
                     80:     load_store.c
                     81:     get_address.c
                     82:     errors.c
                     83: 
                     84: ----------------------- Limitations of wm-FPU-emu -----------------------
                     85: 
                     86: There are a number of differences between the current wm-FPU-emu
                     87: (version ALPHA 0.7) and the 80486 FPU (apart from bugs). Some of the
                     88: more important differences are listed below:
                     89: 
                     90: Internal computations do not use de-normal numbers (but External
                     91: de-normals ARE recognised and generated). The design of wm-FPU-emu
                     92: allows a larger exponent range than the 80486 FPU for internal
                     93: computations.
                     94: 
                     95: All computations are performed at full 64 bit precision (the PC bits
                     96: of the FPU control word are ignored). Under Linux, the FPU normally
                     97: runs at 64 bits precision.
                     98: 
                     99: The precision flag (PE of the FPU status word) is not implemented.
                    100: Does anyone write code which uses this feature?
                    101: 
                    102: The Roundup flag (C1) is not implemented.
                    103: 
                    104: The functions which load/store the FPU state are partially implemented,
                    105: but the implementation should be sufficient for handling FPU errors etc
                    106: in 32 bit protected mode.
                    107: 
                    108: ----------------------- Performance of wm-FPU-emu -----------------------
                    109: 
                    110: Speed.
                    111: -----
                    112: 
                    113: The speed of floating point computation with the emulator will depend
                    114: upon instruction mix. Relative performance is best for the instructions
                    115: which require most computation. The simple instructions are adversely
                    116: affected by the fpu instruction trap overhead.
                    117: 
                    118: 
                    119: Timing: Some simple timing tests have been made on the emulator functions.
                    120: The times include load/store instructions. All times are in microseconds
                    121: measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
                    122: ms-dos, the next two columns are for emulators running with the djgpp
                    123: ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
                    124: using libm4.0 (hard).
                    125: 
                    126: function      Turbo C        djgpp 1.06        WM-emu387     wm-FPU-emu
                    127: 
                    128:    +          60.5           154.8              76.5          139.4
                    129:    -          61.1-65.5      157.3-160.8        76.2-79.5     142.9-144.7
                    130:    *          71.0           190.8              79.6          146.6
                    131:    /          61.2-75.0      261.4-266.9        75.3-91.6     142.2-158.1
                    132: 
                    133:  sin()        310.8          4692.0            319.0          398.5
                    134:  cos()        284.4          4855.2            308.0          388.7
                    135:  tan()        495.0          8807.1            394.9          504.7
                    136:  atan()       328.9          4866.4            601.1          419.5-491.9
                    137: 
                    138:  sqrt()       128.7          crashed           145.2          227.0
                    139:  log()        413.1-419.1    5103.4-5354.21    254.7-282.2    409.4-437.1
                    140:  exp()        479.1          6619.2            469.1          850.8
                    141: 
                    142: 
                    143: The performance under Linux is improved by the use of look-ahead code.
                    144: The following results show the improvement which is obtained under
                    145: Linux due to the look-ahead code. Also given are the times for the
                    146: original Linux emulator with the 4.1 'soft' lib.
                    147: 
                    148:  [ Linus' note: I changed look-ahead to be the default under linux, as
                    149:    there was no reason not to use it after I had edited it to be
                    150:    disabled during tracing ]
                    151: 
                    152:             wm-FPU-emu w     original w
                    153:             look-ahead       'soft' lib
                    154:    +         106.4             190.2
                    155:    -         108.6-111.6      192.4-216.2
                    156:    *         113.4             193.1
                    157:    /         108.8-124.4      700.1-706.2
                    158: 
                    159:  sin()       390.5            2642.0
                    160:  cos()       381.5            2767.4
                    161:  tan()       496.5            3153.3
                    162:  atan()      367.2-435.5     2439.4-3396.8
                    163: 
                    164:  sqrt()      195.1            4732.5
                    165:  log()       358.0-387.5     3359.2-3390.3
                    166:  exp()       619.3            4046.4
                    167: 
                    168: 
                    169: ----------------------- Accuracy of wm-FPU-emu -----------------------
                    170: 
                    171: 
                    172: Accuracy: The following table gives the accuracy of the sqrt(), trig
                    173: and log functions. Each function was tested at about 400 points. Ideal
                    174: results would be 64 bits. The reduced accuracy of cos() and tan() for
                    175: arguments greater than pi/4 can be thought of as being due to the
                    176: precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
                    177: accurate to 64 bits can result in a relative accuracy in cos() of about
                    178: 64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
                    179: in the last column.
                    180: 
                    181: 
                    182: Function      Tested x range            Worst result (bits)         Turbo C
                    183: 
                    184: sqrt(x)       1 .. 2                    64.1                         63.2
                    185: atan(x)       1e-10 .. 200              62.6                         62.8
                    186: cos(x)        0 .. pi/2-(1e-10)         63.2 (x <= pi/4)             62.4
                    187:                                         35.2 (x = pi/2-(1e-10))      31.9
                    188: sin(x)        1e-10 .. pi/2             63.0                         62.8
                    189: tan(x)        1e-10 .. pi/2-(1e-10)     62.4 (x <= pi/4)             62.1
                    190:                                         35.2 (x = pi/2-(1e-10))      31.9
                    191: exp(x)        0 .. 1                    63.1                         62.9
                    192: log(x)        1+1e-6 .. 2               62.4                         62.1
                    193: 

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.