Annotation of coherent/b/kernel/emulator/README, revision 1.1

1.1     ! root        1:  +---------------------------------------------------------------------------+
        !             2:  |  wm-FPU-emu   an FPU emulator for 80386 and 80486SX microprocessors.      |
        !             3:  |                                                                           |
        !             4:  | Copyright (C) 1992    W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
        !             5:  |                       Australia.  E-mail [email protected]    |
        !             6:  |                                                                           |
        !             7:  |    This program is free software; you can redistribute it and/or modify   |
        !             8:  |    it under the terms of the GNU General Public License version 2 as      |
        !             9:  |    published by the Free Software Foundation.                             |
        !            10:  |                                                                           |
        !            11:  |    This program is distributed in the hope that it will be useful,        |
        !            12:  |    but WITHOUT ANY WARRANTY; without even the implied warranty of         |
        !            13:  |    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the          |
        !            14:  |    GNU General Public License for more details.                           |
        !            15:  |                                                                           |
        !            16:  |    You should have received a copy of the GNU General Public License      |
        !            17:  |    along with this program; if not, write to the Free Software            |
        !            18:  |    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.              |
        !            19:  |                                                                           |
        !            20:  +---------------------------------------------------------------------------+
        !            21: 
        !            22: 
        !            23: ***NOTE***       THIS SHOULD BE REGARDED AS AN ALPHA TEST VERSION
        !            24:                  (although the beta version may be identical)
        !            25: 
        !            26: 
        !            27: wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
        !            28: which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
        !            29: in turn based upon emu387 which was written by DJ Delorie for djgpp.
        !            30: The interface to the Linux kernel is based upon the original Linux
        !            31: math emulator by Linus Torvalds.
        !            32: 
        !            33: My target FPU for wm-FPU-emu is that described in the Intel486
        !            34: Programmer's Reference Manual (1992 edition). Numerous facets of the
        !            35: functioning of the FPU are not well covered in the Reference Manual;
        !            36: in the absence of clear details I have made guesses about the most
        !            37: reasonable behaviour.
        !            38: 
        !            39: wm-FPU-emu does not implement all of the behaviour of the 80486 FPU. 
        !            40: See "Limitations" later in this file for a partial list of some
        !            41: differences.  I believe that the missing features are never used by
        !            42: normal C or FORTRAN programs. 
        !            43: 
        !            44: Please report bugs, etc to me at:
        !            45:        [email protected]
        !            46: 
        !            47: 
        !            48: --Bill Metzenthen
        !            49:   Oct 1992
        !            50: 
        !            51: ----------------------- Internals of wm-FPU-emu -----------------------
        !            52: 
        !            53: Numeric algorithms:
        !            54: (1) Add, subtract, and multiply. Nothing remarkable in these.
        !            55: (2) Divide has been tuned to get reasonable performance. The algorithm
        !            56:     is not the obvious one which most people seem to use, but is designed
        !            57:     to take advantage of the characteristics of the 80386. I expect that
        !            58:     it has been invented many times before I discovered it, but I have not
        !            59:     seen it. It is based upon one of those ideas which one carries around
        !            60:     for years without ever bothering to check it out.
        !            61: (3) The sqrt function has been tuned to get good performance. It is based
        !            62:     upon Newton's classic method. Performance was improved by capitalizing
        !            63:     upon the properties of Newton's method, and the code is once again
        !            64:     structured taking account of the 80386 characteristics.
        !            65: (4) The trig, log, and exp functions are based in each case upon quasi-
        !            66:     "optimal" polynomial approximations. My definition of "optimal" was
        !            67:     based upon getting good accuracy with reasonable speed.
        !            68: 
        !            69: The code of the emulator is complicated slightly by the need to
        !            70: account for a limited form of re-entrancy. Normally, the emulator will
        !            71: emulate each FPU instruction to completion without interruption.
        !            72: However, it may happen that when the emulator is accessing the user
        !            73: memory space, swapping may be needed. In this case the emulator may be
        !            74: temporarily suspended while disk i/o takes place. During this time
        !            75: another process may use the emulator, thereby changing some static
        !            76: variables (eg FPU_st0_ptr, etc). The code which accesses user memory
        !            77: is confined to five files:
        !            78:     fpu_entry.c
        !            79:     reg_ld_str.c
        !            80:     load_store.c
        !            81:     get_address.c
        !            82:     errors.c
        !            83: 
        !            84: ----------------------- Limitations of wm-FPU-emu -----------------------
        !            85: 
        !            86: There are a number of differences between the current wm-FPU-emu
        !            87: (version ALPHA 0.7) and the 80486 FPU (apart from bugs). Some of the
        !            88: more important differences are listed below:
        !            89: 
        !            90: Internal computations do not use de-normal numbers (but External
        !            91: de-normals ARE recognised and generated). The design of wm-FPU-emu
        !            92: allows a larger exponent range than the 80486 FPU for internal
        !            93: computations.
        !            94: 
        !            95: All computations are performed at full 64 bit precision (the PC bits
        !            96: of the FPU control word are ignored). Under Linux, the FPU normally
        !            97: runs at 64 bits precision.
        !            98: 
        !            99: The precision flag (PE of the FPU status word) is not implemented.
        !           100: Does anyone write code which uses this feature?
        !           101: 
        !           102: The Roundup flag (C1) is not implemented.
        !           103: 
        !           104: The functions which load/store the FPU state are partially implemented,
        !           105: but the implementation should be sufficient for handling FPU errors etc
        !           106: in 32 bit protected mode.
        !           107: 
        !           108: ----------------------- Performance of wm-FPU-emu -----------------------
        !           109: 
        !           110: Speed.
        !           111: -----
        !           112: 
        !           113: The speed of floating point computation with the emulator will depend
        !           114: upon instruction mix. Relative performance is best for the instructions
        !           115: which require most computation. The simple instructions are adversely
        !           116: affected by the fpu instruction trap overhead.
        !           117: 
        !           118: 
        !           119: Timing: Some simple timing tests have been made on the emulator functions.
        !           120: The times include load/store instructions. All times are in microseconds
        !           121: measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
        !           122: ms-dos, the next two columns are for emulators running with the djgpp
        !           123: ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
        !           124: using libm4.0 (hard).
        !           125: 
        !           126: function      Turbo C        djgpp 1.06        WM-emu387     wm-FPU-emu
        !           127: 
        !           128:    +          60.5           154.8              76.5          139.4
        !           129:    -          61.1-65.5      157.3-160.8        76.2-79.5     142.9-144.7
        !           130:    *          71.0           190.8              79.6          146.6
        !           131:    /          61.2-75.0      261.4-266.9        75.3-91.6     142.2-158.1
        !           132: 
        !           133:  sin()        310.8          4692.0            319.0          398.5
        !           134:  cos()        284.4          4855.2            308.0          388.7
        !           135:  tan()        495.0          8807.1            394.9          504.7
        !           136:  atan()       328.9          4866.4            601.1          419.5-491.9
        !           137: 
        !           138:  sqrt()       128.7          crashed           145.2          227.0
        !           139:  log()        413.1-419.1    5103.4-5354.21    254.7-282.2    409.4-437.1
        !           140:  exp()        479.1          6619.2            469.1          850.8
        !           141: 
        !           142: 
        !           143: The performance under Linux is improved by the use of look-ahead code.
        !           144: The following results show the improvement which is obtained under
        !           145: Linux due to the look-ahead code. Also given are the times for the
        !           146: original Linux emulator with the 4.1 'soft' lib.
        !           147: 
        !           148:  [ Linus' note: I changed look-ahead to be the default under linux, as
        !           149:    there was no reason not to use it after I had edited it to be
        !           150:    disabled during tracing ]
        !           151: 
        !           152:             wm-FPU-emu w     original w
        !           153:             look-ahead       'soft' lib
        !           154:    +         106.4             190.2
        !           155:    -         108.6-111.6      192.4-216.2
        !           156:    *         113.4             193.1
        !           157:    /         108.8-124.4      700.1-706.2
        !           158: 
        !           159:  sin()       390.5            2642.0
        !           160:  cos()       381.5            2767.4
        !           161:  tan()       496.5            3153.3
        !           162:  atan()      367.2-435.5     2439.4-3396.8
        !           163: 
        !           164:  sqrt()      195.1            4732.5
        !           165:  log()       358.0-387.5     3359.2-3390.3
        !           166:  exp()       619.3            4046.4
        !           167: 
        !           168: 
        !           169: ----------------------- Accuracy of wm-FPU-emu -----------------------
        !           170: 
        !           171: 
        !           172: Accuracy: The following table gives the accuracy of the sqrt(), trig
        !           173: and log functions. Each function was tested at about 400 points. Ideal
        !           174: results would be 64 bits. The reduced accuracy of cos() and tan() for
        !           175: arguments greater than pi/4 can be thought of as being due to the
        !           176: precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
        !           177: accurate to 64 bits can result in a relative accuracy in cos() of about
        !           178: 64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
        !           179: in the last column.
        !           180: 
        !           181: 
        !           182: Function      Tested x range            Worst result (bits)         Turbo C
        !           183: 
        !           184: sqrt(x)       1 .. 2                    64.1                         63.2
        !           185: atan(x)       1e-10 .. 200              62.6                         62.8
        !           186: cos(x)        0 .. pi/2-(1e-10)         63.2 (x <= pi/4)             62.4
        !           187:                                         35.2 (x = pi/2-(1e-10))      31.9
        !           188: sin(x)        1e-10 .. pi/2             63.0                         62.8
        !           189: tan(x)        1e-10 .. pi/2-(1e-10)     62.4 (x <= pi/4)             62.1
        !           190:                                         35.2 (x = pi/2-(1e-10))      31.9
        !           191: exp(x)        0 .. 1                    63.1                         62.9
        !           192: log(x)        1+1e-6 .. 2               62.4                         62.1
        !           193: 

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.