18c2ecf20Sopenharmony_ci +---------------------------------------------------------------------------+
28c2ecf20Sopenharmony_ci |  wm-FPU-emu   an FPU emulator for 80386 and 80486SX microprocessors.      |
38c2ecf20Sopenharmony_ci |                                                                           |
48c2ecf20Sopenharmony_ci | Copyright (C) 1992,1993,1994,1995,1996,1997,1999                          |
58c2ecf20Sopenharmony_ci |                       W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
68c2ecf20Sopenharmony_ci |                       Australia.  E-mail billm@melbpc.org.au              |
78c2ecf20Sopenharmony_ci |                                                                           |
88c2ecf20Sopenharmony_ci |    This program is free software; you can redistribute it and/or modify   |
98c2ecf20Sopenharmony_ci |    it under the terms of the GNU General Public License version 2 as      |
108c2ecf20Sopenharmony_ci |    published by the Free Software Foundation.                             |
118c2ecf20Sopenharmony_ci |                                                                           |
128c2ecf20Sopenharmony_ci |    This program is distributed in the hope that it will be useful,        |
138c2ecf20Sopenharmony_ci |    but WITHOUT ANY WARRANTY; without even the implied warranty of         |
148c2ecf20Sopenharmony_ci |    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the          |
158c2ecf20Sopenharmony_ci |    GNU General Public License for more details.                           |
168c2ecf20Sopenharmony_ci |                                                                           |
178c2ecf20Sopenharmony_ci |    You should have received a copy of the GNU General Public License      |
188c2ecf20Sopenharmony_ci |    along with this program; if not, write to the Free Software            |
198c2ecf20Sopenharmony_ci |    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.              |
208c2ecf20Sopenharmony_ci |                                                                           |
218c2ecf20Sopenharmony_ci +---------------------------------------------------------------------------+
228c2ecf20Sopenharmony_ci
238c2ecf20Sopenharmony_ci
248c2ecf20Sopenharmony_ci
258c2ecf20Sopenharmony_ciwm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
268c2ecf20Sopenharmony_ciwhich was my 80387 emulator for early versions of djgpp (gcc under
278c2ecf20Sopenharmony_cimsdos); wm-emu387 was in turn based upon emu387 which was written by
288c2ecf20Sopenharmony_ciDJ Delorie for djgpp.  The interface to the Linux kernel is based upon
298c2ecf20Sopenharmony_cithe original Linux math emulator by Linus Torvalds.
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ciMy target FPU for wm-FPU-emu is that described in the Intel486
328c2ecf20Sopenharmony_ciProgrammer's Reference Manual (1992 edition). Unfortunately, numerous
338c2ecf20Sopenharmony_cifacets of the functioning of the FPU are not well covered in the
348c2ecf20Sopenharmony_ciReference Manual. The information in the manual has been supplemented
358c2ecf20Sopenharmony_ciwith measurements on real 80486's. Unfortunately, it is simply not
368c2ecf20Sopenharmony_cipossible to be sure that all of the peculiarities of the 80486 have
378c2ecf20Sopenharmony_cibeen discovered, so there is always likely to be obscure differences
388c2ecf20Sopenharmony_ciin the detailed behaviour of the emulator and a real 80486.
398c2ecf20Sopenharmony_ci
408c2ecf20Sopenharmony_ciwm-FPU-emu does not implement all of the behaviour of the 80486 FPU,
418c2ecf20Sopenharmony_cibut is very close.  See "Limitations" later in this file for a list of
428c2ecf20Sopenharmony_cisome differences.
438c2ecf20Sopenharmony_ci
448c2ecf20Sopenharmony_ciPlease report bugs, etc to me at:
458c2ecf20Sopenharmony_ci       billm@melbpc.org.au
468c2ecf20Sopenharmony_cior     b.metzenthen@medoto.unimelb.edu.au
478c2ecf20Sopenharmony_ci
488c2ecf20Sopenharmony_ciFor more information on the emulator and on floating point topics, see
498c2ecf20Sopenharmony_cimy web pages, currently at  http://www.suburbia.net/~billm/
508c2ecf20Sopenharmony_ci
518c2ecf20Sopenharmony_ci
528c2ecf20Sopenharmony_ci--Bill Metzenthen
538c2ecf20Sopenharmony_ci  December 1999
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ci
568c2ecf20Sopenharmony_ci----------------------- Internals of wm-FPU-emu -----------------------
578c2ecf20Sopenharmony_ci
588c2ecf20Sopenharmony_ciNumeric algorithms:
598c2ecf20Sopenharmony_ci(1) Add, subtract, and multiply. Nothing remarkable in these.
608c2ecf20Sopenharmony_ci(2) Divide has been tuned to get reasonable performance. The algorithm
618c2ecf20Sopenharmony_ci    is not the obvious one which most people seem to use, but is designed
628c2ecf20Sopenharmony_ci    to take advantage of the characteristics of the 80386. I expect that
638c2ecf20Sopenharmony_ci    it has been invented many times before I discovered it, but I have not
648c2ecf20Sopenharmony_ci    seen it. It is based upon one of those ideas which one carries around
658c2ecf20Sopenharmony_ci    for years without ever bothering to check it out.
668c2ecf20Sopenharmony_ci(3) The sqrt function has been tuned to get good performance. It is based
678c2ecf20Sopenharmony_ci    upon Newton's classic method. Performance was improved by capitalizing
688c2ecf20Sopenharmony_ci    upon the properties of Newton's method, and the code is once again
698c2ecf20Sopenharmony_ci    structured taking account of the 80386 characteristics.
708c2ecf20Sopenharmony_ci(4) The trig, log, and exp functions are based in each case upon quasi-
718c2ecf20Sopenharmony_ci    "optimal" polynomial approximations. My definition of "optimal" was
728c2ecf20Sopenharmony_ci    based upon getting good accuracy with reasonable speed.
738c2ecf20Sopenharmony_ci(5) The argument reducing code for the trig function effectively uses
748c2ecf20Sopenharmony_ci    a value of pi which is accurate to more than 128 bits. As a consequence,
758c2ecf20Sopenharmony_ci    the reduced argument is accurate to more than 64 bits for arguments up
768c2ecf20Sopenharmony_ci    to a few pi, and accurate to more than 64 bits for most arguments,
778c2ecf20Sopenharmony_ci    even for arguments approaching 2^63. This is far superior to an
788c2ecf20Sopenharmony_ci    80486, which uses a value of pi which is accurate to 66 bits.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ciThe code of the emulator is complicated slightly by the need to
818c2ecf20Sopenharmony_ciaccount for a limited form of re-entrancy. Normally, the emulator will
828c2ecf20Sopenharmony_ciemulate each FPU instruction to completion without interruption.
838c2ecf20Sopenharmony_ciHowever, it may happen that when the emulator is accessing the user
848c2ecf20Sopenharmony_cimemory space, swapping may be needed. In this case the emulator may be
858c2ecf20Sopenharmony_citemporarily suspended while disk i/o takes place. During this time
868c2ecf20Sopenharmony_cianother process may use the emulator, thereby perhaps changing static
878c2ecf20Sopenharmony_civariables. The code which accesses user memory is confined to five
888c2ecf20Sopenharmony_cifiles:
898c2ecf20Sopenharmony_ci    fpu_entry.c
908c2ecf20Sopenharmony_ci    reg_ld_str.c
918c2ecf20Sopenharmony_ci    load_store.c
928c2ecf20Sopenharmony_ci    get_address.c
938c2ecf20Sopenharmony_ci    errors.c
948c2ecf20Sopenharmony_ciAs from version 1.12 of the emulator, no static variables are used
958c2ecf20Sopenharmony_ci(apart from those in the kernel's per-process tables). The emulator is
968c2ecf20Sopenharmony_citherefore now fully re-entrant, rather than having just the restricted
978c2ecf20Sopenharmony_ciform of re-entrancy which is required by the Linux kernel.
988c2ecf20Sopenharmony_ci
998c2ecf20Sopenharmony_ci----------------------- Limitations of wm-FPU-emu -----------------------
1008c2ecf20Sopenharmony_ci
1018c2ecf20Sopenharmony_ciThere are a number of differences between the current wm-FPU-emu
1028c2ecf20Sopenharmony_ci(version 2.01) and the 80486 FPU (apart from bugs).  The differences
1038c2ecf20Sopenharmony_ciare fewer than those which applied to the 1.xx series of the emulator.
1048c2ecf20Sopenharmony_ciSome of the more important differences are listed below:
1058c2ecf20Sopenharmony_ci
1068c2ecf20Sopenharmony_ciThe Roundup flag does not have much meaning for the transcendental
1078c2ecf20Sopenharmony_cifunctions and its 80486 value with these functions is likely to differ
1088c2ecf20Sopenharmony_cifrom its emulator value.
1098c2ecf20Sopenharmony_ci
1108c2ecf20Sopenharmony_ciIn a few rare cases the Underflow flag obtained with the emulator will
1118c2ecf20Sopenharmony_cibe different from that obtained with an 80486. This occurs when the
1128c2ecf20Sopenharmony_cifollowing conditions apply simultaneously:
1138c2ecf20Sopenharmony_ci(a) the operands have a higher precision than the current setting of the
1148c2ecf20Sopenharmony_ci    precision control (PC) flags.
1158c2ecf20Sopenharmony_ci(b) the underflow exception is masked.
1168c2ecf20Sopenharmony_ci(c) the magnitude of the exact result (before rounding) is less than 2^-16382.
1178c2ecf20Sopenharmony_ci(d) the magnitude of the final result (after rounding) is exactly 2^-16382.
1188c2ecf20Sopenharmony_ci(e) the magnitude of the exact result would be exactly 2^-16382 if the
1198c2ecf20Sopenharmony_ci    operands were rounded to the current precision before the arithmetic
1208c2ecf20Sopenharmony_ci    operation was performed.
1218c2ecf20Sopenharmony_ciIf all of these apply, the emulator will set the Underflow flag but a real
1228c2ecf20Sopenharmony_ci80486 will not.
1238c2ecf20Sopenharmony_ci
1248c2ecf20Sopenharmony_ciNOTE: Certain formats of Extended Real are UNSUPPORTED. They are
1258c2ecf20Sopenharmony_ciunsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
1268c2ecf20Sopenharmony_ciand Unnormals. None of these will be generated by an 80486 or by the
1278c2ecf20Sopenharmony_ciemulator. Do not use them. The emulator treats them differently in
1288c2ecf20Sopenharmony_cidetail from the way an 80486 does.
1298c2ecf20Sopenharmony_ci
1308c2ecf20Sopenharmony_ciSelf modifying code can cause the emulator to fail. An example of such
1318c2ecf20Sopenharmony_cicode is:
1328c2ecf20Sopenharmony_ci          movl %esp,[%ebx]
1338c2ecf20Sopenharmony_ci	  fld1
1348c2ecf20Sopenharmony_ciThe FPU instruction may be (usually will be) loaded into the pre-fetch
1358c2ecf20Sopenharmony_ciqueue of the CPU before the mov instruction is executed. If the
1368c2ecf20Sopenharmony_cidestination of the 'movl' overlaps the FPU instruction then the bytes
1378c2ecf20Sopenharmony_ciin the prefetch queue and memory will be inconsistent when the FPU
1388c2ecf20Sopenharmony_ciinstruction is executed. The emulator will be invoked but will not be
1398c2ecf20Sopenharmony_ciable to find the instruction which caused the device-not-present
1408c2ecf20Sopenharmony_ciexception. For this case, the emulator cannot emulate the behaviour of
1418c2ecf20Sopenharmony_cian 80486DX.
1428c2ecf20Sopenharmony_ci
1438c2ecf20Sopenharmony_ciHandling of the address size override prefix byte (0x67) has not been
1448c2ecf20Sopenharmony_ciextensively tested yet. A major problem exists because using it in
1458c2ecf20Sopenharmony_civm86 mode can cause a general protection fault. Address offsets
1468c2ecf20Sopenharmony_cigreater than 0xffff appear to be illegal in vm86 mode but are quite
1478c2ecf20Sopenharmony_ciacceptable (and work) in real mode. A small test program developed to
1488c2ecf20Sopenharmony_cicheck the addressing, and which runs successfully in real mode,
1498c2ecf20Sopenharmony_cicrashes dosemu under Linux and also brings Windows down with a general
1508c2ecf20Sopenharmony_ciprotection fault message when run under the MS-DOS prompt of Windows
1518c2ecf20Sopenharmony_ci3.1. (The program simply reads data from a valid address).
1528c2ecf20Sopenharmony_ci
1538c2ecf20Sopenharmony_ciThe emulator supports 16-bit protected mode, with one difference from
1548c2ecf20Sopenharmony_cian 80486DX.  A 80486DX will allow some floating point instructions to
1558c2ecf20Sopenharmony_ciwrite a few bytes below the lowest address of the stack.  The emulator
1568c2ecf20Sopenharmony_ciwill not allow this in 16-bit protected mode: no instructions are
1578c2ecf20Sopenharmony_ciallowed to write outside the bounds set by the protection.
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ci----------------------- Performance of wm-FPU-emu -----------------------
1608c2ecf20Sopenharmony_ci
1618c2ecf20Sopenharmony_ciSpeed.
1628c2ecf20Sopenharmony_ci-----
1638c2ecf20Sopenharmony_ci
1648c2ecf20Sopenharmony_ciThe speed of floating point computation with the emulator will depend
1658c2ecf20Sopenharmony_ciupon instruction mix. Relative performance is best for the instructions
1668c2ecf20Sopenharmony_ciwhich require most computation. The simple instructions are adversely
1678c2ecf20Sopenharmony_ciaffected by the FPU instruction trap overhead.
1688c2ecf20Sopenharmony_ci
1698c2ecf20Sopenharmony_ci
1708c2ecf20Sopenharmony_ciTiming: Some simple timing tests have been made on the emulator functions.
1718c2ecf20Sopenharmony_ciThe times include load/store instructions. All times are in microseconds
1728c2ecf20Sopenharmony_cimeasured on a 33MHz 386 with 64k cache. The Turbo C tests were under
1738c2ecf20Sopenharmony_cims-dos, the next two columns are for emulators running with the djgpp
1748c2ecf20Sopenharmony_cims-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
1758c2ecf20Sopenharmony_ciusing libm4.0 (hard).
1768c2ecf20Sopenharmony_ci
1778c2ecf20Sopenharmony_cifunction      Turbo C        djgpp 1.06        WM-emu387     wm-FPU-emu
1788c2ecf20Sopenharmony_ci
1798c2ecf20Sopenharmony_ci   +          60.5           154.8              76.5          139.4
1808c2ecf20Sopenharmony_ci   -          61.1-65.5      157.3-160.8        76.2-79.5     142.9-144.7
1818c2ecf20Sopenharmony_ci   *          71.0           190.8              79.6          146.6
1828c2ecf20Sopenharmony_ci   /          61.2-75.0      261.4-266.9        75.3-91.6     142.2-158.1
1838c2ecf20Sopenharmony_ci
1848c2ecf20Sopenharmony_ci sin()        310.8          4692.0            319.0          398.5
1858c2ecf20Sopenharmony_ci cos()        284.4          4855.2            308.0          388.7
1868c2ecf20Sopenharmony_ci tan()        495.0          8807.1            394.9          504.7
1878c2ecf20Sopenharmony_ci atan()       328.9          4866.4            601.1          419.5-491.9
1888c2ecf20Sopenharmony_ci
1898c2ecf20Sopenharmony_ci sqrt()       128.7          crashed           145.2          227.0
1908c2ecf20Sopenharmony_ci log()        413.1-419.1    5103.4-5354.21    254.7-282.2    409.4-437.1
1918c2ecf20Sopenharmony_ci exp()        479.1          6619.2            469.1          850.8
1928c2ecf20Sopenharmony_ci
1938c2ecf20Sopenharmony_ci
1948c2ecf20Sopenharmony_ciThe performance under Linux is improved by the use of look-ahead code.
1958c2ecf20Sopenharmony_ciThe following results show the improvement which is obtained under
1968c2ecf20Sopenharmony_ciLinux due to the look-ahead code. Also given are the times for the
1978c2ecf20Sopenharmony_cioriginal Linux emulator with the 4.1 'soft' lib.
1988c2ecf20Sopenharmony_ci
1998c2ecf20Sopenharmony_ci [ Linus' note: I changed look-ahead to be the default under linux, as
2008c2ecf20Sopenharmony_ci   there was no reason not to use it after I had edited it to be
2018c2ecf20Sopenharmony_ci   disabled during tracing ]
2028c2ecf20Sopenharmony_ci
2038c2ecf20Sopenharmony_ci            wm-FPU-emu w     original w
2048c2ecf20Sopenharmony_ci            look-ahead       'soft' lib
2058c2ecf20Sopenharmony_ci   +         106.4             190.2
2068c2ecf20Sopenharmony_ci   -         108.6-111.6      192.4-216.2
2078c2ecf20Sopenharmony_ci   *         113.4             193.1
2088c2ecf20Sopenharmony_ci   /         108.8-124.4      700.1-706.2
2098c2ecf20Sopenharmony_ci
2108c2ecf20Sopenharmony_ci sin()       390.5            2642.0
2118c2ecf20Sopenharmony_ci cos()       381.5            2767.4
2128c2ecf20Sopenharmony_ci tan()       496.5            3153.3
2138c2ecf20Sopenharmony_ci atan()      367.2-435.5     2439.4-3396.8
2148c2ecf20Sopenharmony_ci
2158c2ecf20Sopenharmony_ci sqrt()      195.1            4732.5
2168c2ecf20Sopenharmony_ci log()       358.0-387.5     3359.2-3390.3
2178c2ecf20Sopenharmony_ci exp()       619.3            4046.4
2188c2ecf20Sopenharmony_ci
2198c2ecf20Sopenharmony_ci
2208c2ecf20Sopenharmony_ciThese figures are now somewhat out-of-date. The emulator has become
2218c2ecf20Sopenharmony_ciprogressively slower for most functions as more of the 80486 features
2228c2ecf20Sopenharmony_cihave been implemented.
2238c2ecf20Sopenharmony_ci
2248c2ecf20Sopenharmony_ci
2258c2ecf20Sopenharmony_ci----------------------- Accuracy of wm-FPU-emu -----------------------
2268c2ecf20Sopenharmony_ci
2278c2ecf20Sopenharmony_ci
2288c2ecf20Sopenharmony_ciThe accuracy of the emulator is in almost all cases equal to or better
2298c2ecf20Sopenharmony_cithan that of an Intel 80486 FPU.
2308c2ecf20Sopenharmony_ci
2318c2ecf20Sopenharmony_ciThe results of the basic arithmetic functions (+,-,*,/), and fsqrt
2328c2ecf20Sopenharmony_cimatch those of an 80486 FPU. They are the best possible; the error for
2338c2ecf20Sopenharmony_cithese never exceeds 1/2 an lsb. The fprem and fprem1 instructions
2348c2ecf20Sopenharmony_cireturn exact results; they have no error.
2358c2ecf20Sopenharmony_ci
2368c2ecf20Sopenharmony_ci
2378c2ecf20Sopenharmony_ciThe following table compares the emulator accuracy for the sqrt(),
2388c2ecf20Sopenharmony_citrig and log functions against the Turbo C "emulator". For this table,
2398c2ecf20Sopenharmony_cieach function was tested at about 400 points. Ideal worst-case results
2408c2ecf20Sopenharmony_ciwould be 64 bits. The reduced Turbo C accuracy of cos() and tan() for
2418c2ecf20Sopenharmony_ciarguments greater than pi/4 can be thought of as being related to the
2428c2ecf20Sopenharmony_ciprecision of the argument x; e.g. an argument of pi/2-(1e-10) which is
2438c2ecf20Sopenharmony_ciaccurate to 64 bits can result in a relative accuracy in cos() of
2448c2ecf20Sopenharmony_ciabout 64 + log2(cos(x)) = 31 bits.
2458c2ecf20Sopenharmony_ci
2468c2ecf20Sopenharmony_ci
2478c2ecf20Sopenharmony_ciFunction      Tested x range            Worst result                Turbo C
2488c2ecf20Sopenharmony_ci                                        (relative bits)
2498c2ecf20Sopenharmony_ci
2508c2ecf20Sopenharmony_cisqrt(x)       1 .. 2                    64.1                         63.2
2518c2ecf20Sopenharmony_ciatan(x)       1e-10 .. 200              64.2                         62.8
2528c2ecf20Sopenharmony_cicos(x)        0 .. pi/2-(1e-10)         64.4 (x <= pi/4)             62.4
2538c2ecf20Sopenharmony_ci                                        64.1 (x = pi/2-(1e-10))      31.9
2548c2ecf20Sopenharmony_cisin(x)        1e-10 .. pi/2             64.0                         62.8
2558c2ecf20Sopenharmony_citan(x)        1e-10 .. pi/2-(1e-10)     64.0 (x <= pi/4)             62.1
2568c2ecf20Sopenharmony_ci                                        64.1 (x = pi/2-(1e-10))      31.9
2578c2ecf20Sopenharmony_ciexp(x)        0 .. 1                    63.1 **                      62.9
2588c2ecf20Sopenharmony_cilog(x)        1+1e-6 .. 2               63.8 **                      62.1
2598c2ecf20Sopenharmony_ci
2608c2ecf20Sopenharmony_ci** The accuracy for exp() and log() is low because the FPU (emulator)
2618c2ecf20Sopenharmony_cidoes not compute them directly; two operations are required.
2628c2ecf20Sopenharmony_ci
2638c2ecf20Sopenharmony_ci
2648c2ecf20Sopenharmony_ciThe emulator passes the "paranoia" tests (compiled with gcc 2.3.3 or
2658c2ecf20Sopenharmony_cilater) for 'float' variables (24 bit precision numbers) when precision
2668c2ecf20Sopenharmony_cicontrol is set to 24, 53 or 64 bits, and for 'double' variables (53
2678c2ecf20Sopenharmony_cibit precision numbers) when precision control is set to 53 bits (a
2688c2ecf20Sopenharmony_ciproperly performing FPU cannot pass the 'paranoia' tests for 'double'
2698c2ecf20Sopenharmony_civariables when precision control is set to 64 bits).
2708c2ecf20Sopenharmony_ci
2718c2ecf20Sopenharmony_ciThe code for reducing the argument for the trig functions (fsin, fcos,
2728c2ecf20Sopenharmony_cifptan and fsincos) has been improved and now effectively uses a value
2738c2ecf20Sopenharmony_cifor pi which is accurate to more than 128 bits precision. As a
2748c2ecf20Sopenharmony_ciconsequence, the accuracy of these functions for large arguments has
2758c2ecf20Sopenharmony_cibeen dramatically improved (and is now very much better than an 80486
2768c2ecf20Sopenharmony_ciFPU). There is also now no degradation of accuracy for fcos and fptan
2778c2ecf20Sopenharmony_cifor operands close to pi/2. Measured results are (note that the
2788c2ecf20Sopenharmony_cidefinition of accuracy has changed slightly from that used for the
2798c2ecf20Sopenharmony_ciabove table):
2808c2ecf20Sopenharmony_ci
2818c2ecf20Sopenharmony_ciFunction      Tested x range          Worst result
2828c2ecf20Sopenharmony_ci                                     (absolute bits)
2838c2ecf20Sopenharmony_ci
2848c2ecf20Sopenharmony_cicos(x)        0 .. 9.22e+18              62.0
2858c2ecf20Sopenharmony_cisin(x)        1e-16 .. 9.22e+18          62.1
2868c2ecf20Sopenharmony_citan(x)        1e-16 .. 9.22e+18          61.8
2878c2ecf20Sopenharmony_ci
2888c2ecf20Sopenharmony_ciIt is possible with some effort to find very large arguments which
2898c2ecf20Sopenharmony_cigive much degraded precision. For example, the integer number
2908c2ecf20Sopenharmony_ci           8227740058411162616.0
2918c2ecf20Sopenharmony_ciis within about 10e-7 of a multiple of pi. To find the tan (for
2928c2ecf20Sopenharmony_ciexample) of this number to 64 bits precision it would be necessary to
2938c2ecf20Sopenharmony_cihave a value of pi which had about 150 bits precision. The FPU
2948c2ecf20Sopenharmony_ciemulator computes the result to about 42.6 bits precision (the correct
2958c2ecf20Sopenharmony_ciresult is about -9.739715e-8). On the other hand, an 80486 FPU returns
2968c2ecf20Sopenharmony_ci0.01059, which in relative terms is hopelessly inaccurate.
2978c2ecf20Sopenharmony_ci
2988c2ecf20Sopenharmony_ciFor arguments close to critical angles (which occur at multiples of
2998c2ecf20Sopenharmony_cipi/2) the emulator is more accurate than an 80486 FPU. For very large
3008c2ecf20Sopenharmony_ciarguments, the emulator is far more accurate.
3018c2ecf20Sopenharmony_ci
3028c2ecf20Sopenharmony_ci
3038c2ecf20Sopenharmony_ciPrior to version 1.20 of the emulator, the accuracy of the results for
3048c2ecf20Sopenharmony_cithe transcendental functions (in their principal range) was not as
3058c2ecf20Sopenharmony_cigood as the results from an 80486 FPU. From version 1.20, the accuracy
3068c2ecf20Sopenharmony_cihas been considerably improved and these functions now give measured
3078c2ecf20Sopenharmony_ciworst-case results which are better than the worst-case results given
3088c2ecf20Sopenharmony_ciby an 80486 FPU.
3098c2ecf20Sopenharmony_ci
3108c2ecf20Sopenharmony_ciThe following table gives the measured results for the emulator. The
3118c2ecf20Sopenharmony_cinumber of randomly selected arguments in each case is about half a
3128c2ecf20Sopenharmony_cimillion.  The group of three columns gives the frequency of the given
3138c2ecf20Sopenharmony_ciaccuracy in number of times per million, thus the second of these
3148c2ecf20Sopenharmony_cicolumns shows that an accuracy of between 63.80 and 63.89 bits was
3158c2ecf20Sopenharmony_cifound at a rate of 133 times per one million measurements for fsin.
3168c2ecf20Sopenharmony_ciThe results show that the fsin, fcos and fptan instructions return
3178c2ecf20Sopenharmony_ciresults which are in error (i.e. less accurate than the best possible
3188c2ecf20Sopenharmony_ciresult (which is 64 bits)) for about one per cent of all arguments
3198c2ecf20Sopenharmony_cibetween -pi/2 and +pi/2.  The other instructions have a lower
3208c2ecf20Sopenharmony_cifrequency of results which are in error.  The last two columns give
3218c2ecf20Sopenharmony_cithe worst accuracy which was found (in bits) and the approximate value
3228c2ecf20Sopenharmony_ciof the argument which produced it.
3238c2ecf20Sopenharmony_ci
3248c2ecf20Sopenharmony_ci                                frequency (per M)
3258c2ecf20Sopenharmony_ci                               -------------------   ---------------
3268c2ecf20Sopenharmony_ciinstr   arg range    # tests   63.7   63.8    63.9   worst   at arg
3278c2ecf20Sopenharmony_ci                               bits   bits    bits    bits
3288c2ecf20Sopenharmony_ci-----  ------------  -------   ----   ----   -----   -----  --------
3298c2ecf20Sopenharmony_cifsin     (0,pi/2)     547756      0    133   10673   63.89  0.451317
3308c2ecf20Sopenharmony_cifcos     (0,pi/2)     547563      0    126   10532   63.85  0.700801
3318c2ecf20Sopenharmony_cifptan    (0,pi/2)     536274     11    267   10059   63.74  0.784876
3328c2ecf20Sopenharmony_cifpatan  4 quadrants   517087      0      8    1855   63.88  0.435121 (4q)
3338c2ecf20Sopenharmony_cifyl2x     (0,20)      541861      0      0    1323   63.94  1.40923  (x)
3348c2ecf20Sopenharmony_cifyl2xp1 (-.293,.414)  520256      0      0    5678   63.93  0.408542 (x)
3358c2ecf20Sopenharmony_cif2xm1     (-1,1)      538847      4    481    6488   63.79  0.167709
3368c2ecf20Sopenharmony_ci
3378c2ecf20Sopenharmony_ci
3388c2ecf20Sopenharmony_ciTests performed on an 80486 FPU showed results of lower accuracy. The
3398c2ecf20Sopenharmony_cifollowing table gives the results which were obtained with an AMD
3408c2ecf20Sopenharmony_ci486DX2/66 (other tests indicate that an Intel 486DX produces
3418c2ecf20Sopenharmony_ciidentical results).  The tests were basically the same as those used
3428c2ecf20Sopenharmony_cito measure the emulator (the values, being random, were in general not
3438c2ecf20Sopenharmony_cithe same).  The total number of tests for each instruction are given
3448c2ecf20Sopenharmony_ciat the end of the table, in case each about 100k tests were performed.
3458c2ecf20Sopenharmony_ciAnother line of figures at the end of the table shows that most of the
3468c2ecf20Sopenharmony_ciinstructions return results which are in error for more than 10
3478c2ecf20Sopenharmony_cipercent of the arguments tested.
3488c2ecf20Sopenharmony_ci
3498c2ecf20Sopenharmony_ciThe numbers in the body of the table give the approx number of times a
3508c2ecf20Sopenharmony_ciresult of the given accuracy in bits (given in the left-most column)
3518c2ecf20Sopenharmony_ciwas obtained per one million arguments. For three of the instructions,
3528c2ecf20Sopenharmony_citwo columns of results are given: * The second column for f2xm1 gives
3538c2ecf20Sopenharmony_cithe number cases where the results of the first column were for a
3548c2ecf20Sopenharmony_cipositive argument, this shows that this instruction gives better
3558c2ecf20Sopenharmony_ciresults for positive arguments than it does for negative.  * In the
3568c2ecf20Sopenharmony_cicases of fcos and fptan, the first column gives the results when all
3578c2ecf20Sopenharmony_cicases where arguments greater than 1.5 were removed from the results
3588c2ecf20Sopenharmony_cigiven in the second column. Unlike the emulator, an 80486 FPU returns
3598c2ecf20Sopenharmony_ciresults of relatively poor accuracy for these instructions when the
3608c2ecf20Sopenharmony_ciargument approaches pi/2. The table does not show those cases when the
3618c2ecf20Sopenharmony_ciaccuracy of the results were less than 62 bits, which occurs quite
3628c2ecf20Sopenharmony_cioften for fsin and fptan when the argument approaches pi/2. This poor
3638c2ecf20Sopenharmony_ciaccuracy is discussed above in relation to the Turbo C "emulator", and
3648c2ecf20Sopenharmony_cithe accuracy of the value of pi.
3658c2ecf20Sopenharmony_ci
3668c2ecf20Sopenharmony_ci
3678c2ecf20Sopenharmony_cibits   f2xm1  f2xm1 fpatan   fcos   fcos  fyl2x fyl2xp1  fsin  fptan  fptan
3688c2ecf20Sopenharmony_ci62.0       0      0      0      0    437      0      0      0      0    925
3698c2ecf20Sopenharmony_ci62.1       0      0     10      0    894      0      0      0      0   1023
3708c2ecf20Sopenharmony_ci62.2      14      0      0      0   1033      0      0      0      0    945
3718c2ecf20Sopenharmony_ci62.3      57      0      0      0   1202      0      0      0      0   1023
3728c2ecf20Sopenharmony_ci62.4     385      0      0     10   1292      0     23      0      0   1178
3738c2ecf20Sopenharmony_ci62.5    1140      0      0    119   1649      0     39      0      0   1149
3748c2ecf20Sopenharmony_ci62.6    2037      0      0    189   1620      0     16      0      0   1169
3758c2ecf20Sopenharmony_ci62.7    5086     14      0    646   2315     10    101     35     39   1402
3768c2ecf20Sopenharmony_ci62.8    8818     86      0    984   3050     59    287    131    224   2036
3778c2ecf20Sopenharmony_ci62.9   11340   1355      0   2126   4153     79    605    357    321   1948
3788c2ecf20Sopenharmony_ci63.0   15557   4750      0   3319   5376    246   1281    862    808   2688
3798c2ecf20Sopenharmony_ci63.1   20016   8288      0   4620   6628    511   2569   1723   1510   3302
3808c2ecf20Sopenharmony_ci63.2   24945  11127     10   6588   8098   1120   4470   2968   2990   4724
3818c2ecf20Sopenharmony_ci63.3   25686  12382     69   8774  10682   1906   6775   4482   5474   7236
3828c2ecf20Sopenharmony_ci63.4   29219  14722     79  11109  12311   3094   9414   7259   8912  10587
3838c2ecf20Sopenharmony_ci63.5   30458  14936    393  13802  15014   5874  12666   9609  13762  15262
3848c2ecf20Sopenharmony_ci63.6   32439  16448   1277  17945  19028  10226  15537  14657  19158  20346
3858c2ecf20Sopenharmony_ci63.7   35031  16805   4067  23003  23947  18910  20116  21333  25001  26209
3868c2ecf20Sopenharmony_ci63.8   33251  15820   7673  24781  25675  24617  25354  24440  29433  30329
3878c2ecf20Sopenharmony_ci63.9   33293  16833  18529  28318  29233  31267  31470  27748  29676  30601
3888c2ecf20Sopenharmony_ci
3898c2ecf20Sopenharmony_ciPer cent with error:
3908c2ecf20Sopenharmony_ci        30.9           3.2          18.5    9.8   13.1   11.6          17.4
3918c2ecf20Sopenharmony_ciTotal arguments tested:
3928c2ecf20Sopenharmony_ci       70194  70099 101784 100641 100641 101799 128853 114893 102675 102675
3938c2ecf20Sopenharmony_ci
3948c2ecf20Sopenharmony_ci
3958c2ecf20Sopenharmony_ci------------------------- Contributors -------------------------------
3968c2ecf20Sopenharmony_ci
3978c2ecf20Sopenharmony_ciA number of people have contributed to the development of the
3988c2ecf20Sopenharmony_ciemulator, often by just reporting bugs, sometimes with suggested
3998c2ecf20Sopenharmony_cifixes, and a few kind people have provided me with access in one way
4008c2ecf20Sopenharmony_cior another to an 80486 machine. Contributors include (to those people
4018c2ecf20Sopenharmony_ciwho I may have forgotten, please forgive me):
4028c2ecf20Sopenharmony_ci
4038c2ecf20Sopenharmony_ciLinus Torvalds
4048c2ecf20Sopenharmony_ciTommy.Thorn@daimi.aau.dk
4058c2ecf20Sopenharmony_ciAndrew.Tridgell@anu.edu.au
4068c2ecf20Sopenharmony_ciNick Holloway, alfie@dcs.warwick.ac.uk
4078c2ecf20Sopenharmony_ciHermano Moura, moura@dcs.gla.ac.uk
4088c2ecf20Sopenharmony_ciJon Jagger, J.Jagger@scp.ac.uk
4098c2ecf20Sopenharmony_ciLennart Benschop
4108c2ecf20Sopenharmony_ciBrian Gallew, geek+@CMU.EDU
4118c2ecf20Sopenharmony_ciThomas Staniszewski, ts3v+@andrew.cmu.edu
4128c2ecf20Sopenharmony_ciMartin Howell, mph@plasma.apana.org.au
4138c2ecf20Sopenharmony_ciM Saggaf, alsaggaf@athena.mit.edu
4148c2ecf20Sopenharmony_ciPeter Barker, PETER@socpsy.sci.fau.edu
4158c2ecf20Sopenharmony_citom@vlsivie.tuwien.ac.at
4168c2ecf20Sopenharmony_ciDan Russel, russed@rpi.edu
4178c2ecf20Sopenharmony_ciDaniel Carosone, danielce@ee.mu.oz.au
4188c2ecf20Sopenharmony_cicae@jpmorgan.com
4198c2ecf20Sopenharmony_ciHamish Coleman, t933093@minyos.xx.rmit.oz.au
4208c2ecf20Sopenharmony_ciBruce Evans, bde@kralizec.zeta.org.au
4218c2ecf20Sopenharmony_ciTimo Korvola, Timo.Korvola@hut.fi
4228c2ecf20Sopenharmony_ciRick Lyons, rick@razorback.brisnet.org.au
4238c2ecf20Sopenharmony_ciRick, jrs@world.std.com
4248c2ecf20Sopenharmony_ci 
4258c2ecf20Sopenharmony_ci...and numerous others who responded to my request for help with
4268c2ecf20Sopenharmony_cia real 80486.
4278c2ecf20Sopenharmony_ci
428