.# .# .# .CH "The Hardware" .# .# .MH "Internal Representation of Floating Point Values" There are two basic forms of floating point representation on the Prime: single precision and double precision. Both forms are stored in memory and the registers in about the same manner. It should be noted, however, that the storage format [bf in memory] and the storage format [bf in the registers] are different from each other. Also, the representation of values is different on 750/850 models than on the others. .pp Note that both forms of floating point values are available in three of the four Prime addressing modes: R, V and I. For purposes of this discussion, assume that all references are being made to the V mode instructions and registers unless noted otherwise. Also note that when I refer to the 400/550 machines, this also includes the 550-II. .pp The reader might be interested in perusing {12} through {15} for information about the proposed IEEE 754 standard on floating point representation. These articles also contain information about internal representation and accuracy of results. As a matter of interest, Prime Computer, Inc. had two voting representatives on the committee. .# .SH "Storage Formats" A floating point value consists of three parts: a sign, a normalized mantissa, and an exponent. The mantissa is a two's complement value with an implied leading binary point (radix point). A normalized mantissa always represents a value in the interval [0.5, 1) unless it represents zero. The sign bit is set to indicate a negative value, reset to indicate a positive value. The sign bit is always in the most significant bit position (bit one). Following the sign bit is the mantissa. .pp A single precision value consists of the sign bit, 23 mantissa bits, and 8 exponent bits. The sign bit is bit one, the mantissa is bits 2 to 24, and the exponent is bits 25 to 32. The exponent is stored in excess-128 representation. That is, the value stored in the 8 bits of the exponent, if viewed as a two's complement value, is always 128 greater than the value it represents. Thus, .be 0 0 0 0 0 0 0 0 represents -128 .sp 1 1 1 1 1 1 1 1 represents 127 .sp 1 0 0 0 0 0 0 0 represents 0 .sp 1 0 0 0 0 1 0 0 represents 4 .sp 0 1 1 1 1 1 0 0 represents -4 .ee .pp This implies that the largest possible exponent is +127, and the smallest possible exponent is -128. The exponent is taken to the base 2. (You may wish to refer to a reference such as {3} or {4} for more information about value representations.) .pp A double precision value consists of the sign bit, a 47 bit mantissa, and 16 exponent bits. The sign bit is bit one, the mantissa is bits 2 to 48, and the exponent is bits 49 to 64. The exponent is stored as a 128-biased value. This is similar to excess-128 except that the most significant bit of the exponent is taken as a sign bit. Thus, .be 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 represents 0 .sp 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 represents 4 .sp 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 represents -4 .sp 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 represents -32896 .sp 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 represents 32639 .ee .pp As you can see from the examples, the range of the exponent is larger in the negative direction than in the positive. This means that it is possible to have values in the register whose multiplicative inverses cannot be represented. .# .SH "Normalization" Every arithmetic operation on a floating point value causes the mantissa to be normalized. [bf On the Primes] normalization means that the mantissa is shifted towards the sign bit until the bit next to the sign bit is different from the sign bit. The exponent is decreased by the same amount as the number of places shifted. Normalization does [bf not] always mean shifting until a "1" is present in the second bit. .pp Let us examine an example. Suppose we have just completed a single precision add, and the result is either 5 1/2 or -5 1/2 as follows: .be 0 00010110000000000000000 10000110[bl 5]5.5 1 11101010000000000000000 10000110[bl 4]-5.5 .ee .pp Neither of these values is normalized. The mantissa is shifted left until its first bit is different from the sign bit. Note that it takes exactly 3 such shifts for each value: .be 0 10110000000000000000000 10000011[bl 5]5.5 1 01010000000000000000000 10000011[bl 4]-5.5 .ee .pp Both of these values are now normalized. The value of each is unchanged. There is no assumed first bit as on some machines (such as certain PDP machines). .pp Normalization helps maintain accuracy of results between computations. Additionally, comparisons between floating point numbers is made much easier -- a zero can always be recognized by examining the first word of the value only, and comparison between two floating point numbers can sometimes be done by a simple compare of the exponents and mantissa sign. It also helps to ensure that only one of the two values needs to be adjusted prior to some arithmetic operations (such as add). .pp A special case is when the sign bit is one (a negative value) and every bit of the mantissa is zero. This is [ul not] equal to zero, but rather is equal to -0.5 (assuming the exponent represents zero, of course). .pp It should be noted that load and store operations do not cause the register contents to be normalized. There is also no "normalize" instruction which will allow the user to normalize the bit pattern in the register. .pp Floating skip operations (eg, FSGT, FSZE) and comparison operations (eg, FCS and DFCS) will not work correctly unless the values involved are normalized. .# .SH "Representation in the Registers" The single precision floating point register has more range than can be accommodated in the memory format. The single precision floating point register overlaps the double precision register and uses the extra bits available in the double floating register as guard bits. The register is organized as follows on 400/550 cpus: .sp .ne 10 .nf S MMMMMMMMMMMMMMMMMMMMMMM GGGGGGGG HHHHHHHH EEEEEEEE 0000000000000000 1 2..24 25..32 33..40 41..48 49..64 Where: S is sign of the mantissa M is the mantissa (2's complement) G is mantissa extension (guard bits) H is exponent extension (guard bits) E is exponent (128-biased) 0 extra bits -- must be zero .sp .fi .pp On 750 and 850 cpus (with hardware floating point) the organization is: .sp .ne 10 .nf S MMMMMMMMMMMMMMMMMMMMMMM GGGGGGGGGGGGGGGGGGGGGGGG HHHHHHHH EEEEEEEE 1 2..24 25..48 49..56 57..64 Where: S is sign of the mantissa M is the mantissa (2's complement) G is mantissa extension (guard bits) H is exponent extension (guard bits) E is exponent (excess 128) 0 extra bits -- must be zero .sp .fi .pp The guard bits are always zeroed whenever a floating load operation is done (FLD). The high-order guard bit may be used to round the least significant bit of the regular mantissa just before storage by using the FRN instruction. This increases accuracy somewhat at the cost of increased execution time. See the section on "Firmware Accuracy" for more details. .pp Double precision floating point values are similar in nature to single precision and are organized as follows on 400/550 machines: .sp .ne 7 .nf S MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM EEEEEEEEEEEEEEEE MMMMMMMMMMMMMMMM 1 2..32 33..48 49..64 Where: S is sign of the mantissa M is the mantissa (2's complement) E is exponent (excess 128, two's complement) .sp .fi .pp On 750 and 850 machines, the double precision register is organized as: .sp .ne 7 .nf S MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM EEEEEEEEEEEEEEEE 1 2..48 49..64 Where: S is sign of the mantissa M is the mantissa (2's complement) E is exponent (128 biased) .sp .fi .# .SH "Access Methods" Besides the standard load and store instructions, it is possible to access portions of the floating point registers with integer operations. These accesses are done either through the use of P300 address traps, or through the LDLR/STLR instructions. .pp If short memory references are made to locations 4, 5, and 6, the instructions actually are accessing the first two words of the mantissa and the exponent, respectively. In single precision references, the reference to the exponent fetches both the exponent and exponent guard bits. In double precision, the reference to location 6 fetches the complete exponent. Thus, the PMA sequence: .be LDA ='40000 STA# 4 CRA STA# 5 LDA =128 STA# 6 .ee results in the value 0.5 being in the single precision floating register (note that this sequence also loads all the guard bits correctly on a 400/550). .pp It is also possible to access the floating point register via the LDLR and STLR instructions. In V mode, the first two words (bits 1 to 32) of the mantissa can be loaded into the L register by loading from register file location '12. The third word of the mantissa and the exponent can be obtained by loading from location '13. .bf 3 The organization of the register file on 750/850 machines and 400/550 machines means that the L register contents after a "LDLR[bl]'13" will be different on these machines. On 400/550 machines, the A register will contain the exponent and the B register will contain the third word of the mantissa. On a 750/850 these will be reversed. The program in Appendix I can be used to discover which case is present on your machine. When dealing with the two floating accumulators in I mode addressing, a "LDLR[bl]'11" will have the same problem. .pp Additionally, the floating accumulator shares the same register file location as the second field address and length registers (in the V mode register file). In the I mode registers, the first floating accumulator shares the same location as the first field address register, and the second floating accumulator shares the same location as the second field address register. Thus, various character manipulation instructions including decimal (character) arithmetic instructions may change the floating accumulators as a side effect. .# .SH "Ranges" The effective range for single precision floating point values is approximately 1.701412 * (10 ** 38) to -1.701412 * (10 **38). The smallest, non-zero magnitude that can be represented is approximately 1.469368 * (10 ** -39). .bf This is the range for single precision storage in memory. The guard bits in the register give extended range to values held in the register. .pp Effective range for double precision floating point values is approximately 2.079833 * (10 ** 9825) to -2.079833 * (10 ** 9825). The smallest, non-zero magnitude that can be represented is approximately 1.03808 * (10 ** -9903). .# .# .MH "Available Operations" .# .# .# bt --- begin table .# Usage: .bt [length-of-table] .de bt @[cc]sp @[cc]ne [1] @[cc]nf .en bt .# .# et --- end table .# Usage: .et .de et @[cc]sp @[cc]fi .en et .# .# is --- instruction summary .# Usage: .is mnemonic format c-bit l-bit cc description .de is .ti [1]@[tc][2]@[tc][3]@[tc][4]@[tc][5]@[tc][6] .en is .# .# hl --- header line for instruction summaries .# Usage: .hl [number-of-lines-to-follow] .de hl .sp .ne [1] .ta 10 18 21 24 27 .in 27 .ti .ul Mnemonic@[tc]Format@[tc]C@[tc]L@[tc]CC@[tc]Description .en hl .# .# The following lists describe the instructions available on Prime 50 series machines to manipulate floating point values in 64V mode. This material has been extracted from the paper .ul 64V Mode Instruction Summary and Addressing Formats, by T. Allen Akin, Perry Flinn, and Eugene Spafford, Georgia Tech 1981. The abbreviation FAC refers to the floating accumulator, meaning the combination (overlapped) register. The instructions will be presented first grouped by function, then alphabetically. In the following instruction set summary, instruction formats are abbreviated as follows: .be 7 branch branch gen generic mr memory reference .ee The descriptions of restricted instructions are preceded by an asterisk (*). Note that these instructions are [ul not] restricted unless segmented memory is turned on (bit 14 in current modals) and only if a reference is made outside of the range '0 to '17 (zero to 15, decimal). .pp In the descriptions of effects on the C-bit, L-bit, and condition codes, the following abbreviations are used: .be 9 C-bit: - unchanged V arithmetic overflow indication X indeterminate .ee .be 5 L-bit: - unchanged X indeterminate .ee .be 6 Condition Codes (CC): - unchanged S properly set to reflect value of result, may be used for condition code branches X indeterminate .ee .# .SH "Branch" .ti [in] .hl 6 .is BFEQ branch - - S "branch if FAC = 0" .is BFGE branch - - S "branch if FAC >= 0" .is BFGT branch - - S "branch if FAC > 0" .is BFLE branch - - S "branch if FAC <= 0" .is BFLT branch - - S "branch if FAC < 0" .is BFNE branch - - S "branch if FAC <> 0" .in .# .ne 13 .SH "Floating Point Arithmetic" .ti [in] .hl 9 .is FAD mr V X X "add memory to single precision FAC" .is FCM gen V X X "complement single precision FAC arithmetically" .is FDBL gen - - - "convert single precision floating to double precision" .is FDV mr V X X "divide memory into single precision FAC" .is FLTA gen V X X "convert 16 bit integer to single precision float" .is FLTL gen V X X "convert 32 bit integer to single precision float" .is FMP mr V X X "multiply single precision FAC by memory" .is FRN gen V X X "floating round double to single" .is FSB mr V X X "subtract memory from single precision FAC" .hl 11 .is FCS mr X X X "compare single precision FAC to memory and skip" .is FLD mr - - - "load single precision FAC from memory" .is FLX mr - - - "load double word index" .is FST mr V X - "store single precision FAC into memory" .is INTA gen V X X "convert single precision FAC to 16 bit integer" .is INTL gen V X X "convert single precision FAC to 32 bit integer" .hl 7 .is DFAD mr V X X "add memory to double precision FAC" .is DFCM gen V X X "complement double precision FAC arithmetically" .is DFDV mr V X X "divide memory into double precision FAC" .is DFMP mr V X X "multiply double precision FAC by memory" .is DFSB mr V X X "subtract memory from double precision FAC" .is FDBL gen - - - "convert single precision floating to double precision" .is FRN gen V X X "floating round double to single" .hl 4 .is DFCS mr X X X "compare double precision FAC with memory and skip" .is DFLD mr - - - "load double precision FAC" .is DFLX mr - - - "load quadruple word index" .is DFST mr - - - "store double precision FAC" .in .# .ne 13 .SH "Logicize" .ti [in] .hl 6 .is LFEQ gen - - S "set A to 1 if FAC = 0; else reset A to 0" .is LFGE gen - - S "set A to 1 if FAC >= 0; else reset A to 0" .is LFGT gen - - S "set A to 1 if FAC > 0; else reset A to 0" .is LFLE gen - - S "set A to 1 if FAC <= 0; else reset A to 0" .is LFLT gen - - S "set A to 1 if FAC < 0; else reset A to 0" .is LFNE gen - - S "set A to 1 if FAC <> 0; else reset A to 0" .in .# .SH "Skip" .ti [in] .hl 6 .is FSGT gen - - - "skip if FAC > 0" .is FSLE gen - - - "skip if FAC <= 0" .is FSMI gen - - - "skip if FAC < 0" .is FSNZ gen - - - "skip if FAC <> 0" .is FSPL gen - - - "skip if FAC >= 0" .is FSZE gen - - - "skip if FAC = 0" .hl 2 .is DFCS mr X X X "compare double precision FAC with memory and skip" .is FCS mr X X X "compare single precision FAC to memory and skip" .in .# .ne 8 .SH "Data Movement" .ti [in] .hl 43 .is DFLD mr - - - "load double precision FAC" .is DFLX mr - - - "load quadruple word index" .is DFST mr - - - "store double precision FAC" .is FLD mr - - - "load single precision FAC from memory" .is FLX mr - - - "load double word index" .is FST mr V X - "store single precision FAC into memory" .is LDLR mr - - - "*load L from register file" .is STLR mr - - - "*store L into register file" .in .# .SH "Address Manipulation" .ti [in] .hl 2 .is DFLX mr - - - "load quadruple word index" .is FLX mr - - - "load double word index" .in .# .ne 18 .SH "Type Conversion" .ti [in] .hl 6 .is FDBL gen - - - "convert single precision floating to double precision" .is FLTA gen V X X "convert 16 bit integer to single precision float" .is FLTL gen V X X "convert 32 bit integer to single precision float" .is FRN gen V X X "floating round double to single" .is INTA gen V X X "convert single precision FAC to 16 bit integer" .is INTL gen V X X "convert single precision FAC to 32 bit integer" .in .# .SH "Instructions Grouped Alphabetically" .ti [in] .hl 44 .is BFEQ branch - - S "branch if FAC = 0" .is BFGE branch - - S "branch if FAC >= 0" .is BFGT branch - - S "branch if FAC > 0" .is BFLE branch - - S "branch if FAC <= 0" .is BFLT branch - - S "branch if FAC < 0" .is BFNE branch - - S "branch if FAC <> 0" .is DFAD mr V X X "add memory to double precision FAC" .is DFCM gen V X X "complement double precision FAC arithmetically" .is DFCS mr X X X "compare double precision FAC with memory and skip" .is DFDV mr V X X "divide memory into double precision FAC" .is DFLD mr - - - "load double precision FAC" .is DFLX mr - - - "load quadruple word index" .is DFMP mr V X X "multiply double precision FAC by memory" .is DFSB mr V X X "subtract memory from double precision FAC" .is DFST mr - - - "store double precision FAC" .is FAD mr V X X "add memory to single precision FAC" .is FCM gen V X X "complement single precision FAC arithmetically" .is FCS mr X X X "compare single precision FAC to memory and skip" .is FDBL gen - - - "convert single precision floating to double precision" .is FDV mr V X X "divide memory into single precision FAC" .is FLD mr - - - "load single precision FAC from memory" .is FLTA gen V X X "convert 16 bit integer to single precision float" .is FLTL gen V X X "convert 32 bit integer to single precision float" .is FLX mr - - - "load double word index" .is FMP mr V X X "multiply single precision FAC by memory" .is FRN gen V X X "floating round double to single" .is FSB mr V X X "subtract memory from single precision FAC" .is FSGT gen - - - "skip if FAC > 0" .is FSLE gen - - - "skip if FAC <= 0" .is FSMI gen - - - "skip if FAC < 0" .is FSNZ gen - - - "skip if FAC <> 0" .is FSPL gen - - - "skip if FAC >= 0" .is FST mr V X - "store single precision FAC into memory" .is FSZE gen - - - "skip if FAC = 0" .is INTA gen V X X "convert single precision FAC to 16 bit integer" .is INTL gen V X X "convert single precision FAC to 32 bit integer" .is LDLR mr - - - "*load L from register file" .is LFEQ gen - - S "set A to 1 if FAC = 0; else reset A to 0" .is LFGE gen - - S "set A to 1 if FAC >= 0; else reset A to 0" .is LFGT gen - - S "set A to 1 if FAC > 0; else reset A to 0" .is LFLE gen - - S "set A to 1 if FAC <= 0; else reset A to 0" .is LFLT gen - - S "set A to 1 if FAC < 0; else reset A to 0" .is LFNE gen - - S "set A to 1 if FAC <> 0; else reset A to 0" .is STLR mr - - - "*store L into register file" .in .# .# .MH "Error Handling" There are basically four floating point errors determined by the floating point firmware: store exception, overflow, underflow, and divide by zero. The action on these errors is determined by the state of bit 7 in the current cpu keys. If bit 7 is set, a floating point fault simply sets the C bit and no other action is taken. If bit 7 is reset, then an arithmetic fault is signalled and the standard fault handler invoked. In Primos, this usually entails signalling the ERROR condition. .pp A store exception is triggered when an attempt is made to FST (single precision floating store) a value which is too big or too small (negative) to be accomodated in the two word memory format used by single precision values. This can happen because the value in the floating point register may have been loaded or generated using double precision operations. Alternatively, the value in the register could have been generated by single precision operations, but the value is larger than the memory format can accomodate due to the extra capacity provided by the guard bits. A double precision store cannot cause a store exception. .pp Overflow and underflow operations are the result of arithmetic operations (add, subtract, multiply, or divide) whose result is too big or too small to fit (normalized) in the register. Thus, the exponent of the result must be bigger than 32639 for overflow (base 2 exponent), or less than -32896 for underflow (see the next section). .pp A divide by zero fault is exactly what the name implies -- an attempt to divide by a floating point value, single or double precision, whose value is identical to zero. .pp Another type of fault, not strictly a floating point fault, is triggered when an attempt is made to convert a floating value to an integer, and the floating value is too big or too small to be held in the corresponding integer register. .pp It is possible for user programs to set bit 7 in the keys to ignore these fault conditions, but in doing so the user should realize that results could be invalid without any indication of error. Explicit checks should be made of the C bit after any operation which might cause an error. By default, the standard compilers and the PMA assembler generate entry control blocks (ECBs) for procedures with bit 7 reset to zero. .# .# .MH "Firmware Accuracy" In this document, the word "firmware" refers to the microcode or hardware which performs the floating point arithmetic. 750 and 850 cpus have floating point operations implemented in hardware, while the other models have these operations implemented in microcode. Programs and subroutines depend on the accuracy of these operations, so it is crucial that these operations be implemented correctly. .# .SH "Problems in Multiplication" There appears to be a bug in the double precision floating point multiplication at a few points near the maximum value. If a value whose base 2 exponent is 32639 (maximum possible) is multiplied by a value greater than 0.5, an overflow fault is triggered. Thus, it is possible to multiply a value in the register by something less than 1.0 and get an overflow! In some cases, attempting to multiply smaller values to yield a value theoretically in range also results in an overflow. We have not attempted exhaustive testing to determine limits where this occurs since the likelihood of encountering such an error is small. However, the problem is there, and the user is advised to be careful when writing tests which need to deal with values at the upper limit of register capacity. .pp A much more serious flaw is to be found in the DFMP instruction on 400/550 machines. The double floating multiply instruction appears to [ul always] return a result whose two least significant bits of the mantissa are zero. That is, every multiplication potentially loses 2 out of 47 bits of precision! It is possible to multiply a value by 1.0 and not obtain a result equal to the original value. Such errors can, of course, cascade and result in severe accuracy problems in chains of calculations. The hardware on 750/850 machines appears to be free of this defect. Appendix II contains a program to test your machine and illustrate this problem. .pp Oddly enough, division on the 400/550 machines does not appear to truncate any bits of precision, and according to published timing figures {5} the DFDV instruction is just as fast (slow) as the DFMP instruction. Thus, it might be advisable to recode critical calculations on these machines to be composed of divisions rather than multiplications, whenever possible. .# .SH "Loss of Precision in Type Conversion" When converting from integers to floating point there are basically two machine instructions: FLTA and FLTL. The FLTA instruction converts a 16 bit integer into a single precision floating point value (24 bit mantissa). The FLTL instruction converts a 32 bit integer into a single precision floating point value. Note that such a conversion potentially drops 8 bits of precision. There is adequate storage in the double precision floating point register to convert without a loss of precision, but there is no instruction to convert from long integer to double precision real. Rather, the conversion must be done by a series of instructions; see the code for the SWT 'dble$m' routine. .# .SH "Problems in the Other Operations" We have not observed any loss of precision in the addition, subtraction or division of double precision quantities. We have also not been able to detect any precision losses in any of the single precision operations. However, this does not indicate the absence of errors, rather it just indicates that we have not extensively tested for such errors and none have appeared in any of our other tests. .# .SH "Floating Round" Studies performed at The Flinders University of South Australia on a 750 have indicated that some calculations performed in single precision floating point may benefit from the fact that the register contains extra precision, but that the results may be somewhat uneven depending on how the code is organized {6}. Their studies have also indicated that use of the FRN (floating round) instruction before each store greatly enhances the accuracy of some calculations in single precision: .bq "In fact for the single precision problem a simple and almost complete cure for the problem has been demonstrated, and that is for the compiler to force a round before every store (i.e. emit an FRN instruction before each FST instruction emitted)." {6} .eq .pp Their studies also indicated that the double precision arithmetic failed to do correct rounding. In fact, double precision operations truncate their results rather than rounding (see the next section). This leads to slightly skewed results which are especially noticeable in problems requiring very precise results: .bq "... Consequently the Prime-750 exhibits a far larger error than the VAX-11/780 when we use the sum of squares measure. This error has been detected by our users in other calculations and programs and is particularly critical when nearly unstable matrix problems are investigated.... The consequences of such inaccuracy in a research-oriented application area could be critical." {6} .eq That conclusion was made for a 750 with hardware floating point operations. It can certainly be concluded that a 400 or 550 is not at all appropriate for double precision calculations requiring any high degree of accuracy. .# .SH "Precision" The various models of Prime computer perform floating point operations to slightly different precisions. To quote from section 6.2.1 of {9}: .bq "In double and single precision add, subtract, and multiply operations, the [ul 750 and 850] truncate results to 48 sign and magnitude bits. Single precision divide operations on these processors produce 32 sign and magnitude bits of rounded result.... .sp Double precision operations on the 500-II (and 650) are identical to those performed on the 750 and 850. Single precision divide is also identical to 750/850 single precision divide. Single precision add, subtract, and multiply operations truncate results to 32 sign and magnitude bits. .sp For all other 50 Series systems, double precision add and subtract operations truncate results to 48 sign and magnitude bits; multiply and divide operations truncate to 47 sign and magnitude bits. All single precision operations on these processors truncate results to 32 sign and magnitude bits." .eq These statements tend to raise serious doubts about the accuracy of similar programs run on different model machines due to precision changes. It also would indicate that some program behavior might change when run on a different model cpu.