IEEE precisions

This is a document describing useful information about IEEE 754 floating point standard.

The error introduced depends on the number of bits used for the significand s, and the rounding mode used:

with round-to-nearest mode, an additional bit is added and the error is 2^-(s+1)
with round-to-zero mode, the error is 2^-s

The range depends on the number of exponent bits e.

name	e	s	R2N round-off	R2Z round-off
double	11	52	1.11e-16	2.22e-16
	11	20	4.77e-7	9.54e-7
	11	4	0.03125	0.0625
single	8	23	5.96e-8	1.19e-7
	8	7	0.00391	0.0078125
half	7	10	0.00048828125	0.0009765625

Ginkgo Library

Provide feedback