-
Notifications
You must be signed in to change notification settings - Fork 93
IEEE precisions
Goran Flegar edited this page Dec 4, 2018
·
7 revisions
This is a document describing useful information about IEEE 754 floating point standard.
The error introduced depends on the number of bits used for the significand s
, and the rounding mode used:
- with round-to-nearest mode, an additional bit is added and the error is 2-(s+1)
- with round-to-zero mode, the error is 2-s
The range depends on the number of exponent bits e
.
name | e | s | R2N round-off | R2Z round-off |
---|---|---|---|---|
double | 11 | 52 | 1.11e-16 | 2.22e-16 |
11 | 20 | 4.77e-7 | 9.54e-7 | |
11 | 4 | 0.03125 | 0.0625 | |
single | 8 | 23 | 5.96e-8 | 1.19e-7 |
8 | 7 | 0.00391 | 0.0078125 | |
half | 7 | 10 | 0.00048828125 | 0.0009765625 |
Tutorial: Building a Poisson Solver
- Getting Started
- Implement: Matrices
- Implement: Solvers
- Optimize: Measuring Performance
- Optimize: Monitoring Progress
- Optimize: More Suitable Matrix Formats
- Optimize: Using a Preconditioner
- Optimize: Using GPUs
- Customize: Loggers
- Customize: Stopping Criterions
- Customize: Matrix Formats
- Customize: Solvers
- Customize: Preconditioners