It’s been a while since I posted last. Life, and death, get in the way of many things…
To recap, the last few posts have highlighted some of the issues we need to be aware of when working with floating point numbers. Some of the posts have been somewhat theoretical. How do these theoretical aspects affect us in everyday real-life programming?
Let’s think about a common everyday scenario: how do we choose a float
data type over a double
to represent a value? And once we’ve chosen one over the other, how will it affect the speed of our application? Is a float
faster than a double
, or vice versa?
Range and Precision
Obviously the very first thing that we need to know is what are the values we want to work with. That is the primary driver for our decision. For this we must know the range of float
s vs double
s. I’ve covered that in the blog post Floating Point Numbers in Java.
So we don’t have to flip to that post right now, I’ll include the table here:
Type | Size | Precision | Minimum Value | Maximum Value |
---|---|---|---|---|
float |
4 bytes | 6 – 7 digits | ±1.4E-45 | ±3.40282347E+38F |
double |
8 bytes | 15 digits | ±4.9E-324 | ±1.7976931348623157E+308 |
We also need to decide on the level of precision we need. The more useful type for most calculations is double
. It is a lot more accurate than a float
! The limited precision of float
is often not good enough for many calculations. We generally only use float
when we need to save memory space and/or we know that high precision calculations aren’t needed.
Bjarne Stroustrup, the inventor of C++, suggests using double
s over float
s if we’re uncertain or don’t know any better:
“The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don’t have that understanding, get advice, take the time to learn, or use double
and hope for the best.”
Speed
Once we’ve made those decisions, we’d like to know how will it affect the speed of our application.
The size of a float
is 4 bytes, while the size of a double
is 8 bytes. So our first assumption is that float
s should be faster because they are smaller.
Unfortunately that assumption isn’t necessarily correct. The answer is that it depends on a lot of different factors, like:
- What is the native hardware? Does it have a floating point unit (FPU) that supports both
float
anddouble
operations? For example, if we’re using an Intel CPU, does it only have legacy x87 FPU support, or does it have the modern SSE instruction set? - Does the hardware implement both
float
s anddouble
s, or only one or the other? Or neither? - Are floating point operations emulated in software?
- What is the application doing, especially as far as from where it is retrieving its data?
- Are we working on huge sets of data? Can the data be cached internally, cached externally in RAM, or must it be retrieved from disk?
- What compiler settings are we using?
- Are we using
float
ordouble
versions of any maths libraries? - What operating system are we using?
- The list goes on…
Thinking in terms of memory usage, float
values take half as much memory as double
values. If we’re dealing with very large datasets, this can be a very important factor. If we’re doing a lot of data access, we need to think carefully about memory and cache usage. Taking up twice the memory for each double
value gives a heavier load on the internal caches and more memory bandwidth will be needed to fill those caches to/from RAM. Because float
values are smaller, we might have fewer cache misses. If using double
means we have to cache to disk instead of to RAM, then the speed difference can be huge.
The only way to find out is by benchmarking our particular solution. Small changes in instructions and memory usage can have a significant impact. For large amounts of data, there’ll probably be an advantage in using single precision float
s. This obviously assumes that we don’t need the extra range or precision of double
s.
Application Code Gotchas
Not directly related to float
s and double
s are some real-world problems (and their solutions) that we need to be aware of.
Promotion
Given the following code:
float a, b, c;
foo(a * 3.14 + b);
We must be aware that the compiler will promote the values in the a
and b
variables to double
. We should avoid that by writing 3.14F
to help the compiler generate efficient assembler code/byte code that retains the values as float
if that’s what we want.
Libraries
The float
versions of many C library functions like logf(float)
and sinf(float)
will be faster than log(double)
and sin(double)
, because they work with fewer bits of precision. They can use polynomial approximations with fewer terms to get full precision for float
vs double
.
Division vs Multiplication
Every FPU performs multiplications much faster than divisions. Multiplication can be done in parallel, while division can’t so division is always slower than multiplication.
Many CPUs can perform multiplication in one or two clock cycles, but division always takes longer. Division can sometimes take up to or more than 24 clock cycles.
Why is division so much slower? Multiplication can be done with many simultaneous additions. Division requires iterative subtraction that can’t be performed simultaneously. Some FPUs speed up division by performing a reciprocal approximation and multiplying by that value. For example, instead of dividing by 2, it multiplies by 0.5. Depending on the values, it might not be quite as accurate, but is generally much faster.
So instead of
double d = 420.0 / 2.0;
We can write
double d = 420.0 * 0.5;
Further Reading and Signing Off
Lots more on floating point arithmetic to come!
There are a few pages on Stack Overflow that make for very interesting reading, but lead down all sorts of rabbit holes. Try this one or this one.
There’s also a very interesting page about PhysX87 (a real-time physics engine/library) here.
Was this useful? Please share your comments on the blog post, and as always, stay safe and keep learning!