Binary Representation of Floating Point Numbers in Java

JVM interpreter - binary code

Last post we looked at the representation of floating point numbers.

To recap, the floating point data types in Java have three components in their representation: a mantissa/significand, an exponent and a sign bit.

  • The mantissa/significand represents the actual number. The length of the mantissa determines the precision of the actual number.
  • The exponent represents a multiplier (or scaling) of the mantissa. The scaling provides a very wide range of values. The range is determined by the number of digits in the exponent. Negative exponents represent very small numbers (i.e. close to zero).
  • The sign bit represents the sign of the entire number, i.e., positive or negative.

Java has two primitive floating point types, float and double. The size of their exponents and mantissas are given by the following table:


Type Size Sign bit Mantissa Exponent
float 32 bits 1 bit 23 bits 8 bits
double 64 bits 1 bit 52 bits 11 bits

Printing the Mantissa and Exponent

To print the components of a floating point type, we can mask out the appropriate bits and shift them all to the right.

The mask values can easily be worked out from first principles using your knowledge of integer binary representation.

If you don’t feel like doing that, you can refer to the online Java API documentation of Double::doubleToLongBits(double). For float types refer to Float::floatToIntBits(float).

These links are to the Java SE 17 API documentation. If you want Java 11 or 21 documentation, just change the version number in the URL.

These mask values are used in the example code below:

private static void printParts(double value) {

    final long SIGN_BIT_MASK = 0x8000000000000000L;  // bit  63
    final long EXPONENT_MASK = 0x7ff0000000000000L;  // bits 62 to 52
    final long MANTISSA_MASK = 0x000fffffffffffffL;  // bits 51 to  0

    final int SIGN_SHIFT = 63;
    final int EXPONENT_SHIFT = 52; // biased exponent

    System.out.printf("%nPrinting in floating point and hex form%n");
    System.out.printf("Floating point value    : %f%n", value);
    System.out.printf("Hexadecimal (%%a)        : %+A%n", value);

    System.out.printf("%nUsing doubleToLongBits(value)%n");
    long bits = Double.doubleToLongBits(value);
    System.out.printf("Raw integral value (decimal): %d%n", bits);
    System.out.printf("Raw integral value (hex)    : %X%n", bits);

    System.out.printf("%nPrinting the separate parts%n");
    System.out.printf("Raw bits: %64s%n", Long.toBinaryString(bits));
    System.out.printf("Sign    : %d%n", 
                         (bits & SIGN_BIT_MASK) >> SIGN_SHIFT);
    System.out.printf("Exponent: %d%n", 
                         (bits & EXPONENT_MASK) >> EXPONENT_SHIFT); 
    System.out.printf("Mantissa: %d%n",  bits & MANTISSA_MASK);
}

private static void printParts(float value) {

    final long SIGN_BIT_MASK = 0x80000000;  // bit  31
    final long EXPONENT_MASK = 0x7f800000;  // bits 30 to 23
    final long MANTISSA_MASK = 0x007fffff;  // bits 22 to  0

    final int SIGN_SHIFT = 31;
    final int EXPONENT_SHIFT = 23; // biased exponent

    System.out.printf("%nPrinting in floating point and hex form%n");
    System.out.printf("Floating point value    : %f%n", value);
    System.out.printf("Hexadecimal (%%a)        : %+A%n", value);

    System.out.printf("%nUsing floatToIntBits(value)%n");
    int bits = Float.floatToIntBits(value);
    System.out.printf("Raw integral value (decimal): %d%n", bits);
    System.out.printf("Raw integral value (hex)    : %X%n", bits);

    System.out.printf("%nPrinting the separate parts%n");
    System.out.printf("Raw bits: %32s%n", Integer.toBinaryString(bits));
    System.out.printf("Sign    : %d%n", 
                         (bits & SIGN_BIT_MASK) >> SIGN_SHIFT);
    System.out.printf("Exponent: %d%n", 
                         (bits & EXPONENT_MASK) >> EXPONENT_SHIFT); 
    System.out.printf("Mantissa: %d%n",  bits & MANTISSA_MASK);
}

We can use these two methods in a simple program as follows:

public static void main(String args[]) {

    // change values here, or ask for user input
    double value1 = 0.3; // try other values including Double.NaN;
    printParts(value1); 

    float value2 = 0.3F; // try other values including Float.NaN;
    printParts(value2); 
}

Further Reading and Signing Off

The example code is similar to the floating point visualisation pages at https://evanw.github.io/float-toy/, https://float.exposed/0x400921fbc0000000 and https://bartaz.github.io/ieee754-visualization/, but obviously not as polished. That will be left to you, the dear reader…

You can read more on the formats of floating point numbers at https://floating-point-gui.de/formats/fp/.

More on first principles can be find at https://modelthinkers.com/mental-model/first-principle-thinking, https://fs.blog/first-principles/, or an appropriate Google search.

Was this useful? Please share your comments on the blog post, and as always, stay safe and keep learning!

Leave a Comment

Your email address will not be published. Required fields are marked *

Thank You

We're Excited!

Thank you for completing the form. We're excited that you have chosen to contact us about training. We will process the information as soon as we can, and we will do our best to contact you within 1 working day. (Please note that our offices are closed over weekends and public holidays.)

Don't Worry

Our privacy policy ensures your data is safe: Incus Data does not sell or otherwise distribute email addresses. We will not divulge your personal information to anyone unless specifically authorised by you.

If you need any further information, please contact us on tel: (27) 12-666-2020 or email info@incusdata.com

How can we help you?

Let us contact you about your training requirements. Just fill in a few details, and we’ll get right back to you.