I’m going to discuss a concept that we as Java programmers don’t usually deal with on a day-to-day basis, if at all. However, we do need to know about it because the JVM uses it behind the scenes all the time.
This is the concept of endianness. What is it, and how does it affect us?
Little-Endian vs Big-Endian
You may have heard the terms little-endian and big-endian before. These terms describe the ordering of the bytes that make up a variable when running on a specific CPU architecture. Some platforms use big-endian order internally (e.g. Mac, IBM 390); some use little-endian order (e.g., Intel).
Anything stored in computer memory can be accessed through the address where it is stored. It feels more natural to store numbers in memory with the least significant byte stored at the lower memory address, and the most significant byte being stored at the higher memory address. This is little-endian ordering.
A big-endian system is the opposite of a little-endian system. A big-endian system reverses the order of the bytes. It stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest.
So we can think of endianness as writing data either “left-to-right” or “right-to-left”.
Big-endianness is the leading ordering in networking protocols, and is referred to as network order. It transmits the most significant byte first. Little-endianness is the dominant ordering for processor architectures and their memory (x86, most ARM implementations, etc.) File formats can use either ordering. Some file formats use a mixture of both, or contain a byte order mark (BOM) to indicate the endianness of the file.
Internally, any specific computer could work equally well no matter what endianness it uses. Its hardware would consistently use the same endianness to both load and store data. This allows us to normally ignore the endianness of the computer we’re working on.
There are other byte orderings. They are generically called middle-endian or mixed-endian. However, I won’t cover them here as they aren’t particularly relevant to the JVM.
Endianness Example
Let’s say that we store a 32-bit (four byte) int
on two machines using different endianness. Let’s assign a hexadecimal value of 0x12345678
to this int
, as follows:
int value = 0x12345678; // using hex for ease of understanding
In both cases, the int
is split over four bytes with the values of 0x12
, 0x34
, 0x56
, and 0x78
. The bytes are stored in four sequential locations in memory, starting with the address a
(lowest address), then a+1
, a+2
, and a+3
(highest address).
The difference between big-endian and little-endian is the order in which the four bytes are stored in memory:
Address (a) | a | a+1 | a+2 | a+3 |
---|---|---|---|---|
Little-endian | 78 | 56 | 34 | 12 |
Big-endian | 12 | 34 | 56 | 78 |
A little-endian machine will store the integer with the least-significant byte (0x78
) at address a
, and the most-significant byte (0x12
) at address a+3
. Big-endian does the opposite.
Endianness in Java
Java binary files is stored in big-endian order (i.e., network order). This means that if we use only Java, all files are formatted the same way on all platforms: Windows, MacOS, Linux, etc. We can exchange binary data between Java applications without worrying about endianness. The JVM translates the Java big-endian form to whatever the native CPU is using.
In chapter 4. The Class File Format of the Java Virtual Machine specification, it specifies that “Multibyte data items are always stored in big-endian order, where the high bytes come first.”
A machine can read its own data perfectly. Problems arise when one computer stores data and a different type tries to read it. Any difference in endianness can become an issue when transferring data between two machines. We can run into problems when we exchange data files with a non-Java program that uses little-endian order. If we were examining a memory dump, we would get different results if the endianness differs. In these cases, we must be aware of the endianness of the data, and handle it appropriately.
Accessing Endianness with Java
We can use the method java.nio.ByteOrder.nativeOrder()
to get the endianness used by our specific CPU. Here’s a snippet of code to do it:
ByteOrder byteOrder = ByteOrder.nativeOrder();
System.out.println(byteOrder);
It prints LITTLE_ENDIAN
when run on an Intel CPU.
We can use it with Java NIO ByteBuffer
s. If we choose the native hardware ordering when allocating a ByteBuffer
, we’ll get better performance. Native code libraries are usually more efficient with these buffers.
Summary and Further Reading
To summarise:
-
Java hides the internal endianness from us, and gives us consistent results in all platforms.
-
When we exchange data files between platforms with different endianness, not being aware of, and dealing with, endianness can lead to incompatibility problems.
Just for interest, the terms big-endian and little-endian come from the 1726 book “Gulliver’s Travels”, where two groups of Lilliputians argue over whether to break the shell of a boiled egg at the little end or the big end. People haven’t changed much…
A very easy to read article is here.
A more comprehensive page is here.
Wikipedia has a highly detailed page here.
Was this interesting? Please share your comments on the blog post, and as always, stay safe and keep learning!