Compact Strings in Java

Compress strings in Java - image of car being compressed

Java 9 introduced the concept of compact strings as a performance enhancement.

Strings are a major component of heap usage and can occupy as much as 25% of the heap memory. Reducing the internal size of a string can result in a significant saving in memory usage, as well as an equivalent reduction of garbage collection overhead.

Background

Java uses UTF-16 internally to represent characters, which means that a char is two (2) bytes wide.

Up to and including JDK 8, the sequence of characters in each String object is internally stored using a char array, as follows:

private final char[] value;

In the years between Java 1.0 and Java 9, it was found that most strings contain LATIN-1 (ASCII) characters only. ASCII characters need 1 byte of memory, while other Unicode characters need 2 bytes to represent them.

This happens mainly when we use a language like English that can be represented in ASCII/Latin-1.

If all the characters inside a specific String object can be represented using a single byte each, then half of the space in the internal char array is not being used. Using a byte array instead of a char array therefore has the potential to reduce heap memory usage and improve GC performance.

Compact Strings Implementation

From Java 9, the String class uses a byte array to store the characters, as follows:

private final byte[] value;

If there is even a single character in the string sequence that needs more than one byte to represent it, then every character in the sequence will be stored using 2 bytes, i.e. in UTF-16 representation. The String class internally still uses a byte[], but it doubles the array size when allocating space for it.

Latin-1 vs UTF-16

How does a String differentiate between the LATIN-1 and UTF-16 representations? There is a variable called coder that is used to specify the representation, as follows:

/* can have the value of either LATIN1 or UTF16 */
private final byte coder;

static final byte LATIN1 = 0;
static final byte UTF16 = 1;

Most of the String methods first check the coder value using a call to the isLatin1() method. The String method is then dispatched to a specific implementation, either the StringLatin1 or StringUTF16 class. These changes do not affect any public interfaces of String or any other related classes.

For example,

private boolean isLatin1() {
    return COMPACT_STRINGS && coder == LATIN1;
}

public char charAt(int index) {
    if (isLatin1()) {
        return StringLatin1.charAt(value, index);
    } else {
        return StringUTF16.charAt(value, index);
    }
}

Most of the classes working with strings (such as StringBuilder and StringBuffer) have been updated to support the new String representation.

Disabling Compact Strings

Processing a 2-byte String is slower because there is additional logic for handling both cases. Fortunately 2-byte strings are in the minority in most Java applications. If your application uses more 2-byte strings than 1-byte strings, the best choice is to disable compact strings when running the VM.

The Compact String VM option is enabled by default. To disable it, we can use the following option at runtime:

+XX:-CompactStrings

For more details on compact strings, see https://openjdk.org/jeps/254

Don’t forget to share your comments and Java experiences.

1 thought on “Compact Strings in Java”

  1. Pingback: Unicode and UTF - A Quick Overview • 2022 • Incus Data Programming Courses

Leave a Comment

Your email address will not be published. Required fields are marked *

Code like a Java Guru!

Thank You

We're Excited!

Thank you for completing the form. We're excited that you have chosen to contact us about training. We will process the information as soon as we can, and we will do our best to contact you within 1 working day. (Please note that our offices are closed over weekends and public holidays.)

Don't Worry

Our privacy policy ensures your data is safe: Incus Data does not sell or otherwise distribute email addresses. We will not divulge your personal information to anyone unless specifically authorised by you.

If you need any further information, please contact us on tel: (27) 12-666-2020 or email info@incusdata.com

How can we help you?

Let us contact you about your training requirements. Just fill in a few details, and we’ll get right back to you.

Your Java tip is on its way!

Check that incusdata.com is an approved sender, so that your Java tips don’t land up in the spam folder.

Our privacy policy means your data is safe. You can unsubscribe from these tips at any time.