This week we’re going to look at string interning.
The concept of string interning
String interning (short for “internalising”) is a mechanism to store only a single copy of each different string value. These string values are stored in a string pool.
For interning to work, the string objects must be immutable (which is a given with Java).
Interning can make string processing faster and use memory more efficiently. A downside of interning is that more time is needed when the string is first created and/or interned.
The constant String pool
To save memory space and improve performance (especially comparisons), the JVM maintains a pool of unique strings internally on the heap. The string pool is maintained privately by Java’s String
class. T
All String
literals present at compile time are automatically interned by the compiler. Strings generated on the fly as the program executes won’t be interned automatically.
When we use a string literal, e.g. "Hello World"
, the JVM will look for this string value in its pool of strings, and:
- If found, it will return the reference of the already existing string.
- If not found, this string object is added to the pool and a reference to this newly added string is returned.
For example, if we use the string "Hello World"
twice in our code, we will get a reference to the same string.
String interning with the String.intern() method
The method String.intern()
creates an identical copy of an existing String
object in the heap memory and stores it in the constant string pool. A reference of the copy of the string object is returned by the String.intern()
method.
If a String
with the same contents already exists in the constant string pool, then a new object won’t be created in the string pool, and the reference will point to the existing String
object already in the string pool.
All compile-time constant string literals are automatically interned using the intern()
method. Obviously we can explicitly call the intern()
method anywhere in our code.
String interning and comparisons
Because of interning, we can get different comparison results if we use ==
to test for string equality instead of using the equals()
method. So a program might work correctly for some simple comparisons, but will fail on more complex ones.
Example code follows:
String s0 = "Some string"; // literal string - created in the string constant pool
String s1 = "Some string"; // automatically interned literal string - same as s0
String s2 = new String("Some string"); // string object - created on the heap
String s3 = s2.intern(); // exact copy of s2 - created in the string constant pool
String s4 = s2.intern(); // exact copy of s2 - s3 and s4 are both interned
String s5 = new String("Some string"); // another string object created on the heap
System.out.println("s0 == s1 -> " + (s0 == s1)); // same addresses in the string constant pool
System.out.println("s1 == s2 -> " + (s1 == s2)); // different addresses - heap vs pool
System.out.println("s1 == s3 -> " + (s1 == s3)); // same addresses in the string constant pool
System.out.println("s2 == s3 -> " + (s2 == s3)); // different addresses - heap vs pool
System.out.println("s3 == s4 -> " + (s3 == s4)); // same addresses in the string constant pool
System.out.println("s2 == s5 -> " + (s2 == s5)); // different addresses on the heap
System.out.println("s0.equals(s1) -> " + (s0.equals(s1))); // same contents
System.out.println("s1.equals(s2) -> " + (s1.equals(s2))); // ditto
System.out.println("s1.equals(s3) -> " + (s1.equals(s3))); // ditto
System.out.println("s2.equals(s3) -> " + (s2.equals(s3))); // ditto
System.out.println("s3.equals(s4) -> " + (s3.equals(s4))); // ditto
System.out.println("s2.equals(s5) -> " + (s2.equals(s5))); // ditto
When we explicitly create String
instances using the new
operator, the objects won’t be automatically interned. Every string object created using the new
operator will be allocated a separate memory space on the heap.
Remember that the ==
operator compares the addresses where objects are stored, versus the equals()
method which compares the characters of the string one by one.
The intern()
method should be used on strings constructed with new String()
if we want to compare them using the ==
operator.
Potential problems with string interning
There were some potential problems with interning.
- Objects must be immutable otherwise we can run into problems when developing multi-threaded applications.
- There can be memory de-allocation problems. Strings that are interned might never be garbage collected (depending on the JVM version used). From Java 8, the string pool is now created on the heap, instead of PermGen, so garbage collection can occur in the pool.
- Interning saves RAM at the expense of more processing time to detect and replace duplicate strings.
Security issues with string interning
String interning can have security implications.
If we have sensitive text such as passwords stored as strings in memory, they might stay in memory even after the actual string objects have been garbage collected. That can be a security risk if memory dumps are somehow accessible.
This problem exists even without interning since the garbage collector is non-deterministic. We should rather use a char[]
for password input, and zero it immediately after use.
Extra, extra, read all about it
Objects other than strings can be interned. When primitive values are boxed into a wrapper object, some values (any boolean
, any byte
, short
or int
between −128 and 127, and any char
from 0 to 127) are interned. Any two boxing conversions of one of these values are guaranteed to result in the same object.
I’m always interested in your opinion, so please leave a comment. Your feedback helps me write tips that help you.