Background
This is the third part of a multi-part series on memory leaks in Java. If you missed the other two, you can find them here:
Last post we looked at how the static modifier and anonymous inner classes can cause memory leaks. In this post, we’ll look at the effect of string interning on memory.
String Interning
One way to minimise memory usage is to avoid having duplicate objects in memory. Duplicate String objects are unfortunately very easy to create. They reduce the size of the available heap. One way of avoiding duplicate strings is to intern them.
The String class maintains a private pool of strings. This pool is initially empty when the application starts. All literal strings and constant string expressions are stored in this pool. These are then said to be interned. This is a way for the JVM to save memory by not creating duplicate string objects.
There is a public instance method called intern() in the String class. This method returns a String object which has the same contents as the string on which it was called. However, the returned object is guaranteed to be from a pool of non-duplicate strings.
When we manually call the intern() method on a String object, the following happens:
- If the pool already contains a string equal to this
Stringobject, then the string from the pool is returned. - Otherwise, this
Stringobject is added to the pool and a reference to thisStringobject is returned.
The pool does the equality checking using the equals() method. This means that for any two strings s and t, if s.equals(t) is true, then s.intern() == t.intern() will be true.
PermGen
When we create objects, they are stored on the heap. The heap can dynamically grow as new objects are created. Before Java 7, the String pool was not stored on the heap, but was stored in a different area of memory called PermGen. PermGen (permanent generation) had a fixed size allocation. This meant it couldn’t be expanded at runtime.
Interned String objects in PermGen will stay there as long as our application runs. The garbage collector doesn’t run over PermGen. This means interned strings can’t be collected and the GC won’t be able to free memory. This will lock off a section of memory and will create a memory leak in our application.
It is difficult to predict the PermGen size needed for an application. If we intern too many strings in the PermGen we can get an OutOfMemory error.
Solutions
The memory model went through a major change in Java 7. The String pool was moved from PermGen to the heap. The heap is garbage collected by the JVM. Unreferenced Strings will then be removed from the pool, thus releasing memory. This reduces the risk of triggering an OutOfMemoryError.
We must take care when working with large String objects on version 6 or earlier. If we manipulate a lot of strings, we may need to increase the PermGen size with the following JVM runtime flag:
-XX:MaxPermSize=<size>
The easiest way to fix PermGen issues is to upgrade to a later version of Java, where the String pool is stored in heap space.
From Java 7 on, we have more options to examine and/or change the pool size. We can view the pool size using the following flag:
-XX:+PrintStringTableStatistics
The string pool uses a HashMap to store the non-duplicate String objects. The HashMap is set up with a number of buckets to store each object. To change the number of buckets, we can use the flag:
-XX:StringTableSize=<size>
To get a list of all possible flags, use the flag:
-XX:+PrintFlagsFinal
String Concatenation
While on the topic of strings, avoid string concatenation with the “+” operator. String concatenation can create a large number of temporary objects. This is not a cause of memory leaks, but is rather a performance issue.
When we concatenate two strings with the “+” sign, the following happens behind the scenes:
-
The compiler generates code to create a new
StringBuilderobject. The no-argumentStringBuilderconstructor is used which only allocates space for 16 characters. -
The
append()method is called, appending the contents of the original string to theStringBuilderobject. -
The
append()method is called, appending the value on the right hand side of the+sign. -
Finally, the
StringBuilderis converted to a newStringobject by calling thetoString()method.
So the code:
String s = "Hello,";
s = s + " world!";
actually gets converted to:
String s = "Hello,";
s = new StringBuilder().append(s).append(" world!").toString();
This is a CPU-intensive process. A number of temporary objects are created and then later garbage-collected. A number of additional methods are called which internally copy characters one at a time to the new object.
If we need to do a lot of string concatenation, it would more efficient to create a StringBuilder of the correct final size, append all the strings by calling the append() method multiple times, and then do a single toString() conversion.
Conclusion
We’ll look at a few more causes of memory leaks in the next post. These will include incorrect or missing equals() and hashCode() methods.
Until then, stay healthy and keep learning!