Background
This is the third part of a multi-part series on memory leaks in Java. If you missed the other two, you can find them here:
Last post we looked at how the static
modifier and anonymous inner classes can cause memory leaks. In this post, we’ll look at the effect of string interning on memory.
String Interning
One way to minimise memory usage is to avoid having duplicate objects in memory. Duplicate String
objects are unfortunately very easy to create. They reduce the size of the available heap. One way of avoiding duplicate strings is to intern them.
The String
class maintains a private pool of strings. This pool is initially empty when the application starts. All literal strings and constant string expressions are stored in this pool. These are then said to be interned. This is a way for the JVM to save memory by not creating duplicate string objects.
There is a public instance method called intern()
in the String
class. This method returns a String
object which has the same contents as the string on which it was called. However, the returned object is guaranteed to be from a pool of non-duplicate strings.
When we manually call the intern()
method on a String
object, the following happens:
- If the pool already contains a string equal to this
String
object, then the string from the pool is returned. - Otherwise, this
String
object is added to the pool and a reference to thisString
object is returned.
The pool does the equality checking using the equals()
method. This means that for any two strings s
and t
, if s.equals(t)
is true, then s.intern() == t.intern()
will be true.
PermGen
When we create objects, they are stored on the heap. The heap can dynamically grow as new objects are created. Before Java 7, the String
pool was not stored on the heap, but was stored in a different area of memory called PermGen. PermGen (permanent generation) had a fixed size allocation. This meant it couldn’t be expanded at runtime.
Interned String
objects in PermGen will stay there as long as our application runs. The garbage collector doesn’t run over PermGen. This means interned strings can’t be collected and the GC won’t be able to free memory. This will lock off a section of memory and will create a memory leak in our application.
It is difficult to predict the PermGen size needed for an application. If we intern too many strings in the PermGen we can get an OutOfMemory error
.
Solutions
The memory model went through a major change in Java 7. The String
pool was moved from PermGen to the heap. The heap is garbage collected by the JVM. Unreferenced String
s will then be removed from the pool, thus releasing memory. This reduces the risk of triggering an OutOfMemoryError
.
We must take care when working with large String
objects on version 6 or earlier. If we manipulate a lot of strings, we may need to increase the PermGen size with the following JVM runtime flag:
-XX:MaxPermSize=<size>
The easiest way to fix PermGen issues is to upgrade to a later version of Java, where the String
pool is stored in heap space.
From Java 7 on, we have more options to examine and/or change the pool size. We can view the pool size using the following flag:
-XX:+PrintStringTableStatistics
The string pool uses a HashMap
to store the non-duplicate String
objects. The HashMap
is set up with a number of buckets to store each object. To change the number of buckets, we can use the flag:
-XX:StringTableSize=<size>
To get a list of all possible flags, use the flag:
-XX:+PrintFlagsFinal
String Concatenation
While on the topic of strings, avoid string concatenation with the “+
” operator. String concatenation can create a large number of temporary objects. This is not a cause of memory leaks, but is rather a performance issue.
When we concatenate two strings with the “+
” sign, the following happens behind the scenes:
-
The compiler generates code to create a new
StringBuilder
object. The no-argumentStringBuilder
constructor is used which only allocates space for 16 characters. -
The
append()
method is called, appending the contents of the original string to theStringBuilder
object. -
The
append()
method is called, appending the value on the right hand side of the+
sign. -
Finally, the
StringBuilder
is converted to a newString
object by calling thetoString()
method.
So the code:
String s = "Hello,";
s = s + " world!";
actually gets converted to:
String s = "Hello,";
s = new StringBuilder().append(s).append(" world!").toString();
This is a CPU-intensive process. A number of temporary objects are created and then later garbage-collected. A number of additional methods are called which internally copy characters one at a time to the new object.
If we need to do a lot of string concatenation, it would more efficient to create a StringBuilder
of the correct final size, append all the strings by calling the append()
method multiple times, and then do a single toString()
conversion.
Conclusion
We’ll look at a few more causes of memory leaks in the next post. These will include incorrect or missing equals()
and hashCode()
methods.
Until then, stay healthy and keep learning!