Splunk Completes Acquisition of Plumbr Learn more

To blog |

Reducing memory usage with String.intern()

June 26, 2013 by Nikita Salnikov-Tarnovski Filed under: Memory Leaks

Every now and then you have a dying production application at hand. And you know you need to patch it as fast as possible. So have we, and thought it would be interesting to share one of the recent war stories. In this case we had a chance to patch an application with something as simple as String.intern(). But let me start from the beginning.

The application at hand was suffering from lack of memory and not even starting up after its recent changes. The symptoms included high CPU usage after JVM restart and then a few minutes later the fatal OutOfMemoryError: heap space in the logs. A quick look into heap contents gave us a suspect – the application was loading millions of objects into a certain internal data structure.

Background check with the development team revealed that the number of objects loaded was recently multiplied by a factor of two – instead of ~five million objects the application now had to deal with approximately ten million instances in memory. This can indeed use up some heap space. But knowing the possible cause was not going to help us much – no way the business owners were willing to give up on the precious data they had just acquired.

Digging into the data structure at hand we found its excessive usage of Strings underneath. Which should come as no surprise to any of our readers. But some of these Strings contained a repeating representation content. You can think of address elements, such as street names and / or countries as being the equivalent cases.

And here a quick fix started to brew in our heads. What if we internalize those repeating Strings? After quickly checking with the developers of the application, we were given a green light. The developers warranted that the side effects of the interning, such as remembering to String.intern() all of the strings that were being compared to our internalized Strings, will be contained. Thank god for encapsulation.

Did you know that 20% of Java applications have memory leaks? Don’t kill your application – instead find and fix leaks with Plumbr in minutes.

Now we just had to understand how much CPU overhead we were going to introduce on internalizing. By our surprise, interning ~10M Strings took just a bit less than four minutes. And saved us exactly those ~500MB of memory we were short of. So the day was saved for the time.

Now, before you jump to your application and start internalizing all the Strings you are going to find, I must warn you beforehand. There are a lot of possible things that can go wrong:

  • Your internalized Strings will disappear from the heap and relocate to the permanent generation. So make sure you have enough room in the permgen space. Update: thanks to our readers, this is indeed true only until JDK 6 releases. So if you are using Java 7, the interned Strings are in the very same heap.
  • Be sure to internalize all Strings you are going to compare to your internalized Strings. Or you will be creating the nastiest types of bugs in your application.
  • Make sure you can tolerate the CPU overhead on internalizing. It is a native method call, thus it will be completely dependent on your specific platform, so make sure you try it out before rolling out the changes in production

We admit that our case was rather rare – the data structure contained a lot of repeating String objects and was integrated with the application in a way that made it possible for us to isolate our quick fix. And even in our case, the fix was soon after removed by developers who reworked their data structures to a more reasonable graph representation.

But the warnings aside – there are interesting and helpful tools built into the Java Virtual Machine. Know how to use them and beware of their side effects, and they will become your friends. Use them without caution and you can easily kill your application. Your best friend will always be an actual test case, built on top of your very own application.

And if you read the article this far – subscribe to our RSS feed or start following us in Twitter to be notified of future interesting posts.



Wouldn’t -XX:+UseStringCache also help in this case without any code changes? I have no idea how large the JVM internal String cache can grow…

Stefan Seidel

I always consider doing an String.intern() after String.substring(). In Java. String.subString() only returns a limited view on the original string, therefore preventing gc of the original until the substring isn’t used anymore.

Now imagine trying to generate a report based on a 100 Byte substring of a 1 Million Documents, each about 1 Megabyte in size.


That is one more reason to upgrade your java 🙂 This behaviour you refer to has been changed recently


Substring no longer(as of JDK7) works that way. It now returns a new string, not a view.

Nitsan Wakart

That is why I prefer to develop important code with Microsoft Visual Basic 6. It has wonderful string support and runs faster than the Java. In addition, it is fully compatible with Microsoft Windows.


Thank you for pointing it out, fixed the blog post.

Ivo Mägi

Note that the built-in String#equals already starts with an `==`, so it still runs quickly even when using intern’d Strings. So using `==` only saves you the overhead of the function-call, which hopefully javac already in-lines (since String is a final class, it can do that).

So (a) the ‘encapsulation’ comment is a red herring in this case, and (b) for defensive programming I wouldn’t use `==` to compare strings even if I was really danged sure everybody on the team was remembering to intern them.


I am completely agree with you, that it is (very) bad smell to use `==` to compare strings. But if you already do that and you started to intern them then you really should be careful to intern them all. Consider the following piece of code:

String constant = “aa”;
char[] ch = new char[2];
ch[0] = (char) System.in.read();
ch[1] = (char) System.in.read();
String aa = String.valueOf(ch);
String aa2 = aa;
System.out.println(aa == aa2);
System.out.println(aa == aa2.intern());
System.out.println(aa.intern() == aa2.intern());

Imagine, that user has entered `aa` . What will be the output of this code? So encapsulation does help 🙂