To blog Previous post | Next post
Reducing memory consumption by 20x
This is going to be another story sharing our recent experience with memory-related problems. The case is extracted from a recent customer support case, where we faced a badly behaving application repeatedly dying with OutOfMemoryError messages in production. After running the application with Plumbr attached we were sure we were not facing a memory leak this time. But something was still terribly wrong.
The symptoms were discovered by one of our experimental features monitoring the overhead on certain data structures. It gave us a signal pinpointing towards one particular location in the source code. In order to protect the privacy of the customer we have recreated the case using a synthetic sample, at the same time keeping it technically equivalent to the original problem. Feel free to download the source code.
We found ourselves staring at a set of objects loaded from an external source. The communication with the external system was implemented via XML interface. Which is not bad per se. But the fact that the integration implementation details were scattered across the system – the documents received were converted to XMLBean instances and then used across the system – was not maybe the wisest thing.
Did you know that 20% of Java applications have memory leaks? Don’t kill your application – instead find and fix leaks with Plumbr in minutes.
Essentially we were dealing with a lazily-loaded caching solution. The objects cached were Persons:
// Imports and methods removed to improve readability
public class Person {
private String id;
private Date dateOfBirth;
private String forename;
private String surname;
}
Not too memory-consuming one might guess. But things start to look a bit more sour when we open up some more details. Namely the implementation of this data was anything like the simple class declaration above. Instead, the implementation used a model-generated data structure. The model used was similar to the following simplified XSD snippet:
<xs:schema targetNamespace="http://plumbr.io"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="id" type="xs:string"/>
<xs:element name="dateOfBirth" type="xs:dateTime"/>
<xs:element name="forename" type="xs:string"/>
<xs:element name="surname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Using XMLBeans, the developer had generated the model used behind the scenes. Now lets add the fact that the cache was supposed to hold up to 1.3M instances of Persons and we have created a strong foundation to failure.
Running a bundled testcase gave us an indication that 1.3M instances of the XMLBean-based solution would consume approximately 1.5GB of heap. We thought we could do better.
The first solution is obvious. Integration details should not cross system boundaries. So we changed the caching solution to the simple java.util.HashMap<Long, Person> solution. ID as the key and Person object as the value. Immediately we saw the memory consumption reduced to 214MB. But we were not satisfied yet.
As the key in the Map was essentially a number, we had all the reasons to use Trove Collections to further reduce the overhead. Quick change in the implementation and we had replaced our HashMap withTLongObjectHashMap<Person>. Heap consumption dropped to 143MB.
We definitely could have stopped there, but a engineer’s curiosity did not allow us to do so. We could not help but notice that the data used contained a redundant piece of information. Date Of Birth was actually encoded in the ID, so instead of duplicating it in an additional field, we could easily calculate the birthday from the given ID.
So we changed the layout of the Person object and now it contained just the following fields:
// Imports and methods removed to improve readability
public class Person {
private long id;
private String forename;
private String surname;
}
Re-running the tests confirmed our expectations. Heap consumption was down to 93MB. But we were still not satisfied.
The application was running on a 64-bit machine with an old JDK6 release. Which did not compress the ordinary object pointers by default. Switching to the -XX:+UseCompressedOops gave us an additional gain – now we were down to 73MB.
We could go further and start interning strings or building a b-tree based on the keys, but this would already start impacting the readability of the code, so we decided to stop here. 21.5x heap reduction should already be a good enough result.
Lessons learned?
-
Do not let integration details cross system boundaries
-
Redundant data will be costly. Remove the redundancy whenever you can.
-
Primitives are your friends. Know thy tools and learn Trove if you haven’t already
-
Be aware of the optimization techniques provided by your JVM
If you are curious about the experiment conducted, feel free to download the code used here. The utility used for measurements is described and available in this blog post.
And follow us on Twitter, we have many more interesting war stories to tell.
Comments
I can’t open the url which is http://www.plumbr.eu/files/plumbr-optimization-sample.zip
Date of Birth encoded in the ID? That’s interesting, why would anyone do that?
That is an Estonian way of handing out unique ID’s to citizens. Part of this ID is a birth date of the named citizen.
Actually the atricle is interestng but doesnt look accurate for me.
1) Initial problem statement – 1.3m references 1.5G of heap = 1.13kbytes/instance. LOLWHAT? If thats really so just putting simple bean instead of xmlbeans stuff will give 10-20x gain depending on string constraints.
2.1) putting HashMap\Long, Person\ If xmlbeans retains so much space how did you decrease the footprint? Doesnt that mean the main problem was with the cache data structure? I guess the main footprint gain was because of replacement of xmlbeans Person with plain java bean so hashmap focus is misleading.
2.2) numbers 214M>143M dont look realistic. assuming on 64 bit jvm HashMap overhead is 50bytes/entry that leads to 102 bytes per Person retained on avg. Given that size trove doesnt give overhead at all (which is highly unlikely).
2.3) You reported trove reduced footprint for 70m or 50 bytes/entry. Do you believe this is by replacing hadhmap.entry Long reference with primitive (Long also could be cached within object)? I guess you rather benefited from open addressing map implementation by trove which didnt mentioned.
3) -XX:+UseCompressedOops – logically that should be applied first as obvious zero efforts optimization. appliyng it after reducing refs count looks strange.
4) Q “or building a b-tree based on the keys” – it is absolutely unclear what benefit will give b-tree given the same number of refs and low trove overhead.
“Date Of Birth was actually encoded in the ID … Heap consumption was down to 93MB.” – that really smart and i like it. only keeping key within value having key based access looks like overhead itself.
PS pls consider byte[] backed strings as well
Thank you for your article and best regards
As Nikita is currently on a flight and more elaborate answer will be delayed for a while – did you actually run the code sample packaged along the article? It should clear most of the doubt immediately. But I am sure Nikita can elaborate in more details after he is recovered from the jetlag.
Thank you for extensive feedback 🙂 It is always a pleasure to know that somebody reads your article so carefully.
1. Our first step was exactly to use simple beans instead of XMLBeans. That gave us 7x reduction. So we agree on that.
2.1. We have focused on HashMap next, to further reduce memory footprint. And indeed, from the graph in the article, it can be seen, that main gain came from replacing XMLBeans. No doubt here.
2.2. I dare say that HashMap overhead is larger. But as of now I haven’t number to back my claim. But believe us, we did not make that numbers out the thin air 🙂 You can download the code and verify them.
2.3. On 64bit JVM, taking object alignment into account, Long consumes 24 bytes vs 8 bytes of long. That 16 byte per entry. Remaining gain came indeed from differences in the inner structure of Trove vs HashMap.
3. Why strange? Yes, this is obvious (to most developers?) first step, but not the only step in any case. In this particular case this optimization hit this waypoint a little later 🙂
4. Speculatively speaking, Trove has some small overhead due to unused slots in its inner arrays. B-tree could eliminate that. At the same time, it can introduce more references. So that should be measured, not guessed 🙂
Hello
I try your example on my own laptop – nice sample.
I have only one question – I add sleep method after all operation finished and made memory dump (jmap) and then analyze it in mat – and there is my question – why the
result of object size a little differently between mat and your agent included in this sample
Regards
Kuba
Object sizes can be calculated and reported somewhat differently, depending on your JVM and on the tool that does calculation. E.g. does it account for object alignment or not. Or field padding. Or object header size differences on different JVMs/GCs.
String.intern() the names and surnames – bet that will give you another big saving.
In the attached synthetic sample, unfortunately not as the names and surnames are randomly generated. In a bit more realistic cases with loads of “John Smith”‘s we indeed would have a nice reduction.
Your example is very specific. I have rarely seen IDs with an embedded date of birth – it might be a good idea – but I’m sure if it were, everyone would have embedded all the fields in that single ID field.
Another Joke – try to do date.setTime(date.getTime()) and clean all garbage by GC. 🙂 Sometimes it helps.
Show results when you add -XX:+UseCompressedOops after switching to HashMap and before migrating to TLongObjectHashMap. 🙂
That’s an interesting observation, I would be keen to see that as well.
The results with java.util.HashMap and -XX:+UseCompressedOops: 151.8MB. So no intrigues here. Feel free to run the attached testcase, it has this option in the default set of tests switched on as well.