To blog Previous post | Next post
G1 vs CMS vs Parallel GC
This post is following up the experiment we ran exactly a year ago comparing the performance of different GC algorithms in real-life settings. We took the same experiment, expanded the tests to contain the G1 garbage collector and ran the tests on different platform. This year our tests were run with the following Garbage Collectors:
Description of the environment
The experiment was ran on out-of-the-box JIRA configuration. The motivation for the test run was loud and clear – Minecraft, Dalvik-based Angry Bird and Eclipse asides, JIRA should be one of the most popular Java applications out there. And opposed to the alternatives it is a more typical representative of what most of us are dealing with on the everyday business – after all Java is still by far most used in server side Java EE apps.
What also affected our decision was – the engineers from Atlassian ship nicely packaged load tests along the JIRA download. So we had a benchmark to use for our configuration.
We carefully unzipped our fresh JIRA 6.1 download and installed it on a Mac OS X Mavericks. And ran the bundled tests without changing anything in the default memory settings. The Atlassian team had been kind enough to set them for us:
-Xms256m -Xmx768m -XX:MaxPermSize=256m
The tests used JIRA functionality in different common ways – creating tasks, assigning tasks, resolving tasks, searching and discovering tasks, etc. Total runtime for the test was 30 minutes.
We ran the test using three different garbage collection algorithms – Parallel, CMS and G1 were used in our case. Each test started with a fresh JVM boot, followed by prepopulating the storage to the exactly the same state. Only after the preparations we launched the load generation.
Did you know that GC stops 20% of Java applications regularly for more than 5 seconds? Don’t spoil the user experience – increase GC efficiency with Plumbr instead.
During each run we have collected GC logs using -XX:+PrintGCTimeStamps -Xloggc:/tmp/gc.log -XX:+PrintGCDetails and analyzed this statistics with the help of GCViewer
The results can be aggregated as follows. Note that all measurements are in milliseconds:
Total GC pauses
Max GC pause
Interpretation and results
First stop – Parallel GC (-XX:+UseParallelOldGC). Out of the 30 minutes the tests took to complete, we spent close to 21 seconds in GC pauses with the parallel collector. And the longest pause took 721 milliseconds. So let us take this as the baseline: GC cycles reduced the throughput by 1.1% of the total runtime. And the worst-case latency was 721ms.
Next contestant: CMS (-XX:+UseConcMarkSweepGC). Again, 30 minutes of tests out of which we lost a bit less than 19 seconds to GC. Throughput-wise this is roughly in the same neighbourhood as the parallel mode. Latency on the other hand has been improved significantly – the worst-case latency is reduced more than 10 times! We are now facing just 64ms as the maximum pause time from the GC.
Last experiment used the newest and shiniest GC algorithm available – G1 (-XX:+UseG1GC). The very same tests were run and throughput-wise we saw results suffering severely. This time our application spent more than a minute waiting for the GC to complete. Comparing this to the just 1% of the overhead with CMS, we are now facing close to 3.5% effect on the throughput. But if you really do not care about throughput and want to squeeze out the last bit from the latency then – we have improved around 20% comparing to the already-good CMS – using G1 saw the longest GC pause only taking 50ms.
As always, trying to summarize such an experiment into a single conclusion is dangerous. So if you have time and required skills – definitely go ahead and measure your own environment instead of adopting to one-size-fits-all solution.
But if I would dare to make such a conclusion, I would say that CMS is still the best “default” option to go with. G1 throughput is still so much worse that the improved latency is usually not worth it.
If you enjoyed the content, consider subscribe to either our RSS feed or Twitter stream – we continue to publish on different performance optimization topics.
Thanks for this! Though given the age of this article and the various updates to GC over time, would you consider running an updated experiment to see if any changes have occurred? It’d be interesting to see new results! 🙂
The exact replica is no longer possible as Atlassian apparently is no longer maintaining the set of tests we used in this experiment. But I do believe it is time to recreate the test using a different app/load test combo, so I would assume during the following months we are going to do just that.
What was the hardware and software configuration of this test? Specifically, what JDK version was used?
Your experimental setup seems pretty flawed.
1) using max GC pause is questionable. Your basing decisions on an extreme outlier, which one should never do. You must have enough data to show the distribution of the pauses. Why not include them? Are they N(15,20)+epsilon=0.01 ?
2) Total pauses seems important, but I don’t see any division of the data by the amount of work being done. Are you running the tests for 30minutes? or running the tests and they all took around 30min but they are all doing the same work. % of time of the GC pause should be used regardless.
Listen, I know this was an informal post, so don’t take it the wrong way, but it wouldn’t have been hard to do this right.
Thanks for your comments.
Using maximum GC pause as a metric is actually quite widespread. In many use case you are much more interested in worst-case response time or latency. If you have many small pauses and one huge one, then this huge pause will have huge impact on affected users. And you may loose them.
JIRA load tests run for fixed amount of time, 30 minutes. So total pause time correctly reflects the GC overhead. But you are right, that we could express these numbers more clearly. Thanks for the suggestion 🙂