Splunk Completes Acquisition of Plumbr Learn more

Let us assume we have to deploy our solution to the commodity-class hardware with up to four cores and 10 G RAM available. From this we can derive our capacity requirement that the maximum heap space for the application cannot exceed 8 GB. Having this requirement in place, we would need to turn to the third configuration on which the test was run:

Heap GC Algorithm Useful work Longest pause
-Xmx12g -XX:+UseConcMarkSweepGC 89.8% 560 ms
-Xmx12g -XX:+UseParallelGC 91.5% 1,104 ms
-Xmx8g -XX:+UseConcMarkSweepGC 66.3% 1,610 ms

The application is able to run on this configuration as

java -Xmx8g -XX:+UseConcMarkSweepGC Producer

but both the latency and especially throughput numbers fall drastically:

  • GC now blocks CPUs from doing useful work a lot more, as this configuration only leaves 66.3% of the CPUs for useful work. As a result, this configuration would drop the throughput from the best-case-scenario of 13,176,000 jobs/hour to a meager 9,547,200 jobs/hour
  • Instead of 560 ms we are now facing 1,610 ms of added latency in the worst case

Walking through the three dimensions it is indeed obvious that you cannot just optimize for “performance” but instead need to think in three different dimensions, measuring and tuning both latency and throughput, and taking the capacity constraints into account.