Let us assume we have to deploy our solution to the commodity-class hardware with up to four cores and 10 G RAM available. From this we can derive our capacity requirement that the maximum heap space for the application cannot exceed 8 GB. Having this requirement in place, we would need to turn to the third configuration on which the test was run:
Heap | GC Algorithm | Useful work | Longest pause |
---|---|---|---|
-Xmx12g | -XX:+UseConcMarkSweepGC | 89.8% | 560 ms |
-Xmx12g | -XX:+UseParallelGC | 91.5% | 1,104 ms |
-Xmx8g | -XX:+UseConcMarkSweepGC | 66.3% | 1,610 ms |
The application is able to run on this configuration as
java -Xmx8g -XX:+UseConcMarkSweepGC Producer
but both the latency and especially throughput numbers fall drastically:
- GC now blocks CPUs from doing useful work a lot more, as this configuration only leaves 66.3% of the CPUs for useful work. As a result, this configuration would drop the throughput from the best-case-scenario of 13,176,000 jobs/hour to a meager 9,547,200 jobs/hour
- Instead of 560 ms we are now facing 1,610 ms of added latency in the worst case
Walking through the three dimensions it is indeed obvious that you cannot just optimize for “performance” but instead need to think in three different dimensions, measuring and tuning both latency and throughput, and taking the capacity constraints into account.