Splunk Completes Acquisition of Plumbr Learn more

To blog |

How expensive is a method call in Java

February 19, 2013 by Nikita Salnikov-Tarnovski Filed under: Performance

We have all been there. Looking at the poorly designed code while listening to the author’s explanations about how one should never sacrifice performance over design. And you just cannot convince the author to get rid of his 500-line methods because chaining method calls would destroy the performance.

Well, it might have been true in 1996 or so. But since then JVM has evolved to be an amazing piece of software. One way to find out about it is to start looking more deeply into optimizations carried out by the virtual machine. The arsenal of techniques applied by the JVM is quite extensive, but lets look into one of them in more details. Namely method inlining. It is easiest to explain via the following sample:

int sum(int a, int b, int c, int d) {
   return sum(sum(a, b),sum(c, d));
}

int sum(int a, int b) {
   return a + b;
}

When this code is run, the JVM will figure out that it can replace it with a more effective, so called “inlined” code:

int sum(int a, int b, int c, int d) {
   return a + b + c + d;
}

You have to pay attention that this optimization is done by the virtual machine and not by the compiler. It is not transparent at the first place why this decision was made. After all – if you look at the sample code above – why postpone optimization when compilation can produce more efficient bytecode? But considering also other not-so obvious cases, JVM is the best place to carry out the optimization:

  • JVM is equipped with runtime data besides static analysis. During runtime JVM can make better decisions based on what methods are executed most often, what loads are redundant, when is it safe to use copy propagation, etc.
  • JVM has got information about the underlying architecture – number of cores, heap size and configuration and can thus make the best selection based on this information.

But let us see those assumptions in practice. I have created a small test application which uses several different ways to add together 1024 integers.

  • A relatively reasonable one, where the implementation just iterates over the array containing 1024 integers and sums the result together. This implementation is available in InlineSummarizer.java.
  • Recursion based divide-and-conquer approach. I take the original 1024 – element array and recursively divide it into halves – the first recursion depth thus gives me two 512-element arrays, the second depth has four 256-element arrays and so forth. In order to sum together all the 1024 elements I introduce 1023 additional method invocations. This implementation is attached as RecursiveSummarizer.java.
  • Naive divide-and-conquer approach. This one also divides the original 1024-element array, but via calling additional instance methods on the separated halves – namely I nest sum512(), sum256(), sum128(), …, sum2() calls until I have summarized all the elements. As with recursion, I introduce 1023 additional method invocations in the source code.

And I have a test class to run all those samples. The first results are from unoptimized code:

Statistics with JIT

As seen from the above, the inlined code is the fastest. And the ones where we have introduced 1023 additional method invocations are slower by ~25,000ns. But this image has to be interpreted with a caveat – it is a snapshot from the runs where JIT has not yet fully optimized the code. In my mid-2010 MB Pro it took between 200 and 3000 runs depending on the implementation.

The more realistic results are below. I have ran all the summarizer implementations for more than 1,000,000 times and discarded the runs where JIT has not yet managed to perform it’s magic.

Statistics with JIT

We can see that even though inlined code still performed best, the iterative approach also flew at a decent speed. But recursion is notably different – when iterative approach close in with just 20% overhead, RecursiveSummarizer takes 340% of the time the inlined code needs to complete. Apparently this is something one should be aware of – when you use recursion, the JVM is helpless and cannot inline method calls. So be aware of this limitation when using recursion.

Recursion aside – method overheads are close to being non-existent. Just 205 ns difference between having 1023 additional method invocations in your source code. Remember, those were nanoseconds (10^-9 s) over there that we used for measurement. So thanks to JIT we can safely neglect most of the overhead introduced by method invocations.

The next time when your coworker is hiding his lousy design decisions behind the statement that popping through a call stack is not efficient, let him go through a small JIT crash course first. And if you wish to be well-equipped to block his future absurd statements, subscribe to either our RSS or Twitter feed and we are glad to provide you future case studies.

Full disclosure: the inspiration for the test case used in this article was triggered by Tomasz Nurkiewicz blog post.


ADD COMMENT

Comments

Thanks Nikita! Having worked on C for some time, even in JAVA code always chose not to call method in a for loop. Instead chose to move the for loop in the method, that way it’s only one method call.
The numbers captured above shows the difference, through negligible difference of 205 ns, isn’t there additional work that JVM has to do in accounted somewhere? What if all over logic written is making JVM do the work, will it not show effect on the JVMs numbers?

R

Thanks Nikita. That was very insightful. It’s very hard to find such useful information these days.

Seth @ FBT

Feel free to suggest more topics you would like to read about 🙂

iNikem

I think you have been doing a great work. I have been following your posts regilarly.

Seth @ FBT

Thanks for yet another useful post. Thanks also for linking to Tomasz’ post and the JIT crash course – more brain food there 😉

Marcel Stör

Glad that you enjoyed it! Music for our ears and forces us to share more and better insights.

Ivo Mägi

I noticed that to capture your statistics you used “System.nanoTime();” and ran the test multiple times then found the average run for each pass to report the time a sample took to execute. Is there *nothing* better than this out there? (I ask because this is how I do things but I find it hard to believe that nobody has a better technique yet.)

Shawn Hartsock

Unfortunately I have never seen anything better than averaging multiple runs. And in Java there is no much better source for elapsed time measurements than System.nanoTimes.

iNikem

Cool paper, thanks for the tip!

Vladimir u0160or

The problem with System.nanoTime() is that it is as accurate as the underlying hardware cpu clock which differs. You might think you’re working with “ns” measurements while the hardware clock’s step is something bigger by an order of magnitude (“ms”) and thus making them inaccurate.

Kristiyan Marinov

Yes, you are right. But this is still the best way available to us on commodity hardware. Or can you propose a better way?

iNikem

Despite its drawbacks there’s nothing better I’m aware of.

Marcel Stör

ty.. nnas far as i xperience, those lousy designs r hidden behind endless method calls and a bunch of AbstractFactoryBuilderDelegators.. 🙁

Anonymous

True. The naming pattern you refer is something that was extremely common in mid 2000’s. Nobody took you seriously if you did not have at least three interfaces to implement and a few abstractions to inherit some random behaviour or state from. Plus a few factories here and there. But in the recent years it seems to be less of a case. Or maybe I have just been lucky.

Ivo Mägi

Isn’t there a method size / complexity threshold for in-lining?

Kim

There definitely is. JIT can only inline the calls it is sure do not have any side effects. For complex and tightly coupled code it is usually not achievable. But – as the tests in our case show – even if you introduce more than a thousand unnecessary method calls then the overhead without JIT interfering is ~25,000 nanoseconds. Which is still something that most of the applications wont notice.

Ivo Mägi

Is it possible that when we have a JVM implementation that supports tail-call optimization we could make the recursive version comparable to others?nI have read somewhere that the IBM JVM implementation supports tail-call optimization 🙂

Maido Käära

This particular implementation is not tail-recursion. Last operation is “+”, not recursive call. But if JVM gets tail-recursion optimisation, that would be really cool 🙂

iNikem