Why does my Java process consume more memory than Xmx?
Some of you have been there. You have added -Xmx option to your startup scripts and sat back relaxed knowing that there is no way your Java process is going to eat up more memory than your fine-tuned option had permitted. And then you were in for a nasty surprise. Either by yourself when checking a process table in your development / test box or if things got really bad then by operations who calls you in the middle of the night telling that the 4G memory you had asked for the production is exhausted. And that the application just died.
So what the heck is happening under the hood? Why is the process consuming more memory than you allocated? Is it a bug or something completely normal? Bear with me and I will guide you through what is happening.
First of all, part of it can definitely be a malicious native code leaking memory. But in 99% of the cases it is completely normal behaviour by the JVM. What you have specified via the -Xmx switches is limiting the memory consumed by your application heap.
Besides heap there are other regions in memory which your application is using under the hood – namely permgen and stack sizes. So in order to limit those you should also specify the -XX:MaxPermSize and -Xss options respectively. In short, you can predict your application memory usage with the following formula
Max memory = [-Xmx] + [-XX:MaxPermSize] + number_of_threads * [-Xss]
But besides the memory consumed by your application, the JVM itself also needs some elbow room. The need for it derives from several different reasons:
Garbage collection. As you might recall, Java is a garbage-collected language. In order for the garbage collector to know which objects are eligible for collection, it needs to keep track of the object graphs. So this is one part of the memory lost to this internal bookkeeping. G1 is especially known for its excessive appetite for additional memory, so be aware of this.
JIT optimization. Java Virtual Machine optimizes the code during runtime. Again, to know which parts to optimize it needs to keep track of the execution of certain code parts. So again, you are going to lose memory.
Off-heap allocations. If you happen to use off-heap memory, for example while using direct or mapped ByteBuffers yourself or via some clever 3rd party API then voila – you are extending your heap to something you actually cannot control via JVM configuration.
JNI code. When you are using native code, for example in the format of Type 2 database drivers, then again you are loading code in the native memory.
Metaspace. If you are an early adopter of Java 8, you are using metaspace instead of the good old permgen to store class declarations. This is unlimited and in a native part of the JVM.
You can end up using memory for other reasons than listed above as well, but I hope I managed to convince you that there is a significant amount of memory eaten up by the JVM internals. But is there a way to predict how much memory is actually going to be needed? Or at least understand where it disappears in order to optimize?
As we have found out via painful experience – it is not possible to predict it with reasonable precision. The JVM overhead can range from anything between just a few percentages to several hundred %. Your best friend is again the good old trial and error. So you need to run your application with loads similar to production environment and measure it.
Did you know that 20% of Java applications have memory leaks? Don’t kill your application – instead find and fix leaks with Plumbr in minutes.
Measuring the additional overhead is trivial – just monitor the process with the OS built-in tools (top on Linux, Activity Monitor on OS X, Task Manager on Windows) to find out the real memory consumption. Subtract the heap and permgen sizes from the real consumption and you see the overhead posed.
Now if you need to reduce to overhead you would like to understand where it actually disappears. We have found vmmap on Mac OS X and pmap on Linux to be a truly helpful tools in this case. We have not used the vmmap port to Windows by ourselves, but it seems there is a tool for Windows fanboys as well.
The following example illustrates this situation. I have launched my Jetty with the following startup parameters:
-Xmx168m -Xms168m -XX:PermSize=32m -XX:MaxPermSize=32m -Xss1m
Knowing that I have 30 threads launched in my application I might expect that my memory usage does not exceed 230M no matter what. But now when I look at the Activity Monitor on my Mac OS X, I see something different
The real memory usage has exceeded 320M. Now digging under the hood how the process with the help of the vmmap <pid> output we start to understand where the memory is disappearing. Lets go through some samples:
The following says we have lost close to 2MB that is lost to memory-mapped rt.jar library.
mapped file 00000001178b9000-0000000117a88000 [ 1852K] r--/r-x SM=ALI /Library/Java/JavaVirtualMachines/jdk1.7.0_21.jdk/Contents/Home/jre/lib/rt.jar
Next section explains that we are using ~6MB for a particular Dynamic Library that we have loaded
__TEXT 0000000104573000-0000000104c00000 [ 6708K] r-x/rwx SM=COW /Library/Java/JavaVirtualMachines/jdk1.7.0_21.jdk/Contents/Home/jre/lib/server/libjvm.dylib
And here we have threads no 25-30 each allocating 1MB for their stacks and stack guards
Stack 000000011a5f1000-000000011a6f0000 [ 1020K] rw-/rwx SM=ZER thread 25 Stack 000000011aa8c000-000000011ab8b000 [ 1020K] rw-/rwx SM=ZER thread 27 Stack 000000011ab8f000-000000011ac8e000 [ 1020K] rw-/rwx SM=ZER thread 28 Stack 000000011ac92000-000000011ad91000 [ 1020K] rw-/rwx SM=ZER thread 29 Stack 000000011af0f000-000000011b00e000 [ 1020K] rw-/rwx SM=ZER thread 30
STACK GUARD 000000011a5ed000-000000011a5ee000 [ 4K] ---/rwx SM=NUL stack guard for thread 25 STACK GUARD 000000011aa88000-000000011aa89000 [ 4K] ---/rwx SM=NUL stack guard for thread 27 STACK GUARD 000000011ab8b000-000000011ab8c000 [ 4K] ---/rwx SM=NUL stack guard for thread 28 STACK GUARD 000000011ac8e000-000000011ac8f000 [ 4K] ---/rwx SM=NUL stack guard for thread 29 STACK GUARD 000000011af0b000-000000011af0c000 [ 4K] ---/rwx SM=NUL stack guard for thread 30
I hope I managed to shed some light on the tricky task of predicting and measuring the actual memory consumption. If you enjoyed the content – subscribe to our RSS feed or start following us on Twitter to be notified on future posts of interest.