To blog Previous post | Next post
Native memory leak example
We have written quite a lot about memory leaks in Java. The pattern confirming to you the presence of a memory leak is the growth of used heap memory after Major GC events. The major GCs constantly free less and less memory exposing a clear growth trend.
There is however a different type of a memory leak affecting Java deployments out there. This leak would happen in native memory and you would notice no clear trend when monitoring different memory pools within the JVM. The symptoms would include a perfectly healthy chart in regards of heap & permgen consumption as seen below, coupled with the continuous increase of the total memory used by the Java process on the operating system level:
Example
As I recently stumbled upon a problem where native memory leakage proved to be a problem, I decided to open up the details giving you an example how such leaks can actually happen in the real world. I was able to reduce the example to a simple enough code just loading and transforming classes:
public static void main(String[] args) throws InterruptedException {
final BottomlessClassLoader loader = new BottomlessClassLoader();
while (true) {
loader.loadAnotherClass();
Thread.sleep(100);
}
}
So that is all there is – an unterminated loop, just loading classes using the BottomlessClassLoader class loadAnotherClass() method.
Now let us launch this code in two different ways:
- First launch is just generating classes and keeping the references, essentially just piling up class definitions in memory.
- Second launch is attaching a javaagent and is a tad bit more complex, generating classes similar to the first launch and transforming the bytecode using the agent’s premain method:
public class BloatedAgent {
public static void premain(String agentArgs, Instrumentation inst) {
inst.addTransformer((loader, name, clazz, pd, originalBytes) -> originalBytes, true);
}
}
The transformation is special in regards that it actually does not apply any transformations, returning the original bytes of the class unchanged.
As the next step, the memory usage from both launches was monitored from the OS. In both launches, memory usage of the Java process was captured at certain intervals, using
$ top -R -l 0 -stats mem,time -pid <pid>
command, resulting in the data exposed via following chart:
Understanding the problem
What we see from above is that the second launch is consuming a lot more memory. This is surprising. If you recall, the transformation itself does not actually transform the class, returning the original bytecode. So one might expect the memory consumption for both of the launches would be identical.
First part in understanding the problem starts to make sense when you think about the class definition storage. After all, shouldn’t the class definitions reside in the permgen/metaspace and would monitoring the permgen also be sufficient to detect this particular issue?
Apparently not. Whenever we return non-null value from the transform method, the JVM assumes that the class was modified in some way. Additionally, when we set the canRetransform parameter (in the second param after the lambda) in Agent’s premain method to true, the JVM expects that at some point you will attempt to retransform the class applying a different transformation. As a result, the original non-transformed bytecode is kept by the JVM “just in case”.
This approach, weird at first point, starts to make sense when thinking about classloaders where loading is an expensive operation, say, some network class loader. You would not want to go to the trouble of fetching the very same bytes once again. Therefore, the JVM caches the original bytecode of the class. It does not store it into the metaspace or permgen, but rather into its own native memory. As a result of this, you would not experience any growth in either heap and permgen/metaspace growth but only would notice the problem when monitoring native memory consumption.
The second part of the answer is hidden in the java.lang.instrument.ClassFileTransformer Javadoc, where for the method transform() it is clearly stated that in cases where the transformation is not actually applied, the transform() method should return null. In this case the JVM implementation is aware of the fact that the class was not actually transformed and there is no need to store additional copy of the bytecode in native memory.
So the fix to the issue was as easy as making the transformation to return null instead of the original behavior where the bytecode itself was returned. But was it easy to troubleshoot the issue? No way, this includes three days from my life which I will never get back. I can only hope that sharing this knowledge will end up saving someone from going through the same mess in the future.
Comments
Hi Gleb,
Thanks for sharing this post. It was very helpful.
We have came across “Native Memory Leak” recently in our production servers. I looked at the premain and agentmain classes and methods, there is nothing over there. We didn’t change anything and it is by default with Oracle weblogic installation.
Could you please tell me what else could be the root cause of this issue?
Below is our javaagent code:
package weblogic.diagnostics.debugpatch.agent;
import java.lang.instrument.ClassDefinition;
import java.lang.instrument.Instrumentation;
import java.lang.instrument.UnmodifiableClassException;
import weblogic.diagnostics.debug.DebugLogger;
import weblogic.diagnostics.utils.SecurityHelper;
public class DebugPatchAgent
{
private static final DebugLogger DEBUG_LOGGER = DebugLogger.getDebugLogger(“DebugDebugPatches”);
private static Instrumentation singleton;
public DebugPatchAgent() {}
public static void premain(String agentArguments, Instrumentation instrumentation) {
singleton = instrumentation;
}
public static void agentmain(String args, Instrumentation inst) {
premain(args, inst);
}
public static boolean isRedefineClassesSupported()
{
return singleton != null ? singleton.isRedefineClassesSupported() : false;
}
public static void redefineClasses(ClassDefinition[] classDefs)
throws ClassNotFoundException, UnmodifiableClassException, IllegalAccessException
{
if (!isRedefineClassesSupported()) {
if (DEBUG_LOGGER.isDebugEnabled())
{
DEBUG_LOGGER.debug(“DebugPatchAgent Class redefinition is not supported”);
}
return;
}
singleton.redefineClasses(classDefs);
}
}
Below is our Manifest file details:
Manifest-Version: 1.0
Premain-Class: weblogic.diagnostics.debugpatch.agent.DebugPatchAgent
Agent-Class: weblogic.diagnostics.debugpatch.agent.DebugPatchAgent
Can-Redefine-Classes: true
Implementation-Title: debugpatch-agent
Implementation-Version: 12.2.1.2
Implementation-Vendor: Oracle, Inc.
Hi Saish. What makes you believe that the production servers are having a “Native Memory Leak”? In any case, I would recommend starting the troubleshooting with enabling native memory tracking for the affected JVM, see https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html for details.
Thank Gleb for the useful article. Wondering if this issue affects significantly across java deployments these days.
Thank you Gleb. I am wondering where in native memory the JVM caches the original bytecode of the class. And if the additional copy of the bytecode is stored where would it be.
Hi Vitaly, thanks for your response. I’m not sure I fully understand by what you mean by “where”, though. But you might want to take a look at the _cached_class_file in instanceKlass [1]. It’s set from within jvmtiRedefineClasses [2].
[1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/7eab471aeaf0/src/share/vm/oops/instanceKlass.hpp#l751
[2] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/7eab471aeaf0/src/share/vm/prims/jvmtiRedefineClasses.cpp#l3997
Hi Gleb,
Thank you again.
Since JVMTI should support multiple independent and simultaneous agents, does the usage of two agents increase the probability of the native memory leak as described above.
Will XX:NativeMemoryTracking=detail on the command line help?
Hi Vitaly,
Yes, I do believe that having multiple agents attached results in a greater probability of such a leak manifesting itself.
Yes, native memory tracking will help, just check the `Internal` section in the output.