Plumbr Agents monitor JVM memory usage in three different aspects
- Pre-emptive monitoring to detect memory leaks in heap memory before such leaks escalate to become a threat to JVM availability
- Post-mortem capturing of memory dumps in case an OutOfMemoryError occurs (and no leaks have detected prior to this)
- Permgen/Metaspace monitoring during application redeploy events to spot leaking classloaders
Memory leak detection
Memory leak detection monitors all object creation and collection events in order to detect patterns indicating a certain data structure growth being triggered by a memory leak. When such a data structure is detected, Plumbr generates an incident containing the information about:
- The size of the leak (in MB) and the speed at which the leak is growing (in MB/h)
- Which objects are leaking
- What is currently referencing the leaked objects blocking them from being GC’d
- The line in source code where the leaking objects were created
This information allows you to zoom in to the underlying root cause in the source code and save you from tedious “trying to reproduce” – “gathering evidence” – “wondering around in codebase to link evidence to root cause” troubleshooting cycle.
OutOfMemoryError detection & analysis
When memory leak detection does not spot any abnormal data structure growth that would look like a memory leak, the second line of defense is set to capture all OutOfMemoryError events and analyze the contents of memory when such an event occurs. When the event is captured, Plumbr Agent’s native code captures the snapshot of statistics from the JVM memory and sends it to the Plumbr Server to be analyzed. When the analysis completes, an incident is created containing the following information:
- What are the “fattest” data structures currently in memory (measured in MB)
- What is currently referencing such data structures, blocking them from being GC’d
- What these “fat” data structures consist of
- Where these data structures were created
Having this information allows you to quickly understand the most likely reason for the OutOfMemoryError being triggered. On vast majority of cases, the culprit is staring right at you in one of the top three memory consumers.
The information might look somewhat similar to the dominator tree one could acquire via heap dumps, but at closer look you will see that Plumbr exposes a lot more information than one could capture via heap dumps (such as the allocation points and the full reference chain). In addition the relevant information is presented in a lot more user-friendly way, saving you from days trying to figure out why some byte arrays seem to occupy most of the heap inside your heap dumps.
At the moment there are some limitations to this functionality:
- OutOfMemoryError detection is supported only for the users of Plumbr’s SaaS service, On-Premises Plumbr users unfortunately cannot yet benefit from it
- The feature is enabled only with heap sizes up to 8GB (included). Work to support larger heap sizes is in progress.
- The feature is triggered only on “java.lang.OutOfMemoryError: heap space” and “java.lang.OutOfMemoryError: GC overhead limit exceeded” errors. Other OutOfMemoryErrors do not yet trigger this analysis.
Redeploy-time analysis for classloader leaks
Plumbr Agent monitors Permgen/Metaspace usage during redeploy events. These events are notoriously famous for leaking memory by being unable to unload entire classloaders from Permgen/Metaspace regions. When such an event is detected, an incident is created, informing you about
- The size of the leaked classloader
- Name of the leaked classloader
- Reference chain blocking this classloader from being unloaded
- Solution guidelines how to fix the leak. In 65% of the cases, we also expose a known workaround or a reference to a patch fixing the issue. This is possible due to a fact that more often than not, the classloader leak is triggered by a 3rd party library. In case the library is widely used and we see such a leak happen often, we create solution guidelines specific to the leak and make it accessible for Plumbr users.
Having this information at your fingertips allows you to control your Permgen limits or avoid the uncontrollable growth of Metaspace, both adding unnecessary burden to your application’s capacity requirements.