To blog Previous post | Next post
What is a memory leak?
When we talk to people about our solution for discovering memory leaks we immediately get positive feedback. But when we add Java into the equation, the initial excitement is often complemented with questions: “Are there memory leaks in Java? Isn’t Java a garbage-collected language?”
In this post I will explain why memory leaks are in fact a common problem for Java applications.
Looking for an easy solution to a Java memory leak? Plumbr automatically detects the leak and tells you how to solve it.
What is a “Memory leak” in Java?
Let us start by outlining the difference between memory management in Java and, for example, C languages. When a C-programmer wants to use a variable, he has to manually allocate a region in the memory where the value will reside. After the application finishes using that value, the region of the memory must be manually freed, i.e. the code freeing the memory has to be written by the developer. In Java, when a developer wants to create and use a new object using, e.g. new Integer(5), he doesn’t have to allocate memory – this is being taken care of by the Java Virtual Machine (JVM). During the life of the application JVM periodically checks which objects in memory are still being used and which are not. Unused objects can be discarded and memory reclaimed and reused again. This process is called garbage collection and the corresponding piece of JVM is called a Garbage Collector or GC.
Java’s automatic memory management relies on GC which periodically looks for unused objects and removes them. And here hides the dragon. Simplifying a bit, we can say that a memory leak in Java is a situation where some objects are not used by the application any more, but GC fails to recognize them as unused. As a result, these objects remain in memory indefinitely, reducing the amount of memory available to the application.
Here I would like to stress one very important point: the notion of “object is not used by the application any more” is totally, absolutely, 100% application-specific! Apart from some specific cases, where the lifespan of the object can be logically determined (such as the local variable of the method, which does not under any circumstances escape the method), object usage can be understood only by the application developer taking into account all usage patterns of the application.
How can GC distinguish between the unused objects and the ones the application will use at some point in time in the future? The basic algorithm can be described as follows:
- There are some objects which are considered “important” by GC. These are called GC roots and are (almost) never discarded. They are, for example, currently executing method’s local variables and input parameters, application threads, references from native code and similar “global” objects.
- Any object referenced from those GC roots are assumed to be in use and not discarded. One object can reference another in different ways in Java, most commonly being when object A is stored in a field of object B. In that case, we say “B references A”
- The above process is repeated until all objected that can be transitively reached from GC roots are visited and marked as “in use”
- Everything else is unused and can be thrown away.
Now, it is fairly easy to construct a Java program that satisfies the above definition of a memory leak:
public class Calc {
private Map cache = new HashMap();
public int square(int i) {
int result = i * i;
cache.put(i, result);
return result;
}
public static void main(String[] args) throws Exception {
Calc calc = new Calc();
while (true)
System.out.println("Enter a number between 1 and 100");
int i = readUserInput(); //not shown
System.out.println("Answer " + calc.square(i));
}
}
}
This program reads one number at a time from its user and calculates its square value. This implementation uses a primitive “cache” for storing the results of the calculation. But since these results are never read from the cache, the code block represents a memory leak according to our definition above. If we let this program run and interact with users long enough, the “cached” results consume a lot of memory.
Did you know that 20% of Java applications have memory leaks? Don’t kill your application – instead find and fix leaks with Plumbr in minutes.
This brings us to another important aspect of memory leaks: how big should the leak be to justify the trouble of investigating and fixing it? Technically, whenever you leave an object that you don’t use anymore lying around, you create waste. Practically, a couple of kilobytes here and there don’t really constitute real problems for modern applications, especially the “enterprise” ones … But a leak is a leak, even if it’s just 200 bytes.
Which leads us to a simple corollary: a memory leak is like fine wine – it needs aging. If you want to demonstrate the leak or, more importantly, fix it, you really should let it grow. Tiny memory leaks are lost within all those objects that are present in an application at any given point of time. Regardless of the tool you use for identify memory leaks – be it a profiler, a memory dump analyzer, an APM, or a special-purpose leak finder tool like Plumbr – there should be a lot of objects that outlived their usefulness. Which means that your application should run for a significant period of time AND as many different parts of your application should be executed as possible. Otherwise you will be looking for a needle in a haystack.
If you would like to know more about Java memory leaks, especially about the different ways to hunt them down and fix them in your applications, check out our series of blog posts, titled “Solving OutOfMemoryError”. And stay tuned to our twitter @JavaPlumbr. Till next time!
Comments
Awesome explanation
you said: GC root objects are, for example, currently executing methodu2019s local variables and input parameters, application threads, references from native code and similar u201cglobalu201d objects.nSo If I have the following code snippet:nmethodA(){n List list = new linkedList();n}nyou mean the list wud never be discarded even after it goes out of scope?Could you pls clarify this point with some examples.
It will be discarded. Because when this local variable, “list”, goes out of the scope, it will not qualify as “local variable” anymore. It will be popped out of the execution stack and so GC can reclaim it.
A memory leak is when no pointers/references to an allocated block of memory exist. Secondly, in C and C++ you don’t need to allocate every variable on the heap. There is a stack, you know?nnThe most important thing to note is, that when you want caching in Java, you don’t use a direct reference to an object, that’s what is wrong with your example. Java has WeakReference, which tells the GC to keep the object alive, but to reclaim it when there is a lack of memory.
Errr, that’s not what a weak reference is. A weak reference tells the GC that you want to keep a reference to the object but that it should be collected as soon as all of the strong references to the object are removed.nnWhat you are describing is a soft reference, which is like a weak reference except that it also indicates that you would like the GC to be pessimistic about discarding the object and only do so if absolutely necessary.
You are right. We shouldn’t do such a simplistic “caching”. We should use Weak/SoftReferences, we should define expiration policy, we should restrict the size of your cache. But we also should write bug-free code 🙂
This is more an example of a space leak rather than a memory leak because you still have the reference calc pointing to the object in the heap and the object size is growing with every loop…
Yes, but that is a memory leak in Java world. Or can you give your definition and an example of memory leak, which is not “space leak”?
http://stackoverflow.com/questions/6470651/creating-a-memory-leak-with-java
Look below for definition of space leak. Memory leak is when you loose a pointer to the object in the heap which is not the case here but can happen in languages like C and C++nnhttp://encyclopedia2.thefreedictionary.com/space+leak
It seems to me that the difference between a memory leak and a space leak is entirely a notional difference that has to do with the specific boundaries that Java places on application code versus runtime code.nnFrom the outside looking in, it makes little difference whether or not there is a Java reference pointing to some memory, or if it is actually “unreachable,” a term that is only defined in terms of certain programming constructs. Even a classic malloc()/free() memory leak in a C program leaves memory that is “reachable” via libc memory allocator internal structures.nnSo a memory leak and a space leak exhibit the exact same behavior, and are only different when considered in terms of abstract implementation specifics. Calling unused reachable memory in Java a memory leak sounds totally reasonable to me.
The JRockit memory leak detector, which will eventually make it into HotSpot is an invaluable tool in situations like this. Here is a free chapter from the book “Oracle JRockit – the definitive guide” by Marcus Lagergren & Marcus Hirt that goes into detail about this powerful tool. The chapter also talks about memory leaks in greater depth and is recommended further reading if you liked this blog post.nnhttp://www.packtpub.com/sites/default/files/8068-chapter-10-the-memory-leak-detector.pdf