Before we can start covering Garbage Collection in its modern form, let’s do a quick recap of days where you had to manually and explicitly allocate and free memory for your data. And if you ever forgot to free it, you would not be able to reuse the memory. The memory would be claimed but not used. Such a scenario is called a memory leak.
Here is a simple example written in C using manual memory management:
int send_request() {
size_t n = read_size();
int *elements = malloc(n * sizeof(int));
if(read_elements(n, elements) < n) {
// elements not freed!
return -1;
}
// …
free(elements)
return 0;
}
As we can see, it is fairly easy to forget to free memory. Memory leaks used to be a lot more common problem than today. You could only really fight them by fixing your code. Thus, a much better approach would be to automate the reclamation of unused memory, eliminating the possibility of human error altogether. Such automation is called Garbage Collection (or GC for short).
Smart Pointers
One of the first ways to automate this was by using destructors. For instance, we could do the same thing in C++ using vector, the destructor of which will be automatically called when it’s no longer in scope:
int send_request() {
size_t n = read_size();
vector<int> elements = vector<int>(n);
if(read_elements(elements.size(), &elements[0]) < n) {
return -1;
}
return 0;
}
But in more complex cases, especially when sharing objects across multiple threads, just the destructor will not be sufficient. In comes the simplest form of garbage collection: reference counting. For each object, you simply know how many times it is referred to and when that count reaches zero the object can be safely reclaimed. A well-known example of that would be the shared pointers of C++:
int send_request() {
size_t n = read_size();
auto elements = make_shared<vector<int>>();
// read elements
store_in_cache(elements);
// process elements further
return 0;
}
Now, to avoid reading the elements next time the function is called, we may want to cache them. In such a case, destroying the vector when it’s out of scope is not an option. Therefore, we make use of shared_ptr. It keeps track of the number of references to it. This number increases as you pass it around and decreases as it leaves scope. As soon as the number of references reaches zero, the shared_ptr automatically deletes the underlying vector.