Splunk Completes Acquisition of Plumbr Learn more

To blog |

Why is your software aging?

August 15, 2013 by Nikita Salnikov-Tarnovski Filed under: Programming

I recently stumbled upon a term software aging. My first thoughts on the subject were not too positive, especially after reading the Wikipedia definition. Just Another Buzzword was the only thing resonating in my head. But after digging further into the concept I started thinking a bit differently. Even about our own product, which essentially is offering protection for the outcomes of software aging. So I thought some of the concepts are worth sharing with you.

But lets start by what Wikipedia had to say about the subject:

Software aging refers to progressive performance degradation or a sudden hang/crash of a software system due to exhaustion of operating system resources, fragmentation and accumulation of errors.

This definition is boring as hell. But I think you all remember the days when your freshly booted Windows was running just fine. But in just a few days it became so sluggish that the only solution was a reboot. And in a year or so you needed a clean install, because reboots did not help you any longer.

Rebooting and reinstalling Windows serves as a good example which I guess most of you can easily relate to. And maybe even agree with what David Lodge Parnas has said on the subject:

Programs, like people, get old. We can’t prevent aging, but we can understand its causes, take steps to limit its effects, temporarily reverse some of the damage it has caused, and prepare for the day when the software is no longer viable. We must lose our preoccupation with the first release and focus on the long term health of our products.

In this quote, mr. Parnas also implies that legacy applications are more fragile to aging, but regardless of the size of your code base, you are likely to suffer from different causes for software aging, such as:

  • Memory leaks (our current bread and butter)
  • Lock contention issues
  • Unreleased file handles
  • Memory/swap space bloat
  • Data corruption
  • Storage space fragmentation
  • Round off error accumulation

As the list is a bit too dry, I will try to enhance it by bringing examples from the Java world, demonstrating the relevance (or irrelevance) of the causes.

Memory leaks. This is our current bread and butter – each day I face tens of different situations where applications are suffering from the leaks. As a matter of fact, from our current data set of several thousand applications we see that roughly 50% of the applications do contain one. Following  sample illustrates the case.

The program reads one number at a time and calculates its square value. This implementation uses a primitive “cache” for storing the results of the calculation. But since these results are never read from the cache, the code block represents a memory leak. If we let this program run and interact with users long enough, the “cached” results consume a lot of memory. It serves as a good sample of the aging – this program could be used for days before the end users are affected.

public class Calc {
  Map cache = new HashMap();

  public int square(int i) {
     int result = i * i;
     cache.put(i, result);
     return result;
  }

  public static void main(String[] args) throws Exception {
     Calc calc = new Calc();
     while (true)
        System.out.println("Enter a number between 1 and 100");
        int i = readUserInput(); //not shown
        System.out.println("Answer " + calc.square(i));
     }
  }
}

Lock contention. You must all have been in the situation where the application behaves just fine for years and then after a small bump in load you start facing situations where the threads start waiting behind synchronized blocks and are either starved or completely locked out.

The following sample serves as a textbook illustration to the case. The code will work just fine until you launch two threads which attempt to run transfer(a,b) and transfer(b,a) at the same time resulting in a deadlock. And again, you could be happily running the code for months or years before a situation like this escalates to locked threads.

class Account {
 double balance;
 int id;

 void withdraw(double amount){
    balance -= amount;
 }

 void deposit(double amount){
    balance += amount;
 }

  void transfer(Account from, Account to, double amount){
       sync(from);
       sync(to);
          from.withdraw(amount);
          to.deposit(amount);
       release(to);
       release(from);
   }
}

Unreleased file handles. Again, I am sure you have been cursing when looking at something similar to the following where a fellow developer has forgotten to close the resources after loading. The code might have been running happily for months, before the java.io.IOException: Too many open files message is thrown which again serves as a good case demonstrating the aging problem.

Properties p = new Properties();
try {
   p.load(new FileInputStream(“my.properties”));
} catch (Exception ex) {}
finally {
  //no, i will NOT close the stream
 }

Memory/swap space bloat. Modern OS tend to quickly page out memory that has not been touched for a while. So you might run into problems when you run out of the physical memory and the OS starts swapping your heap. Things get from bad to worse due to the garbage collection – Full GC requires the JVM to walk the object graph to identify every reachable object to detect garbage. While doing so, it will touch every page in the application heap, triggering pages to be swapped in and out from the memory.

Luckily, the effects are reduced in modern JVMs for several reasons, for example:

  • Most objects will never escape from young generation which are close to guaranteed to be resident in memory
  • Objects moved out of the young generations tend to be accessed frequently, which again tends to keep them in resident memory.

So you might have escaped this one, but I have seen the GC cycles extended from few hundred milliseconds to tens of seconds due to the extensive swapping. So we again have a case where a perfectly nicely behaving application with lazily loaded caches turns into a usability nightmare after a while due to the memory bloat.

Considering the samples above – I think you might agree with me that the software is indeed aging like the humans do. And I am extremely glad that we have stepped in to the rescue. So far, only to cure memory leaks, but I can hint that in our labs we have a lot of interesting things brewing. In order to stay tuned on the news, subscribe to either our RSS feed or follow us in Twitter.

 

ADD COMMENT

Comments

Lock contention is not related to aging really, it’s about conditions that change that may cause dead locks, e.g., adding more consumers to the queue.

Also example for memory leak is quite bad. Sorry for saying this but it actually is, you dealt with hundreds (or maybe thousands) of memory leaks and you chosen as an example piece of code which simply fills up the map over time however on practice people using proper caches with eviction policy and size/number limits, I would really consider different example.

Memory/swap space bloat is spot on. One may assume it’s related to memory leaks. Well, it might but that’s different thing. As an example, you can take exporting huge .csv file from database. JVM will not crash, instead there will be not enough free memory, OS will start swapping and CPU usage will get to 100% very quick and then will never complete the export (within meaningful time scales).

Roman Uruskyy

Hello, Roman. Thanks your comments.

In my current view, software aging is a slow degrading of the application’s runtime characteristics due to ongoing modifications and updating. As such, any change in algorithm or business logic can lead to different problems, lock contention including. Adding one more business rule can increase method running time just enough to expose your contention and cause a domino effect.

The memory leak example here is of course very trivial. One reason is the constraints of one blog post. It is hard to present and explain the real memory leak and still not to bore your readers to death 🙂 But on the other hand, the majority of memory leaks are very simple down below: some data structure growing unexpectedly large. How much ceremony is there around it – this is just “an implementation detail” 🙂

Nikita