Splunk Completes Acquisition of Plumbr Learn more

To blog |

Performance improvements with Plumbr

May 29, 2019 by Ivo Mägi Filed under: Blog Plumbr

Deciding to work on web application performance is a decision about tomorrow. Web application performance is a living, breathing exercise that requires engineering commitment and executive buy-in. This is because you need to fix what’s already broken, make sure it does not happen again, and then work on improving new issues with the biggest impact – to get right completely.

Several engineering teams wear their performance culture on their sleeve. The Wikimedia Foundation, Uber, and Netflix are the ones that come to mind immediately. They all use data from their application monitoring and/or real user monitoring setup to optimize the behaviour of their applications.

Improving web application performance is a shared responsibility from executives to engineers, and all those in between. It is part technical, and part cultural. Everyone needs to buy-in to better performance and play their part in helping shape the way for the company. Developer education is the first step.

By educating you about the numeric basis for performance, and tying this with what you see in Plumbr, our goal is to equip you with the techniques you need to understand the need for performance improvements and actually carry them out.

What is software performance?

There always seems to be some interactions in web applications that are slow and take too long. This experience might be inconsistent. The speed at which an application responds to its users is what defines performance. Performance often becomes important to engineers only in hindsight. It does not (yet) have the importance that quality or security enjoy.  

With Plumbr, you can find out about these slow interactions before your users complain. Plumbr also allows you to characterize these slow interactions, and decide on how to act upon them.

Percentiles 101

To understand performance well, we need to express them in a numeric form. Percentiles are the chosen units. This is because percentiles have two key characteristics:

1. They allow you to express a value numerically

2. They are built upon a comparative foundation

So, percentiles allow us to communicate best about performance. They immediately provide a comparative measure against all users. The numeric basis allows us to fix targets and measure improvements objectively.

Here is an example of the usage of percentiles. Assume the following distribution of test scores in a class of 100 students:

With this insight, we can say the following:

  1. 99 out of 100 children scored 90 or below.
  2. 50 out of 100 children scored 60 or below.

This translates to:

  1. The 99th percentile of this class is 90.
  2. The 50th percentile of this class is 60.

Other equivalent interpretations:

  1. 99% of the whole class scored below 90.
  2. 50% of the whole class scored below 50.

Let’s tabulate this for a better understanding:

Let us now take a look at the information Plumbr exposes and see how your team can take advantage of this information. First, let’s take a look at the Performance summary card.

By collecting data about the interactions of your users Plumbr exposes the duration users spend waiting for the application to load. This summary shows you:

  1. How many unique users used your application
  2. How many sessions were spawned
  3. How many interactions did all these users perform

within the selected timeframe.

This summary card is read as follows:

  • A total of 34,243 users interacted with the application
  • In total, their usage spawned 41,781 sessions
  • The total duration that these sessions lasted was 1,337 hours
  • The number of interactions by these users was 53,812
  • The amount of time spent idle by the user, waiting for the web application to respond was over 90 hours.
  • This constitutes about 7% of the total usage time of the application

This will allow you to judge if 7% of the time spent by your users waiting for the application to respond, is an acceptable level of performance. That is it will help you answer the question – “Is your application fast enough?”

Up next, let’s look at the percentiles tab on the same card. Here you have 5 data points about the application. These are interpreted as follows:

  • The application responds in under 2,587ms for 50% of all interactions
  • One in every 10 interactions seems to take 8.5s or more
  • One interaction out of 1,000 takes 8 minutes and 20 seconds or more to complete.

This information should give you exposure to how long your application is taking to respond to your users.

When combined, the two data points tell you that:

  • The application took over 41 seconds to respond to roughly 530 interactions.
  • >27,000 interactions have a response time under 3 seconds.

This contextualized information should help you decide where you want to invest in performance improvements. Notice that Plumbr nor any other monitoring solution will not be able to decide whether or not performance improvements are a priority. The information we expose and contextualize helps you to make this decision.  

This provides the first impression of the performance characteristics of your web applications. Once you gain this exposure, you will be able to measure, track, and have a comparative basis upon which to measure any investments you would make into improving performance of your applications.

First steps in performance management

We recommend to all our new customers to start with a simple exercise called “let’s pick the low-hanging fruit”. This exercise builds on two sources of information

  • Plumbr exposing the current performance of the system along with the bottlenecks impacting users the most. As a result, we can say that Plumbr has identified the biggest and juiciest fruits to pick. No more guessing, you can be sure that these issues are the ones annoying the users the most!
  • Your engineering estimating the time it takes to patch a particular issue among the most impactful bottlenecks. As a result, you have an understanding of how high a particular fruit is hanging. So besides just knowing that the particular bottlenecks are impacting your users the most, you also know how many hours your engineering team needs to alleviate this bottleneck!

The outcome of this exercise is a decision on which bottlenecks to optimize. This decision is based on objective evidence and you can be certain that the investment to performance is done based on facts and not on rumors. This is especially relevant, considering that 90% of the applications we monitor are impacted by more than 150 different bottlenecks. As it never makes sense to deal with all of them, it is really important to spend your precious engineering hours in dealing with bottlenecks that are both high in impact and low in cost to fix.  

In our experience this exercise results in about 30% reduction in the time your users spend waiting. The next step in our recommended course of action is to make sure you stay in control and are aware of situations where the performance drops. For this, the alerts based on median and 99.9th percentile are the way to go.

Tying performance to business objectives

The ultimate benefit of performance improvements are realized when you are able to tie them in to business metrics.  To be able to tie in the business metrics that matter to your company to these improvements in performance can be very useful. From helping teams align behind a shared goal, to helping report the progress and success of an engineering team, this correlation can aid you in important ways.

In a recent engagement with a media company, using Plumbr, we were able to prove that a 20% increase in software performance can help them gain up to 6% in engagement.

Here are some other stories about how performance improvements helped improve business goals:

Future-proofing

Perhaps the final step in making performance improvements is in making sure that there is no relapse. Any improvement to metrics should be coupled with a means to make engineers aware of degradations in future. If you have taken the efforts to improve a particular metric, you should have the ability to configure an alert when the performance slips below the new threshold.

Here’s a more detailed post about how alerting can be achieved using Plumbr.

In the journey towards making better performance a reality for your organization, our Customer Success team is committed to travel along with you. We work with our customers in many ways in helping identify performance issues, review data with your teams, help you extract insights, and provide recommendations and best practices on what you can do to improve the performance of your applications. To avail these benefits, please write to csm@plumbr.io.

A closing remark – Simply buying a treadmill isn’t going to make you fit. Sweating it out is what matters.

ADD COMMENT