To blog Previous post | Next post
Alerting When Applications Are Slow
We have all had slow and frustrating experiences when interacting with applications! Whether they are websites that are slow to load or applications that take forever to respond, one thing is assured – all users dislike slow experiences. Several studies and stories show how faster applications and websites, that offer better user experiences, are more efficient in achieving desired outcomes.

Plumbr’s core mission is to help make your software faster. Faster not just when testing in controlled environments, but in production where actual users are affected. The steps towards building faster software is to have awareness, creating appropriate baselines, and enabling alerts that will keep you in control of degradation.
Today, we’re happy to announce an important milestone in our ability to aid this journey for thousands of engineering teams around the world! We are introducing the ability to alert when application performance degrades below a threshold. Poorly performing applications mean slower response times. Plumbr allows you to configure alerts that will tell you when applications respond too slowly for your users.

Don’t all applications (or parts of them) always respond slowly for some users? Yes. This is an unfortunate truth that engineering teams have to come to terms with. Depending on the OS, browser version, location, device, and a demographic slice of the client that is exhibiting slow behaviour or volume of users, you should be able to decide if and when to invest in making the application faster.
Plumbr alerts users when median latency increases above a threshold time set for the application. Why we chose median latency is because it is a good measure to find out if a large volume of users is being affected because of poor speeds. Median latency is a clear indication that a majority of your users are seeing a degradation in user experience.
An example alerting threshold is – “If the median latency on the interactions of the Invoice Generator in the last 30 minutes exceeds 3000 milliseconds, provided there have been 20 interactions”. Let’s try to deconstruct this alert threshold to fully understand and appreciate what Plumbr is doing. First, we anchor two data points as appropriate frames of reference (a) 30-minute time windows (b) 20 interactions. This is to provide the most noise-free signal. The 30-minute time window is a rolling one with a 1-minute granularity, where the trigger checks the condition for the preceding 30-minutes. The 20-interaction minimum is enforced so that spurious signals are avoided. This prevents false positives caused during periods of low application usage. The key parameter here is 3000 milliseconds. This is the threshold for the acceptable level of delay for the application. The idea is to send out an alert if the median latency of the application exceeds this value.

You can configure an alerting channel that will work with the alerts. Channels are conduits through which Plumbr data will be posted to your incident management workflows. You can set up several channels such as Slack, PagerDuty, and Jira. You can also use simple emails as a channel.
Ivo Magi, our CEO and Chief of Product adds – “Early awareness on performance and availability is the key to professional incident management. I am glad that integrating performance alerts to the existing workflows just became easier for Plumbr customers. When using Plumbr, you can now choose the APM or RUM performance metrics for signal and send the alert to the channel of your choice. Be it PagerDuty or Slack or good-old-email, we got you covered. We have tested the approach carefully to make sure the calibration is easy and when adopting the performance alerts you will not be suffering from false positives or negatives. Stream the performance alerts from your production services to the channel your IT operations are using and gain confidence that you are always notified on performance incidents!”