To blog Previous post | Next post
Most frequent performance bottlenecks
We have recently passed the “half a billion root causes for poor user experience discovered” milestone. To celebrate this, we decided it is time to share the data we have gathered while detecting the root causes for the poorly performing applications.
To understand the data set exposed, you should have some understanding about what we do. Plumbr is keeping an eye on all end user interactions with Java applications. Whenever such an interaction is either too slow or fails altogether, Plumbr exposes the exact root cause in source code responsible for the problem. Examples of such root causes include slow database queries, synchronization issues or blocking file system accesses.
The dataset we analyzed was extracted from the root causes detected in 1,020 different environments Plumbr was monitoring during May – August 2016.
First exposure of the dataset lists the number of times a particular root cause was the reason why end user either faced performance or availability issues:
From the above it is for example visible that
- Web Service access over HTTP calls was the source for poor performance in 26.5% of the root causes analyzed
- Synchronization issues resulting in long locks were the second most popular culprit, responsible for ~15% of the root causes
- Slow JDBC operations ranked third, just barely behind the locking issues.
As this representation of the data is biased towards larger deployments where Plumbr was monitoring clustered applications, let’s look at a different representation of the data. The chart below answers the question “In how many unique accounts was this particular root cause impacting end user experience at least once”
From the above, we can see for example that:
- Too long GC pauses were impacting end users in more than 65% of the accounts
- Locking issues in poorly designed synchronization blocks were detected in around 60% of the accounts
- Streaming operations using file system were detected as root causes in 11% of the accounts
- Lucene indexes were either infrequently used or rather well built, being the source for performance issues in under 2% of the accounts.
I hope you got exposure to a rather interesting view of the different ways the Java-based applications are failing to meet the performance or availability requirements. If you wish to see how your own application performs and what root causes are impacting your users, just go ahead and grab a fully functional & free Plumbr trial to find it out.
For those who want to understand the data exposed above in more details, check out all the root causes Plumbr detects, to understand what the different columns in the charts above are all about.