Announcement: PagerDuty-Plumbr Integration
Plumbr is a monitoring solution designed to be used to manage incidents that arise in complex IT environments. Using Plumbr, I&O leaders can govern application quality in production by combining data from real user monitoring (RUM) and application performance monitoring (APM). Plumbr unifies the data from infrastructure, applications, and clients to provide complete transparency into the experience of a user. Plumbr puts engineering-driven organizations firmly on the path to providing a faster and more reliable digital experience for their users.
PagerDuty, Inc. (NYSE: PD) is a leader in digital operations management. PagerDuty empowers organizations of all sizes with real-time and data-driven insights to drive better business results. DevOps, ITOps, and SecOps teams use PagerDuty’s award-winning platform for real-time operations to improve operations, deliver exceptional customer experiences, and accelerate innovation. Today, over 11,000 organizations across all industries have deployed PagerDuty.
We’re happy to announce an integration with PagerDuty from within Plumbr. The integration is expected to help connect data collected by Plumbr seamlessly into workflows configured within PagerDuty. Degradations in user experience are flagged by Plumbr and are then handed off as incidents to PagerDuty, which helps complete the incident management lifecycle.
The primary drivers of value that Plumbr provides are:
- The ability to quantify degradation in user experience.
- Root-cause analysis that pinpoints the source of the incident.
- Attaching relevant traces and metrics to complete context.
The integration with PagerDuty helps engineering teams receive alerts about two kinds of degradation:
- Availability issues
- Latency issues
For availability issues, alert policies are created in Plumbr by defining a threshold of ‘rolling error rate’. For example, you can specify “Send an alert to PagerDuty when 3 out of the last 100 transactions fail”. This will help trigger an alert on PagerDuty. Responders, who are responsible for a web application, will then be notified every time the above criterion is met. The complete incident lifecycle is then managed via PagerDuty. By combining incident information with impact, Plumbr helps prioritize work, communicate failure, and improve overall digital user experience.
Latency issues are an important segment of degradations in user experience. When web applications become too slow for users, your engineering teams can receive alerts as well. For example, you can set an alert as follows: “If the median latency on the interactions of the Sales Reporting module in the last 30 minutes exceeds 3000 milliseconds, provided there have been 20 interactions, send out an alert via PagerDuty”. This will cause responders to get an alert when there is a non-trivial amount of usage and an actual instance of poor experience for users.
This form of quantified and meaningful alerting adds tremendous value to how engineering teams work with incidents. Thanks to the sophisticated monitoring data gathered by Plumbr and the incident management workflows enabled with PagerDuty, engineers can experience truly mature digital operations management.