Choices While Designing Plumbr
Our post today is about three ongoing discussions in the technology world. There have been similar contrasts in numerous areas of computer science over the years. The three we are discussing today are of immense impact to engineers and engineering teams, especially if they’re invested in performance and/or monitoring.
#1 – Correlation and Causation
Except for poor alliteration, these two terms have nothing in common. The classical debate of “correlation does not mean causation” spans many significant scientific areas. They share an important relationship in formal theories and have been the focus of many longstanding debates.
One is quantitative, while the other is qualitative. If data sets indexed over time are found congruent, it leads people to assume that a mutual relation exists, and so many are led to the fallacy thinking that this correlation may have causation. There are three specific cases which lead to this conclusion – (1) Codependence (2) Deductive oversight (3) Coincidence
It is important for engineering teams to be able to differentiate the two and structure their troubleshooting efforts accordingly.
#2 – Monitoring and Observability
Historically, systems that observed the vital parameters of software have been classified as monitoring tools. The origins of monitoring can be attributed to networks and they gradually incorporated servers, applications, and all other parts of the infrastructure that served web applications. Specialized tools and processes are now available to address individual layers.
In recent times, the term observability has begun to surface in a lot of conversations, especially among teams responsible for web performance. Part of the reason can be attributed to the ‘Infrastructure as Code’ movement, where the responsibility for provisioning underlying systems has shifted to the developers. By adopting observability, engineers are expected to make their code facilitate the ‘mark and measure’ needs.
#3 – Synthetics and Real-user Monitoring
By definition, synthetics is a method where traffic is generated to a website or a web application, to test a pre-determined set of actions or activities. Typically, these involve a set of scripts that simulate a click path taken by users. Feedback, in the form of measurements, is then recorded and made available to engineers to determine if the software conforms to performance requirements.
In contrast, real-user monitoring is a more candid form of monitoring. It measures parameters based on interactions from actual users on production systems. Since many errors appear when applications are used at scale, RUM captures very valuable information Availability and health of systems/applications are wrapped in the context of user experience on production.
We’re writing about these to highlight three important choices that we made as we built Plumbr. We chose to expose causation when and where we can. We’ve chosen to build our product based on real-user monitoring, as opposed to synthetics. We will be providing value as a monitoring tool, and the path taken will be by attaching ourselves to regular code bases and making them intrinsically more observable.
Some specific wins we can provide to engineering teams as a result of these choices are:
1. Grouping isolated incidents by the underlying cause, and tagging these with recurring manifestations of the same errors.
2. Provide actual incident vectors for errors that are difficult to reproduce.
3. Quantify the impact of these incidents. Feedback in the form of user count, and wasted time are available to assess the impact.