Results of the Plumbr DevOps Community Survey
Recently we concluded a survey among engineers belonging to several DevOps communities around the world. We sought out participants from several online forums, events, and sent out invitations to those who were open to sharing some information about their DevOps daily routines, practices, and special secrets. Our goal was to find engineers who were actively practicing DevOps culture and participating meaningfully in helping spread the word about it.
We had a total of 175 responses. From the survey, we see that nearly 80% of our respondents are engineers who are working with systems in production. The breakdown of each role is as follows:
Devops Engineer 18%
Architect / Developer 58%
IT Operations Engineer 3%
Team Lead 14%
CIO 0.00% 0
Our goal for the survey was to extract a deep understanding of the awareness around monitoring practices, and get an idea for the tooling that is in place among engineering teams. We enquired about the different types of monitoring practices that are currently prevalent among engineering teams, namely log monitoring, infrastructure monitoring, application performance monitoring, and real user monitoring.
Predominantly, the respondents seemed to have an active method of logs, infrastructure monitoring, and APM in place. 98% of all respondents said APMs were important to them.
Among the many types of monitoring employed by engineering teams, real user monitoring (RUM) features only in a small fraction of them. Taking a step back, even general awareness of the concept and the benefits that it can bring is rather low. The absolute adoption and overlap of other forms of monitoring are quite high and rather substantial in comparison.
This observation inspired a few things amongst the outreach team here at Plumbr:
- Organizations need to learn that real user monitoring exists and is real(!).
- Teams must recognize the lack of feedback about user experience in production.
- Engineers need to begin to include it as part of their monitoring activities.
In the next part of the survey, we asked our respondents about the different tools they used for monitoring. The responses to the survey indicate that engineering teams are all using a minimum of one product for monitoring. No one responded with a ‘0’. Interestingly, the maximum number of monitoring tools used simultaneously stood at 4.
The monitoring solutions that are the most popular among our respondents are Prometheus, Dynatrace, and New Relic. Other notable mentions include Nagios, Zabbix, AppDynamics, and Sensu. The breakdown into products among this group is as follows:
Engineers are tuned to look for two common tools of the monitoring trade – logs and metrics. Their horizons of telemetry and observability extend as far as event streams (typically in the form of logs) and aggregations (through statistical analysis of measurements). Monitoring, however, isn’t about what an engineer wants. It is about what they need. Monitoring should extend beyond simply collecting metrics. Maturity in monitoring is about evidence-based and intelligent insights that are actionable. And having a high signal-to-noise ratio. With this rather large number of tools, engineers continue to go down the rabbit hole of aggregations and events. Extracting strong alert signals from this rabbit hole is challenging.
Alerting, and consequently, monitoring encompasses a broad spectrum of areas. It is spread across ––Availability, Performance, Capacity and resource saturation, and Anomalies.
And yet, only 22% of our respondents mentioned that they are able to quantify the impact on end-users.
The inability of an engineering team to quantify the impact of availability and performance issues on end-users indicates a lack of maturity in monitoring. Engineering teams could be lacking the right metrics to track, the proper tooling and instrumentation, or be weak with processes.
If teams do not take a measured approach towards performance and availability, it becomes impossible to improve on these fronts. The degradation in experience builds up, leading to frustrated users. Products will suffer as they begin to scale and see more availability and performance issues crop up. Ultimately, these will all affect the bottom line of the business negatively.
If only 1 in 5 engineers can stake claim that they can measure the impact of incidents on end-users, within a community of engineers who are hands-on with DevOps processes, it presents a bleak prospect. It leaves us to conclude that teams have rather limiting monitoring practices, in spite of using several monitoring tools. This means feedback loops from production are incomplete and many blindspots exist for engineers working with production code.
We encourage you to try Plumbr today! It will help alleviate the most common deficiencies for your engineering teams and help fill in the gaps in monitoring tooling. Plumbr takes an approach where we combine the practices of traditional Application Performance Monitoring tools and supplement it with real user monitoring data. This helps engineers close the feedback loop from production, and helps communicate the complete picture of what’s transpiring on production.
Thanks Gleb Smirnov for helping edit drafts of the post.
P.s. Watch out for our next survey results about on-call processes and practices.