To blog

Using Plumbr with OpenTelemetry NodeJS agent

August 5, 2020 by Vladimir Šor Filed under: nodeJS Plumbr Product Updates Tracing

Every once in a while, we get asked about how well Plumbr supports/integrates with the up-and-coming open source distributed tracing agents. Our answer has always been “Yes, that integration will be possible one day, when the agents and their data communication protocols are mature enough”.

Then, that “one day” arrived…

About a month ago, one of our customers pointed out to us how they really like Plumbr’s availability and performance issue detection in their frontend and backend, but how placing a nodeJS application between these two prevented Plumbr from capturing the full traces. Here’s how the high level architecture of their application looked like, with Plumbr monitoring the user experience in the browser and the API performance on the JVM backend:

The customer then added that they would be willing to consider using the OpenTelemetry nodeJS agent, if it would link together with the spans reported by the Plumbr agents. And that’s when we thought – with the OpenTelemetry project getting more momentum and becoming more mature – why not give it a try.

The goal of the OpenTelemetry project is to build vendor-neutral tracing agents and APIs. Its OpenTelemetry JS sub-project also includes the nodeJS agent. Although, at the moment of writing (July 2020), OpenTelemetry agents are not yet officially released, we still thought this might be a good opportunity to test integration between the Plumbr and OpenTelemetry agents. After all, we only needed very minimalistic functionality of the agent. Specifically, we needed it to start the span at the boundary of an HTTP request and propagate the trace context via HTTP headers. This seemed to be already working pretty well.

We decided to give it a go and create a POC project that integrates data collected by OpenTelemetry (nodeJS) agents with data collected by Plumbr proprietary agents.

What did we need to do?

To use a 3rd party agent, we need to integrate three aspects. First, we need to receive the data collected by the agent. Second, we need to process the data in a way that plays well with our existing data processing pipelines. Third, the agent must be capable of joining an ongoing trace and propagating the tracing context downstream in a way that is understandable by our other agents.

OpenTelemetry itself suggests the installation of the so-called collector on every monitored machine. Agents send monitoring data locally to the collector and the collector handles the efficient and reliable transmission of collected data to the real processing backend. You can choose between different protocols for communication between the agent and the collector, and the same for communication between the collector and the data processing backend – a piece of integration code called exporter.

We could have implemented an exporter to convert OpenTelemetry spans into our own binary representation. But this would force our customers to use the collector, which might be an overkill for some simple deployments. Instead, we chose to support the Zipkin protocol in our data reception backend. This choice has several benefits for us. First, Zipkin can be used by the OpenTelemetry JS agent to both send data locally and remotely, which means that we would no longer be forced to use the collector (but would still be able to do so if needed). Second, once we start receiving span data in Zipkin format – we can easily add support for other agents that can export span data in Zipkin format.

The data processing pipeline benefited from the good design decisions made when we developed our Universal Agent. It turned out to be really universal – once raw Zipkin data was converted to our binary format, we only had to add new labels in the UI!

The most interesting part was integrating the agent itself with our trace context propagation. The OpenTelemetry agent has really nice integration points that can be used to customize which headers will be used to transmit the context over the wire. Both Plumbr and OpenTelemetry use very similar representations for both trace id and span id so we just needed to format them differently before transmitting/receiving over the wire. 

Using Plumbr with OpenTelemetry nodeJS agent

We ended up with a very lightweight NPM package that hides away multiple required OpenTelemetry dependencies, adds aforementioned integration hooks and makes installation and configuration as easy as other Plumbr agents.

How to use it? Simple! Add OpenTelemetry-js-plumbr as a dependency to your project:

npm add bitbucket:plumbr/OpenTelemetry-js-plumbr

Add a file named plumbr.js to your project with the following content:

'use strict';
const { initPlumbrTracing } = require("OpenTelemetry-js-plumbr");
initPlumbrTracing(
   {
      apiKey: "<API key>",
      serverUrl: "https://app.plumbr.io/",
      clusterId: "<Cluster ID>",
      serverId: "<Server ID>"
   }
);

Where

  • <API key> comes from from your Plumbr Account Settings page
  • <Cluster ID> will be used as an API name for collected traces in Plumbr.
  • <Server ID> will be used to identify a Server within the cluster (optional, defaults to Cluster ID).

Update your start script to add two parameters to node – node -r ./plumbr.js ...<the rest of launch command>

Caveat: if you use the esm module, it has to be included before plumbr.js, or we will fail to start: node -r esm -r ./plumbr.js ...

End result

After putting together our POC and performing thorough internal testing, we shipped it to the above-mentioned customer. The first external trials were a success, exposing bottlenecks detected in the JVM application fully linked to the user interactions in the browser that were affected by it. The NodeJS layer in the middle was no longer a problem and showed up as part of the traces, allowing for observability across the entire technology stack.

See how the spans of a user interaction in a browser get displayed by Plumbr, and notice how they travel through a nodeJS layer to the JVM (the last two lines in the list of spans):

With a broader roll out, the new integration now already handles more than 100K user interactions per day. If you have a similar deployment, why don’t you go ahead and give our new agent a trial here.

ADD COMMENT