Manual

Introduction

What is Plumbr?

Plumbr is a software solution for monitoring real users using a web application. The goal of the monitoring is to understand whether the application is performing as expected. To help understand this, Plumbr exposes:

  • How many users are facing application failures?
  • How much time is wasted behind bottlenecks in the application?

If the information shows that either availability or performance needs improvements, the next drilldown gives answers to:

  • What functionality of the application is failing?
  • What are the errors causing failures for the most users?
  • What parts of the applications perform the worst?
  • Where are the worst bottlenecks frustrating end users the most?

As a result, the engineering teams will be equipped with the list of errors & bottlenecks in the application ranked by the number of users impacted & the time wasted.

Based on the ranked lists of errors and bottlenecks, engineering teams can now focus on resolving the errors impacting the most users, or bottlenecks wasting the most time.

How does Plumbr work?

Plumbr tracks every user interaction within the web application’s user interface in a browser. The most common means of interaction are clicks, keyboard / mouse events and touches.

Every such interaction is monitored for its outcome and duration. If the interaction fails with any technical error, it is flagged as failed. The interaction duration is used to understand how quickly the application responds to a particular user action. If responding takes longer than expected, the interaction is flagged as slow.

Plumbr also monitors interactions for more details in order to detect errors causing interactions to fail & bottlenecks causing users to waste their time.

As a result, Plumbr is able to expose the user experience in terms of performance and availability. This is coupled with bottlenecks and errors degrading the impact, ranked by their impact to end users.

What does Plumbr consist of?

Plumbr deployments contain up to three different modules:

  • Browser Agent, capturing the end user experience in the device used by your customers. This is the module you should be installing first.
  • Java Agent, tracing the user interaction through the back-end services. Presence of this Agent also exposes bottlenecks & errors in the backend layer. Installation of this module is optional. You should install this module only if your backend services are running on the Java Virtual Machines.
  • Plumbr Server, responsible of receiving and processing the data collected by the Agents. Server also exposes the UI. We recommend using our SaaS offering as the Server. In this case, the installed Agents would connect to https://app.plumbr.io and serve the user interface also via https://app.plumbr.io.

If you only use the Browser Agent, the deployment model includes injecting our JavaScript Agent as the first script to the <HEAD> section of all the HTML pages in your application. After doing so, the Agent can start listening to the end user interactions in the browser:

Plumbr Browser concept

The data captured is sent to Plumbr Server, which is responsible for assembling all the details about the user interaction and allowing you to run analysis on the data.

In case you are after end-to-end transparency and deploy both the Browser and Java Agents, the user interactions are monitored from the browser to all the JVM-based nodes in the backend:

Plumbr Java and Browser Agent

The Plumbr Java Agent is packaged as a standard -javaagent, and attaching it does not mean you need to make any changes to your application. The only required change involves pinpointing the location of the Agent in the file system by adding -javaagent:/path/to/plumbr.jar to JVM startup scripts.

The Agents located in the monitored nodes pass along the transaction ID as a call metadata via HTTP headers. This way all the nodes servicing the user interaction can be assembled in the Plumbr Server into a single interaction distributed across multiple nodes.

User Interactions

What are user interactions?

User interaction is reflecting the real user experience after one interaction with the user interface. The interaction starts with an event generated by a real user via the UI in browser. Common types of such events are mouse clicks, touches and keyboard events, but Plumbr supports all other means of interacting with the browser.

Plumbr Browser Agent listens to all such user interactions. Only the interactions resulting in any server-side requests are considered relevant. All other interactions are ignored and never sent to Server to be reported. As a result of this, scrolls in static pages or clicks in empty areas are never registered as user interactions.

Every captured interaction is linked with any HTTP requests occurring because of the interaction.

If any of the requests returns with 40x or 50x series response, the interaction is flagged as failed, indicating that the end user did not accomplish what she intended.

Interactions that do not fail are monitored for their duration. The duration of the interaction is calculated from the interaction in the browser (click/touch/…) until the last fired HTTP request returns its response to the browser.

Plumbr also keeps track of the user performing the interaction, the application in which the interaction was performed, and the functionality that the interaction consumed. This allows you to keep track of what the particular user was actually doing within the application.

Every transaction starting in a browser thus captures and exposes the following information:

  • The ID of the transaction
  • The ID of the user performing an interaction
  • The start and end timestamps of an interaction
  • The application to which an interaction belongs
  • The functionality of the application used
  • The status of an interaction (successful/slow/failed)

Status flags

Each user interaction is assigned a status based on the outcome & duration of the interaction. As a result, user interactions can end up in one of the following statuses:

  • Successful. This is the desired status, meaning the end user did not face any errors during the interaction & the interaction completed fast enough for the end user.
  • Failed, indicating that the interaction did not complete as expected due to technical errors. This status is set if at least one of the HTTP requests during the user interaction responded with a 400 or 500 series error code.
  • Slow, indicating that the duration of the interaction exceeds a pre-determined & configurable threshold. The very same threshold is used for Very Slow and Stuck statuses.
  • Very Slow, set to interactions the duration of which exceeds the slow threshold for more than 4x.
  • Stuck, set when the duration of the ongoing interaction exceeds the predetermined slow threshold by more than 100x. In this case Plumbr assumes the transaction will never complete and flags the transaction as Stuck. Plumbr will stop monitoring the stuck transaction after flagging it as such.

Status flags for API calls work similarly.

API Calls

What are API Calls?

Besides monitoring real user experience in browser, Plumbr server-side agents are designed to monitor the APIs published on the server-side. Currently Plumbr supports APIs running on Java Virtual Machines (JVMs).

The API call is monitored from the moment it arrives to the JVM boundary until the response is sent to the caller. The API call duration and its outcome are registered similar to the user interactions, making it possible to expose the performance and availability of the API.

Note that in case a Plumbr Agent was already present in the upstream, the JVM processing the request is linked with the ongoing user interaction or API Call. The outcome of this approach is a distributed trace throughout all the nodes, providing end-to-end transparency of a call throughout the infrastructure.

Additional server-side monitoring increases the transparency, as slow or failed interactions are now traced to all server-side nodes to more precisely capture the evidence needed for solving the potential error or bottleneck.

That being said, we strongly recommend to start monitoring the real user experience first by adopting our Browser Agent and adding the server-side monitoring during the second phase of adoption. Without understanding the true user experience, exposed insights from the server-side are missing the proper context and making it hard to make informed decisions.

Applications

What are applications?

Applications are software bundles used by end users and monitored by Plumbr. As such, applications aggregate user interactions / API calls using the same software. This aggregated view represents the user experience all the users of the software have received.

Getting exposure to the information of how the users experienced different applications monitored by Plumbr allows you to quickly understand which applications behave as expected and which applications require investments to improve the user experience:

Identifying applications

To distinguish between applications, the application identifier is set in the Plumbr Agent configuration. It can also be changed via the Plumbr user interface available at https://app.plumbr.io. In essence, it is just a string, uniquely identifying each software bundle you wish to monitor with Plumbr.

The configuration determining the application is different for the web applications monitored by Browser Agent and APIs running in server-side:

For web applications monitored by Plumbr Browser Agent, application is specified as a parameter to the Browser Agent installation as seen in the following example via appName parameter:

<script src="https://browser.plumbr.io/pa.js"
        data-plumbr='{"accountId":"123456789",
        "appName":"CRM",
        "serverUrl":"https://bdr.plumbr.io"}'>
</script>

For server-side APIs monitored by Plumbr Java Agent, application name is derived from two properties specified in Java Agent configuration located in plumbr.properties file.

You will find these properties from the plumbr.properties file next to the Agent’s plumbr.jar file:

  • jvmId. Specify this parameter for JVMs which survive restarts. They can also be permanent members of clusters. jvmId is unique and cannot be reused by several JVMs.
  • clusterId. Specify this property if the JVM monitored by Plumbr is either a member in load-balanced cluster or an ephemeral node with short lifecycle. This way all the members sharing the same clusterId exposing the same API end up aggregated under the same cluster.

Specifying at least one of these attributes is mandatory.

To give you an idea, some examples of typical application names we see include:

  • CRM
  • eShop
  • Order Management

or

  • eshop-test
  • eshop-staging
  • eshop-production

Feel free to pick any string uniquely identifying the application. Pay attention that it helps if the rest of the team can associate the name later with the real software, so names as “kate experimenting” or “second application” tend not to be good choices.

Bottlenecks

What are bottlenecks?

Bottlenecks are underlying reasons why a particular user interaction or an API call was slower than expected. Plumbr’s goal is to map each slow user interaction as closely to the underlying bottleneck as possible, giving our users the means to understand the impact of each bottleneck.

The impact of a bottleneck is measured both in number of users who suffered from it, as well as the total time wasted while waiting behind this bottleneck. Having access to this information allows you to rank the bottlenecks and fix the ones with higher impact (i.e., more wasted time) first.

Bottlenecks are detected either in the layer monitored by a Plumbr Agent or as a downstream call from the layer. This is best understood via following example:

  1. The application is monitored only by Plumbr Browser Agent:
    The detected bottleneck exposes the blocking XHR call to the backend taking 6 seconds. As Plumbr was not monitoring the backend, there is no transparency to what in the backend was causing the six second wait.
  2. The application is monitored both by Plumbr Browser Agent and Java Agents:
    The detected bottleneck is now more specific, exposing that a synchronization issue within the JVM forced the XHR call to wait 5.5 seconds for the lock to be released.

Browser Bottlenecks

Bottlenecks detected in the browser

Browser bottlenecks cover the entire lifecycle of the user interaction exposing, the reasons why a user interaction initiated in the browser took longer to complete than expected. To achieve this, Plumbr Browser Agent monitors different phases for all the HTTP requests triggered by the user interaction:

Plumbr uses different methods to capture the required information – using the combined power of observing the document tree for changes (Mutation Observer) and instrumenting native methods for requests (such as XMLHttpRequest) means we are able to keep track of resource loading start and end times. However, as those are basic start and end metrics and can be inaccurate, we additionally use the Resource Timings API to get a detailed breakdown of the request’s lifecycle from the browser whenever available.

As a result, Plumbr is capable of exposing the following bottlenecks in the browser:

  • Browser – Redirect – in cases where there were a lot of redirects or some of the redirects were slow
  • Browser – Cache Fetch – in situations where requests by browser were already cached in the local disk and extracting this resource from the disk was slow
  • Browser – DNS lookup – whenever the DNS lookup was slow
  • Browser – TCP connect – in cases where establishing the TCP connection to the backend server was slow
  • Browser – SSL handshake – when establishing the secure connection to the backend server was slow
  • Browser – Request Wait (Assets, XHR, Pageload) – in situations where an asset (CSS, JS, etc.), an XHR or an HTML page request from the backend server was slow
  • Browser – Download (Assets, XHR, Pageload) – in cases where an asset, an XHR or an HTML page download from the backend server was slow
  • Browser – HTTP Call (Assets, XHR, Pageload) – whenever the HTTP request for an asset, a XHR or a HTML page from the backend server was slow, but Plumbr wasn’t able to distinguish the exact phase due to missing support for Resource Timings API in user browser.
  • Browser – Queue Wait – in situations where there are lots of resources to be fetched for particular page from the same domain, then browser can place these requests into a waiting queue.

Browser – Redirect

If you have a lot of redirects on your site then it can be very painful to the performance of your site as the browser needs to request them one by one. Plumbr is taking redirects time into account and if it exceeds given threshold then exposes Redirect Root Cause with an full URL where redirect was ended.

In general the rule is to avoid redirects as much as possible, they can drastically decrease page loading especially considering mobile devices. When possible try to make only single redirect.

Browser – Cache Fetch

In case the resource queried by the browser was recently requested it’s response can already be located in local cache and can be retrieved from there. In some cases when local disk is under high load it can happen that taking resource from the cache will take more than expected and if that time exceeds threshold then Cache Fetch Root Cause is registered with static text “Cache Fetch”. It is supposed to be a very rare Root Cause, assuming that clients machines nowadays are powerful enough to overcome such issues.

Browser – DNS lookup

Another issue Plumbr now exposes is slowness caused by browser making a request to a DNS server to translate domain name to IP. DNS lookup Root Cause will be registered if this phase exceeded the threshold. Plumbr groups all requests to the same domain into one DNS lookup Root Cause referencing the particular domain.

Reasons why DNS lookup is slow usually are following:

  • network is bad between client and DNS server
  • DNS server is at heavy load and cannot process requests fast enough
  • DNS server is not configured properly to make fast lookups
  • short DNS TTL setting causing servers to frequently check if their cached value is up to date

To get rid of this problem one can try to improve network conditions or change DNS servers if possible.

Browser – TCP connect

Bad network can cause one more issue also exposed by Plumbr now – TCP connect Root Cause. This Root Cause is registered when making a connection from browser to backend systems takes too long. As with DNS lookup Root Cause this kind of requests are grouped by the domain name where connection establishing was slow.

Browser – SSL handshake

During connection establishment phase the browser also does SSL handshake in case of a secure connection. SSL handshake Root Cause can be caused by poor network conditions, slow client or server. Domain name is also used to group all requests affected by this given Root Cause.

As it is not always possible to improve network conditions then one can pay attention to server performance looking that it’s CPU is not under heavy load and there is sufficient RAM to keep previous connections alive. Also it’s good to not choose certificates with very long keys (RSA 2048 should be sufficient).

Browser – Request Wait (Assets, XHR, Pageload)

Request waiting phase indicates the time from when the browser starts sending the request to getting the first response bytes from the backend. We differentiate 3 different Root Causes for assets (CSS, images, etc.), XHR and full page load requests. Root Cause grouping is done using:

  • domain name for Assets and full page load requests
  • shortened URL for XHR requests

While for all previous Root Causes the main reason can be network, here the reason can more likely be in the backend system and one needs to performance test and debug the system to find the real reason.

Browser – Download (Assets, XHR, Pageload)

Request download phase indicates time browser is receiving response from the backend system. Here we also are making 3 different Root Causes for assets (CSS, images, etc.), XHR and full page load requests.

Root Cause grouping is done using:

  • domain name for Assets and full page load requests
  • shortened URL for XHR requests

To solve the issue in most cases one needs to optimize backend system endpoint (making it closer to the user, eg. using a CDN) or the size of content returned from the system.

Browser – HTTP Call (Assets, XHR, Pageload)

In case we cannot determine a particular phase (as resource timings weren’t available) and the request is slow, then HTTP Call Root Cause is registered. Here we also are making 3 different Root Causes for assets (CSS, images, etc.), XHR and full page load requests.

Root Cause grouping is done using:

  • domain name for Assets and full page load requests
  • shortened URL for XHR requests

Browser – Queue Wait

Less critical resources such as images can be put into a queue by the browser and processed later. Plumbr registers such requests under Queue Wait Root Cause if being in the queue phase exceeds the threshold. The reason why it happens is:

  • lack of TCP connections
  • too many requests are being processed in parallel (usually by default the browser can process 6 requests to the same domain in parallel)
  • some requests are postponed because considered by browser not critical

Grouping of Queue Wait Root Causes is done by domain name.

To solve this issue:

  • domain sharding could be done.
  • reduce the number of requests by:
    • concatenating CSS or JS files together
    • using CSS sprites for images

JVM Bottlenecks

Bottlenecks detected in the JVM

Java Agent monitoring Java Virtual Machines detect server-side bottlenecks for slow user interactions & API calls. The bottlenecks detected in the JVM are either occurring inside the JVM (for example lock contentions on synchronization or GC pauses) or occurring downstream from the monitored JVM (a particular JDBC call to database or web service call over HTTP to a node not monitored by Plumbr).

Plumbr Java Agent uses bytecode instrumentation and JVMTI hooks to capture the information about the bottlenecks from within the JVM and link the bottleneck with the particular user interaction.

As a result, Plumbr Java Agent is able to expose the following as bottlenecks in JVM:

  • JDBC calls to databases, including MySQL, Oracle, Postgres and IBM DB2 databases.
  • Lock contention issues
  • GC pauses
  • Filesystem operations
  • WebService invocations over HTTP
  • MongoDB calls
  • Lucene searches and index updates
  • Threadpooling-related issues
  • N+1 issues occurring while communicating with remote systems

Excessive Number of …

“Excessive number of …” root causes are exposed in situations where many similar operations take place during a single transaction and the accumulated duration of such operations is the reason why the transaction ends up being slow.

For example, when just a single HTTP call is impacting user experience, it will be exposed as a Slow HTTP Call. In situations where many HTTP calls take place during a single transaction and the accumulated duration of such calls is the reason why the transaction ends up being slow, the Excessive Number of HTTP Calls root cause is exposed instead.

In most cases, the solution for such problems requires a change in application code. The performance gains can often be achieved by applying either of the following guidelines:

  • Reducing the amount of operations invoked via changing the amount of data requested
  • Batching the operations together instead of launching them via a single call.

Slow JDBC Calls

The Plumbr Agent monitors every JDBC Type 3 and Type 4 driver detected in the application. This means that Plumbr supports almost every database vendor exposing the data storage via JDBC, including but not limited to the most widely used MySQL, Oracle, Postgres and IBM DB2 databases.

The Agent instruments all the JDBC calls which connect to databases via StatementPrepared Statement and Callable Statement APIs. When a call via such an API starts affecting the end user experience, the offending query is listed as a root cause exposing the JDBC operation executed along with the call stack from the thread executing the query. In such a way, you get access to the root cause of expensive JDBC operations down to a single line in the source code responsible for executing such queries.

In order to reduce noise and get a prioritized list of expensive database operations, Plumbr groups expensive operations triggered by the same root cause together, allowing you to rank the expensive operations based on the number of times they are detected.

 

File Stream Operations

The Plumbr Agent monitors file reading and writing operations performed by using FileInputStream and FileOutputStream classes. When the wait time for the read and write operations starts impacting end user experience, Plumbr recognizes this and links a File Stream Operation root cause with the slow transaction. The root cause exposed will contain the following information:

  • File(s) being read/written, along with their path in file system, size and other relevant attributes.
  • Call stack from the thread executing the operation, zooming you right to the line in source code accessing the file system.

There are several common problems that happen when reading from or writing to a file stream and leading to slow transactions:

  • Lack of buffering: each read or write operation incurs overhead, depending on the operating system, file system and hardware. Instead of reading or writing one byte at a time, a much more performant approach would be to do it in bulk. A simple approach would be to make use of a BufferedInputStream or BufferedOutputStream
  • System issues: like we said above, the performance of file operation depends on the operating system, the file system and the hardware. It is sometimes the case that one of these becomes the bottleneck, and even a single file stream operation could take tens of seconds.

Locked Threads

The Plumbr Agent monitors all JVM threads for lock contention events. Plumbr monitors both synchronized block/method access and java.util.concurrent locks.

For synchronized blocks/methods, Plumbr tracks the situations where a thread in the JVM executes code in a synchronized block or method and another thread tries to enter the same synchronized block/method.

For java.util.concurrent locks Plumbr will detect the situations where threads are forced to wait for events originating from the use of various java.util.concurrent classes, ranging from ReentrantLock to ArrayBlockingQueue.

When the wait times in either of the case exceeds a predetermined threshold, the root cause will be exposed, containing the following:

  • How long the thread was forced to wait before getting access to the synchronized block/method.
  • The monitor used to lock the method/code block (for synchronized usage only).
  • The name and call stack from the thread trying to enter the synchronized block/method.
  • The name and a snapshot of the call stack of the thread whose code was running in the synchronized block. The snapshot of the call stack is taken when the waiting time for the blocked thread is about to exceed the configured threshold.

Having such information allows you to zoom in to the underlying root cause with the precision of a single line in the source code, skipping the tedious and complex process of troubleshooting concurrency issues. Notice that Plumbr also binds together similar lock contention events, allowing you to rank the severity of the performance issues based on the frequency of the underlying root cause.

File Attribute Operations

The Plumbr Agent monitors file attribute querying performed by using methods such as File.exists(), File.isDirectory(), File.canWrite() and so on. While individual operations like that are usually handled very quickly, typically under a few microseconds, having a large number of them may result in a slow transaction. One of the most common cases is recursively walking a large directory that contains millions of files.

When the wait time for the attribute checking starts impacting end user experience, Plumbr recognizes this and links a File Attribute Operations root cause with the slow transaction. The root cause exposed will contain the following information:

  • File(s) being accessed, along with the operation performed (exists(), isDirectory(), etc)
  • Call stack from the thread executing the operation, zooming you right to the line in source code accessing the file system.

GC Pauses

The Plumbr Agent monitors all stop-the-world Garbage Collection pauses that take place in the JVM. If the duration of such a pause exceeds a configured threshold, an incident is created. In addition to the time and duration of the pause, a Plumbr incident contains insights that would help reduce either the duration or frequency of the long GC pauses, for example:

  • Plumbr captures a memory snapshots, exposing the most memory-hungry data structures in memory. This allows you to proceed with trimming the most resource-hungry data structures.
  • Allocation and promotion rates exposed by Plumbr, along with the memory consumption in different memory pools will give you clues about the poorly allocated heap structures..

Underprovisioned Thread Pools

Plumbr Agent monitors thread pools embedded in the application to detect situations where submitted tasks/requests will end up waiting in queue for available executor. When a the wait time in such queue starts impacting the end user experience, an Under-provisioned Thread Pool root cause is registered. The root cause exposes the call stack from the thread waiting in the queue for Plumbr users.

ThreadPools Plumbr Java Agent is able to monitor:

  • ThreadPoolExecutor embedded in the Java SDK
  • org.apache.catalina.core.StandardThreadExecutor from Tomcat application service (configured through “tomcatThreadPool” Executor)

Under-provisioned thread pools are surfaced as a root cause in situations where the thread pool is not able to provide a free thread enough threads to cope with the incoming work load. This can be so either due to:

  • Work done by such threads is taking unusually long to complete. The solution for such cases is to optimize the code executed by the threads in the pool.
  • Amount of tasks/requests submitted to a pool is higher than usual. In such situations the solution is either in controlling or load balancing the load.
  • Last, but not least, situations where the thread pool configuration is not providing enough threads to match the regular load. In such situations the solution is as easy as increasing the number of threads in the pool configuration.

Slow HTTP Calls

The Plumbr Agent monitors different HTTP client libraries used for connecting to remote systems over HTTP. When the HTTP calls to such remote endpoints start affecting the end user experience, the offending HTTP query is linked to user transactions as a root cause, exposing the outgoing HTTP request along with the call stack from the thread executing the query.

Slow HTTP Calls tend to perform poorly due to the remote system not responding to the call from JVM quickly enough. To solve the problem, the system being accessed via HTTP needs to be tuned for latency. If this is not an option, caching the results can also used to reduce the number of such operations.

The supported list of HTTP client libraries includes:

Slow MongoDB Operations

To detect expensive calls to a MongoDB instance, Plumbr monitors DBCollection and MongoCollection interface methods such as find() and findAndModify(). When a call via such an API starts affecting the end user experience, the operation is listed as a root cause exposing the MongoDB operation called along with the call stack from the thread executing the operation.

In order to reduce noise and get a prioritized list of expensive MongoDB operations, Plumbr groups expensive operations triggered by the same root cause together, allowing you to rank the expensive operations based on the number of times they are detected.

Plumbr supports and monitors both the 2.x and 3.x versions of MongoDB drivers.

Slow JDBC Connection Acquisition

Plumbr detects slow JDBC Connection Acquisition when JDBC connection retrieval via DataSource.getConnection() or DriverManager.getConnection() is affecting end user experience. In such case Plumbr notices this and exposes the number of transactions affected along with the wait time the transactions were forced to wait behind the connection retrieval.

Slow connection retrieval can be caused either by

  • Missing connection pool. Creating JDBC connections is expensive, so in this case please consider using pooling the connections. 
  • Uninitialized connection pool. In cases where pooled connections are initialized lazily, the first requests to the empty pool are slow. Consider initializing the pool during application startup.
  • Under-provisioned connection pool. When the number of available connections in the pool is smaller than the demand, there will be wait time in queue for the connections. Consider increasing the pool size to match the number of concurrent requests to the data source.
  • Leaking connection pool. If connections are not closed the connection pool does not know that the connection is no longer being used by the borrower thread. To fix this, add pool-specific options to pool configuration to spot leakages in pool.
  • Testing connections. To avoid unused connections in pool for becoming stale, the pool implementations often test out the connection before handing it off to the executor thread. When the test query is expensive, this can result in poor performance. Consider simplifying or dropping the tests if possible.

Transaction Snapshots

Plumbr is capable of monitoring for a large number of specific root causes explicitly. Unfortunately, the different technologies used in real world means that the number of ways a particular code can perform poorly is effectively unlimited. Thus a fallback is implemented to cover the cases where the explicit root cause can not be determined. In such situations, Plumbr Agent captures snapshot(s) from the suspicious transaction.

Snapshots are effectively thread dumps taken from the thread executing the transaction. Snapshot capturing happens at increasing intervals during the transaction lifespan and is limited to 10 snapshots. Snapshots taken will be linked to the transaction if the duration of the transaction will eventually be flagged as Slow or Stuck. When the transaction ends up being successful, such snapshots will be discarded.

To expose this information in a useful way, Plumbr aggregates those call stacks into a tree-like structure. Call stacks occurring most frequently are ranked higher in a tree. To reduce noise, non-repetitive occurrences are hidden, enabling you to focus on the most frequently captured snapshots first.

Slow Lucene Operations

Plumbr monitors Lucene indexes being used via instrumenting and monitoring all implementations of org.apache.lucene.search.IndexSearcher and org.apache.lucene.index.IndexWriter interfaces. Doing so allows Plumbr to track all the operations modifying the index or reading from the index. This support is implemented and tested on Lucene 4 and 5 releases.

By monitoring the behavior of said interfaces, Plumbr is capable of exposing:

  • The impact poorly performing Lucene indexes have on your end users
  • Actual root cause, down to a single line in source code accessing the index
  • Information about the index accessed, including the index size, accessed fields, accessor methods and more.

JDBC Multi-Queries

The Agent instruments all the JDBC calls which connect to databases via Statement, Prepared Statement and Callable Statement APIs. When a single JDBC statement executed through such APIs will impact user experience, a Slow JDBC Call is detected as the root cause. In situations where many database calls take place during a single transaction and the accumulated duration of such calls is the reason why the transaction is flagged as slow, the multi-query root cause is exposed instead.

In the details of this root cause you will find the offending queries along with the call stacks from the threads executing such queries. To minimize overhead, smart sampling is applied when exposing this data.

The Plumbr Agent monitors every JDBC Type 3 and Type 4 driver detected in the application. This means that Plumbr is able to monitor communication with almost every database vendor exposing the data storage via JDBC, including but not limited to the most widely used MySQL, Oracle, Postgres and IBM DB2 databases.

Slow ResultSet Processing

Slow ResultSet processing is detected when the result set fetched from database over JDBC is processed in a way it is affecting end user experience. In such case Plumbr notices this and exposes the number of transactions affected along with the wait time the transactions were forced to wait behind the JDBC result set processing.

To monitor the time it takes to process the resultset, Plumbr Agent monitors the cumulative duration of each java.sql.ResultSet.next() iteration. When the cumulative time of the iterations starts impacting end user experience, the Slow ResultSet Processing root cause is created. This root cause will expose the query whose results were processed along with the call stack through which the results were processed.  

Slow ResultSet processing is usually detected when fetching large result sets received from database. To improve the situation, consider either switching to more fine-grained queries or use database-backed paging to limit the size of result sets.

Errors

What are errors?

Errors are the reasons for a particular user interaction or API call to fail to complete. Errors captured by Plumbr include technical errors, such as a Javascript errors in browser or Java Exceptions in the back-end. Plumbr is not capturing logical errors, such as the situations where VAT on an invoice is calculated incorrectly.

Plumbr’s goal is to map each failed user interaction to the underlying error, giving our users the possibility to understand the impact of each error. The impact is measured in the number of unique users experiencing the error. Having access to this information allows you to rank the errors and fix the ones with higher impact first.

Errors detected in browser

Plumbr Browser Agent captures errors from user interactions that include at least one request completing with 400 or 500 series response code, indicating either a client or server side failure. In addition, all Javascript errors occurred in browser during the interaction are exposed as root causes for failed transactions.

In all such cases Plumbr flags the interaction containing such request as failed. In addition, the request URL along with the response code is linked with the failed request as the error due to which the interaction failed.

There are also some exceptions, we are not considering spans with response code 400, 401, 403, 409, 418, 426, 451 as failed because they are used a lot in REST APIs to describe behavior of business rules.

JavaScript Errors

When an uncaught error is thrown in JavaScript code, it is gathered up by the agent and linked to the appropriate user interaction.

Script Error

When an error happens in a script loaded from a third party domain, web browsers include additional protections for user data. While error details are visible in developer console (and similar tools), gathering it programmatically gets blocked and error message is set to “Script Error”. This is done to avoid leaking personal data on the user from other sites.

Third party scripts can be whitelisted from this behaviour by loading them in a way that passes CORS rules. This requires both server-side configuration and changes to including script tag:

1. Server must respond with the correct CORS header(s)

Examples:

Access-Control-Allow-Origin: *
Access-Control-Allow-Origin: https://example.com

2. Adding crossorigin attribute to the script tag with appropriate value

Example with normal script tag:

<script src="https://example.org/script.js" crossorigin="anonymous"></script>

Example with a script tag that loads another script:

<script>
var s = document.createElement('s')
s.src = 'https://example.org/script.js'
s.setAttribute('crossorigin', 'anonymous')
document.getElementsByTagName('head')[0].appendChild(s)
</script>
Note: Browser agent installations before August 2018 may also be missing this attribute and can impact the amount of script errors (due to the agent wrapping some browser methods). Feel free to add that attribute to our agent code.

Warnings By Default

As JavaScript is ran in the user’s browser, some errors caused by the environment may also be picked up by us, for example if the user has a broken extension. We use files referenced in the stack trace (for example if the majority of lines are pointing to non-http resources, such as safari-extension://, or are completely anonymous) to guess if the error is more likely caused by these factors outside your application. Such errors get instantly demoted to warnings and can be checked separately in the warnings list.

Errors detected in JVM

If Plumbr Java Agent is used to monitor the server-side of an application, the creation of Java Exceptions is used to more specifically pinpoint the errors. Whenever a user interaction is flagged as failed, the chronologically last Exception occurring is linked to the interaction as an error. The Exception contains the full stack trace, allowing you to zoom in to the source code.

Exceptions that do not affect any user interactions / API calls or exceptions used to steer control flow are not exposed. Exceptions are grouped together into root causes by Exception class. For example, all ArrayIndexOutOfBoundExceptions would be grouped together as instances of a single error. Different call stacks are visible from the root cause details to verify whether or not the source code would need patches in multiple locations.

Demoting errors to warnings

On rare occasions Plumbr ends up flagging user interaction or API as failed in situations where the error detected does not impact real user experience. To overcome such situations, it is possible to demote an error to warning. This possibility is available in error detail view:

After demoting an error, all future user interactions/API calls containing this error will no longer be flagged as failed. Historical data will also be updated. Depending on the data volumes, updating the history can take up to one hour to complete.

Demoted errors turn into warnings. Warnings are accessible via Plumbr UI in the Errors menu item. To see the warnings, toggle the warning icon in the error list header to see the warnings:

In case an error was incorrectly demoted, promoting it back to error is also possible. Open warning details and click Promote to turn the warning back to error.

Root Causes

What happened to root causes?

Until 2018, Plumbr referred both to bottlenecks and errors via a generic term “root cause”. We learned that the two concepts are different and representing them via the same entity/noun created confusion, so the root causes were retired and decoupled to errors representing the root causes for failures & bottlenecks, representing the reason why the end users faced slow applications.

Services

What are services?

A service is a name for the operation the user was doing. As such, services group together similar user interactions (for example, paying an invoice or adding an item to a shopping cart).

Secondary purpose of a service is determining the slow threshold to interactions consuming this service. Transactions exceeding this threshold would be flagged as slow. Out of the box the slow threshold is 5s, but it can be overridden for a any service from the service settings.

Service detection works differently depending on what type of application Plumbr is monitoring.

Services in web applications

When the Plumbr Browser Agent monitors the application, service detection builds upon three components:

  • the URL at which the user was before the interaction
  • the interaction the user performed
  • the URL at which the user ended up after the interaction completed

monitoring web applications

In the example above, the user was viewing an invoice with the ID 123 and decided to pay the invoice by clicking the button Pay. The application processed the payment, after which the user remained at the same URL with the confirmation message that the invoice was paid successfully.

Plumbr identifies the service for this interaction as Click “Pay” on /invoice/view/{1}. As seen from this example, all invoice payments carried out in the /invoice/view screen are grouped together under the same service, independent of the ID.

Improving the service detection

There are known cases where out-of-the-box Plumbr configuration ends up with either services exposed via cryptic names. As a result you would see services similar to the following in Plumbr UI:

Key pressed on “div > list > filter > input#filter” at /user/search

This happens in situations where no human-readable elements were present to use as the identifier on the input field where the user performed the event. As a result of this, Plumbr used a fallback and exposed the DOM tree branch the event took place at. To replace this with a human-readable version, use “aria-label” attribute on such elements, so instead of

<input type="text"/>

you would use

<input type="text" aria-label="Name"/>

After this change, the name of the service will change to Key pressed on “Name” at /user/search for Plumbr. As a side effect, blind people also now have better access to the content of your site, as this is what the aria-label element was originally designed for.

Detecting a service from an URL

Let us explain this approach using a transaction arriving at the following URL as an example:

http://www.example.com/shop/cart/add/iPhone6?quantity=5

As a first step parameters are stripped. Next, service detection parses the URL to use /shop/cart/add/iPhone6 as the input. As seen, the last token identifying the product added to the shopping cart (iPhone6) is actually a parameter of the service. In order to group all interactions adding items to the shopping cart under the same service, Plumbr replaces the iPhone6 token in the URL with the placeholder {1}. As a result, the service detected from the transaction is

/shop/cart/add/{1}.

Using this approach makes it possible to group transactions accessing the same /shop/cart/add service together, regardless of the product you added to the shopping cart.

Certain limitations apply to which tokens can be automatically replaced by a placeholder: by default only tokens containing non-alphabetical characters are replaced.

If an application being monitored contains path parameters consisting only of alphabetical characters, then such services won’t be correctly grouped, i.e., there will be more services reported that is expected. If an application being monitored uses approach where there is only root path ( ) and services are encoded using request parameters (either using or # as a separator), then the services will be grouped too eagerly under one root “/” service. For example both following URLs

/#tab=shoppingcart
/#tab=checkout

will be detected as one root service: “/”

To solve this, it is advised to use custom service grouping (accessible via Settings > Service Grouping Rules). There are two types of grouping rules – prefix based and regular expression based. Prefix matcher is the simplest and it will replace all URLs matching the defined prefix with the desired service name. Regular expression grouping rules are much more powerful, they allow matching request parameters and using groups with back-referencing to construct service names. This allows to overcome limitations of the default URL service detection. For our particular example we could define following regex grouping rule:

Matching pattern:

/#tab=(.*)

Service name pattern:

\1

Will result in two detected services:
shoppingcart
checkout

Services detected in API calls

Whenever the operation arriving to the JVM has not started in a browser monitored by the Plumbr Browser Agent, service detection is the responsibility of the JVM accepting. Service detection in JVM is done differently for different applications:

  • HTTP calls. If the call arrive via HTTP protocol, Plumbr extracts the service from either:
    • MVC framework metadata. If Plumbr supports a particular Java MVC framework (see the list below) used to process the incoming HTTP request, service detection uses the class/method name of the controller invoked
    • If the HTTP call is not processed by a supported MVC framework, the service is detected using the information encoded in the URL.
  • EJB methods. If the call arriving to the JVM is a remote EJB call, Plumbr users the EJB class and method name as the service
  • Swing event listeners. If the interaction was captured in a Swing application, this interaction sets the Swing event and action listeners as the service

Detecting a service from MVC

When a JVM monitored by Plumbr exposes its services via an MVC framework supported by Plumbr, the service name is extracted from the controller processing the transaction. For example, when an HTTP request such as

http://www.example.com/payments?actionId=payInvoice&invoiceId=411121

is mapped and processed by a Struts controller, the service is extracted from the controller. An example of such a controller would then be visible in the Plumbr interface similar to:

com.example.payments.PaymentAction.execute().

The controllers from the following MVC frameworks are currently supported for service detection:

  • Spring MVC
  • Struts 1 & 2
  • GWT 2.x
  • JSF 1.1+
  • Vaadin 6+
  • ZK 7+
  • Play framework

Whenever a HTTP request is not processed by an MVC framework known to Plumbr, service detection falls back to capturing the service from the information encoded in the URL.

 

Users

What are Users?

A user is an attribute of a transaction, which associates the transaction with the end user interacting with the application. Users in Plumbr are built upon two concepts: user tracking to distinguish one user from another and user identification to expose the identity of the user.

Tracking Users

User tracking works for web applications monitored by the Plumbr Browser Agent. User tracking is based on generating and storing a random unique string in the end user’s browser. This random string is then submitted along with each request from the particular browser.

The unique string is stored in the browser’s cookies, and so subsequent visits to the same site can be associated with the same user. The cookie used for this purpose is named plumbr_user_tracker.

In a Plumbr deployment where users are tracked but not identified, a unique user is counted each time your content is accessed from a different device or browser.

For example, the following journey appears as three different users in cases where user identification is not implemented:

  1. Searching for a product on a tablet or a phone one day,
  2. Purchasing the product on a desktop the next day,
  3. Filing a complaint about the purchased product on a laptop a week later.

Even if all these interactions were performed by an authenticated user, Plumbr would track it as three different users. While you can collect data about each of these interactions and devices, you cannot determine if any relationships exist. You only see independent data points.

The very same journey would be linked to a single user if users could be identified. In such a case, the interactions on different devices would be connected with the same user identity.

Plumbr user tracking only works for web applications. So, if you are monitoring EJB modules or Swing applications, Plumbr will be incapable of tracking the users in such deployments. If Browser Agent is used for monitoring user interactions then unidentified but tracked users will appear as separate anonymous users in the Plumbr UI. If only Java agent is used for monitoring, then users will be tracked only if it was possible to obtain identity information for authenticated users, i.e., there are no anonymous users for API calls.

Identifying Users

In order to expose the journey of a specific user and track the same user across multiple devices, Plumbr also embeds the possibility of identifying users. The exposed identity can be in any form that the particular application can handle. Typical examples of identity are the username or email address of the user.

User identity is automatically linked to a transaction in applications where Plumbr is capable of determining the location of the identity. Plumbr has 3 conceptually different ways of obtaining users’ identity:

  1. fully automatic discovery for certain frameworks,
  2. via configuration for certain sources of properties,
  3. programmatically via our APIs.

Fully automatic discovery, requiring no configuration, supports following frameworks for capturing identity:

In case if Plumbr has not been able to detect the user’s identity automatically, you can help Plumbr to locate the identity yourself by configuring one or more of Identity Detection Rules:

  • HTTP Header Rule. If your application passes along the identity of the user using HTTP Request Headers.
  • Session Attribute Rule. If your application stores information about current user’s identity in a custom Servlet session attribute.

Configuration of Identity Detection Rules is explained in detail in the following chapter.

When automatic and configurable identity detection doesn’t suit, then Plumbr allows manually setting user identity via its APIs:

Please note, that all identity detection mechanisms except Browser Agent API only work in settings where the application is monitored by the Java Agent (regardless of whether or not the Browser Agent is used). 

Identity Detection Rules

In case Plumbr has not been able to identify the users, you will need to help it find the location of the stored identities. You can do this by configuring the location of the user identity via creating a new  Identity Detection Rule.

Identity Detection Rules can look for the identity of users from two different locations.

In case your application passes along the identity of the user via HTTP Request Headers, you will need to configure a HTTP Header Rule. In such a case, all you need to specify is the name of the HTTP Header from which Plumbr can extract the tracking information. The value for the Header would then be similar to the following:

X-User-Authentication

In case your application does not use HTTP Headers to pass along the identity, you will need to configure a Session Attribute Rule instead. In such a case, you will need to configure two parameters: Attribute Name and Extraction Path.

  • Attribute Nameis the name of the attribute in the session context storing the user’s identity. For example, Spring Security adds its security context under the SPRING_SECURITY_CONTEXT attributeWhen the attribute specified is not found in a particular HTTP Session either because an incorrect attribute name was provided or the application user is not yet authenticated, the Plumbr Agent will not proceed to detect the user’s identity from the Extraction Path.
  • In the Extraction Path field, you should define the exact path where the user identity is stored. For example, if the Spring Security is used, the extraction path used will be getAuthentication().getPrincipal().getUsername().

Combining the two parameters allows Plumbr to look for the identity. Using the Spring Security as an example again and combining the two examples above will result in Plumbr looking for:

session.getAttribute(“SPRING_SECURITY_CONTEXT”).getAuthentication().getPrincipal().getUsername();

Configuration: Example

To explain how you can configure the location of the User Identity let us check the following example. In this example application the successful authentication operation results in storing the User’s identity in the HTTP Session as:

request.getSession(true).setAttribute(“USER_CONTEXT”, new UserContext(ipAddress, username));

where request is instance of javax.servlet.http.HttpServletRequest.

Let’s also assume that the UserContext class would be designed as:

public class UserContext {
    public String getIpAddress() {
        return ipAddress;
    }

    public User getUser() {
        return user;
    }

    private final String ipAddress;
    private final User user;

    public UserContext(String ipAddress, String username) {
        this.ipAddress = ipAddress;
        this.user = new User(username);
    }

    private final class User {
        private final String username;

        public String getUsername() {
            return username;
        }

        public User(String username) {
            this.username = username;
        }
    }
}

So we are adding an instance of the UserContext into session attributes under the key USER_CONTEXT. By default Plumbr Agent does not look the identity from this location. To teach Plumbr Agent how to extract user identity in this case, we would specify the configuration as following:

  • Attribute name: USER_CONTEXT
  • Extraction Path: getUser().getUsername()

Equipped with this knowledge, Plumbr Agent will now be monitoring for setAttribute() events in all HTTPSession instances. Whenever such an event arrives and the attribute set is “USER_CONTEXT”, Plumbr starts capturing the identity.

The identity itself is extracted by invoking getUser().getUsername() on the UserContext object stored under “USER_CONTEXT” key.

If Plumbr agent fails to extract identity with defined Extraction Path then in your application logs you will see the following banner:

 *********************************************************************
* Failed to extract user identity with path: getUser().username             *
* Please check your configuration here:                                                          *
* https://app.plumbr.io/settings/identity-detection                            *
**********************************************************************

Using the same example above, the error message becomes clear. The username attribute in the User class is declared private and cannot be thus accessed. To fix, just change the Extraction Path to be equal to getUser().getUsername().

Browser Agent Configuration

Browser Agent Configuration

Configuration of the browser agent is done in the data-plumbr attribute of the script tag used to load the agent:

<script 
  src="https://browser.plumbr.io/pa.js" crossorigin="anonymous"
  data-plumbr='{
    "accountId" : "abcde..", 
    "appName"   : "Marketing site", 
    ...
  }'>
</script>

Make sure that the quotes used to define the attribute are being escaped properly. If the settings are generated dynamically, it is recommended to use the backend framework/language methods for JSON and HTML entities encoding. For example, in an EJS template:

<% var plumbrSettings = {
    accountId: "abcde...",
    serviceName: "User's profile"
} %>

<script src="https://browser.plumbr.io/pa.js" data-plumbr="<%= JSON.stringify(plumbrSettings) %>"></script>

Basic Configuration

The following settings are required for the browser agent to run:

accountId Your Plumbr account identifier. This is included in the embed code shown to you in portal. During normal use you do not need to change this.
serverUrl The server to which browser agent sends data to. If you are using on demand Plumbr make sure this value refers to https://bdr.plumbr.io. If the agent should connect to on premise Plumbr server make sure it is set accordingly.

Transaction Configuration

Transaction Defaults

The following configuration options apply to all of the transactions generated via the browser agent and are equivalent to calling the same Browser Agent API methods.

Application Name – appName

Set the Application Name of all transactions generated on this page.

Example

<script src="https://browser.plumbr.io/pa.js" 
  data-plumbr='{
    "accountId": "abcde..",
    "serverUrl": "https://bdr.plumbr.io"
    "appName": "Public Store"
  }'>
</script>
User Identity – userId

Set the User Identity of all transactions generated on this page

Example

<script src="https://browser.plumbr.io/pa.js" 
  data-plumbr='{
    "accountId": "abcde..",
    "serverUrl":"https://bdr.plumbr.io"
    "userId": "John Doe"
  }'>
</script>

Page Loading Transaction

These configuration options apply to the transaction generated from the user loading the page that the browser agent is included on. For example, the act of loading this page by you right now.

Response Code – responseCode

Due to technical limitations the response code of page load is inaccessible to us. In order to mark the page load span with a relevant HTTP response status it must be set in the configuration.

Example

<script src="https://browser.plumbr.io/pa.js" 
  data-plumbr='{
    "accountId": "abcde..",
    "serverUrl":"https://bdr.plumbr.io"
    "responseCode": 404
  }'>
</script>
Service Name – serviceName

Set the Service Name of the transaction that is loading this page.

Example

<script src="https://browser.plumbr.io/pa.js" 
  data-plumbr='{
    "accountId": "abcde..",
    "serverUrl":"https://bdr.plumbr.io"
    "serviceName": "Product Page"
  }'>
</script>

Compatibility Configuration

Unfortunately, there are a few times where libraries used on the page don’t match with the way browser agent works. For those cases there are options to enable compatibility fixes which take a performance hit.


Prototype.js versions < 1.7.1

Prototype.js is a library that extends native JavaScript objects by adding new methods to the prototype of the objects. However as JavaScript evolves and new methods get added to the specification of the language (and therefore latest versions of browsers), the behaviour of these methods can be different than the one defined by the specification.

To fix these issues you can either upgrade to Prototype.js versions >= 1.7.1 or if that is not feasible configure the agent accordingly:

Version Notes Configuration
1.7.0 (inc. RC) Has non-ES5 compliant Function.prototype.bind implementation. "compat":["bind"]
1.6.1 and older In addition to bind, Prototype.js adds of toJSON to all of the objects which is used in ES5 to customise an object’s value when they are turned into a string. "compat":["bind","json"]

Example of using Prototype.js 1.6.1 with browser agent:

<script src="https://browser.plumbr.io/pa.js"
    data-plumbr='{"accountId":"abcdef...","serverUrl":"https://bdr.plumbr.io","compat":["bind","json"]}'>
</script>
<script src="https://ajax.googleapis.com/ajax/libs/prototype/1.6.1.0/prototype.js"></script>

 


In-HTML Event Handlers

Event listeners defined like <a onclick="doSomething(); return false"></a> are tightly coupled to the HTML, so we need to jump through some hoops in order to gather transactions from these interactions. As part of this we wrap your code with instrumented_with_plumbr method.

If you do not want this behaviour, or you never use it and want a small performance gain, you can add "inlineEvents": false to your browser agent config. (Note: Any activity happening in inline event listeners will then be linked to the previous transaction.)

In case you do changes to the onX attribute after we instrument it, you can enable extra watchers by adding "inlineEventChanges": true to the configuration, which re-instruments your code after changes.

Java Agent Configuration

Configuration

Java Agent configuration is stored in the file plumbr.properties located next to the Plumbr Java Agent .jar file. This configuration is used to monitor a single JVM in one machine. When monitoring multiple JVMs in the same machine make sure that every JVM uses a different Plumbr installation to avoid clashing in the configuration.

If you cannot use the configuration via property files, an alternative is to configure the Agent to specify parameters in the JVM startup script using -D parameters, prefixing each parameter with the “plumbr.” namespace. So, for example, you could specify the accountId, jvmId, clusterId and serverUrl parameters for your JVM also via:

java -Dplumbr.accountId=a8nd2bar -Dplumbr.jvmId=node1 -Dplumbr.clusterId=Payment -Dplumbr.serverUrl=https://app.plumbr.io

Please note that if you do not use a property file, you will need to pass the following properties via parameters: accountId, jvmId, clusterId, serverUrl, logFile, logLevel. Copy their values from the property file that you have. Either jvmId or clusterId must be set, one is optional only if the other is present.

Basic configuration

Configuration parameters in this section are required for the Plumbr Agent to connect to the Plumbr Server, link the JVM to your account, and identify the JVM so that you could distinguish between the different JVMs monitored by Plumbr.

  • accountId – your account identifier, which binds this Agent to your account in the Plumbr Server. This identity is generated and embedded into the downloaded Agent configuration for you. During normal use, you should not change the value of the parameter.
  • jvmId – (optional if clusterId is set) JVM identifier, binding data from this particular JVM to a correct JVM in the Plumbr Server. When this identity is not provided, the connected JVM gets assigned a temporary identifier, which will not survive over JVM restarts. In order to have the data connected to the same JVM, provide the identifier either as a value of this property or via the server-side UI.
  • clusterId –  (optional if jvmId is set) identifier of the JVM cluster. ClusterId is the preferred way of grouping jvm transactions under predetermined application (application discovery logic more precisely described). JVMs with same clusterId are also grouped together on the Plumbr server side views, where appropriate (like Architecture views).  Setting clusterId is useful, when individual JVMs run the same code and/or run in dynamically provisioned environments. If the value of clusterId is unspecified, no cluster grouping is applied to this JVM.
  • serverUrl– the server to which the Agent connects. If you are using On Demand Plumbr, make sure the value refers to https://app.plumbr.io If the Agent is connecting to a Plumbr Server installed in your premises, make sure you have specified the correct server URL.

Proxy configuration

When your network configuration requires outgoing communication to pass a proxy server you can set up the communication between the Plumbr Agent and the Server via a proxy. Specifying the values for these parameters redirects the traffic from the Agent to the Server via the proxy server specified in proxyUrl.

  • proxyUrl– the proxy URL that you can use to connect to the Plumbr Server if a direct connection from Agent is not possible. If proxy is used, this setting is mandatory; other proxy settings are optional. An example of the parameter: proxyUrl=http://squid.mycompany.com:3128.
  • proxyAuthUser– the username for proxy authentication. Note that the Plumbr Agent only supports Basic authentication.
  • proxyAuthPassword– the password for proxy authentication. Note that the Plumbr Agent only supports Basic authentication.

Logging configuration

Parameters in logging configuration are used to tune the logging of the Plumbr Agent.

  • logConf – the location of the Logback configuration file Plumbr Agent uses for logging purposes. Detailed logging configuration is embedded in the referred XML file which you can tune to suit your needs.
  • tmpDir– the location of the temporary files generated by Plumbr during runtime. Such temporary data includes the buffered data Agent has not yet sent to the Server and temporary data structures used during Agent-side analysis. The location is relative to the location of the Plumbr Agent in the file system.
  • doCleanup– whether or not Plumbr deletes the temporary files after they are no longer needed. Switch this to false only when told so by Plumbr support.

Network Configuration

When your network configuration is blocking Plumbr Agents from connecting to the Plumbr Server, you will see a message similar to the following in your JVM standard output logs:

****************************************************************
* Plumbr Server not responding -                               *
* cannot connect to https://app.plumbr.io.                  *
* Retrying in 60 seconds.                                      *
****************************************************************

You should also notice that although the Server cannot be reached, the JVM will still start, it will just not be monitored by Plumbr.

To verify the problem is related to network configuration only, try connecting to app.plumbr.io port 443 from the machine you are installing Plumbr, to see whether the connection is allowed. This can be achieved via telnet, similar to following:

$ telnet app.plumbr.io 443
Trying 54.171.1.110...
Connected to app.plumbr.io.
Escape character is '^]'.

When the connection is successful, you should see a message Connected to app.plumbr.io, similar to the example above. When the connection fails, the network configuration is blocking connections to app.plumbr.io port 443.

To overcome the situation, connections via proxy servers or relaxing the firewall configuration are the first two options recommended. If you can not change the network configuration, you should turn to our On Premise offering where you can install the Server component in your network.

Proxy configuration

When your network configuration requires outgoing communication to pass a proxy server, you can set up the communication between the Plumbr Agent and Server via proxy. Specifying the values for the following parameters in the plumbr.properties file located next to the Plumbr Agent redirects traffic from the Agent to the Server via the proxy server specified in the proxyUrl.

  • proxyUrl– the proxy URL that you can use to connect to the Plumbr server if direct connection is not possible. If proxy is used, this setting is mandatory; other proxy settings are optional. An example of the parameter: proxyUrl=http://squid.mycompany.com:3128.
  • proxyAuthUser– the username for proxy authentication. Note that the Plumbr agent only supports Basic authentication.
  • proxyAuthPassword– the password for proxy authentication. Note that the Plumbr agent only supports Basic authentication.

Firewall configuration

Another option for bypassing connectivity issues is to check your firewall configuration. If the outgoing connections from the Plumbr Agent are blocked by a firewall make sure the connection to app.plumbr.io:443 is allowed in your firewall configuration.

The exact configuration steps for this are firewall-specific. See your vendor manuals for further information.

Upgrading Java Agent

Automatic Upgrade

Starting from version 17.07.11, Plumbr Java Agents are capable of automatically upgrading their version. You can enable/disable this functionality under Settings menu, in case the default enabled option is not suitable for you.

Whenever new Plumbr Java Agent  version is released, existing agents connected to the Plumbr Server will download updated version. The switch to newly downloaded version happens after next JVM restart.

More detailed flow of the auto-update for those who want to take a peek under the hood:

  1. When connection is established between the Java Agent and Server, Agent checks for whether or not the auto-update is enabled. If the auto-update is disabled, the process is aborted.
  2. After connection establishment, a message is sent to the Java Agent if its build number is lower than the build number of the highest known Agent version. The message contains the version to update to, the checksum and download URLs for that specific version
    1. The very same message is broadcasted to all connected Java Agents each time you change the setting from enabled to disabled or vice versa in settings menu.
  3. When the Java Agent receives the message, it makes a request to the checksum URL to download the new Agent and performs the checksum verification.
  4. The downloaded ZIP is unzipped in a separate directory in installation dir which has a name starting with version.
  5. On next JVM restart where the particular Agent got attached to, the wrapper will automatically select the new version because it has a higher build number than the previous one.

Notice that auto-updating is only possible for our On Demand customers. On Premises users must use the manual agent update process.

Manual Upgrade

If you need to manually upgrade Plumbr Java Agent (either from pre 17.08.05 version or because of your company policies) you need to go through the following steps:

  1. Download the new Java Agent from here.
  2. Backup the current Agent installation.
  3. Unzip the newly downloaded agent .zip file to the folder you wish to install the Plumbr Java Agent to.
  4. Copy plumbr.properties from the backup to new Agent installation.
  5. Update startup parameters of the JVM you are monitoring to point to new Agent JAR file:
    -javaagent:path-to-new-agent/plumbr.jar
  6. Restart the JVM you want to monitor.

Identifying the JVM

When you configure your application to run with Plumbr, you have an option to identify this JVM with a name suitable for your deployment. “Payment Live” or “Reporting QA” as examples can give you an idea what this ID can look like. Assigning the ID can be done in three ways:

  • By setting “jvmId” property in the plumbr.properties file. This file resides in the same directory with “plumbr.jar” file. Just find in that file a line starting with “jvmId=” and append the selected name, similar to the example: “jvmId=Payment
  • By providing your JVM with “plumbr.jvmId” system property. Just add “-Dplumbr.jvmId=Payment” to your application command line. E.g. “java -Dplumbr.jvmId=Payment -javaagent:/path/to/plumbr/plumbr.jar …”. This option is useful in dynamic environments, where JVMs are created and destroyed dynamically via scripts.
  • When your application is already running and is connected to the Plumbr Server, then by going to this JVM detail view and by clicking on JVM name, providing a new name and clicking “Save”.

If you don’t manually identify your JVM as described above, it gets assigned an auto-generated ID. The generated ID will be ephemeral in the sense that it will not persist between JVM restarts. When you restart your application and have not provided the ID yourself, then a new JVM with a new identifier will be created in Plumbr Server. Also pay attention that if you don’t specify jvmId manually, specifying clusterId is mandatory.

Agent startup checks

While starting the Agent, Plumbr goes through the following checks to verify the integrity of the installation:

  1. Verifying file system permissions: whether or not the folder Plumbr Agent resides and its subdirectories are readable and writable by the user launching the JVM Plumbr is attached to.
  2. Verifying the configuration: whether or not all the required configuration parameters are present and valid.
  3. Verifying the support for the environment: whether the OS, JVM and application server used are supported by Plumbr
  4. Verifying the connectivity: whether the Plumbr Agent can connect to Server.
  5. Verifying the Agent version: whether the Agent is still supported by the Server the Agent is connecting to
  6. Verifying the subscription: whether the account has an active subscription or has the subscription period expired
  7. Other miscellaneous checks, including but not limited to:
    1. If a proxy server is used to connect to Plumbr Server, then whether the proxy can be connected with the credentials provided
    2. Whether or not the jvmId parameter used to identify the JVM connecting to Server is unique
    3. Whether or not multiple JVMs are using the same Plumbr installation.
    4. Check for the jvmId does belong to the account it tries to connect to.

Some of the steps can fail due to various reasons. In such case the Agent will not be attached and the JVM starts without the Plumbr Agent monitoring the end user experience. The reason for the failure will be exposed in the server’s standard output.

So in order to find whether or not the Agent failed to initialize, search the log files for “Plumbr” phrase. In case you discover one of the error messages listed below, follow the instructions specified in the particular error message.

Verifying file system permissions

********************************************************************
* Plumbr encountered a filesystem permissions error.               *
* The user that runs your Java process has no write access to the  *
* /users/me/plumbr                                                 *
* Plumbr needs write access to the whole directory.                *
*                                                                  *
* Please ensure that the user that runs your Java process has      *
* read and write permissions for that directory,                   *
* its sub-directories and files inside it.                         *
*                                                                  *
* Check out https://plumbr.io/support/agent-configuration          *
* for more information or contact support@plumbr.io                *
********************************************************************

When you encounter this error message in log files, it indicates that the user launching the JVM Plumbr Agent is attached to does not have enough permissions to access the Plumbr Agent installation directory.

Plumbr Agent needs to be able to read and write the folder the Agent is installed (and its subdirectories). In order to proceed, you need to ensure the user running the Java process the Agent is attached to has read and write permissions for both the Agent’s installation folder and its subdirectories.

Verifying the configuration

******************************************************************************
* Plumbr is missing the following required properties: serverUrl.            *
* Either make sure the plumbr.properties file is present next to plumbr.jar  *
* or specify individual properties via -D parameters in your startup script. *
*                                                                            *
* Check out https://plumbr.io/support/agent-configuration                    *
* for more information or contact support@plumbr.io                          *
******************************************************************************

When you face such a banner in your JVM startup scripts then either the entire configuration stored in the file plumbr.properties file or some of the mandarory properties are missing. The plumbr.properties file is located next to the Plumbr Agent .jar file.

First step to overcome the problem is to make sure the Plumbr installation directory is intact and you have not extracted only the Agent’s plumbr.jar file. In case the file is present, check the content of the error message indicating the missing mandatory parameter(s) and add such parameters to the file. When in doubt, check the Agent Configuration page in our support materials.

An alternative to have the configuration present in the plumbr.properties file in the filesystem is to configure the Agent to specify parameters in the JVM startup script using -D parameters, prefixing each parameter with the “plumbr.” namespace. So, for example, you could specify the accountId, jvmId and serverUrl parameters for your JVM also via:

java -Dplumbr.accountId=a8nd2bar -Dplumbr.jvmId=BillingProduction -Dplumbr.serverUrl=https://app.plumbr.io

Verifying the support for the environment

**********************************************************************************************
* Environment you are trying to run Plumbr in is not supported.                              *
* Windows XP operation system is unsupported. Minimum supported Windows version is Windows 7.*
*                                                                                            *
* Check out the support page https://plumbr.io/support/is-my-environment-supported-by-plumbr *
* for the list of supported environments.                                                    *
**********************************************************************************************

When facing a banner similar to the one above in your log files, the environment the Plumbr Agent is running in is not supported by the Agent. The exact message will be different, depending on which unsupported operating system, JVM vendor/version or application server was detected in the environment.

To overcome the issue, consult the list of supported environments in our support documentation to find out whether or not you have a possibility to use Plumbr in an environment officially supported by Plumbr.

Verifying the connectivity

***********************************************
* Plumbr Server not responding -              *
* cannot connect to https://app.plumbr.io. *
* Retrying in 60 seconds.                     *
***********************************************
***************************************************************
* Plumbr Server not responding -                              *
* cannot connect to https://app.plumbr.io.                 *
* Retrying in 60 seconds.                                     *
*                                                             *
* In case your network configuration is blocking connections  *
* to Plumbr servers, see how to configure proxy server and/or *
* firewall https://plumbr.io/support/network-configuration.   *
*                                                             *
* In case your company policy does not allow using externally *
* hosted services, try out Hosted Plumbr which does not       *
* require external network connections                        *
* https://plumbr.io/support/hosted-plumbr                     *
***************************************************************

When you face a banner similar to the either of the above in your JVM log files, then it indicates the Agent deployed cannot connect to Plumbr Server over the network. Plumbr Agents will start without the presence of the Server, but as there is no endpoint to send the harvested data, then the Server cannot analyze the gathered data and thus you receive no value from Plumbr.

Pay attention that when the Server endpoint is just temporarily unavailable, the Agent will buffer the data locally. The Agent also periodically retries to connect to Server and when the Server (re)appears, the buffered data will be sent to Server.

To verify that the problem is related to network configuration only, try connecting to app.plumbr.io port 443 from the machine you are installing Plumbr, to see whether the connection is allowed. This can, for example, be achieved via telnet, similar to following:

$ telnet app.plumbr.io 443
Trying 54.171.1.110...
Connected to app.plumbr.io
Escape character is '^]'.

When the connection is successful, you should see a message Connected to app.plumbr.io, similar to the example above. When the connection fails, the network configuration is blocking connections to app.plumbr.io port 443.

To overcome the situation, connections via proxy servers or relaxing the firewall configuration are the first two options recommended. If you can not change the network configuration, you should turn to our On Premise offering where you can install the Server component in your network.

Verifying Agent version

During the startup, the Agent version will be compared to the Server version to verify whether or not the Agent is still supported by the Server the Agent is connecting to. In general we use the following policy for Agent version support

  • Servers accept connections from Agents up to one year older than Servers. Agents older than one year will be rejected by Server.
  • Servers will not be compatible with Agents released later than the Server.

Recommending to upgrade

************************************************************************************************
* You are using version 16.08.02 of Plumbr. We recommend upgrading to the latest version 12356.   *
* Download the latest version of the agent here: https://app.plumbr.io/download/agent/16.09.20 *
************************************************************************************************

When facing a banner like the one above, your currently used Agent is between 1 to 3 months behind the latest and greatest Agent available. As we add new features almost every month, you should consider upgrading, but you still have nothing to worry about.

Strongly recommending to upgrade

********************************************************************************************************
* You are using version 16.08.02 of Plumbr, which will be supported only until 2017-01-01                 *
* Download the latest version (16.12.12) of the agent here: https://app.plumbr.io/download/agent/16.12.12 *
********************************************************************************************************

When facing the banner above, your current Agent is between 3 and 6 months older than the Server the Agent connects to. The Server will still support the connecting Agent, but you should start planning for the Agent version upgrade t

Deprecated Agent version

********************************************************************************************************
* You are using deprecated version 16.08.02 of the Plumbr agent.                                         *
* Download the latest version (17.04.02) of the agent here: https://app.plumbr.io/download/agent/17.04.02 *
********************************************************************************************************

When facing the message above, then the Agent connecting to the Server is from 6 to 12 months older than the Server the Agent connects to. The version is already deprecated and will be unsupported when the 12 months limit will be hit. You should plan for Agent version upgrade as soon as possible.

Unsupported Agent version

**************************************************************************************************************
* You are using unsupported version 16.08.02 of the Plumbr Agent, which can no longer connect to Plumbr Server. *
* Please upgrade to the latest version of the agent from: https://app.plumbr.io/settings/download-center *
**************************************************************************************************************

When seeing an error banner above, the Agent can no longer connect to the Server as the Agent version is older than the oldest Agent version supported by this particular Server. You need to upgrade to a new Agent version in order to proceed benefitting from Plumbr.

Agent version newer than Server version

*************************************************************************************
* You are using unsupported version 16.08.02 of the Plumbr Agent                       *
* Plumbr Server accepts only agents released before the server                     *
* Please refer to Download Center https://app.plumbr.io/settings/download-center *
* to get supported version of Plumbr Agent                                         *
*************************************************************************************

When facing an error message above in your JVM logs, the Agent connecting to the Server is newer than the Server. As Server accepts connections only from Agents it knew existed when the Server was released, this “agent from the future” is not allowed to connect.

When facing this message you are using our On Premise offering where you have installed the Server yourself. This means you have two options:

  • Preferably you should upgrade the Server, so that new Agents with new and shiny features can apply all their new features in your deployment.
  • If this is not possible, you need to use older Agent version, so that the Agent is not released later than the Server it connects.

Verifying the subscription

Plumbr is a subscription-based software with 14-day free trial subscription available. When the subscription has expired, the data on your account is kept for 10 more days, but you can no longer monitor the applications with Plumbr. After the 10 days have passed from your subscription expiring, the data on your account will be permanently deleted.

Expiration warning

***************************************************************************************
* Your subscription will expire on 2016-01-01.                                        *
* From 2016-01-01 Plumbr will not monitor your JVM(s) any more.                       *
* To renew your subscription get in contact with Plumbr Sales at sales@plumbr.io.     *
***************************************************************************************

When encountering such a warning in your log files, your subscription will expire soon. If you wish to benefit from Plumbr after the subscription period, you should start planning for the subscription extension.

Paid account expired

**********************************************************************************************
* Your subscription expired on 2016-01-01 and Plumbr is not monitoring your JVM(s) any more. *
* Your data will be available for 10 days, after which your account will be deleted.         *
* To renew your subscription go to https://app.plumbr.io/payment                          *
**********************************************************************************************

If you encounter the message above, it means your subscription has expired. The data is still present on your account, but you can no longer monitor any JVMs. Whenever the 10 days have passed from the subscription expiration, the data will be permanently deleted.

To keep benefitting from Plumbr, extend your subscription.

Paid account deleted

******************************************************************************
* As you subscription was not renewed your Plumbr account has been deleted   *
* and you cannot monitor your JVM(s) with Plumbr any longer.                 *
* To start monitoring your JVM(s), sign up and purchase Plumbr               *
* one year subscription: https://app.plumbr.io/payment                    *
******************************************************************************

Encountering the message above indicates your subscription has expired and more than 10 days have passed from the expiration date. The data on your account has been deleted.

The way to start reusing Plumbr is to purchase a new subscription.

Trial account expired

***************************************************************************************
* Your free trial is now expired and Plumbr is not monitoring your JVM(s) any more.   *
* Your data will be available for 10 days, after which your account will be deleted.  *
* To activate your account go to https://app.plumbr.io/payment                     *

The message above indicates that the free trial you used has expired. You can no longer monitor the JVMs with Plumbr. The data gathered during the trial is still available for you until 10 days have passed from the trial expiration. After this, the data on your account will be permanently deleted.

Trial account deleted

*************************************************************************************
* As your free trial expired your Plumbr account has been deleted and you cannot    *
* monitor your JVM(s) with Plumbr any longer. To start monitoring your JVM(s),      *
* sign up and purchase Plumbr subscription: https://app.plumbr.io/payment        *
*************************************************************************************

The banner indicates that your free trial has been expired and the data on your account has been deleted.

If the trial demonstrated the value of Plumbr to you, then the way to keep using Plumbr is to switch to a paid subscription.

Miscellaneous checks.

Besides the categories above, Plumbr Agent performs a number of other checks, which can also result in error/warnings being printed into the JVM standard output.

Connecting to wrong Server

***************************************************************************************************
* The account does not exist at htts://my-plumbr-sever-installation:8080/.                                         *
* Check plumbr.properties to make sure you are connecting to the correct Plumbr Server instance.  *
* If indeed so, contact support@plumbr.io                                                         *
***************************************************************************************************

When facing the error above, the Agent is connecting to a Server using the accountId the Server is not aware of. This usually means you are connecting to an incorrect Plumbr Server. If this is the case, just make sure the serverUrl in plumbr.properties is pinpointing towards the correct Server.

When it is not the case, contact our support@plumbr.io and let us figure out the source for the problem.

Lambda support in early Java 8 releases

*******************************************************************************************************
* There is a known issue with Java versions 1.8.0 - 1.8.0_31 where using Java agents                  *
* together with code that uses dynamic invocation (such as lambdas or dynamic languages)              *
* may cause segmentation faults. If these are not used in your application, your JVM may be safe,     *
* but for production sites we do not recommend using Plumbr with Java 8 versions older than 1.8.0_40. *
* To make sure this problem will not occur, either:                                                   *
*   a) Upgrade your Java version to 1.8.0_40 or newer                                                 *
*   b) If upgrading Java version is not possible, turn off JIT compilation for java.lang.invoke       *
* package by specifying -XX:CompileCommand=exclude,java/lang/invoke/ in your JVM startup script.      *
*******************************************************************************************************

When facing the warning above, you are running on an early Java 8 build which are known to contain bugs which will affect your JVM when you are making use of lambdas or dynamic languages along with any Java Agents attached to the JVM.

The application might work fine, but in order to make sure you will not run into any issues, please consider either

  • upgrading the Java version to 1.8.0_40 or newer
  • Turning off JIT compilation, as specified in the error message.

Native agent loading failure

*****************************************************************
* Native agent could not be loaded from                         *
* /users/me/plumbr                                              *
* This may be caused by missing read or execute permissions for *
* plumbr home directory or one of its subdirectories.           *
*                                                               *
* Check out https://plumbr.io/support/agent-configuration       *
* for more information or contact support@plumbr.io             *
*****************************************************************

When facing the error above, the filesystem permissions for native agents located in lib/ folder next to the Agent’s plumbr.jar file are not readable or executable by the user launching the JVM Plumbr is attached to.

To solve the problem you would need to make sure the user launching the JVM Plumbr Agent is attached has read and execute permissions for the lib/ folder and its subdirectories.

In all honesty, this is one of the cases we do not fully understand can be created. So if you are facing this situation, we would really appreciate if you could contact support@plumbr.io so we could understand how on earth this permission issue can even happen.

Proxy credentials missing

**********************************************************************************
* The Proxy server at your-proxy-server.ip:3039 is requesting a username and a password.           *
* Please add them to plumbr.properties file. You can find the instructions here: *
* https://plumbr.io/support/manual#network-configuration                         *
**********************************************************************************

When seeing the banner above in your JVM standard output, then you are trying to connect from the Agent to Server using a proxy server. The proxy server requires authentication but the configuration you provided in plumbr.properties does not contain username and password.

To solve the issue, provide the correct username and password in the Plumbr configuration to access the proxy.

Multiple JVMs using the same jvmId

*************************************************************************************************************************
* The JVM ID “myjvmid" is already in use. This happens when multiple JVMs are connecting                                *
* to Plumbr Server using the same jvmId configuration parameter specified in plumbr.properties file.                    *
* In order to solve the problem, download new Plumbr agent from here: https://app.plumbr.io/settings/download-center *
* and make sure the agent location in the new JVM refers to a different Plumbr installation in file system.             *
*************************************************************************************************************************

When facing the error message above, the Plumbr Agent was not started. It was so due to a JVM already being connected to the Server using the same jvmId as specified for the rejected JVM.

This can happen when you have copied the Plumbr installation used by one JVM and are using it for the second JVM. The jvmId specified in the plumbr.properties file must be unique, so to solve the issue you would need to make sure all the JVMs you want to monitor with Plumbr have a unique jvmId specified in the configuration (either in plumbr.properties or passed as -D parameter).

Multiple JVMs accessing the same Plumbr installation

**************************************************************************
* Working directory is locked. This might happen when you launch several *
* applications from the same plumbrHome at the same time.                *
* In order to solve the problem, download new Plumbr agent from here:    *
* https://app.plumbr.io/settings/download-center                      *
* and make sure the agent location in the new JVM refers to a different  *
* Plumbr installation in file system                                     *
**************************************************************************

When you encounter the error above, your are trying to launch two JVMs both loading the Plumbr Agent from the same location in the filesystem. Pay attention that each JVM monitored by Plumbr must use a unique Plumbr installation.

To overcome the issue, create a separate Plumbr Agent installation for each JVM monitored and load the javaagent from different locations for each JVM monitored.

Troubleshooting startup failures

When you have followed the installation instructions and attached Plumbr JVM Agent to the JVM as specified, you should see a new JVM appearing at https://app.plumbr.io/jvms. Additionally, the JVM standard output should contain an information banner similar to the following:

************************************************************
* Plumbr (15.12.14) is attached.                           *
*                                                          *
* Plumbr agent is connected to the Plumbr Server.          *
* Open up https://app.plumbr.io to follow its progress. *
************************************************************

If your experience does not match the success symptoms above, please follow the five steps below to find out the cause:

  1. First stop is to check whether the JVM you attached Plumbr is actually started. One way of doing this is by listing all running Java processes by specifying jps -lvm in the command line. Output of the command is the list of the JVMs currently running in the machine. Make sure the JVM in question is among them. If not, the JVM failed to start and you should turn to JVM/application logs to find the cause.
  2. If the JVM is up and running, the next step is to make sure the Plumbr Agent you specified in the startup scripts was picked up. Again, the way to do it is to check the output of the jps -lvm command. The output would look similar to the following, where it is visible that the process with ID 6349 has picked up the javaagent from /home/me/plumbr/plumbr.jar.
    my-precious:~ me$ jps -lvm
    6359 sun.tools.jps.Jps -lvm -Dapplication.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home -Xms8m
    6349 MonitorContention -javaagent:/home/me/plumbr/plumbr.jar
    my-precious:~ me$
  3. When the output above does not contain the -javaagent section for the JVM in question,the parameter specified in the startup script was not picked up. To proceed in such case, you should debug the startup scripts to see where the added / modified parameters are lost.
  4. If the -javaagent was present in the process list, the next stop is checking the application logs. Search for messages containing “Plumbr” to see whether there are any traces about Plumbr exposing the problem. In many cases, when you find Plumbr log records, these contain both the cause what went wrong and the way to correct this. The full list of the error messages printed by Plumbr Agent along with the solution guidelines is available here.
  5. If there is nothing in the application logs, next source for information is in Plumbr Agent logs located in folder logs/ next to the Agent’s plumbr.jar in the filesystem. The folder contains plumbr.log and plumbr-debug.log files. Search these files to see whether there are any error messages in the logs. Again, in most cases the error messages would guide you to the problem along with a reference to the solution.

If following the steps above does not reveal the source of the problem, feel free to contact support@plumbr.io and we will figure the solution out together.

Server Configuration

Upgrading Plumbr Server

Plumbr Server update is based on building new Docker images and mounting the data to the newly built images. No data stored in the docker images is thus preserved, so you cannot expect any manual configuration changes made to existing docker machines to be preserved.

To upgrade Plumbr Server you need to go through the following steps.

  1. Download a new version of Plumbr Server distribution from Download Center.
  2. Extract downloaded archive on top of the existing plumbr-server folder replacing all existing files.
  3. Restart Docker Compose project by running “./launch.sh” from plumbr-server This will download all updated Docker images and then recreate all affected containers.
  4. After process completes, new version of Plumbr Server is now available at same URL as previously.

As a next step, the Plumbr Agents connecting to the Server need to be upgraded. You can do this independently of the Server update, but for consistency you need to eventually also upgrade the Agents.

On-Premise OOM analysis

Running analysis when an OutOfMemoryError occurs in an application is computationally intensive and requires an amount of RAM proportional to the number of objects in the JVM. Therefore, when running your own Plumbr Server on-premise, additional steps must be taken to find root causes for an OutOfMemoryError.

Semi-automatic analysis

The OutOfMemoryError meta-information snapshot will always be automatically sent by Plumbr Agent to the Plumbr Server after such error occurs. A corresponding Root Cause screen will appear in Plumbr Server, prompting you to do the following:

  1. Select or create a machine with an amount of RAM as specified on the page
  2. Download a .jar file that would perform the analysis to that machine
  3. Run it, supplying the required amount of heap to the JVM by specifying the -Xmx argument

The program will automatically download the meta-information from Plumbr Server, run the analysis and upload a complete report back to Plumbr Server. This assumes two things:

  • You have specified the plumbr.server.url property in the server properties or set it in the web interface
  • The machine that runs the analysis has access to the machine where Plumbr Server is running

In case condition (1) is not met, you can still run the analysis by supplying a property to the jar file: -Dportal.url=https://address-of-your-plumbr-server-installation.

Running analysis from behind a firewall

In case access to the Plumbr Server is restricted by a firewall, some additional manual actions are required:

  1. Click on “Detailed information” on the Root Cause page to follow the instructions
  2. Copy the meta-information files named oom_dump_v4.tbz2 and oom_dump_info.txt to the target machine
  3. Supply the path to the copied .tbz2 file to the jar running command
  4. Supply the path ot where the report should be saved, e.g. report.bin
  5. Run the analysis, e.g. java -Xmx1g -jar analyze-oom ~/oom-analysis/1/oom_dump.tar.bz2 report.bin
  6. Upload the report.bin file to the corresponding form on the Root Cause page

Data retention
By default, meta-information files will be immediately deleted upon successful analysis. Data files that date back more than 30 days will also be deleted, even if no analysis was performed on them. At any time you may manually delete the dumps from the ${plumbr.server.home}/data/dumps folder.

Data backup & restoration

Plumbr Server stores all persistent data in PLUMBR_SERVER_HOME/data folder on the host server running docker containers. We have provided a sample backup script called “backup.sh” that can be used to preserve the most important data. Just run it periodically (e.g. via cron job) as follows:

cd $PLUMBR_SERVER_HOME

./backup.sh /my/backup/destination

Please note, that only final aggregated data which is presented in the Plumbr Server UI is preserved. Raw probe data sent by Plumbr Agents as well as all intermediate partially processed data is not backed up.

In order to restore Plumbr Server installation after some disaster or relocation to another server do the following:

  • Install Plumbr Server on new server as described in Plumbr Server Installation Manual
  • Run Plumbr Server and wait for 5 minutes until it creates the required structures, both internally and on the file system in data subfolder
  • Run the provided script “restore.sh”

Browser Agent API

Introduction

Plumbr Browser Agent API enables control over:

  1. Starting transactions (see this chapter for more details)
  2. Naming of Applications (see description of an application for more details)
  3. Naming of Services (see description of a service for more details)
  4. Naming of Users (see description of an user for more details)

The following guide describes how to install the browser api and how to use it.

Installation

Plumbr Browser API is installed along with the Plumbr Browser Agent, so no additional installation is needed. See Browser Agent Installation guide for more details.

Setting transaction attributes

Plumbr Browser API allows setting following attributes of a transaction:

  • Application
  • Service
  • User

These attributes can be set in two alternative ways: programmatically or via configuration parameters of the browser agent script. Selecting which way to use depends on the application type. Classic web applications, which generate HTML on the server side and have all the knowledge there may find it easier to generate required values for the configuration on the server side, eliminating the need for additional JavaScript code. Single page web applications, may find it more suitable to setup these attributes via direct JavaScript calls.

For information on configuration parameters please see the section on Browser Agent Configuration


Plumbr Browser API is exposed on window.PLUMBR, so it can be accessed globally as PLUMBR. It is recommended to wrap API calls in try…catch blocks to avoid situations where user-side blocking of the agent (such as privacy targeted browser addons) would crash your application.

document.getElementById('add-to-cart').addEventListener('click', function() {
    try {
        PLUMBR.setServiceName('add product to cart');
    } catch(err) {}

    // ajax call to add product to cart...
});

Available configuration options and API calls will be described below.

Application

In Configuration API Call
{ "appName": "Marketing Site" } PLUMBR.setAppName('Marketing Site')

Set the application name for the page. This is persistent across all transactions made on the page (such as soft navigations, ajax interactions), so setting it in the configuration means it doesn’t need to be called again in API.

Service

In Configuration API Call
{ "serviceName": "Product details" } PLUMBR.setServiceName('Product details')

Set the service for the current transaction. The service name set in configuration is always the service name for the transaction that represents loading the page, while API calls mean the transaction that was currently active. For example in a SPA when the API method is called after user clicks on a link, it will be used to define the service of the transaction that is made by user clicking on the link.

User

In Configuration API Call
{ "userId": "person@example.com" } PLUMBR.setUserId('person@example.com')

Set the user of the page. This is persistent across all transactions made on the page (such as soft navigations, ajax interactions), so setting it in the configuration means it doesn’t need to be called again in API.

Transaction Management

In most cases the browser agent is able to automatically detect all of the the user interactions. However there might be cases where you’d want to start transactions manually.

PLUMBR.startTransaction(serviceName)

Starts a new transaction under serviceName service.

Example usage: Starting a transaction when document receives an external-force event

document.addEventListener('external-force', function (event) {
    try {
        PLUMBR.startTransaction('External force');
    } catch(err) {}

    messageServerAboutExternalForce();
});
Note: Include the api call as close to the source as possible. That way when the browser agent starts detecting it natively, it will try to avoid creating multiple transactions.

document.getElementById('sign-up').addEventListener('click', function (event) {
    try {
        PLUMBR.startTransaction('User signs up');
        // Because browser agent has already detected the click it will be translated to
        // PLUMBR.setServiceName('User signs up')
    } catch(err) {}

    registerUser({ /* ... */ });
});
    

Regenerate User ID

In Configuration API Call
{ "regenerateUser": true } PLUMBR.regenerateUser()

Plumbr browser agent uses a randomly generated tracking ID for each user, which can then assigned an identity by setting it via config or api. However we’ll only use the first known identity for the user. If you wish to change the user identity later (for example updating info based on “user + role”, and user changing role), a new user tracking ID must be generated.

Regenerating via configuration

Ideal for: When it’s needed after role change that causes hard navigation (eg. form submit, clicking on link). The transaction containing page load will be linked to the new user.

Example:

<script
  src="https://browser.plumbr.io/pa.js"
  crossorigin="anonymous"
  data-plumbr='{
    ...
    "regenerateUser": true,
    "userId": "Admin as John Doe"
  }'>
</script>

Regenerating via API

Ideal for: When it’s needed after ajax requests. The transaction that is made after calling the method will be linked to the new user.

Example:

// In callback of some ajax method
function (newUserIdentity) {
  try {
    // Generate new user tracking ID
    PLUMBR.regenerateUser()
    // Set new user identity
    PLUMBR.setUserId(newUserIdentity)
    // Start new transaction that is linked to this user
    PLUMBR.startTransaction()
  } catch(err) {}
}

Java Agent API

Introduction

Plumbr Agent API enables programmatic control over:

  1. Application naming (see description of an application for more details)
  2. Service naming (see description of a service for more details)
  3. Identification of users (see description of users for more deatils
  4. Transaction boundary definition (see definition of a transaction for more details)

The following guide describes how to install the api dependency and how to use it.

Installation

To start using the Plumbr Agent API, agent-api.jar must be added as a dependency to your project. When running the application without the Plumbr Agent attached, all calls to the library will be silently ignored without any performance impact. When the Plumbr Agent sees the attached Agent API library, it will perform the requested integration calls.

The Agent API is published on Bintray () and Maven Central.

Javadocs are published with the API, and are also available here.

To add the dependency, copy and paste the suitable snippet for your build system from the respective Bintray or maven central page.

To use the library in the code, the following import must be added to your source file:

import eu.plumbr.api.Plumbr;

Terminology

Span represents some time that the application has spent executing in one thread. Spans may be started and finished. Once a span starts, it becomes associated with the current processing thread and all root causes, which are detected within that thread are associated with the active span.

A span may contain any number of child spans. Child spans may be associated with threads either in the same JVM, or in a different JVM, which also is monitored by the Plumbr Java Agent.

A span may have metadata associated with it, which is shown in the single transaction view of an unhealthy transaction which that span belongs to.

A Transaction is a tree of spans, which consists of a root span and all of its children. The transaction has some additional properties that describe that tree of spans. These properties include:

  • a transaction ID (a UUID, generated automatically)
  • an application name (taken from the root span)
  • a service name (taken from the root span)
  • an identifier of a user (taken from the root span)

Creating new transactions

When to use: Plumbr Agent fails to automatically discover transactions in a given application.

How: In this case, the transaction should be created manually by first calling eu.plumbr.api.Plumbr.newSpan(), then configuring the service name and application of the span and calling eu.plumbr.api.Span.start() to start it and eu.plumbr.api.Span.finish() to end it.

Example:

Plumbr
  .newSpan()
  .setAppName("My application")
  .setServiceName("My Service")
  .setUserId("user@domain.com")
  .start();

try {
  // do work
} catch (Exception e) {
  Plumbr.getCurrentSpan().fail(e);
} finally {
  Plumbr.getCurrentSpan().finish();
}

Setting transaction attributes

When to use: Plumbr Agent is able to detect transactions, but fails to assign meaningful service name, application or user ID to them.

How: In this case, eu.plumbr.api.Plumbr.getCurrentSpan() should be called to get a reference to the automatically created span and then the properties of that span be set with the corresponding methods in eu.plumbr.api.Span:

setServiceName(String serviceName)
setAppName(String appName)
setUserId(String userId)

The getCurrentSpan() is null-safe and thus it never returns null. If there is no current span in the current thread, then an instance of eu.plumbr.api.null.NullSpan is returned instead. It is, in turn, a null-safe implementation of the Span. So, if an agent is not attached, then you still can call all the setters on the object returned by the getCurrentSpan() without any additional null-checks. In most cases this is sufficient.

If you really need to check whether there is a current Plumbr span within the current thread (for example if the code which you want to monitor, can be called both from within a Plumbr transaction and without such), then method Span.isNull() will return true if the returned span is a null-span and false if there is a current span.

Example:

Plumbr.getCurrentSpan().setUserId("my precious user");
Plumbr.getCurrentSpan().setServiceName("my precious service");
Plumbr.getCurrentSpan().setAppName("my precious application");

Create failed transaction with a custom exception

When to use: Plumbr is able to detect transactions, but is unable to automatically detect if they fail or associate the correct exception with the failure.

How: In this case, eu.plumbr.api.Plumbr.getCurrentSpan() should be called to get a reference to the automatically created span and then eu.plumbr.api.Span.fail(Throwable) be called to mark the span as failed and to optionally associate a specific exception as a root cause for the failure.

Example:

try {
 // do something that throws wrapped exception
} catch (Exception e) {
  Plumbr.getCurrentSpan().fail(e.getParent());
}

Join remote spans to existing transaction

When to use: a request made from a transaction causes a new transaction in a remote application where linking it as a child span into the first transaction is desired.

How: In this case, before calling the remote service, the caller should create a child span in the current span by calling first eu.plumbr.api.Span.createChildSpan() and then serializing it using eu.plumbr.api.SpanSerializer. This serialized span can then be included in the request to the other application (which should be also monitored by the Plumbr agent) and deserialized there with eu.plumbr.api.SpanSerializer and should then be started and finished manually using calls to eu.plumbr.api.Span.start() and eu.plumbr.api.Span.finish() respectively. After the call to the remote span finishes, the calling side must acknowledge that by calling eu.plumbr.api.Span.finishChildSpan(childSpan). See full examples below.

Listing 1: Managing a child span in the parent process:

Span childSpan = Plumbr.getCurrentSpan().createChildSpan();
String serializedChildSpan = SpanSerializer.toBase64(childSpan);

// Transfer serializedChildSpan to another machine.
// See Listing 2 about what to do there.
try {
try {
// perform remote call
} finally {
	Plumbr.getCurrentSpan().finishChildSpan(childSpan);
}
} catch (Exception e) {
	// If this failed remote call should fail the transaction:
	Plumbr.getCurrentSpan().fail(e)
}

Listing 2: Working with a child span on remote JVM:

String serializedChildSpan = … // obtain a serialized child span
Span span = SpanSerializer.fromBase64(serializedChildSpan);
span.start()
try {
	…
catch (Exception e) {
	span.fail(e);
} finally {
	span.finish()
}

Triggering Alerts

General Approach

Through the use of Plumbr Server API, it is possible to expose the insights captured by Plumbr to any system that can make an HTTP call. One of the more common use cases for that is sending out alerts to you on-call team so they can immediately respond to the degraded service level. Let us go over some of the common use cases that you might face.

Example 1:

Suppose that there is an e-shop application monitored by Plumbr at shop.example.com. What’s the most crucial metric for this application that can directly show if the business is going well? There may be many answers to that, depending on the business model, but “is anything being sold” would probably be close to the top of the list.

Since the application is monitored by Plumbr, each click on the “CHECK OUT” button on the cart is tracked, and the outcome is recorded. In the user interface, it could appear like this:

Screenshot from Plumbr UI

Looks like we have hundreds of users successfully checking out their cart. This means that revenue is being generated, and the e-shop can keep going. However, we’d like to make sure that these deals keep happening 24/7. So let us use the Plumbr API by passing in the serviceId and applicationName seen on the screenshot above.

$ curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=serviceId%3D1234567890abcdef,applicationName%3Dshop.example.com&last=4h"

[
    {
        "failed": 1, 
        "onlySlow": 0, 
        "success": 249, 
        "total": 261, 
        "verySlow": 11
    }
]

These values can then be compared against some thresholds or other triggers. For instance, if there are zero sales during the last 4 hours, then something is probably broken (or it’s January the 1st). As we’ll see a bit later, it is very simple to codify such rules and send out alerts when needed.

Example 2:

Another important metric to track the well-being of the e-business would be how the users are experiencing the e-shop. If they are forced to wait for the spinning wheels, or, worse, if they are facing errors while flowing through the shop, then the long-term perspectives of the application are gloomy. In such a competitive market, people can just find a different e-shop that works for them.

With Plumbr, we can directly track the status of all the interactions in the e-shop:

$ curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=applicationName%3Dshop.example.com&last=4h"

[
   {
      "total" : 609,
      "failed" : 3,
      "success" : 586,
      "verySlow" : 20,
      "onlySlow" : 0
   }
]

Dividing the “success” by “total”, we see that about 4% of the customers have received a sub-par digital user experience. If that number goes up, then it’s definitely a good cause for an alert.

Example 3:

For a bit of a more complex example, you could use Plumbr track the longer-term behaviour of your application. For instance, in some cases it may be a good idea to track spikes in error rates for a particular API call. A straightforward (albeit naive) approach to this would be using the moving average crossovers. To do that using Plumbr Server API, you would need to make two calls for different time windows:

$ curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=24h"

[
   {
      "total" : 2997918,
      "failed" : 20361,
      "success" : 2957453,
      "verySlow" : 0,
      "onlySlow" : 104
   }
]

$ curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=1h"

[
   {
      "total" : 125001,
      "failed" : 19117,
      "success" : 105884,
      "verySlow" : 0,
      "onlySlow" : 0
   }
]

From here, we can see that the error rate for 24 hours is under 1%, and may be within the SLO and perhaps not a reason for triggering an alert just yet. However, the majority of these errors all occurred in the last hour, with the error rate spiking to over 15%. This clearly indicates an issue. If something is not done quickly, the SLOs will be violated in no time.

The next step would be to set up regular monitoring of these values and to send out alerts based on them. This will come in the subsequent sections.

Triggering alerts using Cron

Putting it all together now, you can use the examples from the previous section to create a rudimentary alert system by writing a simple bash script:

#!/bin/sh

function alert() {

sendmail admin@example.com << EOF
subject: Alert from Plumbr
from: admin@example.com

Alert from Plumbr: $1
EOF

}

CHECKOUT_COUNT=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=serviceId%3D1234567890abcdef,applicationName%3Dshop.example.com&last=4h" | jq ".[0].total")

if [ $CHECKOUT_COUNT -eq 0 ]; then
	alert "There were no carts checked out in the last 4 hours"
fi


ESHOP_UX_STATS=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=applicationName%3Dshop.example.com&last=4h")
ESHOP_USERS_TOTAL=$(echo "$ESHOP_UX_STATS" | jq ".[0].total")
ESHOP_USERS_OK=$(echo "$ESHOP_UX_STATS" | jq ".[0].success")
ESHOP_ERROR_RATE_PCT=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))

if [ $ESHOP_ERROR_RATE_PCT -gt 10 ]; then
	alert "Error rate in e-shop is $ESHOP_ERROR_RATE_PCT"
fi


SEARCH_API_STATS_24H=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=24h")
SEARCH_API_TOTAL_24H=$(echo "$SEARCH_API_STATS_24H" | jq ".[0].total")
SEARCH_API_FAILED_24H=$(echo "$SEARCH_API_STATS_24H" | jq ".[0].failed")
SEARCH_API_ERROR_RATE_PCT_24H=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))

SEARCH_API_STATS_1H=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=1h")
SEARCH_API_TOTAL_1H=$(echo "$SEARCH_API_STATS_1H" | jq ".[0].total")
SEARCH_API_FAILED_1H=$(echo "$SEARCH_API_STATS_1H" | jq ".[0].failed")
SEARCH_API_ERROR_RATE_PCT_1H=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))


if [ $SEARCH_API_ERROR_RATE_PCT_1H -gt $SEARCH_API_ERROR_RATE_PCT_24H ]; then
	alert "Short-term error rates are going up"
fi

This queries the Plumbr API for the values of all the relevant metrics of the application, and then verifies that these are within operational ranges. If not, an alert is sent out via email.

The pre-requisites for this script to work is to have sendmail configured on the machine, and curl and jq installed. Then all you have to do is add this script as a cron job and go to sleep.

Besides manually running queries, you can also add Plumbr data to your existing monitoring system such as Prometheus or Nagios. Using Plumbr allows you to have a much more clear signal of the user experience level instead of using low-level metrics like CPU utilization or instance health.

Integrating with Nagios

To integrate Plumbr with Nagios, you will need to use a custom check command that queries Plumbr Server API and verifies the returned numbers against a threshold. A reference implementation is available on bitbucket, along with more detailed instructions.

Integrating with Prometheus

To integrate Plumbr with Prometheus, you will need to use a custom exporter that collects data from Plumbr Server API and exposes it as Prometheus metrics. A reference implementation of such an exporter is available on bitbucket. This implementation can be configured to cover basic use cases, such as gathering the metrics and alerting based on their values using the standard Prometheus Alert Rules.

A pre-built docker image is coming soon as well. The more detailed documentation is available in the README file.

Integrating with Zabbix

To integrate Plumbr with Zabbix, you will need to use a custom external check item that fetches data from Plumbr Server API. Based upon the returned values, you can use the standard Zabbix triggers to send out alerts. A reference implementation is available on bitbucket, along with more detailed instructions.

Integrating with other systems

We currently only provide ready-to-use integrations for the monitoring systems that are the most widely used by our customers. Given the existing reference implementations here and the Plumbr Server API, it should be straightforward to map the same approach to integrate Plumbr with any other system as well. If in doubt, do not hesitate to contact us at support@plumbr.io.