Manual
Introduction
What is Plumbr?
Plumbr is offering products in software monitoring market. Plumbr Real User Monitoring (RUM) is designed to monitor and expose the experience of the users in terms of performance and availability. Plumbr Application Performance Monitoring (APM) will trace poor user experience down to bottlenecks and errors in the server-side source code using distributed tracing.
With Plumbr, you will gain the following:
- Exposure to the performance and availability of particular application.
- Early awareness via PagerDuty/Slack/JIRA/email alerts in situations where performance or availability degrades
If either the availability or performance need attention, Plumbr helps you to understand:
- What functionality of the application is failing?
- What are the errors causing failures for the most users?
- What parts of the applications perform the worst?
- Where are the worst bottlenecks frustrating end users the most?
As a result, the engineering teams will be equipped with the list of errors & bottlenecks in the application ranked by the number of users impacted & the time wasted.
With errors and bottlenecks ranked by their true impact to users, engineering teams can now focus on resolving the errors impacting the most users & bottlenecks wasting the most time.
How does Plumbr work?
Plumbr tracks every user interaction within the web application’s user interface in a browser. The most common means of interaction are clicks, keyboard / mouse events and touches.
Every such interaction is monitored for its outcome and duration. If the interaction fails with any technical error, it is flagged as failed. The interaction duration is used to understand how quickly the application responds to a particular user action.
Plumbr also monitors interactions for more details in order to detect errors causing interactions to fail & bottlenecks causing users to waste their time.
As a result, Plumbr is able to expose the user experience in terms of performance and availability. This is coupled with bottlenecks and errors degrading the impact, ranked by their impact to end users.
What does Plumbr consist of?
Plumbr deployments contain up to three different modules:
- Browser Agent, capturing the end user experience in the device used by your customers. This is the module you should be installing first.
- Java and Universal (PHP, Python, nginx, Apache) agents, tracing the user interaction through backend services. Presence of these agents also exposes bottlenecks & errors in the backend layers. Installation of these modules is optional.
- Plumbr Server, responsible of receiving and processing the data collected by the Agents. Server also exposes the UI. We recommend using our SaaS offering as the Server. In this case, the installed Agents would connect to https://app.plumbr.io and serve the user interface also via https://app.plumbr.io.
If you only use the Browser Agent, the deployment model includes injecting our JavaScript Agent as the first script to the <HEAD>
section of all the HTML pages in your application. After doing so, the Agent can start listening to the end user interactions in the browser:
The data captured is sent to Plumbr Server, which is responsible for assembling all the details about the user interaction and allowing you to run analysis on the data.
In case you are after end-to-end transparency and deploy both the Browser and server agents (Java, PHP, Python, nginx, Apache), the user interactions are monitored from the browser to all the nodes in the backend:
The Plumbr Java Agent is packaged as a standard -javaagent
, and attaching it does not mean you need to make any changes to your application. The only required change involves pinpointing the location of the Agent in the file system by adding -javaagent:/path/to/plumbr.jar
to JVM startup scripts.
The Plumbr Universal Agent runs an additional process outside of the monitored application. It is installed either as a system service, or launched manually when used in Docker images. No changes to the monitored application are required. The agent is attached by setting the LD_PRELOAD
environment variable, which is automatically added to the configurations of supported system services.
The Agents located in the monitored nodes pass along the transaction ID as a call metadata via HTTP headers. This way all the nodes servicing the user interaction can be assembled in the Plumbr Server into a single interaction distributed across multiple nodes.
Dashboard
Plumbr Dashboard
The dashboard has a singular purpose: To provide a detailed overview of the availability and performance of each application (or API backend) that you’re responsible for.
The Plumbr Dashboard serves as a single source of truth for all the applications in your portfolio. You can pin any application (or API) that is being monitored by Plumbr to the Dashboard. Click on the “➕” icon or use the “PIN…” buttons to add a card that summarizes the application availability and performance characteristics of any application (or API).
When viewing the list of applications being monitored (app.plumbr.io/applications), you also have the option to toggle the ‘pin’ 📌 icon to pin or unpin a card from the Dashboard.
Each card has a link called “See Details”. Clicking this will allow you to dive into the summary of the availability and performance of each application.
Changing the date-time filter will also change the aggregate of the values displayed on the cards for each application. Whether you choose built-in time intervals or custom ones, the cards on the Plumbr Dashboard will reflect the metrics for your application from the corresponding time period.
You also have the option to rearrange the cards on the Plumbr Dashboard.
Another important facet is related to inviting team members to access Plumbr. You can share access to Plumbr data with others in your team or organization. The applications that you chose to permit someone to have access to will be pinned to their respective Plumbr Dashboard by default.
When any application is successfully being monitored, the default screen shown to users is the Plumbr Dashboard. Using an account on Plumbr, you are at liberty to monitor as many applications (or API backends) as you want to. The Plumbr Dashboard allows you to pin the applications that you’re responsible for on your Dashboard.
Please note: If you’re a trial user, the behaviour will be different in some cases.
- By default, the first 8 applications (or APIs) that you monitor will be automatically added to the Plumbr Dashboard.
- You will not have the option to pin/unpin any of the applications (or APIs) to cards.
User Interactions
What are user interactions?
User interaction is reflecting the real user experience after one interaction with the user interface. The interaction starts with an event generated by a real user via the UI in browser. Common types of such events are mouse clicks, touches and keyboard events, but Plumbr supports all other means of interacting with the browser.
Plumbr Browser Agent listens to all such user interactions. Only the interactions resulting in any server-side requests are considered relevant. All other interactions are ignored and never sent to Server to be reported. As a result of this, scrolls in static pages or clicks in empty areas are never registered as user interactions.
Every captured interaction is linked with any HTTP requests occurring because of the interaction.
If any of the requests returns with 40x or 50x series response, the interaction is flagged as failed, indicating that the end user did not accomplish what she intended.
All interactions are monitored for their duration as well. The duration of the interaction is calculated from the interaction in the browser (click/touch/…) until the last fired HTTP request returns its response to the browser.
Plumbr also keeps track of the user performing the interaction, the application in which the interaction was performed, and the functionality that the interaction consumed. This allows you to keep track of what the particular user was actually doing within the application.
Every transaction starting in a browser thus captures and exposes the following information:
- The ID of the transaction
- The ID of the user performing an interaction
- The start and end timestamps of an interaction
- The application to which an interaction belongs
- The functionality of the application used
- Whether the interaction was failed, stuck or successful
Status flags
Some user interactions can be assigned an extra status based on its outcome:
- Failed, indicating that the interaction did not complete as expected due to technical errors. This status is set if at least one of the HTTP requests during the user interaction responded with a 400 or 500 series error code.
- Stuck, set when the duration of the ongoing interaction exceeds the predetermined slow threshold by more than 100x. In this case Plumbr assumes the transaction will never complete and flags the transaction as Stuck. Plumbr will stop monitoring the stuck transaction after flagging it as such.
Status flags for API calls work similarly.
API Calls
What are API Calls?
Besides monitoring real user experience in browser, Plumbr server-side agents are designed to monitor the APIs published on the server-side. Currently Plumbr supports APIs running on Java Virtual Machines (JVMs), PHP (both on FPM and Apache), Python (CPython and PyPy, with or without uWSGI) and reverse proxies (nginx, Apache).
The API call is monitored from the moment it arrives to the server until the response is sent to the caller. The API call duration and its outcome are registered similar to the user interactions, making it possible to expose the performance and availability of the API.
Depending on the situation, zero, one or two copies of the API call can be created by Plumbr:
- In case the upstream server to the JVM processing the API call did not contain either Plumbr Browser Agent nor a server agent, one instance of the API call is created.
- In case a Plumbr Browser Agent was monitoring the real user experience upstream to the server processing the API call, two copies of the API call are created. First instance is linked with the ongoing user interaction and will be a part of the distributed trace reflecting the entire user experience throughout the back-end infrastructure. Second instance will be linked to the specific API and will be reflecting the performance & availability of the specific server-side API.
- In case a Plumbr Agent was monitoring the server directly upstream to the server processing the API call, no separate API call is registered for this sub-API by default. You can gain full transparency to a specific API that is part of the distributed back-end, by enabling API calls forking either on JVM settings screen or on Cluster settings screen:
Additional server-side monitoring increases the transparency, as all interactions are now traced to all server-side nodes to more precisely capture the evidence needed for solving the potential error or bottleneck.
That being said, we strongly recommend to start monitoring the real user experience first by adopting our Browser Agent and adding the server-side monitoring during the second phase of adoption. Without understanding the true user experience, exposed insights from the server-side are missing the proper context and making it hard to make informed decisions.
Applications
What are applications?
Applications are software bundles used by end users and monitored by Plumbr. As such, applications aggregate user interactions / API calls using the same software. This aggregated view represents the user experience all the users of the software have received.
Getting exposure to the information of how the users experienced different applications monitored by Plumbr allows you to quickly understand which applications behave as expected and which applications require investments to improve the user experience:
Identifying applications
To distinguish between applications, the application identifier is set in the Plumbr Agent configuration. It can also be changed via the Plumbr user interface available at https://app.plumbr.io. In essence, it is just a string, uniquely identifying each software bundle you wish to monitor with Plumbr.
The configuration determining the application is different for the web applications monitored by Browser Agent and APIs running in server-side:
For web applications monitored by Plumbr Browser Agent, application is specified as a parameter to the Browser Agent installation as seen in the following example via appName parameter:
<script src="https://browser.plumbr.io/pa.js" data-plumbr='{"accountId":"123456789", "appName":"CRM", "serverUrl":"https://bdr.plumbr.io"}'> </script>
For server-side APIs monitored by Plumbr Java Agent, application name is derived from two properties specified in Java Agent configuration located in plumbr.properties file.
You will find these properties from the plumbr.properties file next to the Agent’s plumbr.jar file:
- jvmId. Specify this parameter for JVMs which survive restarts. They can also be permanent members of clusters. jvmId is unique and cannot be reused by several JVMs.
- clusterId. Specify this property if the JVM monitored by Plumbr is either a member in load-balanced cluster or an ephemeral node with short lifecycle. This way all the members sharing the same clusterId exposing the same API end up aggregated under the same cluster.
Specifying at least one of these attributes is mandatory.
To give you an idea, some examples of typical application names we see include:
- CRM
- eShop
- Order Management
or
- eshop-test
- eshop-staging
- eshop-production
Feel free to pick any string uniquely identifying the application. Pay attention that it helps if the rest of the team can associate the name later with the real software, so names as “kate experimenting” or “second application” tend not to be good choices.
Application versions
Whenever you release a new version of your application, you can let Plumbr know about it as well. Then Plumbr will be able to show you how the availability and performance metrics change across different versions of your application.
Application version is any string that uniquely and meaningfully identifies and distinguishes any given deployment from the previous one, e.g. “20.02.21” or “4.2.56” or “1.2-b45hfdf3”. We don’t impose any constraint on version identifier format or what exactly “version” means in your case.
You pass application version as one of the configuration parameter to Plumbr Agent. All our agents (JVM, Universal and Browser) accept this configuration parameter and annotate any collected data with its value.
Example 1
When running JVM application, the easiest way to pass version information is via environment variable:
export VERSION=4.3.18 export PLUMBR_APP_VERSION=${VERSION} java -jar myapp-${VERSION}.jar
The similar approach works well with Docker-based applications:
export VERSION=4.3.18 docker run -e PLUMBR_APP_VERSION=$VERSION -d org.example/myApplication:$VERSION
Example 2
In case of web applications, it makes sense to inject application version together with the rest of Plumbr Browser Agent configuration during build time. E.g. when using webpack:
<script src="https://browser.plumbr.io/pa.js" crossorigin="anonymous" data-plumbr='{ "accountId":"...", "appName":"...", "serverUrl":"https://bdr.plumbr.io" "appVersion": "<%= htmlWebpackPlugin.options.appVersion %>" }'> </script>
Then pass version information during the build:
npm run build -- --appVersion 4.3.18
Bottlenecks
What are bottlenecks?
Bottlenecks are underlying reasons why a particular user interaction or an API call was slower than expected. Plumbr’s goal is to map each user interaction that took significant amount of time as closely to the underlying bottleneck as possible, giving our users the means to understand the impact of each bottleneck.
The impact of a bottleneck is measured both in number of users who suffered from it, as well as the total time wasted while waiting behind this bottleneck. Having access to this information allows you to rank the bottlenecks and fix the ones with higher impact (i.e., more wasted time) first.
Bottlenecks are detected either in the layer monitored by a Plumbr Agent or as a downstream call from the layer. This is best understood via following example:
- The application is monitored only by Plumbr Browser Agent:
The detected bottleneck exposes the blocking XHR call to the backend taking 6 seconds. As Plumbr was not monitoring the backend, there is no transparency to what in the backend was causing the six second wait. - The application is monitored both by Plumbr Browser Agent and Java Agents:
The detected bottleneck is now more specific, exposing that a synchronization issue within the JVM forced the XHR call to wait 5.5 seconds for the lock to be released.
Browser Bottlenecks
Bottlenecks detected in the browser
Browser bottlenecks cover the entire lifecycle of the user interaction exposing the reasons why a user interaction initiated in the browser took longer to complete than expected. To achieve this, Plumbr Browser Agent monitors different phases for all the HTTP requests triggered by the user interaction:
Plumbr uses different methods to capture the required information – using the combined power of observing the document tree for changes (Mutation Observer) and instrumenting native methods for requests (such as XMLHttpRequest) means we are able to keep track of resource loading start and end times. However, as those are basic start and end metrics and can be inaccurate, we additionally use the Resource Timings API to get a detailed breakdown of the request’s lifecycle from the browser whenever available.
As a result, Plumbr is capable of exposing the following bottlenecks in the browser:
- Browser – Redirect – in cases where there were a lot of redirects or some of the redirects were slow
- Browser – Cache Fetch – in situations where requests by browser were already cached in the local disk and extracting this resource from the disk was slow
- Browser – DNS lookup – whenever the DNS lookup was slow
- Browser – TCP connect – in cases where establishing the TCP connection to the backend server was slow
- Browser – SSL handshake – when establishing the secure connection to the backend server was slow
- Browser – Request Wait (Assets, XHR, Pageload) – in situations where an asset (CSS, JS, etc.), an XHR or an HTML page request from the backend server was slow
- Browser – Download (Assets, XHR, Pageload) – in cases where an asset, an XHR or an HTML page download from the backend server was slow
- Browser – HTTP Call (Assets, XHR, Pageload) – whenever the HTTP request for an asset, a XHR or a HTML page from the backend server was slow, but Plumbr wasn’t able to distinguish the exact phase due to missing support for Resource Timings API in user browser.
- Browser – Queue Wait – in situations where there are lots of resources to be fetched for particular page from the same domain, then browser can place these requests into a waiting queue.
Browser – Redirect
If you have a lot of redirects on your site then it can be very painful to the performance of your site as the browser needs to request them one by one. Plumbr is taking redirects time into account and if it exceeds given threshold then exposes Redirect Root Cause with an full URL where redirect was ended.
In general the rule is to avoid redirects as much as possible, they can drastically decrease page loading especially considering mobile devices. When possible try to make only single redirect.
Browser – Cache Fetch
In case the resource queried by the browser was recently requested it’s response can already be located in local cache and can be retrieved from there. In some cases when local disk is under high load it can happen that taking resource from the cache will take more than expected and if that time exceeds threshold then Cache Fetch Root Cause is registered with static text “Cache Fetch”. It is supposed to be a very rare Root Cause, assuming that clients machines nowadays are powerful enough to overcome such issues.
Browser – DNS lookup
Another issue Plumbr now exposes is slowness caused by browser making a request to a DNS server to translate domain name to IP. DNS lookup Root Cause will be registered if this phase exceeded the threshold. Plumbr groups all requests to the same domain into one DNS lookup Root Cause referencing the particular domain.
Reasons why DNS lookup is slow usually are following:
- network is bad between client and DNS server
- DNS server is at heavy load and cannot process requests fast enough
- DNS server is not configured properly to make fast lookups
- short DNS TTL setting causing servers to frequently check if their cached value is up to date
To get rid of this problem one can try to improve network conditions or change DNS servers if possible.
Browser – TCP connect
Bad network can cause one more issue also exposed by Plumbr now – TCP connect Root Cause. This Root Cause is registered when making a connection from browser to backend systems takes too long. As with DNS lookup Root Cause this kind of requests are grouped by the domain name where connection establishing was slow.
Browser – SSL handshake
During connection establishment phase the browser also does SSL handshake in case of a secure connection. SSL handshake Root Cause can be caused by poor network conditions, slow client or server. Domain name is also used to group all requests affected by this given Root Cause.
As it is not always possible to improve network conditions then one can pay attention to server performance looking that it’s CPU is not under heavy load and there is sufficient RAM to keep previous connections alive. Also it’s good to not choose certificates with very long keys (RSA 2048 should be sufficient).
Browser – Request Wait (Assets, XHR, Pageload)
Request waiting phase indicates the time from when the browser starts sending the request to getting the first response bytes from the backend. We differentiate 3 different Root Causes for assets (CSS, images, etc.), XHR and full page load requests. Root Cause grouping is done using:
- domain name for Assets and full page load requests
- shortened URL for XHR requests
While for all previous Root Causes the main reason can be network, here the reason can more likely be in the backend system and one needs to performance test and debug the system to find the real reason.
Browser – Download (Assets, XHR, Pageload)
Request download phase indicates time browser is receiving response from the backend system. Here we also are making 3 different Root Causes for assets (CSS, images, etc.), XHR and full page load requests.
Root Cause grouping is done using:
- domain name for Assets and full page load requests
- shortened URL for XHR requests
To solve the issue in most cases one needs to optimize backend system endpoint (making it closer to the user, eg. using a CDN) or the size of content returned from the system.
Browser – HTTP Call (Assets, XHR, Pageload)
In case we cannot determine a particular phase (as resource timings weren’t available) and the request is slow, then HTTP Call Root Cause is registered. Here we also are making 3 different Root Causes for assets (CSS, images, etc.), XHR and full page load requests.
Root Cause grouping is done using:
- domain name for Assets and full page load requests
- shortened URL for XHR requests
Browser – Queue Wait
Less critical resources such as images can be put into a queue by the browser and processed later. Plumbr registers such requests under Queue Wait Root Cause if being in the queue phase exceeds the threshold. The reason why it happens is:
- lack of TCP connections
- too many requests are being processed in parallel (usually by default the browser can process 6 requests to the same domain in parallel)
- some requests are postponed because considered by browser not critical
Grouping of Queue Wait Root Causes is done by domain name.
To solve this issue:
- domain sharding could be done.
- reduce the number of requests by:
- concatenating CSS or JS files together
- using CSS sprites for images
JVM Bottlenecks
Bottlenecks detected in the JVM
Java Agent monitoring Java Virtual Machines detect server-side bottlenecks for slow user interactions & API calls. The bottlenecks detected in the JVM are either occurring inside the JVM (for example lock contentions on synchronization or GC pauses) or occurring downstream from the monitored JVM (a particular JDBC call to database or web service call over HTTP to a node not monitored by Plumbr).
Plumbr Java Agent uses bytecode instrumentation and JVMTI hooks to capture the information about the bottlenecks from within the JVM and link the bottleneck with the particular user interaction.
As a result, Plumbr Java Agent is able to expose the following as bottlenecks in JVM:
- JDBC calls to databases, including MySQL, Oracle, Postgres and IBM DB2 databases.
- Lock contention issues
- GC pauses
- Filesystem operations
- WebService invocations over HTTP
- MongoDB calls
- Lucene searches and index updates
- Threadpooling-related issues
- N+1 issues occurring while communicating with remote systems
Excessive Number of …
“Excessive number of …” root causes are exposed in situations where many similar operations take place during a single transaction and the accumulated duration of such operations is the reason why the transaction ends up being slow.
For example, when just a single HTTP call is impacting user experience, it will be exposed as a Slow HTTP Call. In situations where many HTTP calls take place during a single transaction and the accumulated duration of such calls is the reason why the transaction ends up being slow, the Excessive Number of HTTP Calls root cause is exposed instead.
In most cases, the solution for such problems requires a change in application code. The performance gains can often be achieved by applying either of the following guidelines:
- Reducing the amount of operations invoked via changing the amount of data requested
- Batching the operations together instead of launching them via a single call.
Slow JDBC Calls
The Plumbr Agent monitors every JDBC Type 3 and Type 4 driver detected in the application. This means that Plumbr supports almost every database vendor exposing the data storage via JDBC, including but not limited to the most widely used MySQL, Oracle, Postgres and IBM DB2 databases.
The Agent instruments all the JDBC calls which connect to databases via Statement, Prepared Statement and Callable Statement APIs. When a call via such an API starts affecting the end user experience, the offending query is listed as a root cause exposing the JDBC operation executed along with the call stack from the thread executing the query. In such a way, you get access to the root cause of expensive JDBC operations down to a single line in the source code responsible for executing such queries.
In order to reduce noise and get a prioritized list of expensive database operations, Plumbr groups expensive operations triggered by the same root cause together, allowing you to rank the expensive operations based on the number of times they are detected.
File Stream Operations
The Plumbr Agent monitors file reading and writing operations performed by using FileInputStream and FileOutputStream classes. When the wait time for the read and write operations starts impacting end user experience, Plumbr recognizes this and links a File Stream Operation root cause with the slow transaction. The root cause exposed will contain the following information:
- File(s) being read/written, along with their path in file system, size and other relevant attributes.
- Call stack from the thread executing the operation, zooming you right to the line in source code accessing the file system.
There are several common problems that happen when reading from or writing to a file stream and leading to slow transactions:
- Lack of buffering: each read or write operation incurs overhead, depending on the operating system, file system and hardware. Instead of reading or writing one byte at a time, a much more performant approach would be to do it in bulk. A simple approach would be to make use of a BufferedInputStream or BufferedOutputStream
- System issues: like we said above, the performance of file operation depends on the operating system, the file system and the hardware. It is sometimes the case that one of these becomes the bottleneck, and even a single file stream operation could take tens of seconds.
Locked Threads
The Plumbr Agent monitors all JVM threads for lock contention events. Plumbr monitors both synchronized block/method access and java.util.concurrent locks.
For synchronized blocks/methods, Plumbr tracks the situations where a thread in the JVM executes code in a synchronized block or method and another thread tries to enter the same synchronized block/method.
For java.util.concurrent locks Plumbr will detect the situations where threads are forced to wait for events originating from the use of various java.util.concurrent classes, ranging from ReentrantLock to ArrayBlockingQueue.
When the wait times in either of the case exceeds a predetermined threshold, the root cause will be exposed, containing the following:
- How long the thread was forced to wait before getting access to the synchronized block/method.
- The monitor used to lock the method/code block (for synchronized usage only).
- The name and call stack from the thread trying to enter the synchronized block/method.
- The name and a snapshot of the call stack of the thread whose code was running in the synchronized block. The snapshot of the call stack is taken when the waiting time for the blocked thread is about to exceed the configured threshold.
Having such information allows you to zoom in to the underlying root cause with the precision of a single line in the source code, skipping the tedious and complex process of troubleshooting concurrency issues. Notice that Plumbr also binds together similar lock contention events, allowing you to rank the severity of the performance issues based on the frequency of the underlying root cause.
File Attribute Operations
The Plumbr Agent monitors file attribute querying performed by using methods such as File.exists(), File.isDirectory(), File.canWrite() and so on. While individual operations like that are usually handled very quickly, typically under a few microseconds, having a large number of them may result in a slow transaction. One of the most common cases is recursively walking a large directory that contains millions of files.
When the wait time for the attribute checking starts impacting end user experience, Plumbr recognizes this and links a File Attribute Operations root cause with the slow transaction. The root cause exposed will contain the following information:
- File(s) being accessed, along with the operation performed (exists(), isDirectory(), etc)
- Call stack from the thread executing the operation, zooming you right to the line in source code accessing the file system.
GC Pauses
The Plumbr Agent monitors all stop-the-world Garbage Collection pauses that take place in the JVM. If the duration of such a pause exceeds a configured threshold, an incident is created. In addition to the time and duration of the pause, a Plumbr incident contains insights that would help reduce either the duration or frequency of the long GC pauses, for example:
- Plumbr captures a memory snapshots, exposing the most memory-hungry data structures in memory. This allows you to proceed with trimming the most resource-hungry data structures.
- Allocation and promotion rates exposed by Plumbr, along with the memory consumption in different memory pools will give you clues about the poorly allocated heap structures..
Underprovisioned Thread Pools
Plumbr Agent monitors thread pools embedded in the application to detect situations where submitted tasks/requests will end up waiting in queue for available executor. When a the wait time in such queue starts impacting the end user experience, an Under-provisioned Thread Pool root cause is registered. The root cause exposes the call stack from the thread waiting in the queue for Plumbr users.
ThreadPools Plumbr Java Agent is able to monitor:
- ThreadPoolExecutor embedded in the Java SDK
- org.apache.catalina.core.StandardThreadExecutor from Tomcat application service (configured through “tomcatThreadPool” Executor)
Under-provisioned thread pools are surfaced as a root cause in situations where the thread pool is not able to provide a free thread enough threads to cope with the incoming work load. This can be so either due to:
- Work done by such threads is taking unusually long to complete. The solution for such cases is to optimize the code executed by the threads in the pool.
- Amount of tasks/requests submitted to a pool is higher than usual. In such situations the solution is either in controlling or load balancing the load.
- Last, but not least, situations where the thread pool configuration is not providing enough threads to match the regular load. In such situations the solution is as easy as increasing the number of threads in the pool configuration.
Slow HTTP Calls
The Plumbr Agent monitors different HTTP client libraries used for connecting to remote systems over HTTP. When the HTTP calls to such remote endpoints start affecting the end user experience, the offending HTTP query is linked to user transactions as a root cause, exposing the outgoing HTTP request along with the call stack from the thread executing the query.
Slow HTTP Calls tend to perform poorly due to the remote system not responding to the call from JVM quickly enough. To solve the problem, the system being accessed via HTTP needs to be tuned for latency. If this is not an option, caching the results can also used to reduce the number of such operations.
The supported list of HTTP client libraries includes:
- JDK HTTP (java.net.HttpURLConnection and related classes)
- Apache HTTP Commons & HttpComponents
- OkHTTP client (requires Plumbr Agent 17.01.24+)
Slow MongoDB Operations
To detect expensive calls to a MongoDB instance, Plumbr monitors DBCollection and MongoCollection interface methods such as find() and findAndModify(). When a call via such an API starts affecting the end user experience, the operation is listed as a root cause exposing the MongoDB operation called along with the call stack from the thread executing the operation.
In order to reduce noise and get a prioritized list of expensive MongoDB operations, Plumbr groups expensive operations triggered by the same root cause together, allowing you to rank the expensive operations based on the number of times they are detected.
Plumbr supports and monitors both the 2.x and 3.x versions of MongoDB drivers.
Slow JDBC Connection Acquisition
Plumbr detects slow JDBC Connection Acquisition when JDBC connection retrieval via DataSource.getConnection() or DriverManager.getConnection() is affecting end user experience. In such case Plumbr notices this and exposes the number of transactions affected along with the wait time the transactions were forced to wait behind the connection retrieval.
Slow connection retrieval can be caused either by
- Missing connection pool. Creating JDBC connections is expensive, so in this case please consider using pooling the connections.
- Uninitialized connection pool. In cases where pooled connections are initialized lazily, the first requests to the empty pool are slow. Consider initializing the pool during application startup.
- Under-provisioned connection pool. When the number of available connections in the pool is smaller than the demand, there will be wait time in queue for the connections. Consider increasing the pool size to match the number of concurrent requests to the data source.
- Leaking connection pool. If connections are not closed the connection pool does not know that the connection is no longer being used by the borrower thread. To fix this, add pool-specific options to pool configuration to spot leakages in pool.
- Testing connections. To avoid unused connections in pool for becoming stale, the pool implementations often test out the connection before handing it off to the executor thread. When the test query is expensive, this can result in poor performance. Consider simplifying or dropping the tests if possible.
Transaction Snapshots
Plumbr is capable of monitoring for a large number of specific root causes explicitly. Unfortunately, the different technologies used in real world means that the number of ways a particular code can perform poorly is effectively unlimited. Thus a fallback is implemented to cover the cases where the explicit root cause can not be determined. In such situations, Plumbr Agent captures snapshot(s) from the suspicious transaction.
Snapshots are effectively thread dumps taken from the thread executing the transaction. Snapshot capturing happens at increasing intervals during the transaction lifespan and is limited to 10 snapshots. Snapshots taken will be linked to the transaction if the duration of the transaction will eventually be flagged as Slow or Stuck. When the transaction ends up being successful, such snapshots will be discarded.
To expose this information in a useful way, Plumbr aggregates those call stacks into a tree-like structure. Call stacks occurring most frequently are ranked higher in a tree. To reduce noise, non-repetitive occurrences are hidden, enabling you to focus on the most frequently captured snapshots first.
Slow Lucene Operations
Plumbr monitors Lucene indexes being used via instrumenting and monitoring all implementations of org.apache.lucene.search.IndexSearcher and org.apache.lucene.index.IndexWriter interfaces. Doing so allows Plumbr to track all the operations modifying the index or reading from the index. This support is implemented and tested on Lucene 4 and 5 releases.
By monitoring the behavior of said interfaces, Plumbr is capable of exposing:
- The impact poorly performing Lucene indexes have on your end users
- Actual root cause, down to a single line in source code accessing the index
- Information about the index accessed, including the index size, accessed fields, accessor methods and more.
JDBC Multi-Queries
The Agent instruments all the JDBC calls which connect to databases via Statement, Prepared Statement and Callable Statement APIs. When a single JDBC statement executed through such APIs will impact user experience, a Slow JDBC Call is detected as the root cause. In situations where many database calls take place during a single transaction and the accumulated duration of such calls is the reason why the transaction is flagged as slow, the multi-query root cause is exposed instead.
In the details of this root cause you will find the offending queries along with the call stacks from the threads executing such queries. To minimize overhead, smart sampling is applied when exposing this data.
The Plumbr Agent monitors every JDBC Type 3 and Type 4 driver detected in the application. This means that Plumbr is able to monitor communication with almost every database vendor exposing the data storage via JDBC, including but not limited to the most widely used MySQL, Oracle, Postgres and IBM DB2 databases.
Slow ResultSet Processing
Slow ResultSet processing is detected when the result set fetched from database over JDBC is processed in a way it is affecting end user experience. In such case Plumbr notices this and exposes the number of transactions affected along with the wait time the transactions were forced to wait behind the JDBC result set processing.
To monitor the time it takes to process the resultset, Plumbr Agent monitors the cumulative duration of each java.sql.ResultSet.next() iteration. When the cumulative time of the iterations starts impacting end user experience, the Slow ResultSet Processing root cause is created. This root cause will expose the query whose results were processed along with the call stack through which the results were processed.
Slow ResultSet processing is usually detected when fetching large result sets received from database. To improve the situation, consider either switching to more fine-grained queries or use database-backed paging to limit the size of result sets.
Server Bottlenecks
Bottlenecks detected by the Universal Agent
Universal Agent detects server-side bottlenecks for slow API calls in monitored processes. Bottlenecks are detected for network operations that do not show up as spans, such as SQL queries or HTTP requests to endpoints not monitored by Plumbr. Details for these are collected from network traffic of the monitored process. Each bottleneck comes with a stack trace, which is collected using language runtime specific APIs (Zend API for PHP, Python/C API for Python).
Each bottleneck instance may be collected either as a single slow operation (one query/request) or as a collection of multiple operations (prefixed with “Multi” or “Multiple”). For each operation which exceeded the global bottleneck threshold (1 second), a single operation bottleneck is registered. If the remaining operations of the same type combined exceed the bottleneck threshold, the multiple operation bottleneck of that type is registered, with its name derived from the group of operations that took the longest time.
Currently the following bottlenecks are collected by the Universal Agent:
- SQL queries to MySQL and Postgres
- HTTP/1 requests
HTTP Requests
Plumbr Universal Agent monitors outgoing HTTP/1.0 and HTTP/1.1 requests made while handling an API call. These are detected by analyzing network traffic. Detecting outgoing HTTPS (SSL/TLS) requests is supported if the library used to make the requests uses OpenSSL for handling TLS.
The agent collects the request URL, request method and response code for each request. If the TCP connection is closed before request handling is finished, the request is marked as aborted, which is shown instead of the response code. The request duration is the time between when the first byte of the request was written by the application and when the last byte of the response was received. In case this was the first request for the same TCP connection, the time to open the connection is also included.
Each individual request longer than the bottleneck threshold (1 second) is registered as a bottleneck. If the combined duration of all requests below the bottleneck threshold in the same API call is longer than the threshold, then a multiple requests bottleneck is registered. HTTP request bottlenecks are not registered if the target is also monitored by either the Universal or Java agent. In that case bottlenecks from that target will be present instead.
SQL Queries
Plumbr Universal Agent supports detecting PostgreSQL (8+) and MySQL (5.5+) queries as bottlenecks. These are detected by analyzing network traffic. Monitoring TLS-encrypted SQL connections is supported if the SQL client library uses OpenSSL for handling TLS.
The agent collects the query text, server address, server version and database name for each query. Only query text is used for bottleneck name, other information is only visible when looking at a specific query in single transaction view. The query duration is the time between when the first byte of the query was written to the network socket and when the last byte of the result (or error) was received. The time to initiate the SQL connection is not included in the query duration.
Each individual query longer than the bottleneck threshold (1 second) is registered as a bottleneck. If the combined duration of all queries below the bottleneck threshold in the same API call is longer than the threshold, then a multiple queries bottleneck is registered. For MySQL, if the duration between the server signalling that query is finished and the time when the last result byte was received exceeds the bottleneck threshold, this is created as a separate additional bottleneck which has “Processing” prefix instead of “Querying”. In that case, the “Querying” bottleneck is only created if the duration without result processing time exceeds the bottleneck threshold.
Errors
What are errors?
Errors are the reasons for a particular user interaction or API call to fail to complete. Errors captured by Plumbr include technical errors, such as a Javascript errors in browser or Java Exceptions in the back-end. Plumbr is not capturing logical errors, such as the situations where VAT on an invoice is calculated incorrectly.
Plumbr’s goal is to map each failed user interaction to the underlying error, giving our users the possibility to understand the impact of each error. The impact is measured both in the number of unique users experiencing the error and in the number of failed interactions affected by the error. Having access to this information allows you to rank the errors and fix the ones with higher impact first.
Errors detected in browser
Plumbr Browser Agent captures errors from user interactions that include at least one request completing with 400 or 500 series response code, indicating either a client or server side failure. In addition, all Javascript errors occurred in browser during the interaction are exposed as root causes for failed transactions.
In all such cases Plumbr flags the interaction containing such request as failed. In addition, the request URL along with the response code is linked with the failed request as the error due to which the interaction failed.
There are also some exceptions. First, we are not considering spans with response code 401, 409, 418 and 451 as failed because they are used a lot in REST APIs to describe behaviour of business rules and no errors or warnings are registered from such spans.
Second, pretty often response codes 400, 412 and 422 are also used to describe behaviour of business rules, but that’s not always the case – sometimes they are symptoms of genuine error. So when we first see spans with 400, 412 and 422 response codes, we do extract errors from those spans, but extracted errors are automatically demoted to warnings. Should they mean actual error for your API, you can promote them back to errors.
JavaScript Errors
When an uncaught error is thrown in JavaScript code, it is gathered up by the agent and linked to the appropriate user interaction.
Script Error
When an error happens in a script loaded from a third party domain, web browsers include additional protections for user data. While error details are visible in developer console (and similar tools), gathering it programmatically gets blocked and error message is set to “Script Error”. This is done to avoid leaking personal data on the user from other sites.
Third party scripts can be whitelisted from this behaviour by loading them in a way that passes CORS rules. This requires both server-side configuration and changes to including script tag:
1. Server must respond with the correct CORS header(s)
Examples:
Access-Control-Allow-Origin: *
Access-Control-Allow-Origin: https://example.com
2. Adding crossorigin
attribute to the script tag with appropriate value
Example with normal script tag:
<script src="https://example.org/script.js" crossorigin="anonymous"></script>
Example with a script tag that loads another script:
<script>
var s = document.createElement('s')
s.src = 'https://example.org/script.js'
s.setAttribute('crossorigin', 'anonymous')
document.getElementsByTagName('head')[0].appendChild(s)
</script>
Scripts Failed To Load
When JavaScript error happens and current interaction or related page load interaction has related script assets which failed to load, then error name is “Scripts failed to load”.
Example:
Stacktrace: TypeError:$(...).tabs is not a function...
Related scripts that failed to load: https://example.com/example.js
Warnings By Default
As JavaScript is ran in the user’s browser, some errors caused by the environment may also be picked up by us, for example if the user has a broken extension. We use files referenced in the stack trace (for example if the majority of lines are pointing to non-http resources, such as safari-extension://
, or are completely anonymous) to guess if the error is more likely caused by these factors outside your application. Such errors get instantly demoted to warnings and can be checked separately in the warnings list.
Errors detected in JVM
If Plumbr Java Agent is used to monitor the server-side of an application, the creation of Java Exceptions is used to more specifically pinpoint the errors. Whenever a user interaction is flagged as failed, the chronologically last Exception occurring is linked to the interaction as an error. The Exception contains the full stack trace, allowing you to zoom in to the source code. Exceptions that do not affect any user interactions / API calls or exceptions used to steer control flow are not exposed.
Exceptions are grouped together into errors by deepest nested Exception class name and a top non-JDK calling method of this exception stacktrace. For example, all NullPointerException
thrown by com.mybusiness.finance.Biller.billCustomer
would be grouped together as instances of a single error under NullPointerException at com.mybusiness.finance.Biller.billCustomer
. If NullPointerException
is thrown from different method in code, it’s grouped accordingly.
Different call stacks are visible from the error details to verify whether or not the source code would need patches in multiple locations.
Demoting errors to warnings
On rare occasions Plumbr ends up flagging user interaction or API as failed in situations where the error detected does not impact real user experience. To overcome such situations, it is possible to demote an error to warning. This possibility is available in error detail view:
After demoting an error, all future user interactions/API calls containing this error will no longer be flagged as failed. Historical data will also be updated. Depending on the data volumes, updating the history can take up to one hour to complete.
Demoted errors turn into warnings. Warnings are accessible via Plumbr UI in the Errors menu item. To see the warnings, toggle the warning icon in the error list header to see the warnings:
In case an error was incorrectly demoted, promoting it back to error is also possible. Open warning details and click Promote to turn the warning back to error.
Root Causes
What happened to root causes?
Until 2018, Plumbr referred both to bottlenecks and errors via a generic term “root cause”. We learned that the two concepts are different and representing them via the same entity/noun created confusion, so the root causes were retired and decoupled to errors representing the root causes for failures & bottlenecks, representing the reason why the end users faced slow applications.
Services
What are services?
A service is a name for the operation the user was doing. As such, services group together similar user interactions (for example, paying an invoice or adding an item to a shopping cart).
Secondary purpose of a service is determining the slow threshold to interactions consuming this service. Transactions exceeding this threshold would be flagged as slow. Out of the box the slow threshold is 5s, but it can be overridden for a any service from the service settings.
Service detection works differently depending on what type of application Plumbr is monitoring.
Services in web applications
When the Plumbr Browser Agent monitors the application, service detection builds upon three components:
- the URL at which the user was before the interaction
- the interaction the user performed
- the URL at which the user ended up after the interaction completed
In the example above, the user was viewing an invoice with the ID 123 and decided to pay the invoice by clicking the button Pay. The application processed the payment, after which the user remained at the same URL with the confirmation message that the invoice was paid successfully.
Plumbr identifies the service for this interaction as Click “Pay” on /invoice/view/{1}. As seen from this example, all invoice payments carried out in the /invoice/view screen are grouped together under the same service, independent of the ID.
How labels for interactions are found
When browser agent detects that an interaction started from the user interacting with some element (for example clicking a button), we try to label the element with the first found non-empty:
- A
data-plumbr-service
attribute on element or parent:<a href="/profile/23" data-plumbr-service="User Profile">John Doe</a> <p data-plumbr-service="Editable Features"> <span>Features to configure</span> <label><input type="checkbox" /> Price</label> <label><input type="checkbox" /> SKUs</label> </p>
- A
<label>
element that hasfor
attribute that matches the element id:<input type="checkbox" id="newsletter"> <label for="newsletter">Send me marketing e-mails</label>
- Within the DOM tree (starting from interaction target) the first found:
- A parent
<label>
element<label> <input type="checkbox" id="newsletter"> Send me marketing e-mails </label>
- An element with
aria-label
attribute<input type="text" name="location" aria-label="Location">
- An element with
aria-labelledby
attribute<input type="text" name="location" aria-labelledby="location-help"> <p id="location-help">Enter Location</p>
- A parent
<img>
alt
attribute<a href="/sale"> <img src="/img/sale.png" alt="Hot Sales"> </a>
<img>
filename<a href="/sale"><img src="/img/sale.png"></a>
<input>
value if type is button, radio or submit<input type="submit" value="Purchase">
- The text of first
<option>
element in<select>
<select name="delivery"> <option value="0">Select Delivery Method</option> <option value="1">Courier</option> </select>
title
attribute of<i>
<i class="fa fa-times" title="Remove">
- The text in closest parent
<a>
,<button>
,<label>
,<span>
,<td>
or<th>
<a href="#">Continue to checkout</a>
Should none of these be found, a css selector is constructed from the target element and it’s parents.
<ul id="todo-list">
<li class="item">
<input type="text" name="item[]" value="Test labels">
</li>
</ul>
Results in: ul#todo-list > li.item > input[type="text"][name="item[]"]
Improving the service detection
There are known cases where out-of-the-box Plumbr configuration ends up with either services exposed via cryptic names. As a result you would see services similar to the following in Plumbr UI:
Key pressed on “div > list > filter > input#filter” at /user/search
This happens in situations where no human-readable elements were present to use as the identifier on the input field where the user performed the event. As a result of this, Plumbr used a fallback and exposed the DOM tree branch the event took place at. To replace this with a human-readable version, use “aria-label” attribute on such elements, so instead of
<input type="text"/>
you would use
<input type="text" aria-label="Name"/>
After this change, the name of the service will change to Key pressed on “Name” at /user/search for Plumbr. As a side effect, blind people also now have better access to the content of your site, as this is what the aria-label element was originally designed for.
Detecting a service from an URL
Let us explain this approach using a transaction arriving at the following URL as an example:
http://www.example.com/shop/cart/add/iPhone6?quantity=5
As a first step parameters are stripped. Next, service detection parses the URL to use /shop/cart/add/iPhone6 as the input. As seen, the last token identifying the product added to the shopping cart (iPhone6) is actually a parameter of the service. In order to group all interactions adding items to the shopping cart under the same service, Plumbr replaces the iPhone6 token in the URL with the placeholder {1}. As a result, the service detected from the transaction is
/shop/cart/add/{1}.
Using this approach makes it possible to group transactions accessing the same /shop/cart/add service together, regardless of the product you added to the shopping cart.
Certain limitations apply to which tokens can be automatically replaced by a placeholder: by default only tokens containing non-alphabetical characters are replaced.
If an application being monitored contains path parameters consisting only of alphabetical characters, then such services won’t be correctly grouped, i.e., there will be more services reported that is expected. If an application being monitored uses approach where there is only root path ( / ) and services are encoded using request parameters (either using ? or # as a separator), then the services will be grouped too eagerly under one root “/” service. For example both following URLs
/#tab=shoppingcart
/#tab=checkout
will be detected as one root service: “/”
To solve this, it is advised to use custom service grouping (accessible via Settings > Service Grouping Rules). There are two types of grouping rules – prefix based and regular expression based. Prefix matcher is the simplest and it will replace all URLs matching the defined prefix with the desired service name. Regular expression grouping rules are much more powerful, they allow matching request parameters and using groups with back-referencing to construct service names. This allows to overcome limitations of the default URL service detection. For our particular example we could define following regex grouping rule:
Matching pattern:
/#tab=(.*)
Service name pattern:
\1
Will result in two detected services:
shoppingcart
checkout
Services detected in API calls
Whenever the operation arriving to the JVM has not started in a browser monitored by the Plumbr Browser Agent, service detection is the responsibility of the JVM accepting. Service detection in JVM is done differently for different applications:
- HTTP calls. If the call arrive via HTTP protocol, Plumbr extracts the service from either:
- MVC framework metadata. If Plumbr supports a particular Java MVC framework (see the list below) used to process the incoming HTTP request, service detection uses the class/method name of the controller invoked
- If the HTTP call is not processed by a supported MVC framework, the service is detected using the information encoded in the URL.
- EJB methods. If the call arriving to the JVM is a remote EJB call, Plumbr users the EJB class and method name as the service
- Swing event listeners. If the interaction was captured in a Swing application, this interaction sets the Swing event and action listeners as the service
Detecting a service from MVC
When a JVM monitored by Plumbr exposes its services via an MVC framework supported by Plumbr, the service name is extracted from the controller processing the transaction. For example, when an HTTP request such as
http://www.example.com/payments?actionId=payInvoice&invoiceId=411121
is mapped and processed by a Struts controller, the service is extracted from the controller. An example of such a controller would then be visible in the Plumbr interface similar to:
com.example.payments.PaymentAction.execute().
The controllers from the following MVC frameworks are currently supported for service detection:
- Spring MVC
- Struts 1 & 2
- GWT 2.x
- JSF 1.1+
- Vaadin 6+
- ZK 7+
- Play framework
Whenever a HTTP request is not processed by an MVC framework known to Plumbr, service detection falls back to capturing the service from the information encoded in the URL.
Users
What are Users?
A user is an attribute of a transaction, which associates the transaction with the end user interacting with the application. Users in Plumbr are built upon two concepts: user tracking to distinguish one user from another and user identification to expose the identity of the user.
Tracking Users
User tracking works for web applications monitored by the Plumbr Browser Agent. User tracking is based on generating and storing a random unique string in the end user’s browser. This random string is then submitted along with each request from the particular browser.
The unique string is stored in the browser’s cookies, and so subsequent visits to the same site can be associated with the same user. The cookie used for this purpose is named plumbr_user_tracker.
In a Plumbr deployment where users are tracked but not identified, a unique user is counted each time your content is accessed from a different device or browser.
For example, the following journey appears as three different users in cases where user identification is not implemented:
- Searching for a product on a tablet or a phone one day,
- Purchasing the product on a desktop the next day,
- Filing a complaint about the purchased product on a laptop a week later.
Even if all these interactions were performed by an authenticated user, Plumbr would track it as three different users. While you can collect data about each of these interactions and devices, you cannot determine if any relationships exist. You only see independent data points.
The very same journey would be linked to a single user if users could be identified. In such a case, the interactions on different devices would be connected with the same user identity.
Plumbr user tracking only works for web applications. So, if you are monitoring EJB modules or Swing applications, Plumbr will be incapable of tracking the users in such deployments. If Browser Agent is used for monitoring user interactions then unidentified but tracked users will appear as separate anonymous users in the Plumbr UI. If only Java agent is used for monitoring, then users will be tracked only if it was possible to obtain identity information for authenticated users, i.e., there are no anonymous users for API calls.
Identifying Users
In order to expose the journey of a specific user and track the same user across multiple devices, Plumbr also embeds the possibility of identifying users. The exposed identity can be in any form that the particular application can handle. Typical examples of identity are the username or email address of the user.
User identity is automatically linked to a transaction in applications where Plumbr is capable of determining the location of the identity. Plumbr has 3 conceptually different ways of obtaining users’ identity:
- fully automatic discovery for certain frameworks,
- via configuration for certain sources of properties,
- programmatically via our APIs.
Fully automatic discovery, requiring no configuration, supports following frameworks for capturing identity:
- JWT Bearer tokens. If your application passes the identity of the user in the HTTP request headers using JWT Bearer tokens, Plumbr will use the value of the subject extracted from the token as the identity of the user.
- Spring Security. If the application monitored by Plumbr uses the authentication built into the Spring Security library, Plumbr will extract the user’s identity from security.core.userdetails.UserDetails.getUsername().
- Java Authentication and Authorization Service (JAAS). If the application monitored by Plumbr exposes authentication data via standard Servlet API (javax.servlet.http.HttpServletRequest.getUserPrincipal()), Plumbr will extract the identity from java.security.Principal.getName().
In case if Plumbr has not been able to detect the user’s identity automatically, you can help Plumbr to locate the identity yourself by configuring one or more of Identity Detection Rules:
- HTTP Header Rule. If your application passes along the identity of the user using HTTP Request Headers.
- Session Attribute Rule. If your application stores information about current user’s identity in a custom Servlet session attribute.
Configuration of Identity Detection Rules is explained in detail in the following chapter.
When automatic and configurable identity detection doesn’t suit, then Plumbr allows manually setting user identity via its APIs:
Please note, that all identity detection mechanisms except Browser Agent API only work in settings where the application is monitored by the Java Agent (regardless of whether or not the Browser Agent is used).
Identity Detection Rules
In case Plumbr has not been able to identify the users, you will need to help it find the location of the stored identities. You can do this by configuring the location of the user identity via creating a new Identity Detection Rule.
Identity Detection Rules can look for the identity of users from two different locations.
In case your application passes along the identity of the user via HTTP Request Headers, you will need to configure a HTTP Header Rule. In such a case, all you need to specify is the name of the HTTP Header from which Plumbr can extract the tracking information. The value for the Header would then be similar to the following:
X-User-Authentication
In case your application does not use HTTP Headers to pass along the identity, you will need to configure a Session Attribute Rule instead. In such a case, you will need to configure two parameters: Attribute Name and Extraction Path.
- Attribute Nameis the name of the attribute in the session context storing the user’s identity. For example, Spring Security adds its security context under the SPRING_SECURITY_CONTEXT attribute. When the attribute specified is not found in a particular HTTP Session either because an incorrect attribute name was provided or the application user is not yet authenticated, the Plumbr Agent will not proceed to detect the user’s identity from the Extraction Path.
- In the Extraction Path field, you should define the exact path where the user identity is stored. For example, if the Spring Security is used, the extraction path used will be getAuthentication().getPrincipal().getUsername().
Combining the two parameters allows Plumbr to look for the identity. Using the Spring Security as an example again and combining the two examples above will result in Plumbr looking for:
session.getAttribute(“SPRING_SECURITY_CONTEXT”).getAuthentication().getPrincipal().getUsername();
Configuration: Example
To explain how you can configure the location of the User Identity let us check the following example. In this example application the successful authentication operation results in storing the User’s identity in the HTTP Session as:
request.getSession(true).setAttribute(“USER_CONTEXT”, new UserContext(ipAddress, username));
where request is instance of javax.servlet.http.HttpServletRequest.
Let’s also assume that the UserContext class would be designed as:
public class UserContext {
public String getIpAddress() {
return ipAddress;
}
public User getUser() {
return user;
}
private final String ipAddress;
private final User user;
public UserContext(String ipAddress, String username) {
this.ipAddress = ipAddress;
this.user = new User(username);
}
private final class User {
private final String username;
public String getUsername() {
return username;
}
public User(String username) {
this.username = username;
}
}
}
So we are adding an instance of the UserContext into session attributes under the key USER_CONTEXT. By default Plumbr Agent does not look the identity from this location. To teach Plumbr Agent how to extract user identity in this case, we would specify the configuration as following:
- Attribute name: USER_CONTEXT
- Extraction Path: getUser().getUsername()
Equipped with this knowledge, Plumbr Agent will now be monitoring for setAttribute() events in all HTTPSession instances. Whenever such an event arrives and the attribute set is “USER_CONTEXT”, Plumbr starts capturing the identity.
The identity itself is extracted by invoking getUser().getUsername() on the UserContext object stored under “USER_CONTEXT” key.
If Plumbr agent fails to extract identity with defined Extraction Path then in your application logs you will see the following banner:
********************************************************************* * Failed to extract user identity with path: getUser().username * * Please check your configuration here: * * https://app.plumbr.io/settings/identity-detection * **********************************************************************
Using the same example above, the error message becomes clear. The username attribute in the User class is declared private and cannot be thus accessed. To fix, just change the Extraction Path to be equal to getUser().getUsername().
Browser Agent Configuration
Browser Agent Configuration
Configuration of the browser agent is done in the data-plumbr
attribute of the script tag used to load the agent:
<script
src="https://browser.plumbr.io/pa.js" crossorigin="anonymous"
data-plumbr='{
"accountId" : "abcde..",
"appName" : "Marketing site",
...
}'>
</script>
Make sure that the quotes used to define the attribute are being escaped properly. If the settings are generated dynamically, it is recommended to use the backend framework/language methods for JSON and HTML entities encoding. For example, in an EJS template:
<% var plumbrSettings = {
accountId: "abcde...",
serviceName: "User's profile"
} %>
<script src="https://browser.plumbr.io/pa.js" data-plumbr="<%= JSON.stringify(plumbrSettings) %>"></script>
Basic Configuration
The following settings are required for the browser agent to run:
accountId | Your Plumbr account identifier. This is included in the embed code shown to you in portal. During normal use you do not need to change this. | |
---|---|---|
serverUrl | The server to which browser agent sends data to. If you are using on demand Plumbr make sure this value refers to https://bdr.plumbr.io . If the agent should connect to on premise Plumbr server make sure it is set accordingly. |
|
appName | Set the Application Name of all transactions generated on this page. |
Optional Settings:
Cookie Domain – cookieDomain
Choose the domain user & session tracking cookies are set on.
true
(default) – Sets the cookie as high up as possible (for example on site summer.marketing.example.co.uk the cookie will try to cover all of example.co.uk)(string value)
– Sets the cookie on defined value (for example to only have cookie on subdomain use"marketing.example.co.uk"
)false
– Cookies will be limited to current subdomain
Example
<script src="https://browser.plumbr.io/pa.js"
data-plumbr='{
"accountId": "abcde..",
"serverUrl": "https://bdr.plumbr.io",
"cookieDomain": "tenant.example.com"
}'>
</script>
Application Version – appVersion
Declare the application version.
Example
<script src="https://browser.plumbr.io/pa.js"
data-plumbr='{
"accountId": "abcde..",
"serverUrl": "https://bdr.plumbr.io",
"appVersion": "4.3.18"
}'>
</script>
Transaction Configuration
Transaction Defaults
The following configuration options apply to all of the transactions generated via the browser agent and are equivalent to calling the same Browser Agent API methods.
User Identity – userId
Set the User Identity of all transactions generated on this page
Example
<script src="https://browser.plumbr.io/pa.js"
data-plumbr='{
"accountId": "abcde..",
"serverUrl":"https://bdr.plumbr.io"
"userId": "John Doe"
}'>
</script>
Page Loading Transaction
These configuration options apply to the transaction generated from the user loading the page that the browser agent is included on. For example, the act of loading this page by you right now.
Response Code – responseCode
Due to technical limitations the response code of page load is inaccessible to us. In order to mark the page load span with a relevant HTTP response status it must be set in the configuration.
Example
<script src="https://browser.plumbr.io/pa.js"
data-plumbr='{
"accountId": "abcde..",
"serverUrl":"https://bdr.plumbr.io"
"responseCode": 404
}'>
</script>
Service Name – serviceName
Set the Service Name of the transaction that is loading this page.
Example
<script src="https://browser.plumbr.io/pa.js"
data-plumbr='{
"accountId": "abcde..",
"serverUrl":"https://bdr.plumbr.io"
"serviceName": "Product Page"
}'>
</script>
Compatibility Configuration
Unfortunately, there are a few times where libraries used on the page don’t match with the way browser agent works. For those cases there are options to enable compatibility fixes which take a performance hit.
Prototype.js versions < 1.7.1
Prototype.js is a library that extends native JavaScript objects by adding new methods to the prototype of the objects. However as JavaScript evolves and new methods get added to the specification of the language (and therefore latest versions of browsers), the behaviour of these methods can be different than the one defined by the specification.
To fix these issues you can either upgrade to Prototype.js versions >= 1.7.1 or if that is not feasible configure the agent accordingly:
Version | Notes | Configuration |
---|---|---|
1.7.0 (inc. RC) | Has non-ES5 compliant Function.prototype.bind implementation. | "compat":["bind"] |
1.6.1 and older | In addition to bind, Prototype.js adds of toJSON to all of the objects which is used in ES5 to customise an object’s value when they are turned into a string. | "compat":["bind","json"] |
Example of using Prototype.js 1.6.1 with browser agent:
<script src="https://browser.plumbr.io/pa.js"
data-plumbr='{"accountId":"abcdef...","serverUrl":"https://bdr.plumbr.io","compat":["bind","json"]}'>
</script>
<script src="https://ajax.googleapis.com/ajax/libs/prototype/1.6.1.0/prototype.js"></script>
In-HTML Event Handlers
Event listeners defined like <a onclick="doSomething(); return false"></a>
are tightly coupled to the HTML, so we need to jump through some hoops in order to gather transactions from these interactions. As part of this we wrap your code with instrumented_with_plumbr
method.
If you do not want this behaviour, or you never use it and want a small performance gain, you can add "inlineEvents": false
to your browser agent config. (Note: Any activity happening in inline event listeners will then be linked to the previous transaction.)
In case you do changes to the onX attribute after we instrument it, you can enable extra watchers by adding "inlineEventChanges": true
to the configuration, which re-instruments your code after changes.
Installation options
Scripts added before pa.js
Generally, the Plumbr browser agent needs to be included on the monitored web page before any other Javascript files. Otherwise, you will see an error message on your Plumbr account.
The reason for this is that in order to correctly capture user experience telemetry, the browser agent needs to run before any Javascript that adds its own event handlers or instrumentation – such as any frontend framework code (react, vue, angular, jquery, etc).
However, in some situations you might need to load scripts before pa.js that don’t infer with the Plumbr browser agent. In these circumstances you might want to disable the error message.
If you have just a few scripts, you can use data-plumbr-allow-before
attribute.
Example
<script data-plumbr-allow-before src="loaded-before-browser-agent.js"></script>
<script src="https://browser.plumbr.io/pa.js"
data-plumbr='{"accountId":"abcdef...","serverUrl":"https://bdr.plumbr.io"}'>
</script>
Alternatively, you can use a configuration propertydetectUnzoned
, which will disable the check completely.
Example
<script src="https://browser.plumbr.io/pa.js"
data-plumbr='{"accountId":"abc...","serverUrl":"https://bdr.plumbr.io", "detectUnzoned": false}'>
</script>
Note that this way, Plumbr browser agent will not check for being included as the first script on the web page. Use only if you know what you’re doing!
Java Agent Configuration
Configuration
Java Agent configuration is stored in the file plumbr.properties located next to the Plumbr Java Agent .jar file. This configuration is used to monitor a single JVM in one machine. When monitoring multiple JVMs in the same machine make sure that every JVM uses a different Plumbr installation to avoid clashing in the configuration.
Basic configuration
Configuration parameters in this section are required for the Plumbr Agent to connect to the Plumbr Server, link the JVM to your account, and identify the JVM so that you could distinguish between the different JVMs monitored by Plumbr.
- accountId – your account identifier, which binds this Agent to your account in the Plumbr Server. This identity is generated and embedded into the downloaded Agent configuration for you. During normal use, you should not change the value of the parameter.
- jvmId – (optional if clusterId is set) JVM identifier, binding data from this particular JVM to a correct JVM in the Plumbr Server. When this identity is not provided, the connected JVM gets assigned a temporary identifier, which will not survive over JVM restarts. In order to have the data connected to the same JVM, provide the identifier either as a value of this property or via the server-side UI.
- clusterId – (optional if jvmId is set) identifier of the JVM cluster. ClusterId is the preferred way of grouping jvm transactions under predetermined application (application discovery logic more precisely described). JVMs with same clusterId are also grouped together on the Plumbr server side views, where appropriate (like Architecture views). Setting clusterId is useful, when individual JVMs run the same code and/or run in dynamically provisioned environments. If the value of clusterId is unspecified, no cluster grouping is applied to this JVM.
- serverUrl – the server to which the Agent connects. If you are using On Demand Plumbr, make sure the value refers to https://app.plumbr.io If the Agent is connecting to a Plumbr Server installed in your premises, make sure you have specified the correct server URL.
- appVersion – the version of the application being monitored. It is advised to use environment variable flavour (
PLUMBR_APP_VERSION
) of this option.
Proxy configuration
When your network configuration requires outgoing communication to pass a proxy server you can set up the communication between the Plumbr Agent and the Server via a proxy. Specifying the values for these parameters redirects the traffic from the Agent to the Server via the proxy server specified in proxyUrl.
- proxyUrl – the proxy URL that you can use to connect to the Plumbr Server if a direct connection from Agent is not possible. If proxy is used, this setting is mandatory; other proxy settings are optional. An example of the parameter: proxyUrl=http://squid.mycompany.com:3128.
- proxyAuthUser – the username for proxy authentication. Note that the Plumbr Agent only supports Basic authentication.
- proxyAuthPassword – the password for proxy authentication. Note that the Plumbr Agent only supports Basic authentication.
Logging configuration
Parameters in logging configuration are used to tune the logging of the Plumbr Agent.
- logConf – the location of the Logback configuration file Plumbr Agent uses for logging purposes. Detailed logging configuration is embedded in the referred XML file which you can tune to suit your needs.
- tmpDir – the location of the temporary files generated by Plumbr during runtime. Such temporary data includes the buffered data Agent has not yet sent to the Server and temporary data structures used during Agent-side analysis. The location is relative to the location of the Plumbr Agent in the file system.
- doCleanup – whether or not Plumbr deletes the temporary files after they are no longer needed. Switch this to false only when told so by Plumbr support.
Configuration via environment
If you cannot use the configuration via a property file, an alternative is to configure the Agent by specifying parameters in the JVM startup script using environment variables, prefixing each parameter with the “PLUMBR_” prefix and converting camel-case property names to the standard upper case snake case. So, for example, you could specify the accountId, jvmId, clusterId and serverUrl parameters for your JVM like this:
set PLUMBR_ACCOUNT_ID=a8nd2bar
set PLUMBR_JVM_ID=node1
set PLUMBR_CLUSTER_ID=Payment
set PLUMBR_SERVER_URL=https://app.plumbr.io
set PLUMBR_APP_VERSION=1.0.352.RELEASE
java com.acme.MainClass
Please note that if you do not use a property file at all, you will need to pass the following properties via parameters: PLUMBR_ACCOUNT_ID, PLUMBR_SERVER_URL, PLUMBR_LOG_CONF (just copy their values from the property file that you have downloaded with the Plumbr Java Agent installation package) and either PLUMBR_JVM_ID or PLUMBR_CLUSTER_ID. One of them is optional only if the other is present.
Network Configuration
When your network configuration is blocking Plumbr Agents from connecting to the Plumbr Server, you will see a message similar to the following in your JVM standard output logs:
**************************************************************** * Plumbr Server not responding - * * cannot connect to https://app.plumbr.io. * * Retrying in 60 seconds. * ****************************************************************
You should also notice that although the Server cannot be reached, the JVM will still start, it will just not be monitored by Plumbr.
To verify the problem is related to network configuration only, try connecting to app.plumbr.io port 443 from the machine you are installing Plumbr, to see whether the connection is allowed. This can be achieved via telnet, similar to following:
$ telnet app.plumbr.io 443 Trying 63.33.144.70... Connected to app.plumbr.io. Escape character is '^]'.
When the connection is successful, you should see a message Connected to app.plumbr.io, similar to the example above. When the connection fails, the network configuration is blocking connections to app.plumbr.io port 443.
To overcome the situation, connections via proxy servers or relaxing the firewall configuration are the first two options recommended. If you can not change the network configuration, you should turn to our On Premise offering where you can install the Server component in your network.
Proxy configuration
When your network configuration requires outgoing communication to pass a proxy server, you can set up the communication between the Plumbr Agent and Server via proxy. Specifying the values for the following parameters in the plumbr.properties file located next to the Plumbr Agent redirects traffic from the Agent to the Server via the proxy server specified in the proxyUrl.
- proxyUrl– the proxy URL that you can use to connect to the Plumbr server if direct connection is not possible. If proxy is used, this setting is mandatory; other proxy settings are optional. An example of the parameter: proxyUrl=http://squid.mycompany.com:3128.
- proxyAuthUser– the username for proxy authentication. Note that the Plumbr agent only supports Basic authentication.
- proxyAuthPassword– the password for proxy authentication. Note that the Plumbr agent only supports Basic authentication.
Firewall configuration
Another option for bypassing connectivity issues is to check your firewall configuration. If the outgoing connections from the Plumbr Agent are blocked by a firewall make sure the connection to app.plumbr.io:443 is allowed in your firewall configuration.
The exact configuration steps for this are firewall-specific. See your vendor manuals for further information.
Upgrading Java Agent
Automatic Upgrade
Starting from version 17.07.11, Plumbr Java Agents are capable of automatically upgrading their version. You can enable/disable this functionality under Settings menu, in case the default enabled option is not suitable for you.
Whenever new Plumbr Java Agent version is released, existing agents connected to the Plumbr Server will download updated version. The switch to newly downloaded version happens after next JVM restart.
More detailed flow of the auto-update for those who want to take a peek under the hood:
- When connection is established between the Java Agent and Server, Agent checks for whether or not the auto-update is enabled. If the auto-update is disabled, the process is aborted.
- After connection establishment, a message is sent to the Java Agent if its build number is lower than the build number of the highest known Agent version. The message contains the version to update to, the checksum and download URLs for that specific version
- The very same message is broadcasted to all connected Java Agents each time you change the setting from enabled to disabled or vice versa in settings menu.
- When the Java Agent receives the message, it makes a request to the checksum URL to download the new Agent and performs the checksum verification.
- The downloaded ZIP is unzipped in a separate directory in installation dir which has a name starting with version.
- On next JVM restart where the particular Agent got attached to, the wrapper will automatically select the new version because it has a higher build number than the previous one.
Notice that auto-updating is only possible for our On Demand customers. On Premises users must use the manual agent update process.
Manual Upgrade
If you need to manually upgrade Plumbr Java Agent (either from pre 17.08.05 version or because of your company policies) you need to go through the following steps:
- Download the new Java Agent from here.
- Backup the current Agent installation.
- Unzip the newly downloaded agent .zip file to the folder you wish to install the Plumbr Java Agent to.
- Copy plumbr.properties from the backup to new Agent installation.
- Update startup parameters of the JVM you are monitoring to point to new Agent JAR file:
-javaagent:path-to-new-agent/plumbr.jar
- Restart the JVM you want to monitor.
Identifying the JVM
When you configure your application to run with Plumbr, you have an option to identify this JVM with a name suitable for your deployment. “Payment Live” or “Reporting QA” as examples can give you an idea what this ID can look like. Assigning the ID can be done in three ways:
- By setting “jvmId” property in the plumbr.properties file. This file resides in the same directory with “plumbr.jar” file. Just find in that file a line starting with “jvmId=” and append the selected name, similar to the example: “jvmId=Payment”
- By providing your JVM with “plumbr.jvmId” system property. Just add “-Dplumbr.jvmId=Payment” to your application command line. E.g. “java -Dplumbr.jvmId=Payment -javaagent:/path/to/plumbr/plumbr.jar …”. This option is useful in dynamic environments, where JVMs are created and destroyed dynamically via scripts.
- When your application is already running and is connected to the Plumbr Server, then by going to this JVM detail view and by clicking on JVM name, providing a new name and clicking “Save”.
If you don’t manually identify your JVM as described above, it gets assigned an auto-generated ID. The generated ID will be ephemeral in the sense that it will not persist between JVM restarts. When you restart your application and have not provided the ID yourself, then a new JVM with a new identifier will be created in Plumbr Server. Also pay attention that if you don’t specify jvmId manually, specifying clusterId is mandatory.
Agent startup checks
While starting the Agent, Plumbr goes through the following checks to verify the integrity of the installation:
- Verifying file system permissions: whether or not the folder Plumbr Agent resides and its subdirectories are readable and writable by the user launching the JVM Plumbr is attached to.
- Verifying the configuration: whether or not all the required configuration parameters are present and valid.
- Verifying the support for the environment: whether the OS, JVM and application server used are supported by Plumbr
- Verifying the connectivity: whether the Plumbr Agent can connect to Server.
- Verifying the Agent version: whether the Agent is still supported by the Server the Agent is connecting to
- Verifying the subscription: whether the account has an active subscription or has the subscription period expired
- Other miscellaneous checks, including but not limited to:
- If a proxy server is used to connect to Plumbr Server, then whether the proxy can be connected with the credentials provided
- Whether or not the jvmId parameter used to identify the JVM connecting to Server is unique
- Whether or not multiple JVMs are using the same Plumbr installation.
- Check for the jvmId does belong to the account it tries to connect to.
Some of the steps can fail due to various reasons. In such case the Agent will not be attached and the JVM starts without the Plumbr Agent monitoring the end user experience. The reason for the failure will be exposed in the server’s standard output.
So in order to find whether or not the Agent failed to initialize, search the log files for “Plumbr” phrase. In case you discover one of the error messages listed below, follow the instructions specified in the particular error message.
Verifying file system permissions
******************************************************************** * Plumbr encountered a filesystem permissions error. * * The user that runs your Java process has no write access to the * * /users/me/plumbr * * Plumbr needs write access to the whole directory. * * * * Please ensure that the user that runs your Java process has * * read and write permissions for that directory, * * its sub-directories and files inside it. * * * * Check out https://plumbr.io/support/agent-configuration * * for more information or contact support@plumbr.io * ********************************************************************
When you encounter this error message in log files, it indicates that the user launching the JVM Plumbr Agent is attached to does not have enough permissions to access the Plumbr Agent installation directory.
Plumbr Agent needs to be able to read and write the folder the Agent is installed (and its subdirectories). In order to proceed, you need to ensure the user running the Java process the Agent is attached to has read and write permissions for both the Agent’s installation folder and its subdirectories.
Verifying the configuration
****************************************************************************** * Plumbr is missing the following required properties: serverUrl. * * Either make sure the plumbr.properties file is present next to plumbr.jar * * or specify individual properties via -D parameters in your startup script. * * * * Check out https://plumbr.io/support/agent-configuration * * for more information or contact support@plumbr.io * ******************************************************************************
When you face such a banner in your JVM startup scripts then either the entire configuration stored in the file plumbr.properties file or some of the mandarory properties are missing. The plumbr.properties file is located next to the Plumbr Agent .jar file.
First step to overcome the problem is to make sure the Plumbr installation directory is intact and you have not extracted only the Agent’s plumbr.jar file. In case the file is present, check the content of the error message indicating the missing mandatory parameter(s) and add such parameters to the file. When in doubt, check the Agent Configuration page in our support materials.
An alternative to have the configuration present in the plumbr.properties file in the filesystem is to configure the Agent to specify parameters in the JVM startup script using -D parameters, prefixing each parameter with the “plumbr.” namespace. So, for example, you could specify the accountId, jvmId and serverUrl parameters for your JVM also via:
java -Dplumbr.accountId=a8nd2bar -Dplumbr.jvmId=BillingProduction -Dplumbr.serverUrl=https://app.plumbr.io
Verifying the support for the environment
********************************************************************************************** * Environment you are trying to run Plumbr in is not supported. * * Windows XP operation system is unsupported. Minimum supported Windows version is Windows 7.* * * * Check out the support page https://plumbr.io/support/is-my-environment-supported-by-plumbr * * for the list of supported environments. * **********************************************************************************************
When facing a banner similar to the one above in your log files, the environment the Plumbr Agent is running in is not supported by the Agent. The exact message will be different, depending on which unsupported operating system, JVM vendor/version or application server was detected in the environment.
To overcome the issue, consult the list of supported environments in our support documentation to find out whether or not you have a possibility to use Plumbr in an environment officially supported by Plumbr.
Verifying the connectivity
*********************************************** * Plumbr Server not responding - * * cannot connect to https://app.plumbr.io. * * Retrying in 60 seconds. * ***********************************************
*************************************************************** * Plumbr Server not responding - * * cannot connect to https://app.plumbr.io. * * Retrying in 60 seconds. * * * * In case your network configuration is blocking connections * * to Plumbr servers, see how to configure proxy server and/or * * firewall https://plumbr.io/support/network-configuration. * * * * In case your company policy does not allow using externally * * hosted services, try out Hosted Plumbr which does not * * require external network connections * * https://plumbr.io/support/hosted-plumbr * ***************************************************************
When you face a banner similar to the either of the above in your JVM log files, then it indicates the Agent deployed cannot connect to Plumbr Server over the network. Plumbr Agents will start without the presence of the Server, but as there is no endpoint to send the harvested data, then the Server cannot analyze the gathered data and thus you receive no value from Plumbr.
Pay attention that when the Server endpoint is just temporarily unavailable, the Agent will buffer the data locally. The Agent also periodically retries to connect to Server and when the Server (re)appears, the buffered data will be sent to Server.
To verify that the problem is related to network configuration only, try connecting to app.plumbr.io port 443 from the machine you are installing Plumbr, to see whether the connection is allowed. This can, for example, be achieved via telnet, similar to following:
$ telnet app.plumbr.io 443 Trying 54.171.1.110... Connected to app.plumbr.io Escape character is '^]'.
When the connection is successful, you should see a message Connected to app.plumbr.io, similar to the example above. When the connection fails, the network configuration is blocking connections to app.plumbr.io port 443.
To overcome the situation, connections via proxy servers or relaxing the firewall configuration are the first two options recommended. If you can not change the network configuration, you should turn to our On Premise offering where you can install the Server component in your network.
Verifying Agent version
During the startup, the Agent version will be compared to the Server version to verify whether or not the Agent is still supported by the Server the Agent is connecting to. In general we use the following policy for Agent version support
- Servers accept connections from Agents up to one year older than Servers. Agents older than one year will be rejected by Server.
- Servers will not be compatible with Agents released later than the Server.
Recommending to upgrade
************************************************************************************************ * You are using version 16.08.02 of Plumbr. We recommend upgrading to the latest version 12356. * * Download the latest version of the agent here: https://app.plumbr.io/download/agent/16.09.20 * ************************************************************************************************
When facing a banner like the one above, your currently used Agent is between 1 to 3 months behind the latest and greatest Agent available. As we add new features almost every month, you should consider upgrading, but you still have nothing to worry about.
Strongly recommending to upgrade
******************************************************************************************************** * You are using version 16.08.02 of Plumbr, which will be supported only until 2017-01-01 * * Download the latest version (16.12.12) of the agent here: https://app.plumbr.io/download/agent/16.12.12 * ********************************************************************************************************
When facing the banner above, your current Agent is between 3 and 6 months older than the Server the Agent connects to. The Server will still support the connecting Agent, but you should start planning for the Agent version upgrade t
Deprecated Agent version
******************************************************************************************************** * You are using deprecated version 16.08.02 of the Plumbr agent. * * Download the latest version (17.04.02) of the agent here: https://app.plumbr.io/download/agent/17.04.02 * ********************************************************************************************************
When facing the message above, then the Agent connecting to the Server is from 6 to 12 months older than the Server the Agent connects to. The version is already deprecated and will be unsupported when the 12 months limit will be hit. You should plan for Agent version upgrade as soon as possible.
Unsupported Agent version
************************************************************************************************************** * You are using unsupported version 16.08.02 of the Plumbr Agent, which can no longer connect to Plumbr Server. * * Please upgrade to the latest version of the agent from: https://app.plumbr.io/settings/download-center * **************************************************************************************************************
When seeing an error banner above, the Agent can no longer connect to the Server as the Agent version is older than the oldest Agent version supported by this particular Server. You need to upgrade to a new Agent version in order to proceed benefitting from Plumbr.
Agent version newer than Server version
************************************************************************************* * You are using unsupported version 16.08.02 of the Plumbr Agent * * Plumbr Server accepts only agents released before the server * * Please refer to Download Center https://app.plumbr.io/settings/download-center * * to get supported version of Plumbr Agent * *************************************************************************************
When facing an error message above in your JVM logs, the Agent connecting to the Server is newer than the Server. As Server accepts connections only from Agents it knew existed when the Server was released, this “agent from the future” is not allowed to connect.
When facing this message you are using our On Premise offering where you have installed the Server yourself. This means you have two options:
- Preferably you should upgrade the Server, so that new Agents with new and shiny features can apply all their new features in your deployment.
- If this is not possible, you need to use older Agent version, so that the Agent is not released later than the Server it connects.
Verifying the subscription
Plumbr is a subscription-based software with 14-day free trial subscription available. When the subscription has expired, the data on your account is kept for 10 more days, but you can no longer monitor the applications with Plumbr. After the 10 days have passed from your subscription expiring, the data on your account will be permanently deleted.
Expiration warning
*************************************************************************************** * Your subscription will expire on 2016-01-01. * * From 2016-01-01 Plumbr will not monitor your JVM(s) any more. * * To renew your subscription get in contact with Plumbr Sales at sales@plumbr.io. * ***************************************************************************************
When encountering such a warning in your log files, your subscription will expire soon. If you wish to benefit from Plumbr after the subscription period, you should start planning for the subscription extension.
Paid account expired
********************************************************************************************** * Your subscription expired on 2016-01-01 and Plumbr is not monitoring your JVM(s) any more. * * Your data will be available for 10 days, after which your account will be deleted. * * To renew your subscription go to https://app.plumbr.io/payment * **********************************************************************************************
If you encounter the message above, it means your subscription has expired. The data is still present on your account, but you can no longer monitor any JVMs. Whenever the 10 days have passed from the subscription expiration, the data will be permanently deleted.
To keep benefitting from Plumbr, extend your subscription.
Paid account deleted
****************************************************************************** * As you subscription was not renewed your Plumbr account has been deleted * * and you cannot monitor your JVM(s) with Plumbr any longer. * * To start monitoring your JVM(s), sign up and purchase Plumbr * * one year subscription: https://app.plumbr.io/payment * ******************************************************************************
Encountering the message above indicates your subscription has expired and more than 10 days have passed from the expiration date. The data on your account has been deleted.
The way to start reusing Plumbr is to purchase a new subscription.
Trial account expired
*************************************************************************************** * Your free trial is now expired and Plumbr is not monitoring your JVM(s) any more. * * Your data will be available for 10 days, after which your account will be deleted. * * To activate your account go to https://app.plumbr.io/payment *
The message above indicates that the free trial you used has expired. You can no longer monitor the JVMs with Plumbr. The data gathered during the trial is still available for you until 10 days have passed from the trial expiration. After this, the data on your account will be permanently deleted.
Trial account deleted
************************************************************************************* * As your free trial expired your Plumbr account has been deleted and you cannot * * monitor your JVM(s) with Plumbr any longer. To start monitoring your JVM(s), * * sign up and purchase Plumbr subscription: https://app.plumbr.io/payment * *************************************************************************************
The banner indicates that your free trial has been expired and the data on your account has been deleted.
If the trial demonstrated the value of Plumbr to you, then the way to keep using Plumbr is to switch to a paid subscription.
Miscellaneous checks.
Besides the categories above, Plumbr Agent performs a number of other checks, which can also result in error/warnings being printed into the JVM standard output.
Connecting to wrong Server
*************************************************************************************************** * The account does not exist at htts://my-plumbr-sever-installation:8080/. * * Check plumbr.properties to make sure you are connecting to the correct Plumbr Server instance. * * If indeed so, contact support@plumbr.io * ***************************************************************************************************
When facing the error above, the Agent is connecting to a Server using the accountId the Server is not aware of. This usually means you are connecting to an incorrect Plumbr Server. If this is the case, just make sure the serverUrl in plumbr.properties is pinpointing towards the correct Server.
When it is not the case, contact our support@plumbr.io and let us figure out the source for the problem.
Lambda support in early Java 8 releases
******************************************************************************************************* * There is a known issue with Java versions 1.8.0 - 1.8.0_31 where using Java agents * * together with code that uses dynamic invocation (such as lambdas or dynamic languages) * * may cause segmentation faults. If these are not used in your application, your JVM may be safe, * * but for production sites we do not recommend using Plumbr with Java 8 versions older than 1.8.0_40. * * To make sure this problem will not occur, either: * * a) Upgrade your Java version to 1.8.0_40 or newer * * b) If upgrading Java version is not possible, turn off JIT compilation for java.lang.invoke * * package by specifying -XX:CompileCommand=exclude,java/lang/invoke/ in your JVM startup script. * *******************************************************************************************************
When facing the warning above, you are running on an early Java 8 build which are known to contain bugs which will affect your JVM when you are making use of lambdas or dynamic languages along with any Java Agents attached to the JVM.
The application might work fine, but in order to make sure you will not run into any issues, please consider either
- upgrading the Java version to 1.8.0_40 or newer
- Turning off JIT compilation, as specified in the error message.
Native agent loading failure
***************************************************************** * Native agent could not be loaded from * * /users/me/plumbr * * This may be caused by missing read or execute permissions for * * plumbr home directory or one of its subdirectories. * * * * Check out https://plumbr.io/support/agent-configuration * * for more information or contact support@plumbr.io * *****************************************************************
When facing the error above, the filesystem permissions for native agents located in lib/ folder next to the Agent’s plumbr.jar file are not readable or executable by the user launching the JVM Plumbr is attached to.
To solve the problem you would need to make sure the user launching the JVM Plumbr Agent is attached has read and execute permissions for the lib/ folder and its subdirectories.
In all honesty, this is one of the cases we do not fully understand can be created. So if you are facing this situation, we would really appreciate if you could contact support@plumbr.io so we could understand how on earth this permission issue can even happen.
Proxy credentials missing
********************************************************************************** * The Proxy server at your-proxy-server.ip:3039 is requesting a username and a password. * * Please add them to plumbr.properties file. You can find the instructions here: * * https://plumbr.io/support/manual#network-configuration * **********************************************************************************
When seeing the banner above in your JVM standard output, then you are trying to connect from the Agent to Server using a proxy server. The proxy server requires authentication but the configuration you provided in plumbr.properties does not contain username and password.
To solve the issue, provide the correct username and password in the Plumbr configuration to access the proxy.
Multiple JVMs using the same jvmId
************************************************************************************************************************* * The JVM ID “myjvmid" is already in use. This happens when multiple JVMs are connecting * * to Plumbr Server using the same jvmId configuration parameter specified in plumbr.properties file. * * In order to solve the problem, download new Plumbr agent from here: https://app.plumbr.io/settings/download-center * * and make sure the agent location in the new JVM refers to a different Plumbr installation in file system. * *************************************************************************************************************************
When facing the error message above, the Plumbr Agent was not started. It was so due to a JVM already being connected to the Server using the same jvmId as specified for the rejected JVM.
This can happen when you have copied the Plumbr installation used by one JVM and are using it for the second JVM. The jvmId specified in the plumbr.properties file must be unique, so to solve the issue you would need to make sure all the JVMs you want to monitor with Plumbr have a unique jvmId specified in the configuration (either in plumbr.properties or passed as -D parameter).
Multiple JVMs accessing the same Plumbr installation
************************************************************************** * Working directory is locked. This might happen when you launch several * * applications from the same plumbrHome at the same time. * * In order to solve the problem, download new Plumbr agent from here: * * https://app.plumbr.io/settings/download-center * * and make sure the agent location in the new JVM refers to a different * * Plumbr installation in file system * **************************************************************************
When you encounter the error above, your are trying to launch two JVMs both loading the Plumbr Agent from the same location in the filesystem. Pay attention that each JVM monitored by Plumbr must use a unique Plumbr installation.
To overcome the issue, create a separate Plumbr Agent installation for each JVM monitored and load the javaagent from different locations for each JVM monitored.
Troubleshooting startup failures
When you have followed the installation instructions and attached Plumbr JVM Agent to the JVM as specified, you should see a new JVM appearing at https://app.plumbr.io/jvms. Additionally, the JVM standard output should contain an information banner similar to the following:
************************************************************ * Plumbr (15.12.14) is attached. * * * * Plumbr agent is connected to the Plumbr Server. * * Open up https://app.plumbr.io to follow its progress. * ************************************************************
If your experience does not match the success symptoms above, please follow the five steps below to find out the cause:
- First stop is to check whether the JVM you attached Plumbr is actually started. One way of doing this is by listing all running Java processes by specifying jps -lvm in the command line. Output of the command is the list of the JVMs currently running in the machine. Make sure the JVM in question is among them. If not, the JVM failed to start and you should turn to JVM/application logs to find the cause.
- If the JVM is up and running, the next step is to make sure the Plumbr Agent you specified in the startup scripts was picked up. Again, the way to do it is to check the output of the jps -lvm command. The output would look similar to the following, where it is visible that the process with ID 6349 has picked up the javaagent from /home/me/plumbr/plumbr.jar.
my-precious:~ me$ jps -lvm 6359 sun.tools.jps.Jps -lvm -Dapplication.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home -Xms8m 6349 MonitorContention -javaagent:/home/me/plumbr/plumbr.jar my-precious:~ me$
- When the output above does not contain the -javaagent section for the JVM in question,the parameter specified in the startup script was not picked up. To proceed in such case, you should debug the startup scripts to see where the added / modified parameters are lost.
- If the -javaagent was present in the process list, the next stop is checking the application logs. Search for messages containing “Plumbr” to see whether there are any traces about Plumbr exposing the problem. In many cases, when you find Plumbr log records, these contain both the cause what went wrong and the way to correct this. The full list of the error messages printed by Plumbr Agent along with the solution guidelines is available here.
- If there is nothing in the application logs, next source for information is in Plumbr Agent logs located in folder logs/ next to the Agent’s plumbr.jar in the filesystem. The folder contains plumbr.log and plumbr-debug.log files. Search these files to see whether there are any error messages in the logs. Again, in most cases the error messages would guide you to the problem along with a reference to the solution.
If following the steps above does not reveal the source of the problem, feel free to contact support@plumbr.io and we will figure the solution out together.
Universal Agent Configuration
Configuration
Universal Agent configuration is stored in agent.properties
file in /opt/plumbr-agent/conf
directory. This file contains both the configuration for the Universal Agent service (daemon) and the cluster-specific configuration.
All other files of the agent are stored in /opt/plumbr-agent/[service_version]
directories where each service_version
is also installed as a separate service in the system. All paths mentioned here are relative to the service-specific directory.
Service configuration
All configuration properties that are specific to the daemon, thus do not affect the behavior of the agent within
monitored processes, have a daemon.
prefix. Required properties are:
daemon.serverUrl
– specifies the address of the Plumbr Server that the agent sends data to.daemon.serverId
– specifies the server name which will be shown under Servers list. This is also shown when viewing individual interactions or API calls. By default, installer sets it to the server hostname.daemon.apiKey
– authentication key. Can be viewed under Account Settings in Plumbr Server.
Optional configuration properties are:
daemon.diskBufferSize
– specifies the maximum size of the file that is used to store data that has not yet been sent to the server. This file is only used if in-memory buffers are full, which can happen either due to the server being unreachable or the rate of new data being generated exceeding the rate at which it can be sent.
Cluster configuration
A cluster is defined by a property named monitor.cluster
. One instance of this line is present for each defined cluster, therefore there may be multiple of these. The value of this property is in the format of [cluster_name]:[process_name_pattern]
, where process name pattern is matched against the full path of the executable
being monitored, where **
matches any character and *
matches any number of characters except a directory separator /
. For example the default cluster definition line for PHP-FPM is monitor.cluster=php-fpm:**php*
. The process name patterns set up by Plumbr installer for each of the supported technologies are:
**php*
– for PHP-FPM**apache2*
and**httpd*
– for Apache 2, also monitors PHP if it is running viamod_php
within Apache**nginx*
– for nginx**python*
– for CPython or PyPy when not running within uWSGI**pypy*
– for PyPy when not running within uWSGI**uwsgi*
– for uWSGI running either with CPython or PyPy
Other properties are in the format of monitor.[property_name]
to apply to all clusters, or monitor.cluster.[property_name]
to apply it only to the last cluster defined before that line.
Application version configuration
Application version can be provided in several ways:
- Configuration file property
monitor.appVersion
- Environment variable
PLUMBR_APP_VERSION
Which option is better to use depends on how the deploy or launch of the monitored processes is done. If the monitored service is managed by systemd or sysvinit, and changing the launch scripts or configuration of the service is undesired, then configuration file property should be used. Otherwise it should generally be easier to use the environment variable.
Logging configuration
Log file names are in the format of [type].[date].[fragment].log
. New fragment is created when previous reaches maximum size. The size of a fragment can be configured, but the maximum number or maximum size cannot.
Agent keeps four types of logs:
logs/daemon
contains the logs of the daemon that communicates with the Plumbr Server directly.data/monitor/[process_id]
contains the log file fragment of a specific process for as long as that process is still active.logs/monitors/
contains logs mixed together from all monitored processes, but only contains log lines added after they have successfully established a connection with the daemon, therefore might not contain all logs from monitored processes.logs/monitor-archive
contains.tar
files to which log files fromdata/monitor/[process_id]
will be appended to when the process exits or when the specific log file is completed (new fragment is started).
Logging configuration is specified either for the daemon (daemon.
prefix), for monitored processes (monitor.
prefix) or for the monitored processes of a specific cluster (monitor.cluster.
prefix). The available logging configuration properties are:
logLevel
– specifies the log level for the daemon process. Default value isDEBUG
. All available values areERROR
,WARN
,INFO
,DEBUG
andTRACE
(do not useTRACE
in production).logFileMaxSize
– specifies the maximum size of a single log file (fragment). If this is reached, then a new fragment is created.
Retention can be configured with the option daemon.logRetentionDays
which specifies the number of days to keep log files for. Log files older than that are automatically deleted.
Upgrading Universal Agent
Universal Agent can only be upgraded manually. This is done using the exact same steps as for the initial installation:
- Download the agent zip file from the download center and unzip to a convenient location on your server.
- Run the installer in your terminal:
sudo ./PlumbrAgentInstaller
- The installer will confirm your server and cluster names, but the defaults are taken from your existing configuration, therefore it is not necessary to specify anything explicitly.
- All new processes launched after this will start using the new agent version. To make sure they immediately use the new version, you can restart your monitored services.
Usage in Docker
To use the agent in a Docker container without having the agent installed in the host, the following steps must be taken. Full examples are after that.
Step 1 – installation
Can be performed either during container launch (installer directory mounted to container) or during image building (installer copied during image building and then run). The following command must be run either during building the image or as the first command when running the container: /path/to/plumbr-agent-installer/PlumbrAgentInstaller --unpack-only --cluster-id=example-cluster
with your desired cluster name.
Step 2 – launch agent process
The agent process must be launched in the background when launching the process, so it would not block the next command (original entry point). This requires changing the container entry point, which can be done either by putting all the commands (optional installation, agent launch, original entry point with environment variables) in a script file and mounting that, or including all of it in the container start command.
If your image is using a musl-based Docker base image, then the agent process must be launched differently, by prepending /lib/ld-musl-x86_64.so.1 –
to the command.
Step 3 – attach
To make the agent libraries get attached to processes in the container, the LD_PRELOAD=/opt/plumbr-agent/libplumbrmonitor.so
environment variable must be set. Since the original entry point must be manually invoked anyway due to entry point change, the easiest way to set this is include it in that command.
Full example – install in container, mount launch script
Mount agent installer:
/host/path/to/plumbr-agent-installer
to /opt/plumbr-agent-installer
as read-only
Mount the following as /launch-with-plumbr.sh
:
/opt/plumbr-agent-installer/PlumbrAgentInstaller --unpack-only --cluster-id=[example-cluster]
/opt/plumbr-agent/plumbrd &
LD_PRELOAD=/opt/plumbr-agent/libplumbrmonitor.so [original-entry-point] "$@"
Override entry point with /launch-with-plumbr.sh
.
Full example – install in container, launch command
Mount agent installer:
/host/path/to/plumbr-agent-installer
to /opt/plumbr-agent-installer
as read-only
Override entry point with /bin/sh
, set command to /opt/plumbr-agent-installer/PlumbrAgentInstaller --unpack-only --cluster-id=[example-cluster] && (/opt/plumbr-agent/plumbrd & LD_PRELOAD=/opt/plumbr-agent/libplumbrmonitor.so [original_entry_point])
.
Full example – install in image
Create launch script launch-with-plumbr.sh
:
/opt/plumbr-agent/plumbrd &
LD_PRELOAD=/opt/plumbr-agent/libplumbrmonitor.so [original-entry-point] "$@"
Dockerfile change:
COPY /path/to/launch-script.sh /launch-with-plumbr.sh
COPY /path/to/plumbr-agent-installer /opt/plumbr-agent-installer
RUN /opt/plumbr-agent-installer/PlumbrAgentInstaller --unpack-only --cluster-id=[example-cluster]
ENTRYPOINT ["/launch-with-plumbr.sh"]
Server Configuration
Upgrading Plumbr Server
Plumbr Server update is based on building new Docker images and mounting the data to the newly built images. No data stored in the docker images is thus preserved, so you cannot expect any manual configuration changes made to existing docker machines to be preserved.
To upgrade Plumbr Server you need to go through the following steps.
- Download a new version of Plumbr Server distribution from Download Center.
- Extract downloaded archive on top of the existing plumbr-server folder replacing all existing files.
- Restart Docker Compose project by running “./launch.sh” from plumbr-server This will download all updated Docker images and then recreate all affected containers.
- After process completes, new version of Plumbr Server is now available at same URL as previously.
As a next step, the Plumbr Agents connecting to the Server need to be upgraded. You can do this independently of the Server update, but for consistency you need to eventually also upgrade the Agents.
On-Premise OOM analysis
Running analysis when an OutOfMemoryError
occurs in an application is computationally intensive and requires an amount of RAM proportional to the number of objects in the JVM. Therefore, when running your own Plumbr Server on-premise, additional steps must be taken to find root causes for an OutOfMemoryError
.
Semi-automatic analysis
The OutOfMemoryError
meta-information snapshot will always be automatically sent by Plumbr Agent to the Plumbr Server after such error occurs. A corresponding Root Cause screen will appear in Plumbr Server, prompting you to do the following:
- Select or create a machine with an amount of RAM as specified on the page
- Download a
.jar
file that would perform the analysis to that machine - Run it, supplying the required amount of heap to the JVM by specifying the
-Xmx
argument
The program will automatically download the meta-information from Plumbr Server, run the analysis and upload a complete report back to Plumbr Server. This assumes two things:
- You have specified the
plumbr.server.url
property in the server properties or set it in the web interface - The machine that runs the analysis has access to the machine where Plumbr Server is running
In case condition (1) is not met, you can still run the analysis by supplying a property to the jar file: -Dportal.url=https://address-of-your-plumbr-server-installation
.
Running analysis from behind a firewall
In case access to the Plumbr Server is restricted by a firewall, some additional manual actions are required:
- Click on “Detailed information” on the Root Cause page to follow the instructions
- Copy the meta-information files named
oom_dump_v4.tbz2
andoom_dump_info.txt
to the target machine - Supply the path to the copied
.tbz2
file to the jar running command - Supply the path ot where the report should be saved, e.g.
report.bin
- Run the analysis, e.g.
java -Xmx1g -jar analyze-oom ~/oom-analysis/1/oom_dump.tar.bz2 report.bin
- Upload the
report.bin
file to the corresponding form on the Root Cause page
Data retention
By default, meta-information files will be immediately deleted upon successful analysis. Data files that date back more than 30 days will also be deleted, even if no analysis was performed on them. At any time you may manually delete the dumps from the ${plumbr.server.home}/data/dumps
folder.
Browser Agent API
Introduction
Plumbr Browser Agent API enables control over:
- Starting transactions (see this chapter for more details)
- Naming of Applications (see description of an application for more details)
- Naming of Services (see description of a service for more details)
- Naming of Users (see description of an user for more details)
The following guide describes how to install the browser api and how to use it.
Installation
Plumbr Browser API is installed along with the Plumbr Browser Agent, so no additional installation is needed. See Browser Agent Installation guide for more details.
Setting transaction attributes
Plumbr Browser API allows setting following attributes of a transaction:
- Application
- Service
- User
These attributes can be set in two alternative ways: programmatically or via configuration parameters of the browser agent script. Selecting which way to use depends on the application type. Classic web applications, which generate HTML on the server side and have all the knowledge there may find it easier to generate required values for the configuration on the server side, eliminating the need for additional JavaScript code. Single page web applications, may find it more suitable to setup these attributes via direct JavaScript calls.
For information on configuration parameters please see the section on Browser Agent Configuration
Plumbr Browser API is exposed on window.PLUMBR
, so it can be accessed globally as PLUMBR
. It is recommended to wrap API calls in try…catch blocks to avoid situations where user-side blocking of the agent (such as privacy targeted browser addons) would crash your application.
document.getElementById('add-to-cart').addEventListener('click', function() { try { PLUMBR.setServiceName('add product to cart'); } catch(err) {} // ajax call to add product to cart... });
Available configuration options and API calls will be described below.
Application
In Configuration | API Call |
---|---|
{ "appName": "Marketing Site" } |
PLUMBR.setAppName('Marketing Site') |
Set the application name for the page. This is persistent across all transactions made on the page (such as soft navigations, ajax interactions), so setting it in the configuration means it doesn’t need to be called again in API.
Service
In Configuration | API Call |
---|---|
{ "serviceName": "Product details" } |
PLUMBR.setServiceName('Product details') |
Set the service for the current transaction. The service name set in configuration is always the service name for the transaction that represents loading the page, while API calls mean the transaction that was currently active. For example in a SPA when the API method is called after user clicks on a link, it will be used to define the service of the transaction that is made by user clicking on the link.
User
In Configuration | API Call |
---|---|
{ "userId": "person@example.com" } |
PLUMBR.setUserId('person@example.com') |
Set the user of the page. This is persistent across all transactions made on the page (such as soft navigations, ajax interactions), so setting it in the configuration means it doesn’t need to be called again in API.
Transaction Management
In most cases the browser agent is able to automatically detect all of the the user interactions. However there might be cases where you’d want to start transactions manually.
PLUMBR.startTransaction(serviceName)
Starts a new transaction under serviceName
service. As there can only be one ongoing transaction at a time, there is no API call for ending transaction. Starting a new transaction will always end the ongoing transaction
Example usage: Starting a transaction when document receives an external-force event
document.addEventListener('external-force', function (event) { try { PLUMBR.startTransaction('External force'); } catch(err) {} messageServerAboutExternalForce(); });
document.getElementById('sign-up').addEventListener('click', function (event) { try { PLUMBR.startTransaction('User signs up'); // Because browser agent has already detected the click it will be translated to // PLUMBR.setServiceName('User signs up') } catch(err) {} registerUser({ /* ... */ }); });
Regenerate User ID
In Configuration | API Call |
---|---|
{ "regenerateUser": true } |
PLUMBR.regenerateUser() |
Plumbr browser agent uses a randomly generated tracking ID for each user, which can then assigned an identity by setting it via config or api. However we’ll only use the first known identity for the user. If you wish to change the user identity later (for example updating info based on “user + role”, and user changing role), a new user tracking ID must be generated.
Regenerating via configuration
Ideal for: When it’s needed after role change that causes hard navigation (eg. form submit, clicking on link). The transaction containing page load will be linked to the new user.
Example:
<script
src="https://browser.plumbr.io/pa.js"
crossorigin="anonymous"
data-plumbr='{
...
"regenerateUser": true,
"userId": "Admin as John Doe"
}'>
</script>
Regenerating via API
Ideal for: When it’s needed after ajax requests. The transaction that is made after calling the method will be linked to the new user.
Example:
// In callback of some ajax method
function (newUserIdentity) {
try {
// Generate new user tracking ID
PLUMBR.regenerateUser()
// Set new user identity
PLUMBR.setUserId(newUserIdentity)
// Start new transaction that is linked to this user
PLUMBR.startTransaction()
} catch(err) {}
}
Javascript errors
There are some cases where Plumbr is unable to capture Javascript errors automatically eg. when frontend frameworks like Vue or React capture the errors themselves and don’t propagate them further.
In these cases it is possible to report the error via API. The errors are added to the current transaction.
Example:
try {
if (window.PLUMBR) {
window.PLUMBR.sendError(err)
}
} catch (err) {}
Where err is either Javascript error object or a string with an error message.
Toggling cookies
Plumbr uses several cookies. By default user and session tracking cookies are enabled. If your end user chooses not to be tracked by cookies, you can instruct Plumbr either via configuration or API to disable usage of cookies.
In Configuration | API Call |
---|---|
{ "tracking": {"user": false, "session": false} } |
PLUMBR.setCookiePermissions({"user": false, "session": false}) |
If cookie use is disabled, Plumbr will set user and session values to `00000000-0000-0000-0000-000000000000`. All interactions by such users will be aggregated under one user in the Plumbr database.
When cookie usage is disabled, the browser agent will also remove all existing Plumbr cookies.
Java Agent API
Introduction
Plumbr Java Agent API enables programmatic control over:
- Service naming (see description of a service for more details)
- Identification of users (see description of users for more details
- Transaction boundary definition (see definition of a transaction for more details)
The following guide describes how to install the api dependency and how to use it.
Installation
To start using the Plumbr Agent API, agent-api.jar must be added as a dependency to your project. When running the application without the Plumbr Agent attached, all calls to the library will be silently ignored without any performance impact. When the Plumbr Agent sees the attached Agent API library, it will perform the requested integration calls.
The Agent API is published on Bintray () and Maven Central.
Javadocs are published with the API, and are also available here.
To add the dependency, copy and paste the suitable snippet for your build system from the respective Bintray or maven central page.
To use the library in the code, the following import must be added to your source file:
import eu.plumbr.api.Plumbr;
Terminology
Span represents some time that the application has spent executing in one thread. Spans may be started and finished. Once a span starts, it becomes associated with the current processing thread and all root causes, which are detected within that thread are associated with the active span.
A span may contain any number of child spans. Child spans may be associated with threads either in the same JVM, or in a different JVM, which also is monitored by the Plumbr Java Agent.
A span may have metadata associated with it, which is shown in the single transaction view of an unhealthy transaction which that span belongs to.
A Transaction is a tree of spans, which consists of a root span and all of its children. The transaction has some additional properties that describe that tree of spans. These properties include:
- a transaction ID (a UUID, generated automatically)
- a service name (taken from the root span)
- an identifier of a user (taken from the root span)
Creating new transactions
When to use: Plumbr Agent fails to automatically discover transactions in a given application.
How: In this case, the transaction should be created manually by first calling eu.plumbr.api.Plumbr.newSpan()
, then configuring the service name and application of the span and calling eu.plumbr.api.Span.start()
to start it and eu.plumbr.api.Span.finish()
to end it.
Example:
Plumbr .newSpan() .setServiceName("My Service") .setUserId("user@domain.com") .start(); try { // do work } catch (Exception e) { Plumbr.getCurrentSpan().fail(e); } finally { Plumbr.getCurrentSpan().finish(); }
NB! service name is mandatory for manually created root spans. If not set, transaction will be discarded.
Setting transaction attributes
When to use: Plumbr Agent is able to detect transactions, but fails to assign meaningful service name or user ID to them.
How: In this case, eu.plumbr.api.Plumbr.getCurrentSpan()
should be called to get a reference to the automatically created span and then the properties of that span be set with the corresponding methods in eu.plumbr.api.Span
:
setServiceName(String serviceName) setUserId(String userId)
The getCurrentSpan()
is null-safe and thus it never returns null. If there is no current span in the current thread, then an instance of eu.plumbr.api.null.NullSpan
is returned instead. It is, in turn, a null-safe implementation of the Span. So, if an agent is not attached, then you still can call all the setters on the object returned by the getCurrentSpan()
without any additional null-checks. In most cases this is sufficient.
If you really need to check whether there is a current Plumbr span within the current thread (for example if the code which you want to monitor, can be called both from within a Plumbr transaction and without such), then method Span.isNull()
will return true if the returned span is a null-span and false if there is a current span.
Example:
Plumbr.getCurrentSpan().setUserId("my precious user"); Plumbr.getCurrentSpan().setServiceName("my precious service");
Create failed transaction with a custom exception
When to use: Plumbr is able to detect transactions, but is unable to automatically detect if they fail or associate the correct exception with the failure.
How: In this case, eu.plumbr.api.Plumbr.getCurrentSpan()
should be called to get a reference to the automatically created span and then eu.plumbr.api.Span.fail(Throwable)
be called to mark the span as failed and to optionally associate a specific exception as a root cause for the failure.
Example:
try { // do something that throws wrapped exception } catch (Exception e) { Plumbr.getCurrentSpan().fail(e.getParent()); }
Join remote spans to existing transaction
When to use: a request made from a transaction causes a new transaction in a remote application where linking it as a child span into the first transaction is desired.
How: In this case, before calling the remote service, the caller should create a child span in the current span by calling first eu.plumbr.api.Span.createChildSpan()
and then serializing it using eu.plumbr.api.SpanSerializer
. This serialized span can then be included in the request to the other application (which should be also monitored by the Plumbr agent) and deserialized there with eu.plumbr.api.SpanSerializer
and should then be started and finished manually using calls to eu.plumbr.api.Span.start()
and eu.plumbr.api.Span.finish()
respectively. After the call to the remote span finishes, the calling side must acknowledge that by calling eu.plumbr.api.Span.finishChildSpan(childSpan)
. See full examples below.
Listing 1: Managing a child span in the parent process:
Span childSpan = Plumbr.getCurrentSpan().createChildSpan(); String serializedChildSpan = SpanSerializer.toBase64(childSpan); // Transfer serializedChildSpan to another machine. // See Listing 2 about what to do there. try { try { // perform remote call } finally { Plumbr.getCurrentSpan().finishChildSpan(childSpan); } } catch (Exception e) { // If this failed remote call should fail the transaction: Plumbr.getCurrentSpan().fail(e) }
Listing 2: Working with a child span on remote JVM:
String serializedChildSpan = … // obtain a serialized child span Span span = SpanSerializer.fromBase64(serializedChildSpan); span.start() try { … } catch (Exception e) { span.fail(e); } finally { span.finish() }
Triggering Alerts
General Approach
Through the use of Plumbr Server API, it is possible to expose the insights captured by Plumbr to any system that can make an HTTP call. One of the more common use cases for that is sending out alerts to you on-call team so they can immediately respond to the degraded service level. Let us go over some of the common use cases that you might face.
Example 1:
Suppose that there is an e-shop application monitored by Plumbr at shop.example.com. What’s the most crucial metric for this application that can directly show if the business is going well? There may be many answers to that, depending on the business model, but “is anything being sold” would probably be close to the top of the list.
Since the application is monitored by Plumbr, each click on the “CHECK OUT” button on the cart is tracked, and the outcome is recorded. In the user interface, it could appear like this:
Looks like we have hundreds of users successfully checking out their cart. This means that revenue is being generated, and the e-shop can keep going. However, we’d like to make sure that these deals keep happening 24/7. So let us use the Plumbr API by passing in the serviceId and applicationName seen on the screenshot above.
$ curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=serviceId%3D1234567890abcdef,applicationName%3Dshop.example.com&last=4h"
[
{
"failed": 1,
"onlySlow": 0,
"success": 249,
"total": 261,
"verySlow": 11
}
]
These values can then be compared against some thresholds or other triggers. For instance, if there are zero sales during the last 4 hours, then something is probably broken (or it’s January the 1st). As we’ll see a bit later, it is very simple to codify such rules and send out alerts when needed.
Example 2:
Another important metric to track the well-being of the e-business would be how the users are experiencing the e-shop. If they are forced to wait for the spinning wheels, or, worse, if they are facing errors while flowing through the shop, then the long-term perspectives of the application are gloomy. In such a competitive market, people can just find a different e-shop that works for them.
With Plumbr, we can directly track the status of all the interactions in the e-shop:
$ curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=applicationName%3Dshop.example.com&last=4h"
[
{
"total" : 609,
"failed" : 3,
"success" : 586,
"verySlow" : 20,
"onlySlow" : 0
}
]
Dividing the “success” by “total”, we see that about 4% of the customers have received a sub-par digital user experience. If that number goes up, then it’s definitely a good cause for an alert.
Example 3:
For a bit of a more complex example, you could use Plumbr track the longer-term behaviour of your application. For instance, in some cases it may be a good idea to track spikes in error rates for a particular API call. A straightforward (albeit naive) approach to this would be using the moving average crossovers. To do that using Plumbr Server API, you would need to make two calls for different time windows:
$ curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=24h"
[
{
"total" : 2997918,
"failed" : 20361,
"success" : 2957453,
"verySlow" : 0,
"onlySlow" : 104
}
]
$ curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=1h"
[
{
"total" : 125001,
"failed" : 19117,
"success" : 105884,
"verySlow" : 0,
"onlySlow" : 0
}
]
From here, we can see that the error rate for 24 hours is under 1%, and may be within the SLO and perhaps not a reason for triggering an alert just yet. However, the majority of these errors all occurred in the last hour, with the error rate spiking to over 15%. This clearly indicates an issue. If something is not done quickly, the SLOs will be violated in no time.
The next step would be to set up regular monitoring of these values and to send out alerts based on them. This will come in the subsequent sections.
Triggering alerts using Cron
Putting it all together now, you can use the examples from the previous section to create a rudimentary alert system by writing a simple bash script:
#!/bin/sh
function alert() {
sendmail admin@example.com << EOF
subject: Alert from Plumbr
from: admin@example.com
Alert from Plumbr: $1
EOF
}
CHECKOUT_COUNT=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=serviceId%3D1234567890abcdef,applicationName%3Dshop.example.com&last=4h" | jq ".[0].total")
if [ $CHECKOUT_COUNT -eq 0 ]; then
alert "There were no carts checked out in the last 4 hours"
fi
ESHOP_UX_STATS=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=applicationName%3Dshop.example.com&last=4h")
ESHOP_USERS_TOTAL=$(echo "$ESHOP_UX_STATS" | jq ".[0].total")
ESHOP_USERS_OK=$(echo "$ESHOP_UX_STATS" | jq ".[0].success")
ESHOP_ERROR_RATE_PCT=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))
if [ $ESHOP_ERROR_RATE_PCT -gt 10 ]; then
alert "Error rate in e-shop is $ESHOP_ERROR_RATE_PCT"
fi
SEARCH_API_STATS_24H=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=24h")
SEARCH_API_TOTAL_24H=$(echo "$SEARCH_API_STATS_24H" | jq ".[0].total")
SEARCH_API_FAILED_24H=$(echo "$SEARCH_API_STATS_24H" | jq ".[0].failed")
SEARCH_API_ERROR_RATE_PCT_24H=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))
SEARCH_API_STATS_1H=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=1h")
SEARCH_API_TOTAL_1H=$(echo "$SEARCH_API_STATS_1H" | jq ".[0].total")
SEARCH_API_FAILED_1H=$(echo "$SEARCH_API_STATS_1H" | jq ".[0].failed")
SEARCH_API_ERROR_RATE_PCT_1H=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))
if [ $SEARCH_API_ERROR_RATE_PCT_1H -gt $SEARCH_API_ERROR_RATE_PCT_24H ]; then
alert "Short-term error rates are going up"
fi
This queries the Plumbr API for the values of all the relevant metrics of the application, and then verifies that these are within operational ranges. If not, an alert is sent out via email.
The pre-requisites for this script to work is to have sendmail configured on the machine, and curl and jq installed. Then all you have to do is add this script as a cron job and go to sleep.
As an alternative to alerting by email, you could also integrate with an external system. For example, for PagerDuty you would need to add a new Service that directly uses the Events V2 API, and note down the integration key. Then use the API to trigger incidents like so:
PAGERDUTY_INTEGRATION_KEY='ENTER_YOUR_INTEGRATION_KEY'
function alert() {
EVENT=`cat << EOF
{
"service_key": "$PAGERDUTY_INTEGRATION_KEY",
"event_type": "trigger",
"description": "$1"
}
`
curl -H "Content-Type:application/json" -X POST --data "$EVENT" "https://events.pagerduty.com/generic/2010-04-15/create_event.json"
}
Besides manually running queries, you can also add Plumbr data to your existing monitoring system such as Prometheus or Nagios. Using Plumbr allows you to have a much more clear signal of the user experience level instead of using low-level metrics like CPU utilization or instance health.
Integrating with Nagios
To integrate Plumbr with Nagios, you will need to use a custom check command that queries Plumbr Server API and verifies the returned numbers against a threshold. A reference implementation is available on bitbucket, along with more detailed instructions.
Integrating with Prometheus
To integrate Plumbr with Prometheus, you will need to use a custom exporter that collects data from Plumbr Server API and exposes it as Prometheus metrics. A reference implementation of such an exporter is available on bitbucket. This implementation can be configured to cover basic use cases, such as gathering the metrics and alerting based on their values using the standard Prometheus Alert Rules.
A pre-built docker image is coming soon as well. The more detailed documentation is available in the README file.
Integrating with Zabbix
To integrate Plumbr with Zabbix, you will need to use a custom external check item that fetches data from Plumbr Server API. Based upon the returned values, you can use the standard Zabbix triggers to send out alerts. A reference implementation is available on bitbucket, along with more detailed instructions.
Integrating with Atlassian Jira
There is a dedicated manual for integrating Plumbr with Atlassian Jira
Integrating with other systems
We currently only provide ready-to-use integrations for the monitoring systems that are the most widely used by our customers. Given the existing reference implementations here and the Plumbr Server API, it should be straightforward to map the same approach to integrate Plumbr with any other system as well. If in doubt, do not hesitate to contact us at support@plumbr.io.
Push alerts
Configuring alerts in the manner described above will cover predominantly every alerting use case required for your team of on-call engineers. It is a way to seamlessly integrate into your existing workflows and alerting tools such as Prometheus and Zabbix.
However, some use cases would require independent alerts to be setup and operate in autonomy. For many users, Plumbr is their only monitoring and alerting system. To satisfy these use cases, we provide the ability to configure Push Alerts within Plumbr. Plumbr allows you to configure simplistic alerts from within the Plumbr UI. This will enable you to receive alerts without requiring external systems.
Because Plumbr is based on real user monitoring, our triggers also reflect these very parameters. An example alerting threshold would be: “Notify me if more than 2% of 1000 last user interactions resulted in failures”.
Interactions (or API calls) are chosen as the basis for these alerts. Configuring an alert this way allows you to prevent alert flapping by default. The rolling window will allow you to prevent spurious notifications. Presently, you can configure one notification per application. If you require multiple alerts to be setup for a single application, please contact us at support@plumbr.io.
Steps to take to configure alerts:
- Make sure you’ve configured alerting channels as specified.
- In order to configure an alert, ensure that you have administrator-level privileges on your account.
- Open the application/API you want to setup the alert for.
- Click on the “Alerts” side menu item.
On the alert screen you can configure 4 different alerts: error rate, throughput, latency and new errors.
Error rate alerts
- You can configure a percentage threshold
- As you set the threshold, you can see the Alert band (—) appear correspondingly.
- Click on the “Setup alert” button to save this configuration.
- You will receive alerts via the selected channel whenever the rolling error rate exceeds the selected threshold.
Latency alerts
- It is possible to configure both median and 99th percentile alerts
- As you set the threshold, you can see the Alert band (—) appear correspondingly.
- 99th percentile is usually lot higher than median and can push the median chart series to the bottom where it is difficult to see. In that case it is possible to toggle the 99th percentile chart and threshold lines in the chart legend
- Click on the “Set up alert” button to save this configuration.
- You will receive alerts via the selected channel whenever latency exceeds the selected threshold.
Throughput alerts
- It is possible to configure time window and interaction count
- As you set the threshold, you can see the Alert band (—) appear correspondingly.
- You will receive alerts via the selected channel whenever throughput goes over or under the threshold
New error alerts
- Every time Plumbr detects a never before seen error an alert is triggered
- Error history chart is always shown for the last 30 days