To blog |

Universal agent expands bottleneck coverage with slow http requests

July 10, 2020 by Ago Allikmaa Filed under: Performance Plumbr Product Updates

Most complex applications are nowadays composed of smaller functional pieces (often called micro services) that communicate to each other over HTTP. When you develop and maintain one such service that depends on other services, your code’s performance in production does not only depend on the efficiency of your own algorithms, but also on the performance of the downstream services that your code consumes.

This means that in order to discover the bottlenecks that impact your service’s consumers the most, you need to be notified by your monitoring solution when the dependent services don’t perform.

The July update to the Plumbr Universal Agent (versions and up) addresses exactly that use case. It introduces two new types of bottlenecks recognized in PHP and Python code, and any applications that run inside Apache or Nginx – the Slow HTTP Call and the Recurring HTTP Call.

Here’s how they look like in the list of your application’s bottlenecks:

How do we track these bottlenecks and their impact? 

The Plumbr Universal Agent monitors the outgoing HTTP/1.0 and HTTP/1.1 requests that are made while the server thread handles an incoming HTTP call. These requests are detected by analyzing network traffic. Detection of outgoing HTTPS (SSL/TLS) requests is supported if the library used to make the requests uses OpenSSL for handling TLS.

The agent collects the request URL, request method and response code for each request. If the TCP connection is closed before request handling is finished, the request is marked as aborted, which is shown instead of the response code. The request duration is the time between when the first byte of the request was written by the application and when the last byte of the response was received. In case this was the first request for the same TCP connection, the time to open the connection is also included.

Each individual request that takes longer than 1 second is registered as a bottleneck. Here’s how we display the details of such bottleneck – notice how the call stacks from all 52,342 incoming HTTP calls that were impacted by the bottleneck are aggregated on one page:

On this screenshot, you can see the name of the monitored application (remote), the HTTP method and URL of the bottleneck (GET https://nginx-unmonitored:8080/sleep.php), the response code (200), and the aggregated call stacks that resulted in your code making a call to the slow URL.

Should you want to drill down to the individual incoming HTTP call that suffered from the bottleneck, here’s how you will find Plumbr Single Transaction View display the HTTP Call bottleneck:

Sometimes the outgoing HTTP calls are quick to finish, but your code is written in a way that it makes the same query over and over, and the aggregate amount of time it spends on waiting for the downstream service still makes that service a bottleneck. In such cases, Plumbr reports the Recurring HTTP Call bottleneck. It looks very similar to the above, just that you’ll additionally see how many times the HTTP service was called.

HTTP request bottlenecks are not registered if the downstream service is also monitored by either the Plumbr Universal or the Plumbr Java agent. In that case bottlenecks from the downstream system will be detected and presented instead.

Finally, how do you get to use the new feature? Existing customers just need to upgrade their Plumbr Universal Agent, make sure to download the newest version here. If you’re not a Plumbr user yet, create a trial account to try it out.