Triggering alerts using Cron
Putting it all together now, you can use the examples from the previous section to create a rudimentary alert system by writing a simple bash script:
#!/bin/sh
function alert() {
sendmail admin@example.com << EOF
subject: Alert from Plumbr
from: admin@example.com
Alert from Plumbr: $1
EOF
}
CHECKOUT_COUNT=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=serviceId%3D1234567890abcdef,applicationName%3Dshop.example.com&last=4h" | jq ".[0].total")
if [ $CHECKOUT_COUNT -eq 0 ]; then
alert "There were no carts checked out in the last 4 hours"
fi
ESHOP_UX_STATS=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/users/summary?context=applicationName%3Dshop.example.com&last=4h")
ESHOP_USERS_TOTAL=$(echo "$ESHOP_UX_STATS" | jq ".[0].total")
ESHOP_USERS_OK=$(echo "$ESHOP_UX_STATS" | jq ".[0].success")
ESHOP_ERROR_RATE_PCT=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))
if [ $ESHOP_ERROR_RATE_PCT -gt 10 ]; then
alert "Error rate in e-shop is $ESHOP_ERROR_RATE_PCT"
fi
SEARCH_API_STATS_24H=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=24h")
SEARCH_API_TOTAL_24H=$(echo "$SEARCH_API_STATS_24H" | jq ".[0].total")
SEARCH_API_FAILED_24H=$(echo "$SEARCH_API_STATS_24H" | jq ".[0].failed")
SEARCH_API_ERROR_RATE_PCT_24H=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))
SEARCH_API_STATS_1H=$(curl -s -u admin@example.com "https://app.plumbr.io/api/v4/transactions/summary?context=applicationName%3Dsearch.example.com,serviceId=examplequicksearch1234567890&last=1h")
SEARCH_API_TOTAL_1H=$(echo "$SEARCH_API_STATS_1H" | jq ".[0].total")
SEARCH_API_FAILED_1H=$(echo "$SEARCH_API_STATS_1H" | jq ".[0].failed")
SEARCH_API_ERROR_RATE_PCT_1H=$(((ESHOP_USERS_OK * 100) / ESHOP_USERS_TOTAL))
if [ $SEARCH_API_ERROR_RATE_PCT_1H -gt $SEARCH_API_ERROR_RATE_PCT_24H ]; then
alert "Short-term error rates are going up"
fi
This queries the Plumbr API for the values of all the relevant metrics of the application, and then verifies that these are within operational ranges. If not, an alert is sent out via email.
The pre-requisites for this script to work is to have sendmail configured on the machine, and curl and jq installed. Then all you have to do is add this script as a cron job and go to sleep.
As an alternative to alerting by email, you could also integrate with an external system. For example, for PagerDuty you would need to add a new Service that directly uses the Events V2 API, and note down the integration key. Then use the API to trigger incidents like so:
PAGERDUTY_INTEGRATION_KEY='ENTER_YOUR_INTEGRATION_KEY'
function alert() {
EVENT=`cat << EOF
{
"service_key": "$PAGERDUTY_INTEGRATION_KEY",
"event_type": "trigger",
"description": "$1"
}
`
curl -H "Content-Type:application/json" -X POST --data "$EVENT" "https://events.pagerduty.com/generic/2010-04-15/create_event.json"
}
Besides manually running queries, you can also add Plumbr data to your existing monitoring system such as Prometheus or Nagios. Using Plumbr allows you to have a much more clear signal of the user experience level instead of using low-level metrics like CPU utilization or instance health.