Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@mattpr
Last active November 14, 2023 16:06
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save mattpr/de96f3a9c7b895ce5a9fbbe8812d0890 to your computer and use it in GitHub Desktop.
Save mattpr/de96f3a9c7b895ce5a9fbbe8812d0890 to your computer and use it in GitHub Desktop.
using mtail and nginx access_log to export custom metrics from nginx for prometheus and other monitoring scrapers

Feedback

I'm sure I've gotten multiple things wrong here. Either flat out wrong, anti-patterns or just sub-optimal. I'm new to prometheus, grafana and mtail...so please feel free to share corrections/suggestions.

Background

There are a handful of custom nginx stats exporters.

Some are tying into internal nginx stats like the official nginx exporter: nginx-prometheus-exporter

Others are custom applications that tail/parse nginx access logs in order to generate more detailed or custom stats (example: Martin Helmich's prometheus-nginxlog-exporter).

The problem with the first type is that you are limited to the nginx internal stats.

The problem with the second is that you are limited in customization due to the needs of the author being specific. If you are happy with the stats the author decided on, great. For instance with Martin Helmich's custom exporter, the docs mention working with about 3 nginx variables in the access log...I think there are many more "standard" ones that the exporter expects but the documentation wasn't clear to me. Including/customizing other metrics/labels does not seem to be possible with that exporter.

Since using any exporter means deploying a binary, setting up a systemd service to process manage the binary and doing a bit of configuration, I decided to just use mtail for this. The idea being I will probably use mtail (general purpose solution) for other things at some point, so will net less infrastructure provisioning work in the future.

mtail is basically just a log tailer that tries to be smart about not losing data due to log rotation and allows you to write custom PATTERN:ACTION logic to extract fields and metrics from log lines and combine them into custom metrics that will be exported on http for prometheus and other monitoring systems under a single service.

mtail isn't providing interval based counters (e.g. application hits in the last 5 minutes). The counters are perpetually growing (reset at restart of course)...so you will generally use prometheus' rate function which looks at change in the counter over time. Prometheus is smart about bridging over restarts (ie where counters reset to 0).

I've included most of the relevant files here as an example configuration.

Things we need to do

  • install mtail binary on system
    • create any custom users/groups
    • create init script or systemd service. I provide a sample systemd unit below.
  • Pick a location for a new/custom nginx access_log we will use for mtail.
    • nginx will need write access and mtail read access.
    • Beware of log rotation permission/ownership changes.
    • permissions on the log file may not be enough, you may also need execute permissions on the parent directory.
  • Configure nginx to write to the new/additional access_log
  • Create mtail program to parse the new nginx access_log
    • I include nginx.mtail below as an example.
  • Make sure appropriate firewall port is open, that nginx is reloaded/running and mtail service is running.
  • tail the new access log and make sure you are getting data. If you are buffering it may take a while.
  • on the server: curl http://localhost:<mtail-port>/metrics and make sure you are getting prometheus metrics
  • Point prometheus server at the port mtail is running on in order to scrape the new metrics
    • generally this means adding a new scrape config in prometheus.yml...but will depend on how you configure prometheus (service discovery, etc).
  • Add new prometheus alerts based on the new metrics (for instance when when rate of non 200 responses is greater than 10% percent of 200 responses). Something like (untested):
    • sum(rate(nginx_request_count{nginx_status!="200"}[10m]) * 60) by (instance, nginx_host) / sum(rate(nginx_request_count{nginx_status="200"}[10m]) * 60) by (instance, nginx_host) > 0.1
  • Build grafana dashboard/etc, alerts, etc
    • grafana.nginx.json is an example dashboard showing how you can use these metrics.
    • screenshots at the bottom

Advice

  • Read the mtail Language doc.
  • Run mtail -h and read through the commandline parameters that are available.
  • review the /tmp/mtail* log files if something isn't working or tail syslog or journalctl -f | grep mtail as you are doing things. Watch for errors.
  • make sure nginx and mtail are both running.
  • if nginx and mtail are both running but nothing is showing up in the access_log, you probably have a permissions problem on the log file or directory where the logfile lives.
  • think in terms of labels vs metrics. some log fields will be used for filtering/grouping (ie labels), others are used to update counters. In our example:
    • Labels: $host, $server_port, $request_method, $uri, $content_type, $status
    • Metrics: $request_length, $bytes_sent, $body_bytes_sent, $request_time\, $upstream_connect_time, $upstream_header_time, $upstream_response_time
    • $msec is neither label or metric, but tells mtail the time of the logline since there may be a write delay due to buffering.
  • If you try to increment counters with strings (e.g. -) there seems to be an issue with mtail where you will get an exported metric with no metric. e.g. nginx_request{...} instead of nginx_request{...} <metric>. Prometheus will report "down" but you will see the http endpoint is up. If you look at prometheus UI > targets you can see the specific parse error message giving you a hint.

I know it seems like a lot...but that is mostly because I am over-explaining. I started using prometheus/grafana/node_exporter less than a week ago for the first time. I figured out all the mtail stuff described here in an afternoon. The grafana dashboard only took an hour to build from scratch. So overall I'm quite happy with the setup with mtail. The only gripe is the strange issue where mtail sometimes exports metrics without a metric. A restart of mtail seems to resolve that...but it is annoying that I don't know exactly why it happens. Something to dig into later.

# Tab separated nginx access_log format with just the fields we care about.
#
# Added some buffering/caching that should help with performace. I'm no expert.
#
# If you want to customize this, you just need to add more nginx variables to the log format and
# update the nginx.mtail pattern/program to match.
# http://nginx.org/en/docs/varindex.html
#
# We put this in /etc/nginx/conf.d/ since the http block in /etc/nginx/nginx.conf includes /etc/nginx/conf.d/*.conf
# prefixed with 00- so it is loaded before other server config files in conf.d
#
log_format prometheus_log '$msec\t$host\t$server_port\t$request_method\t$uri\t$content_type\t$status\t$request_length\t$bytes_sent\t$body_bytes_sent\t$request_time\t$upstream_connect_time\t$upstream_header_time\t$upstream_response_time';
# location here should match logfile location mtail is configured to look at.
access_log /opt/mtail/logs/nginx-access.log prometheus_log buffer=1k;
open_log_file_cache max=1000 inactive=20s valid=1m min_uses=2;
# we stick with nginx var names here and prefix the exported metric with nginx_ by using "as"
# "by" allows us to specify our labelling/grouping
counter request_count by nginx_host, nginx_port, nginx_method, nginx_uri, nginx_content_type, nginx_status as "nginx_request_count"
counter request_length_bytes by nginx_host, nginx_port, nginx_method, nginx_uri, nginx_content_type, nginx_status as "nginx_request_length_bytes"
counter bytes_sent by nginx_host, nginx_port, nginx_method, nginx_uri, nginx_content_type, nginx_status as "nginx_bytes_sent"
counter body_bytes_sent by nginx_host, nginx_port, nginx_method, nginx_uri, nginx_content_type, nginx_status as "nginx_body_bytes_sent"
counter request_time by nginx_host, nginx_port, nginx_method, nginx_uri, nginx_content_type, nginx_status as "nginx_request_time"
counter upstream_connect_time by nginx_host, nginx_port, nginx_method, nginx_uri, nginx_content_type, nginx_status as "nginx_upstream_connect_time"
counter upstream_header_time by nginx_host, nginx_port, nginx_method, nginx_uri, nginx_content_type, nginx_status as "nginx_upstream_header_time"
counter upstream_response_time by nginx_host, nginx_port, nginx_method, nginx_uri, nginx_content_type, nginx_status as "nginx_upstream_response_time"
counter nginx_log_nomatch_count
# the following pattern matches exactly the tab-separated fields in our custom access_log format
# if you want to customize...this pattern should be updated to match changes you make in the nginx log format.
/^/ +
/(?P<msec>\d+)\.\d+\t/ + # settime() needs just the seconds so we exclude the .milliseconds part
/(?P<host>\S+)\t/ +
/(?P<server_port>\S+)\t/ +
/(?P<request_method>\S+)\t/ +
/(?P<uri>\S+)\t/ +
/(?P<content_type>\S+)\t/ +
/(?P<status>\S+)\t/ +
/(?P<request_length>\d+)\t/ +
/(?P<bytes_sent>\d+)\t/ +
/(?P<body_bytes_sent>\d+)\t/ +
/(?P<request_time>\d+\.\d+)\t/ +
/(?P<upstream_connect_time>\S+)\t/ +
/(?P<upstream_header_time>\S+)\t/ +
/(?P<upstream_response_time>\S+)/ +
/$/ {
settime($msec)
request_count[$host][$server_port][$request_method][$uri][$content_type][$status]++
request_length_bytes[$host][$server_port][$request_method][$uri][$content_type][$status] += $request_length
bytes_sent[$host][$server_port][$request_method][$uri][$content_type][$status] += $bytes_sent
body_bytes_sent[$host][$server_port][$request_method][$uri][$content_type][$status] += $body_bytes_sent
request_time[$host][$server_port][$request_method][$uri][$content_type][$status] += $request_time
# mtail is happier doing counters with floats/ints.
# nginx logs '-' when there isn't a value which
# we check for - and skip updating these counters if found.
# otherwise we cast the string to a float and increment the counter.
$upstream_connect_time != "-" {
upstream_connect_time[$host][$server_port][$request_method][$uri][$content_type][$status] += float($upstream_connect_time)
}
$upstream_header_time != "-" {
upstream_header_time[$host][$server_port][$request_method][$uri][$content_type][$status] += float($upstream_header_time)
}
$upstream_response_time != "-" {
upstream_response_time[$host][$server_port][$request_method][$uri][$content_type][$status] += float($upstream_response_time)
}
} else {
# our pattern doesn't match.
# in this example since we have a single program with a single log file...it should always match
# or we have a bug.
# can use this metric to detect when our parser is failing.
nginx_log_nomatch_count++
}
#
# This is the systemd service
# It would need to be customized for your system, where you deploy the binary, where the access log lives, etc.
# I also log to stderr so I can see what is going on in journalctl
# emit_prog_label=false just prevents us from having prog=mtail label on all metrics. not critical.
#
[Unit]
Description=mtail
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
Restart=on-failure
WorkingDirectory=/opt/mtail/current
ExecStart=/opt/mtail/current/mtail \
--address 0.0.0.0 --port 9101 \
--emit_prog_label=false --alsologtostderr \
--progs /opt/mtail/progs \
--logs /opt/mtail/logs/nginx-access.log
User=mtail
Group=mtail
[Install]
WantedBy=multi-user.target
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 6,
"iteration": 1550476630144,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
},
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_request_count{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Requests over Time",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 0
},
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": false,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_request_time{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host) / sum(increase(nginx_request_count{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Average Time per Request",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 9,
"w": 8,
"x": 0,
"y": 9
},
"id": 14,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": false,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_request_count{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host, nginx_content_type)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "requests by content_type",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 9,
"w": 8,
"x": 8,
"y": 9
},
"id": 16,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": false,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_request_count{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host, nginx_status)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "requests by http status",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 9,
"w": 8,
"x": 16,
"y": 9
},
"id": 17,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": false,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_request_count{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host, nginx_uri)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "requests by URI",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 9,
"w": 8,
"x": 0,
"y": 18
},
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_upstream_connect_time{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host)",
"format": "time_series",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "upstream_connect_time",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 9,
"w": 8,
"x": 8,
"y": 18
},
"id": 11,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_upstream_header_time{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "upstream_header_time",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 9,
"w": 8,
"x": 16,
"y": 18
},
"id": 12,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_upstream_response_time{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "upstream_response_time",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 27
},
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_request_length_bytes{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "request_length_bytes",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fill": 1,
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 35
},
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(nginx_bytes_sent{instance=~\"$instance\",nginx_host=~\"$nginx_host\",nginx_port=~\"$nginx_port\",nginx_uri=~\"$nginx_uri\",nginx_content_type=~\"$nginx_content_type\",nginx_status=~\"$nginx_status\"}[$interval])) by (instance, nginx_host)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "bytes_sent",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": false,
"schemaVersion": 16,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "30s",
"value": "30s"
},
"hide": 0,
"label": "interval",
"name": "interval",
"options": [
{
"selected": true,
"text": "30s",
"value": "30s"
},
{
"selected": false,
"text": "1m",
"value": "1m"
},
{
"selected": false,
"text": "5m",
"value": "5m"
},
{
"selected": false,
"text": "10m",
"value": "10m"
},
{
"selected": false,
"text": "30m",
"value": "30m"
},
{
"selected": false,
"text": "1h",
"value": "1h"
},
{
"selected": false,
"text": "6h",
"value": "6h"
},
{
"selected": false,
"text": "12h",
"value": "12h"
},
{
"selected": false,
"text": "1d",
"value": "1d"
},
{
"selected": false,
"text": "7d",
"value": "7d"
},
{
"selected": false,
"text": "14d",
"value": "14d"
},
{
"selected": false,
"text": "30d",
"value": "30d"
}
],
"query": "30s,1m,5m,10m,30m,1h,6h,12h,1d,7d,14d,30d",
"refresh": 2,
"skipUrlSync": false,
"type": "interval"
},
{
"allValue": null,
"current": {
"text": "All",
"value": "$__all"
},
"datasource": "Prometheus",
"definition": "label_values(nginx_request_count, instance)",
"hide": 0,
"includeAll": true,
"label": "Instance",
"multi": true,
"name": "instance",
"options": [],
"query": "label_values(nginx_request_count, instance)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "All",
"value": "$__all"
},
"datasource": "Prometheus",
"definition": "label_values(nginx_host)",
"hide": 0,
"includeAll": true,
"label": "Nginx Host",
"multi": true,
"name": "nginx_host",
"options": [],
"query": "label_values(nginx_host)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "All",
"value": "$__all"
},
"datasource": "Prometheus",
"definition": "label_values(nginx_port)",
"hide": 0,
"includeAll": true,
"label": "HTTP Port",
"multi": true,
"name": "nginx_port",
"options": [],
"query": "label_values(nginx_port)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "All",
"value": "$__all"
},
"datasource": "Prometheus",
"definition": "label_values(nginx_method)",
"hide": 0,
"includeAll": true,
"label": "HTTP Method",
"multi": true,
"name": "nginx_method",
"options": [],
"query": "label_values(nginx_method)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "All",
"value": "$__all"
},
"datasource": "Prometheus",
"definition": "label_values(nginx_uri)",
"hide": 0,
"includeAll": true,
"label": "HTTP URI",
"multi": true,
"name": "nginx_uri",
"options": [],
"query": "label_values(nginx_uri)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "All",
"value": "$__all"
},
"datasource": "Prometheus",
"definition": "label_values(nginx_status)",
"hide": 0,
"includeAll": true,
"label": "HTTP Status",
"multi": true,
"name": "nginx_status",
"options": [],
"query": "label_values(nginx_status)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "All",
"value": "$__all"
},
"datasource": "Prometheus",
"definition": "label_values(nginx_content_type)",
"hide": 0,
"includeAll": true,
"label": "Request Content-Type",
"multi": true,
"name": "nginx_content_type",
"options": [],
"query": "label_values(nginx_content_type)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "nginx",
"uid": "xdlNPjXmk",
"version": 18
}
@hikhvar
Copy link

hikhvar commented Feb 5, 2023

Thank you for this good example! I have build upon this. Mainly I have removed some metrics and used histograms for analysis. I have written down my setup here: https://journal.petrausch.info/post/2023/02/nginx-monitoring-mtail/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment