Skip to content

Instantly share code, notes, and snippets.

@MAS150MD200
Created August 14, 2016 18:33
Show Gist options
  • Save MAS150MD200/f236372a788689608a9caf5bd342d5dd to your computer and use it in GitHub Desktop.
Save MAS150MD200/f236372a788689608a9caf5bd342d5dd to your computer and use it in GitHub Desktop.
https://blog.svedr.in/posts/prometheus-quick-start.html
https://blog.svedr.in/posts/prometheus-quick-start.rst
.. title: Prometheus quick start
.. slug: prometheus-quick-start
.. date: 2016-07-05 15:50:00 UTC+02:00
.. tags: linux, prometheus
.. link:
.. description:
.. type: text
Here's a little quick start procedure I like to use to get the `Prometheus <https://prometheus.io/>`_ monitoring system up and running, featuring Prometheus itself, the node and SNMP exporters, separate directories for configured and discovered targets and a couple of basic alerts.
So far, I used this on Ubuntu 14.04 and Debian Jessie.
.. TEASER_END
Prometheus
==========
First, install Prometheus::
add-apt-repository ppa:ubuntu-lxc/lxd-stable
apt-get update
apt-get install golang build-essential
echo 'GOPATH="/opt/gocode"' >> /etc/environment
source /etc/environment
export GOPATH
go get github.com/prometheus/prometheus/cmd/prometheus
go get github.com/prometheus/prometheus/cmd/promtool
go get github.com/prometheus/node_exporter
go get github.com/prometheus/alertmanager
mkdir -p /etc/prometheus/targets/node
mkdir -p /etc/prometheus/targets/snmp
mkdir -p /var/lib/prometheus/data
mkdir -p /var/lib/prometheus/amdata
mkdir -p /var/lib/prometheus/discovery/node
mkdir -p /var/lib/prometheus/discovery/snmp
Put the following into ``/etc/prometheus/prometheus.yml``::
scrape_configs:
- job_name: "node"
scrape_interval: "15s"
file_sd_configs:
- files:
- '/etc/prometheus/targets/node/*.json'
- '/var/lib/prometheus/discovery/node/*.json'
- job_name: 'snmp'
params:
module: [default]
file_sd_configs:
- files:
- '/etc/prometheus/targets/snmp/*.json'
- '/var/lib/prometheus/discovery/snmp/*.json'
relabel_configs:
- source_labels: [instance]
target_label: hostname
- source_labels: [__address__]
target_label: __param_address
- source_labels: [__param_address]
target_label: instance
- target_label: __address__
replacement: '127.0.0.1:9116'
rule_files:
- /etc/prometheus/alert.rules
For starters, you can monitor the Prometheus node itself using Node exporter by
putting the following into ``/etc/prometheus/targets/node/localhost.json``::
[
{
"targets": ["127.0.0.1:9100"],
"labels": {
"instance": "localhost"
}
}
]
SNMP Exporter
=============
Next, install the SNMP Exporter. You get to choose between the official branch::
apt-get install python-netsnmp python-dev python-pip
pip install snmp_exporter
and my own branch, that I extended with a more modular config and I fixed an infinite loop that occurred in our network for some reason::
apt-get install python-netsnmp python-dev python-pip
cd /opt
git clone https://github.com/Svedrin/snmp_exporter.git
cd snmp_exporter
git checkout svedrin-master
python setup.py install
cp -r snmp.yml.d /etc/prometheus
(So, you only need to run *one* of the two sections above.)
It totally helps if all your nodes are configured to use the same SNMP community
and you have a discovery tool that can generate a JSON file that knows them all.
This way, you can literally get up and running in minutes.
Alerting
========
We installed Alert Manager already, time to configure it -- the config file is
``/etc/prometheus/alertmanager.conf``::
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'smtp.derpyherp.com'
smtp_from: 'prometheus@herpyderp.com'
smtp_auth_username: 'derpity'
smtp_auth_password: 'derpington'
route:
receiver: 'team-X-mails'
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 6h
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
# Apply inhibition if the alertname is the same.
equal: ['alertname']
receivers:
- name: 'team-X-mails'
email_configs:
- to: 'svedrin@herpyderp.com'
Rules go into ``/etc/prometheus/alert.rules``::
ALERT node_down
IF up == 0 AND job="node"
FOR 5m
ANNOTATIONS {
summary = "Node is down",
description = "Node has been unreachable for more than 5 minutes.",
severity = "warning"
}
ALERT snmp_down
IF up == 0 AND job="snmp"
FOR 5m
ANNOTATIONS {
summary = "SNMP is down",
description = "SNMP has been unreachable for more than 5 minutes.",
severity = "warning"
}
ALERT fs_at_80_percent
IF hrStorageUsed{hrStorageDescr=~"/.+"} / hrStorageSize >= 0.8
FOR 15m
ANNOTATIONS {
summary = "File system {{$labels.hrStorageDescr}} is at 80%",
description = "{{$labels.hrStorageDescr}} has been at 80% for more than 15 Minutes.",
severity = "warning"
}
ALERT fs_at_90_percent
IF hrStorageUsed{hrStorageDescr=~"/.+"} / hrStorageSize >= 0.9
FOR 15m
ANNOTATIONS {
summary = "File system {{$labels.hrStorageDescr}} is at 90%",
description = "{{$labels.hrStorageDescr}} has been at 90% for more than 15 Minutes.",
severity = "average"
}
ALERT disk_load_mostly_random_reads
IF rate(diskIOReads{diskIODevice=~"sd[a-z]+"}[5m]) > 20 AND
rate(diskIONReadX{diskIODevice=~"sd[a-z]+"}[5m]) / rate(diskIOReads{diskIODevice=~"sd[a-z]+"}[5m]) < 10000
FOR 15m
ANNOTATIONS {
summary = "Disk {{$labels.diskIODevice}} reads are mostly random.",
description = "{{$labels.diskIODevice}} reads have been mostly random for the past 15 Minutes.",
severity = "info"
}
ALERT disk_load_mostly_random_writes
IF rate(diskIOWrites{diskIODevice=~"sd[a-z]+"}[5m]) > 20 AND
rate(diskIONWrittenX{diskIODevice=~"sd[a-z]+"}[5m]) / rate(diskIOWrites{diskIODevice=~"sd[a-z]+"}[5m]) < 10000
FOR 15m
ANNOTATIONS {
summary = "Disk {{$labels.diskIODevice}} writes are mostly random.",
description = "{{$labels.diskIODevice}} writes have been mostly random for the past 15 Minutes.",
severity = "info"
}
ALERT disk_load_high
IF diskIOLA1{diskIODevice=~"s|vd[a-z]+"} > 30
FOR 15m
ANNOTATIONS {
summary = "Disk {{$labels.diskIODevice}} is at 30%",
description = "{{$labels.diskIODevice}} Load has exceeded 30% over the past 15 Minutes.",
severity = "warning"
}
ALERT cpu_load_high
IF ssCpuIdle < 70
FOR 15m
ANNOTATIONS {
summary = "CPU is at 30%",
description = "CPU Load has constantly exceeded 30% over the past 15 Minutes.",
severity = "warning"
}
ALERT linux_load_high
IF laLoad1 > 50
FOR 15m
ANNOTATIONS {
summary = "Linux Load is at 40",
description = "Linux Load has constantly exceeded 40 over the past 15 Minutes.",
severity = "average"
}
ALERT if_operstatus_changed
IF delta(ifOperStatus[15m]) != 0
ANNOTATIONS {
summary = "Port {{$labels.ifDescr}} changed status",
description = "Port {{$labels.ifDescr}} went up or down in the past 15 Minutes",
severity = "info"
}
ALERT if_traffic_at_30_percent
IF ifSpeed > 10000000 AND
ifOperStatus == 1 AND
rate(ifInOctets[5m]) > ifSpeed * 0.3
FOR 15m
ANNOTATIONS {
summary = "Port {{$labels.ifDescr}} is at 30%",
description = "Port {{$labels.ifDescr}} has had at least 30% traffic over the past 15 Minutes.",
severity = "warning"
}
ALERT if_traffic_at_70_percent
IF ifSpeed > 10000000 AND
ifOperStatus == 1 AND
rate(ifInOctets[5m]) > ifSpeed * 0.7
FOR 15m
ANNOTATIONS {
summary = "Port {{$labels.ifDescr}} is at 70%",
description = "Port {{$labels.ifDescr}} has had at least 70% traffic over the past 15 Minutes.",
severity = "average"
}
.. note::
Please be aware that those rules only cover SNMP data, and for the
most part relate to data the upstream SNMP exporter doesn't even
scrape.
You could also put the instance name into the alert summary and/or
description, but I'd advise against it. If you omit that info, you
can more easily group alerts by their summary.
Upstart configs
===============
All this stuff has to be started somehow. If you're on Ubuntu 14.04,
you may want to (or find yourself forced to) use upstart. So, here goes:
``/etc/init/prometheus.conf``::
# Run prometheus
start on startup
script
cd /opt/gocode/src/github.com/prometheus/prometheus
/opt/gocode/bin/prometheus \
-storage.local.path="/var/lib/prometheus/data" \
-config.file=/etc/prometheus/prometheus.yml \
-alertmanager.url=http://localhost:9093/alert-manager/ \
-web.external-url=http://192.168.0.1/prometheus
end script
``/etc/init/alertmanager.conf``::
# Run alert manager
start on startup
script
/opt/gocode/bin/alertmanager \
-log.level=debug \
-storage.path="/var/lib/prometheus/amdata" \
-config.file=/etc/prometheus/alertmanager.conf \
-web.external-url=http://192.168.0.1/alert-manager/
end script
``/etc/init/node-exporter.conf``::
# Run node_exporter
start on startup
script
/opt/gocode/bin/node_exporter
end script
``/etc/init/snmp-exporter.conf``::
# Run snmp_exporter
start on startup
script
# This is only relevant for the Svedrin edition. Omit it for upstream.
cat /etc/prometheus/snmp.yml.d/*.yml > /var/lib/prometheus/snmp.yml
/usr/local/bin/snmp_exporter /var/lib/prometheus/snmp.yml
end script
Systemd configs
===============
If you're fortunate enough to be on a platform that supports Systemd, the following configs may come in handy.
``/etc/systemd/system/prometheus.service``::
[Unit]
Description=Prometheus server
After=network.target
[Service]
WorkingDirectory=/opt/gocode/src/github.com/prometheus/prometheus/
ExecStart=/opt/gocode/bin/prometheus \
-storage.local.path=/var/lib/prometheus/data \
-config.file=/etc/prometheus/prometheus.yml \
-alertmanager.url=http://localhost:9093/alert-manager \
-web.external-url=http://192.168.0.1/prometheus/
User=prometheus
[Install]
WantedBy=multi-user.target
``/etc/systemd/system/alertmanager.service``::
[Unit]
Description=Prometheus Alert Manager
After=network.target
[Service]
ExecStart=/opt/gocode/bin/alertmanager \
-log.level=debug \
-storage.path="/var/lib/prometheus/amdata" \
-config.file=/etc/prometheus/alertmanager.conf \
-web.external-url=http://192.168.0.1/alert-manager/
User=prometheus
[Install]
WantedBy=multi-user.target
``/etc/systemd/system/node-exporter.service``::
WantedBy=multi-user.target[Unit]xporter
Description=Prometheus Node Exporter
After=network.target
[Service]
ExecStart=/usr/local/sbin/node_exporter
User=nobody
[Install]
WantedBy=multi-user.target
``/etc/systemd/system/snmp-exporter.service``::
[Unit]
WantedBy=multi-user.target
Description=Prometheus SNMP Exporter
After=network.target
[Service]
WorkingDirectory=/opt/snmp_exporter
Environment=PYTHONPATH=.
ExecStart=/usr/bin/python scripts/snmp_exporter snmp.yml
User=nobody
[Install]
WantedBy=multi-user.target
(I haven't yet ported my ``snmp.yml.d`` mechanism to my systemd machine, so I don't have a config for that yet.)
Apache2 Reverse Proxy
=====================
``/etc/apache2/sites-available/prometheus.conf``::
ProxyPass /prometheus/ https://localhost:9090/prometheus/
ProxyPassReverse /prometheus/ https://localhost:9090/prometheus/
ProxyPass /alert-manager/ https://localhost:9093/alert-manager/
ProxyPassReverse /alert-manager/ https://localhost:9093/alert-manager/
Summary
=======
This config illustrates a quick way to get started. I consider it more
of a guideline than a production-ready setup, please don't forget to
adapt it to your needs. Especially the alert rules will need some tuning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment