Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Prometheus exporter for barman
# vim: set ft=dosini:
# Supervisord config for the barman exporter
[program:barman_exporter]
environment=PATH=/usr/local/bin:%(ENV_PATH)s
command=/usr/local/bin/env python3 /var/barman/barman_exporter.py
user=barman
autostart=true
stopasgroup=true
killasgroup=true
autorestart=true
startretries=10000
stderr_logfile=/var/log/%(program_name)s.err.log
stdout_logfile=/var/log/%(program_name)s.out.log
stdout_logfile_maxbytes=200MB
stdout_logfile_backups=1
stderr_logfile_maxbytes=200MB
stderr_logfile_backups=1
#!/usr/bin/env python3.6
import sys
import time
import contextlib
import collections
from datetime import datetime
import prometheus_client
from prometheus_client import core
from barman import cli
from barman import output
from barman import backup
from barman.server import CheckOutputStrategy
class Output(output.ConsoleOutputWriter):
results = collections.defaultdict(dict)
def result_check(self, server_name, check, status, hint=None):
self.results[check] = dict(status=status, hint=hint)
class BarmanCollector:
def __init__(self, args):
self.args = args
self.results = output._writer.results
def collect(self):
cli.global_config(self.args)
servers = cli.get_server_list(self.args)
collectors = dict(
barman_backups=core.GaugeMetricFamily(
'barman_backups', 'total backups available',
labels=['server']),
barman_last_backup=core.GaugeMetricFamily(
'barman_last_backup', 'last backup timestamp',
labels=['server']),
barman_last_backup_age=core.GaugeMetricFamily(
'barman_last_backup_age', 'seconds since last backup',
labels=['server']),
barman_status=core.GaugeMetricFamily(
'barman_status', 'Several barman status checks',
labels=['server', 'check'])
)
for server_name, server in servers.items():
backups = len(server.backup_manager.get_available_backups(
status_filter=(backup.BackupInfo.DONE,)))
collectors['barman_backups'].add_metric([server_name], backups)
last_backup = server.backup_manager.get_last_backup_id()
if last_backup:
now = datetime.now()
last_backup = datetime.strptime(last_backup, '%Y%m%dT%H%M%S')
collectors['barman_last_backup'].add_metric(
[server_name], time.mktime(last_backup.timetuple()))
collectors['barman_last_backup_age'].add_metric(
[server_name], (now - last_backup).total_seconds())
with contextlib.closing(server):
check_strategy = CheckOutputStrategy()
# Check WAL archive
server.check_archive(check_strategy)
# Postgres configuration is not available on passive nodes
if not server.passive_node:
server.check_postgres(check_strategy)
# Check barman directories from barman configuration
server.check_directories(check_strategy)
# Check retention policies
server.check_retention_policy_settings(check_strategy)
# Check for backup validity
server.check_backup_validity(check_strategy)
# Executes the backup manager set of checks
server.backup_manager.check(check_strategy)
# Check if the msg_list of the server
# contains messages and output eventual failures
server.check_configuration(check_strategy)
# Executes check() for every archiver, passing
# remote status information for efficiency
for archiver in server.archivers:
archiver.check(check_strategy)
# Check archiver errors
server.check_archiver_errors(check_strategy)
collector = collectors['barman_status']
for name, value in self.results.items():
key = name.replace(' ', '_').replace('-', '_').lower()
if value['hint']:
continue
collector.add_metric([server_name, key], int(value['status']))
for collector in collectors.values():
yield collector
if __name__ == '__main__':
output.set_output_writer(Output())
class Args:
server_name = ['all']
quiet = output._writer
debug = output._writer
color = 'auto'
format = debug
args = Args()
core.REGISTRY.register(BarmanCollector(args))
# Start up the server to expose the metrics.
prometheus_client.start_http_server(8000)
# Generate some requests.
while True:
time.sleep(1)
@ThomasPoty

This comment has been minimized.

Copy link

commented Jun 24, 2019

Hello,

Thanks for the barman_exporter, very usefull !

Also I would like to inform you after several hours of running, the exporter does not output metrics anymore.
It seems to be caused by opening the log file set in barman configuration without never close it until reaching ulimit (1024 for us) of the system (centos).

@WoLpH

This comment has been minimized.

Copy link
Owner Author

commented Jun 24, 2019

I noticed similar issues but I didn't have time to find (and fix) the issue so I took the lazy solution and added this to my crontab: 1 * * * * root /usr/local/bin/supervisorctl restart barman_exporter

When writing this tool it became obvious to me that Barman wasn't written with any type of reuse in mind. Many of the internal Barman methods return formatted strings instead of actual values making it somewhat difficult to parse for a tool like this. So the opening (and not automatically closing) of the log files don't surprise me much.

I think the best solution might be to simply execute the fetching in a separate thread or process.

@ahes

This comment has been minimized.

Copy link

commented Jul 9, 2019

Hi,

Here is my barman-exporter: https://github.com/ahes/prometheus-barman-exporter
It exports similar metrics but I did some naming and convention changes to align with prometheus manual about writing exporters.

I started by writing JsonOutputWriter class and using barman's cli.py directly but after two hours I decided to let go. Instead I parse barman cli output. It is way simpler and works just fine.

Thank you for your gist. It was a great inspiration.

@WoLpH

This comment has been minimized.

Copy link
Owner Author

commented Jul 9, 2019

Yeah, I initially went the same route but that didn't work too great. It's obvious that Barman was written for a single purpose by someone that is used to writing languages other than Python :)

Your module looks quite nice, I'll probably switch to that one soon. Thanks for packaging it so nicely!

@ahes

This comment has been minimized.

Copy link

commented Jul 10, 2019

@WoLpH If you are wondering how to fix your code to not open log files indefinitely you can add:

# [...]
import logging

logging.disable(logging.CRITICAL)  

class BarmanCollector:
# [...]
    def collect(self):
        cli.global_config(self.args)
        for handler in logging.root.handlers[:]:
            logging.root.removeHandler(handler)
        # [...]
@ahes

This comment has been minimized.

Copy link

commented Jul 11, 2019

I was finally able to wrap my head around this and here is a version that uses Python API: https://github.com/ahes/prometheus-barman-exporter/blob/master/barman_exporter.py

I had to expose the server check exactly the same way as you did because server.check() uses timeout() from barman.utils which uses signals. And signals are only available in main thread which sadly will not work with prometheus register.

I almost have JsonOutputWriter() ready for Barman and I think that I will eventually run barman command from CLI and just load JSON from output.

@WoLpH

This comment has been minimized.

Copy link
Owner Author

commented Jul 11, 2019

Excellent work, that looks great already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.