Skip to content

Instantly share code, notes, and snippets.

@okelet
Last active July 27, 2021 18:37
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save okelet/e616438c274bc997eb5f0f03efd075d0 to your computer and use it in GitHub Desktop.
Save okelet/e616438c274bc997eb5f0f03efd075d0 to your computer and use it in GitHub Desktop.
Graylog Zabbix monitoring

Graylog Zabbix monitoring

The bash script creates a user with the permissions required to monitor the journal and get metrics.

The second file is a template for Zabbix (requires Zabbix 3.4); this template collects the following data:

  • Balancer status for the node
  • Journal: current size
  • Journal: max size
  • Journal: usage (percent)
  • Journal: uncommited entries
  • Number of messages received

And triggers when:

  • AVERAGE: Balancer status is DEAD or THROTTLED
  • AVERAGE: The journal usage is more than 60% during 5 minutes
  • AVERAGE: There are more than 1000 uncommited entries in the journal during the last 5 minutes
  • AVERAGE: No data of balancer status during the last 5 minutes

You can change the severity of the triggers if you want.

You must assign a macro {$GRAYLOG_TOKEN} to each monitored Graylog host with the token generated by the script.

#!/bin/bash
# Exit if any command fails
set -e
# URL to the API
GA=http://127.0.0.1:9000/api
# Existing administrator user
MYU='admin'
MYP='xxxxxxxxx'
# New user that will be created
UU=journalmetricsmonitoring
UP='yyyyyyyyyy'
# Create the user
curl -sSf -X POST -u ${MYU}:${MYP} -H 'Content-Type: application/json' ${GA}/users?pretty=true -d '{
"username": "'${UU}'",
"password": "'${UP}'",
"full_name": "Journal and metrics monitoring user from API",
"email": "nobody@nowhere.com",
"permissions": ["users:tokenlist", "users:tokencreate", "users:tokenremove", "journal:read", "metrics:read"]
}'
# Create the token for the user and save it in a variable
TOKEN=$(curl -sSf -X POST -u ${UU}:${UP} ${GA}/users/${UU}/tokens/journalmonitoring?pretty=true | jq -r '.token')
# Test: get the journal information
curl -sSf -u ${TOKEN}:token ${GA}/system/journal?pretty=true
# Test: get the metrics for org.graylog2.shared.buffers.InputBufferImpl.incomingMessages
curl -sSf -u ${TOKEN}:token ${GA}/system/metrics/org.graylog2.shared.buffers.InputBufferImpl.incomingMessages?pretty=true
# Display the generated token
echo "Token: ${TOKEN}"
<?xml version="1.0" encoding="UTF-8"?>
<zabbix_export>
<version>3.4</version>
<date>2017-10-13T08:44:24Z</date>
<groups>
<group>
<name>Templates</name>
</group>
</groups>
<templates>
<template>
<template>Graylog</template>
<name>Graylog</name>
<description/>
<groups>
<group>
<name>Templates</name>
</group>
</groups>
<applications>
<application>
<name>Graylog server</name>
</application>
</applications>
<items>
<item>
<name>Current journal size</name>
<type>18</type>
<snmp_community/>
<snmp_oid/>
<key>graylog.journal.currentsize</key>
<delay>0</delay>
<history>15d</history>
<trends>90d</trends>
<status>0</status>
<value_type>3</value_type>
<allowed_hosts/>
<units>b</units>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<params/>
<ipmi_sensor/>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>Graylog server</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<preprocessing>
<step>
<type>12</type>
<params>$.journal_size</params>
</step>
</preprocessing>
<jmx_endpoint/>
<master_item>
<key>system.run[&quot;curl -sSfL -u {$GRAYLOG_TOKEN}:token http://{HOST.CONN}:8080/api/system/journal&quot;]</key>
</master_item>
</item>
<item>
<name>Max journal size</name>
<type>18</type>
<snmp_community/>
<snmp_oid/>
<key>graylog.journal.maxsize</key>
<delay>0</delay>
<history>15d</history>
<trends>90d</trends>
<status>0</status>
<value_type>3</value_type>
<allowed_hosts/>
<units>b</units>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<params/>
<ipmi_sensor/>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>Graylog server</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<preprocessing>
<step>
<type>12</type>
<params>$.journal_size_limit</params>
</step>
</preprocessing>
<jmx_endpoint/>
<master_item>
<key>system.run[&quot;curl -sSfL -u {$GRAYLOG_TOKEN}:token http://{HOST.CONN}:8080/api/system/journal&quot;]</key>
</master_item>
</item>
<item>
<name>Uncommited journal entries</name>
<type>18</type>
<snmp_community/>
<snmp_oid/>
<key>graylog.journal.uncommitedentries</key>
<delay>0</delay>
<history>15d</history>
<trends>90d</trends>
<status>0</status>
<value_type>3</value_type>
<allowed_hosts/>
<units/>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<params/>
<ipmi_sensor/>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>Graylog server</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<preprocessing>
<step>
<type>12</type>
<params>$.uncommitted_journal_entries</params>
</step>
</preprocessing>
<jmx_endpoint/>
<master_item>
<key>system.run[&quot;curl -sSfL -u {$GRAYLOG_TOKEN}:token http://{HOST.CONN}:8080/api/system/journal&quot;]</key>
</master_item>
</item>
<item>
<name>Journal usage</name>
<type>15</type>
<snmp_community/>
<snmp_oid/>
<key>graylog.journal.usage</key>
<delay>60s</delay>
<history>15d</history>
<trends>90d</trends>
<status>0</status>
<value_type>0</value_type>
<allowed_hosts/>
<units>%</units>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<params>100*last(&quot;graylog.journal.currentsize&quot;)/last(&quot;graylog.journal.maxsize&quot;)</params>
<ipmi_sensor/>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>Graylog server</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<preprocessing/>
<jmx_endpoint/>
<master_item/>
</item>
<item>
<name>Journal status</name>
<type>7</type>
<snmp_community/>
<snmp_oid/>
<key>system.run[&quot;curl -sSfL -u {$GRAYLOG_TOKEN}:token http://{HOST.CONN}:8080/api/system/journal&quot;]</key>
<delay>60s</delay>
<history>1d</history>
<trends>0</trends>
<status>0</status>
<value_type>4</value_type>
<allowed_hosts/>
<units/>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<params/>
<ipmi_sensor/>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>Graylog server</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<preprocessing/>
<jmx_endpoint/>
<master_item/>
</item>
<item>
<name>Received messages per second</name>
<type>7</type>
<snmp_community/>
<snmp_oid/>
<key>system.run[&quot;curl -sSfL -u {$GRAYLOG_TOKEN}:token http://{HOST.CONN}:8080/api/system/metrics/org.graylog2.shared.buffers.InputBufferImpl.incomingMessages&quot;]</key>
<delay>60s</delay>
<history>15d</history>
<trends>90d</trends>
<status>0</status>
<value_type>0</value_type>
<allowed_hosts/>
<units/>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<params/>
<ipmi_sensor/>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>Graylog server</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<preprocessing>
<step>
<type>12</type>
<params>$.count</params>
</step>
<step>
<type>10</type>
<params/>
</step>
</preprocessing>
<jmx_endpoint/>
<master_item/>
</item>
<item>
<name>Balancer status</name>
<type>7</type>
<snmp_community/>
<snmp_oid/>
<key>system.run[&quot;curl -sSfL http://{HOST.CONN}:8080/api/system/lbstatus&quot;]</key>
<delay>30s</delay>
<history>15d</history>
<trends>0</trends>
<status>0</status>
<value_type>4</value_type>
<allowed_hosts/>
<units/>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<params/>
<ipmi_sensor/>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>Graylog server</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<preprocessing/>
<jmx_endpoint/>
<master_item/>
</item>
</items>
<discovery_rules/>
<httptests/>
<macros/>
<templates/>
<screens/>
</template>
</templates>
<triggers>
<trigger>
<expression>{Graylog:system.run[&quot;curl -sSfL http://{HOST.CONN}:8080/api/system/lbstatus&quot;].str(&quot;DEAD&quot;)}&lt;&gt;0</expression>
<recovery_mode>0</recovery_mode>
<recovery_expression/>
<name>Graylog status is DEAD</name>
<correlation_mode>0</correlation_mode>
<correlation_tag/>
<url/>
<status>0</status>
<priority>3</priority>
<description/>
<type>0</type>
<manual_close>1</manual_close>
<dependencies/>
<tags/>
</trigger>
<trigger>
<expression>{Graylog:system.run[&quot;curl -sSfL http://{HOST.CONN}:8080/api/system/lbstatus&quot;].str(&quot;THROTTLED&quot;)}&lt;&gt;0</expression>
<recovery_mode>0</recovery_mode>
<recovery_expression/>
<name>Graylog status is THROTTLED</name>
<correlation_mode>0</correlation_mode>
<correlation_tag/>
<url/>
<status>0</status>
<priority>3</priority>
<description/>
<type>0</type>
<manual_close>1</manual_close>
<dependencies/>
<tags/>
</trigger>
<trigger>
<expression>{Graylog:graylog.journal.usage.min(5m)}&gt;60</expression>
<recovery_mode>0</recovery_mode>
<recovery_expression/>
<name>High Graylog journal usage ({ITEM.LASTVALUE})</name>
<correlation_mode>0</correlation_mode>
<correlation_tag/>
<url/>
<status>0</status>
<priority>3</priority>
<description/>
<type>0</type>
<manual_close>1</manual_close>
<dependencies/>
<tags/>
</trigger>
<trigger>
<expression>{Graylog:system.run[&quot;curl -sSfL http://{HOST.CONN}:8080/api/system/lbstatus&quot;].nodata(5m)}&lt;&gt;0</expression>
<recovery_mode>0</recovery_mode>
<recovery_expression/>
<name>No data for Graylog status</name>
<correlation_mode>0</correlation_mode>
<correlation_tag/>
<url/>
<status>0</status>
<priority>3</priority>
<description/>
<type>0</type>
<manual_close>1</manual_close>
<dependencies/>
<tags/>
</trigger>
<trigger>
<expression>{Graylog:graylog.journal.uncommitedentries.min(5m)}&gt;1000</expression>
<recovery_mode>0</recovery_mode>
<recovery_expression/>
<name>Too many uncommited entries in the Graylog journal ({ITEM.LASTVALUE})</name>
<correlation_mode>0</correlation_mode>
<correlation_tag/>
<url/>
<status>0</status>
<priority>3</priority>
<description/>
<type>0</type>
<manual_close>1</manual_close>
<dependencies/>
<tags/>
</trigger>
</triggers>
</zabbix_export>
@dio99
Copy link

dio99 commented Jun 4, 2018

is there a way of run this from shell/cron and get the output to file ? or mail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment