Created
August 16, 2019 15:15
-
-
Save cocodrino/9c1a871daeb855741da8d979da66198b to your computer and use it in GitHub Desktop.
netdata disk alarms
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ----------------------------------------------------------------------------- | |
# low disk space | |
# checking the latest collected values | |
# raise an alarm if the disk is low on | |
# available disk space | |
template: disk_space_usage | |
on: disk.space | |
families: * | |
calc: $used * 100 / ($avail + $used) | |
units: % | |
every: 1m | |
warn: $this > (($status >= $WARNING ) ? (80) : (90)) | |
crit: $this > (($status == $CRITICAL) ? (90) : (98)) | |
delay: up 1m down 15m multiplier 1.5 max 1h | |
info: current disk space usage | |
to: sysadmin | |
template: disk_inode_usage | |
on: disk.inodes | |
families: * | |
calc: $used * 100 / ($avail + $used) | |
units: % | |
every: 1m | |
warn: $this > (($status >= $WARNING) ? (80) : (90)) | |
crit: $this > (($status == $CRITICAL) ? (90) : (98)) | |
delay: up 1m down 15m multiplier 1.5 max 1h | |
info: current disk inode usage | |
to: sysadmin | |
# ----------------------------------------------------------------------------- | |
# disk fill rate | |
# calculate the rate the disk fills | |
# use as base, the available space change | |
# during the last hour | |
# this is just a calculation - it has no alarm | |
# we will use it in the next template to find | |
# the hours remaining | |
template: disk_fill_rate | |
on: disk.space | |
families: * | |
lookup: min -10m at -50m unaligned of avail | |
calc: ($this - $avail) / (($now - $after) / 3600) | |
every: 1m | |
units: GB/hour | |
info: average rate the disk fills up (positive), or frees up (negative) space, for the last hour | |
# calculate the hours remaining | |
# if the disk continues to fill | |
# in this rate | |
template: out_of_disk_space_time | |
on: disk.space | |
families: * | |
calc: ($disk_fill_rate > 0) ? ($avail / $disk_fill_rate) : (inf) | |
units: hours | |
every: 10s | |
warn: $this > 0 and $this < (($status >= $WARNING) ? (48) : (8)) | |
crit: $this > 0 and $this < (($status == $CRITICAL) ? (24) : (2)) | |
delay: down 15m multiplier 1.2 max 1h | |
info: estimated time the disk will run out of space, if the system continues to add data with the rate of the last hour | |
to: sysadmin | |
# ----------------------------------------------------------------------------- | |
# disk congestion | |
# raise an alarm if the disk is congested | |
# by calculating the average disk utilization | |
# for the last 10 minutes | |
template: 10min_disk_utilization | |
on: disk.util | |
families: * | |
lookup: average -10m unaligned | |
units: % | |
every: 1m | |
green: 90 | |
red: 98 | |
warn: $this > $green * (($status >= $WARNING) ? (0.7) : (1)) | |
crit: $this > $red * (($status == $CRITICAL) ? (0.7) : (1)) | |
delay: down 15m multiplier 1.2 max 1h | |
info: the percentage of time the disk was busy, during the last 10 minutes | |
to: sysadmin | |
# raise an alarm if the disk backlog | |
# is above 1000ms (1s) per second | |
# for 10 minutes | |
# (i.e. the disk cannot catch up) | |
template: 10min_disk_backlog | |
on: disk.backlog | |
families: * | |
lookup: average -10m unaligned | |
units: ms | |
every: 1m | |
green: 2000 | |
red: 5000 | |
warn: $this > $green * (($status >= $WARNING) ? (0.7) : (1)) | |
crit: $this > $red * (($status == $CRITICAL) ? (0.7) : (1)) | |
delay: down 15m multiplier 1.2 max 1h | |
info: average of the kernel estimated disk backlog, for the last 10 minutes | |
to: sysadmin |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment