Skip to content

Instantly share code, notes, and snippets.

@bahamas10
Last active September 14, 2023 01:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bahamas10/75907040512b77c20ae476dcdc002d1e to your computer and use it in GitHub Desktop.
Save bahamas10/75907040512b77c20ae476dcdc002d1e to your computer and use it in GitHub Desktop.
smartctl disk monitoring nagios check

check_smartctl_health

Run with no arguments (check all disks)

$ sudo ./check_smartctl_health
ok: all devices healthy (20 total)
$ echo $?
0

Run with a disk as an argument

$ sudo ./check_smartctl_health /dev/rdsk/c1t5000CCA03BB46E69d0s0
ok: all devices healthy (1 total)
$ echo $?
0

Run with a bad argument

$ sudo ./check_smartctl_health foo
unknown: failed to run - smartctl -a foo
$ echo $?
3
#!/usr/bin/env bash
#
# Check for disk health using smartctl
#
# Call with a list of devices as arguments to run this check against, or omit
# the first argument and this script will run against all disks found.
#
# Author: Dave Eddy <dave@daveeddy.com>
# Date: September 13, 2023
# License: MIT
devices=("$@")
if (( ${#devices[@]} == 0)); then
# no devices given, scan for them
output=$(smartctl --scan)
if (($? != 0)); then
echo 'unknown: failed to enumerate devices'
exit 3
fi
if [[ -z $output ]]; then
echo 'unknown: no devices found to check'
exit 3
fi
# store devices found in the "devices" array
while read -r device _; do
devices+=("$device")
done <<< "$output"
fi
# get health for each device found - look for bad devices
bad_devices=()
for device in "${devices[@]}"; do
# read device health
output=$(smartctl -a "$device")
if (($? != 0)) || [[ -z $output ]]; then
echo "unknown: failed to run - smartctl -a $device"
exit 3
fi
# keep track of devices that don't report "OK"
if ! grep -q '^SMART Health Status:.*OK$' <<< "$output"; then
bad_devices+=("$device")
fi
done
num_devices=${#devices[@]}
num_bad=${#bad_devices[@]}
if (( num_bad == 0 )); then
echo "ok: all devices healthy ($num_devices total)"
exit 0
else
echo "critical: $num_Bad devices unhealthy ($num_devices total)"
exit 2
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment