Skip to content

Instantly share code, notes, and snippets.

@papamoose
Created January 6, 2022 20:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save papamoose/741ef99b54211cb646c656afef850f5d to your computer and use it in GitHub Desktop.
Save papamoose/741ef99b54211cb646c656afef850f5d to your computer and use it in GitHub Desktop.
ZFS Health Check & Discord Notification Bash Script

Summary

This is a script I made to check the health of the ZFS pools on my Ubuntu server and send a notification with a summary to a Discord server channel (see image of example notification below) I have made for my servers. I borrowed and modified some parts for the actual ZFS health check from this Gist. The script checks ZFS pools overall condition, capacity, errors and time since last scrub. If an issue is detected with a pool a role on the Discord channel is pinged.

This script is only tested on Ubuntu Server 20.04.

Instructions

Copy the two bash files to a Linux server with ZFS pools and modify as required based on distro/version. Fill inn the Discord variables in the discord-variables.sh file.

You can get the ID (numbers) and token (alphanumerical) and by going into the settings for the Discord channel you want the notifications to be sent to and create a new webhook under Integration -> Webhooks. Create a webhook and click the 'Copy Webhook URL'.

The user is just a string with any username you want to appear on the notification. The avatar URL and the message icon are URLs to the image files you want to use in the notification. I use this icon for the icon that shows besides the pool name.

The role id is the Discord ID for a role you create on your Discord channel which you want to get pinged when any issues are detected. You can get the role ID by typing \@rolename in a channel on your Discord server.

Create a cron job to run the script at an any intervall you desire.

zfs-health-check-discord-notification-example

#!/usr/bin/bash
DISCORD_TOKEN=
DISCORD_ID=
DISCORD_USER=
DISCORD_AVATAR_URL=
DISCORD_MESSAGE_ICON=
DISCORD_ROLE_ID=
#!/usr/bin/bash
#### DISCORD VARIABLES ####
# Discord webhook address base.
discordWebhookBase="https://discord.com/api/webhooks"
# Import secret variables.
source discord-variables.sh
# Discord secret variables.
discordToken=${DISCORD_TOKEN}
discordID=${DISCORD_ID}
discordUsername=${DISCORD_USER}
discordAvatarURL=${DISCORD_AVATAR_URL}
discordRoleID=${DISCORD_ROLE_ID}
# Icon added to the message.
# Downloaded and hosting this (https://materialdesignicons.com/icon/heart-pulse)
discordMessageIcon=${DISCORD_MESSAGE_ICON}
#### INITIALIZATION & PARAMETERS ####
# Get date.
currentDate=$(date +%s)
# Declare the array to store ZFS pools.
declare -a pools
# Initialize pool number.
poolNumber=0
# Initalize JSON report string.
poolDiscordJsonReport=""
# Define keyword signifying unhealthy condtion.
unhealthyConditions='(DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover)'
# Define max capacity (%).
maxCapacity=85
# Define scrup expiration time (days).
scrubExpirationDays=36
#### ZFS POOLS HEALT CHECK ####
# Populate the array with system ZFS pools.
while read line;
do
pools+=($line)
done <<< "$(/sbin/zpool list | awk -F" " '{print $1}' | grep -v NAME)"
# Get total number of ZFS pools.
totalPools="${#pools[@]}"
# Check each pool.
for pool in "${pools[@]}"
do
# Initial issues flag.
poolIssues=0
# Current pool name.
poolName=${pool}
# Increment pool number.
poolNumber=$((poolNumber+1))
#### GENERAL CONDITION #####
# Set initial condition text.
poolConditionSubText="No unhealthy states found from pool status."
# Search for any term signifying an uhealthy condition.
poolCondition=$(/sbin/zpool status ${pool} | egrep -i ${unhealthyConditions})
# Set flag on any hits.
if [ "${poolCondition}" ]; then
poolIssues=1
poolConditionSubText="There might be an issue with the pool health. Run "\`"zpool status ${pool}"\`" for details."
fi
# Get the overall pool state and set notfication text.
poolOverallCondition=$(/sbin/zpool status ${pool} | grep state | awk '{print $2}')
poolConditionText="${poolConditionSubText} Overall state is reporting as **${poolOverallCondition}** for this pool."
#### CAPACITY #####
# Set initial condition text.
poolCapacitySubText="The pool spare capacity is within limits."
# Get current capacity used.
poolCapacity=$(/sbin/zpool list ${pool} -H -o capacity | cut -d'%' -f1)
# Set flag if over max capacity.
if [ $poolCapacity -ge $maxCapacity ]; then
poolIssues=1
poolCapacitySubText="The pool spare capacity is **low**. Run "\`"zpool list ${pool}"\`" for details."
fi
# Set notfication text.
poolCapacityText="${poolCapacitySubText} The used capacity at current is **${poolCapacity}%** for this pool."
#### ERRORS #####
# Set initial condition text.
poolErrorsSubText="No errors found in read, write or checksum fields."
# Set no errors text.
poolErrorStatus="No known data errors"
# Search for error codes or text.
poolErrors=$(/sbin/zpool status ${pool} | grep ONLINE | grep -v state | awk '{print $3 $4 $5}' | grep -v 000)
poolErrorReport=$(/sbin/zpool status -v ${pool} | grep errors: | awk '{$1=""; print $0 }' | grep -v "${poolErrorStatus}")
# Set flag if any errors found.
if [ "${poolErrors}" ] || [ "${poolErrorReport}" ]; then
poolIssues=1
poolErrorsSubText="Errors were found in the read, write or checksum fields. Run "\`"zpool list -v ${pool}"\`" for details."
poolErrorStatus=${poolErrorReport}
fi
# Set notfication text.
poolErrorsText="${poolErrorsSubText} The pool status reports **${poolErrorStatus}**."
#### SCRUB #####
# Set initial condition text.
poolScrupSubText="The pool scrub date has not yet been exeeded."
# Get any special conditions.
if [ $(/sbin/zpool status ${pool} | egrep -c "none requested") -ge 1 ]; then
poolScrupWarningText="No status to report. A "\`"zpool scrub ${pool}"\`" command must be run before this script can monitor the scrub expiration time."
fi
if [ $(/sbin/zpool status ${pool} | egrep -c "scrub in progress|resilver") -ge 1 ]; then
poolScrupWarningText="A pool scrub or resilver is currently in progress."
fi
# Check if any special condition.
if [ "${poolScrupWarningText}" ]; then
# Set notfication text.
poolScrubText=${poolScrupWarningText}
else
# Get the last scrub date.
poolScrubRawDate=$(/sbin/zpool status ${pool} | grep scrub | awk '{print $11" "$12" " $13" " $14" "$15}')
poolScrubDate=$(date -d "$poolScrubRawDate" +%s)
# Convert expiration to seconds.
scrubExpirationSeconds=$(expr 26 \* 60 \* 60 \* ${scrubExpirationDays})
# Check if next scrub date has expired.
if [ $(($currentDate - $poolScrubDate)) -ge $scrubExpirationSeconds ]; then
poolIssues=1
poolScrupSubText="The pool scrub date is overdue. Check the scrub cron jobs in */etc/cron.d/zfsutils-linux* or run "\`"zpool scrub ${pool}"\`" to initialte a manual scrub."
fi
# Set notfication text.
poolScrubText="${poolScrupSubText} Last scrub was performed on **${poolScrubRawDate}** for this pool."
fi
#### POOL JSON STRING ####
# Set JSON notification color.
if [ ${poolIssues} -eq 0 ]; then
discordNotficationColor=166160
discordDescription="No issues found for this pool."
else
discordNotficationColor=15606820
discordDescription="There are issues detected for this pool. Actions required by <@&${discordRoleID}>"
fi
# Build pool JSON string for Discord notification.
poolDiscordJsonReport=''${poolDiscordJsonReport}'
{
"title": "**Health Summary**",
"author": {
"name": "Pool #'${poolNumber}' ('${poolName}')",
"icon_url": "'"${discordMessageIcon}"'"
},
"description":"'"${discordDescription}"'",
"color":"'"${discordNotficationColor}"'",
"fields": [
{
"name": "Condition",
"value":"'"${poolConditionText}"'"
},
{
"name": "Capacity",
"value": "'"${poolCapacityText}"'"
},
{
"name": "Errors",
"value": "'"${poolErrorsText}"'"
},
{
"name": "Scrub",
"value": "'"${poolScrubText}"'"
}]
},'
done
#### DISCORD NOTIFICATION ####
# Complete the Discord JSON string.
discordJson='{ "username":"'"${discordUsername}"'",
"content":"Report from ZFS health check script on **'$HOSTNAME'** server. The script checked **'${totalPools}'** pools.",
"avatar_url":"'"${discordAvatarURL}"'",
"allowed_mentions": {
"roles": [ "'"${discordRoleID}"'" ]
},
"embeds": [ '${poolDiscordJsonReport%?}' ]
}'
# Send Discord notification.
curl -H "Content-Type: application/json" -d "$discordJson" ${discordWebhookBase}"/"${discordID}"/"${discordToken}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment