Skip to content

Instantly share code, notes, and snippets.

@tommybutler
Last active June 3, 2024 09:02
Show Gist options
  • Save tommybutler/7592005 to your computer and use it in GitHub Desktop.
Save tommybutler/7592005 to your computer and use it in GitHub Desktop.
Script to quickly scan the S.M.A.R.T. health status of all your hard drive devices in Linux (at least all the ones from /dev/sda to /dev/sdzz). You need smartctl installed on your system for this script to work, and your hard drives need to have S.M.A.R.T. capabilities (they probably do).
#!/bin/bash
# install the smartctl package first! (apt-get install smartctl)
if sudo true
then
true
else
echo 'Root privileges required'
exit 1
fi
for drive in /dev/sd[a-z] /dev/sd[a-z][a-z]
do
if [[ ! -e $drive ]]; then continue ; fi
echo -n "$drive "
smart=$(
sudo smartctl -H $drive 2>/dev/null |
grep '^SMART overall' |
awk '{ print $6 }'
)
[[ "$smart" == "" ]] && smart='unavailable'
echo "$smart"
done
@BloodBlight
Copy link

Ya, the uncorrected errors would defiantly land that disk on my "SUS" list! Even the volume fast ECC corrections would get it a "warn" from me. Sometimes uncorrected errors just happens though, and that isn't a big number. But I can totally see wanting that out of your array!

If it is a performance sensitive production environment, I would 100% yank that drive just because. Those retries can cause odd performance issues for customers that are almost impossible to pinpoint as the cause.

I do have some HGSTs that are 8 years old now and still going strong, some with almost as many fast ECC errors, but others that I have evicted just because some of those metrics were increasing at an unhealthy rate. I have the luxury of having 4 parity disks though. That plus regular scrubbing I might keep that one in my cluster unless it got worse. If I didn't have that, I would not trust it. But I am a cheapskate when it comes to my home lab!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment