With smartctl
you can talk to the hd/ssd's built-in firmware, get detailed status info and ask it to execute tests.
First:
sudo apt install smartmontools
To do a quick health test (which can report OK even if the drive is having issues):
sudo smartctl -H /dev/sdX
Note that this doesn't always work through a USB adapter. smartctl
has support for passing SMART commands through many commons USB to HD/SSD adapters but not all, and some of them you have to explicitly specify because the support is experimental, e.g:
smartctl -d sntjmicron ... # for JMicron USB to NVMe adapters
smartctl -d jmb39x,N ... # for JMicron RAID SATA port mulitpliers (where N is the drive number)
See man smartctl
for specifics. Usually the USB adapter is just auto-detected though.
Now, to get a dump of a looot of info you can use:
smartctl -x /dev/sdX
Drives usually support a long and short self-test. When you issue the command to start a self-test, the command will immediately complete and the drive will begin testing itself in the background. You can use the drive as normal while the test is happening but it may slow the test. You will have to use a different command to check if the test is done and view the test results.
To start a test do:
# for a short test
smartctl -t short /dev/sdX
# or for a long test
smartctl -t long /dev/sdX
On a modern SSD, expect something like 2 mins for a short test and 10+ mins for a long test.
To check how much of the test is remaining:
smartctl -a /dev/sdX |grep "test remaining"
The output should be something like:
70% of test remaining.
If you don't see any output that means the test is done, either because it completed or failed early (or never started).
All tests ever run on the drive and their results can be viewed by running:
smartctl -l selftest /dev/sdX
You will see a table like this:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 8708 -
# 2 Vendor (0x50) Completed without error 00% 0 -
# 3 Short offline Completed without error 00% 8453 -
# 4 Short offline Completed without error 00% 3 -
The top entry will be the most recent completed test. If it says something other than "Completed without error" in Status
or something other than "00%" in Remaining
then the drive has a problem. Probably you should get all data off it as quickly as possible and then discontinue using it.
SSDs do a thing called "wear leveling" which tries to spread writes to different parts of the drive as muuch as possible. Unfortunately this means that when you write to e.g. the zeroeth byte on an SSD, the actual write could be to any arbitrary byte on the drive's firmware keeps track of which real byte is currently mapped as byte zero. Even writing to all bytes one after the other is not enough to wipe the drive since there are more real bytes than are accessible at any time. This is done so if any parts of the flash memory fail then they can simply be unmapped and it won't stop the drive from functioning until the drive runs out of "extra" bytes.
The best way to wipe a drive is to use the hdparm
utility to tell the drive's firmware to wipe itself. Unfortunately hdparm
, unlike smartctl
doesn't have good support for USB adapters, so if your drive is connected using USB then you're probably out of luck there.
There is a nice guide for how to use hdparm
to wipe your drive here: https://grok.lsu.edu/Article.aspx?articleid=16716
The other way to wipe the drive relies to something called "trim" support. Having trim support enabled for you filesystem on your SSD means that anything deleted from your filesystem actually gets properly erased on the SSD. If you ensure that this feature is enabled then you can simply delete all files and then use the fstrim
command to ensure all of the unused blocks are erased. I'm honestly not sure how good this method is at wiping everything.
To enable trim support for your root filesystem edit /etc/fstab
to ensure the option discard
is there. If your fstab looks something like this:
# <file system> <mount point> <type> <options> <dump> <pass>
<YOUR_UUID> / ext4 defaults 0 1
add discard
like so:
# <file system> <mount point> <type> <options> <dump> <pass>
<YOUR_UUID> / ext4 defaults,discard 0 1
If you use neither LVM nor full disk encryption, rebooting should enable trim support and you can skip the next sections.
If you are using full disk encryption then you need to edit /etc/crypttab
to add "discard" in a similar way. If you have a line that ends in e.g. luks,keyscript=/bin/cat
then change it to luks,discard,keyscript=/bin/cat
.
If you use LVM, then edit /etc/lvm/lvm.conf
and in the devices {
section, ensure that issue_discards
is present and set to 1.
Reboot after any of these changes.
Now to properly erase all filesystem unused blocks, simply run:
sudo fstrim /