Juul/hard_drive_ssd_smart_test.md

## hard_drive_ssd_smart_test.md

      
    Raw
  

              hard_drive_ssd_smart_test.md
            
          
    With smartctl you can talk to the hd/ssd's built-in firmware, get detailed status info and ask it to execute tests.
First:
sudo apt install smartmontools

To do a quick health test (which can report OK even if the drive is having issues):
sudo smartctl -H /dev/sdX

Note that this doesn't always work through a USB adapter. smartctl has support for passing SMART commands through many commons USB to HD/SSD adapters but not all, and some of them you have to explicitly specify because the support is experimental, e.g:
smartctl -d sntjmicron ... # for JMicron USB to NVMe adapters
smartctl -d jmb39x,N ... # for JMicron RAID SATA port mulitpliers (where N is the drive number)

See man smartctl for specifics. Usually the USB adapter is just auto-detected though.
Now, to get a dump of a looot of info you can use:
smartctl -x /dev/sdX

Testing

Drives usually support a long and short self-test. When you issue the command to start a self-test, the command will immediately complete and the drive will begin testing itself in the background. You can use the drive as normal while the test is happening but it may slow the test. You will have to use a different command to check if the test is done and view the test results.
To start a test do:
# for a short test
smartctl -t short /dev/sdX
# or for a long test
smartctl -t long /dev/sdX

On a modern SSD, expect something like 2 mins for a short test and 10+ mins for a long test.
To check how much of the test is remaining:
smartctl -a /dev/sdX |grep "test remaining"

The output should be something like:
70% of test remaining.

If you don't see any output that means the test is done, either because it completed or failed early (or never started).
All tests ever run on the drive and their results can be viewed by running:
smartctl -l selftest /dev/sdX

You will see a table like this:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      8708         -
# 2  Vendor (0x50)       Completed without error       00%         0         -
# 3  Short offline       Completed without error       00%      8453         -
# 4  Short offline       Completed without error       00%         3         -

The top entry will be the most recent completed test. If it says something other than "Completed without error" in Status or something other than "00%" in Remaining then the drive has a problem. Probably you should get all data off it as quickly as possible and then discontinue using it.
Wiping drives

SSDs do a thing called "wear leveling" which tries to spread writes to different parts of the drive as muuch as possible. Unfortunately this means that when you write to e.g. the zeroeth byte on an SSD, the actual write could be to any arbitrary byte on the drive's firmware keeps track of which real byte is currently mapped as byte zero. Even writing to all bytes one after the other is not enough to wipe the drive since there are more real bytes than are accessible at any time. This is done so if any parts of the flash memory fail then they can simply be unmapped and it won't stop the drive from functioning until the drive runs out of "extra" bytes.
The best way to wipe a drive is to use the hdparm utility to tell the drive's firmware to wipe itself. Unfortunately hdparm, unlike smartctl doesn't have good support for USB adapters, so if your drive is connected using USB then you're probably out of luck there.
There is a nice guide for how to use hdparm to wipe your drive here: https://grok.lsu.edu/Article.aspx?articleid=16716
The other way to wipe the drive relies to something called "trim" support. Having trim support enabled for you filesystem on your SSD means that anything deleted from your filesystem actually gets properly erased on the SSD. If you ensure that this feature is enabled then you can simply delete all files and then use the fstrim command to ensure all of the unused blocks are erased. I'm honestly not sure how good this method is at wiping everything.
To enable trim support for your root filesystem edit /etc/fstab to ensure the option discard is there. If your fstab looks something like this:
# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
<YOUR_UUID>                 /              ext4    defaults   0       1

add discard like so:
# <file system>             <mount point>  <type>  <options>          <dump>  <pass>
<YOUR_UUID>                 /              ext4    defaults,discard   0       1

If you use neither LVM nor full disk encryption, rebooting should enable trim support and you can skip the next sections.
If you are using full disk encryption then you need to edit /etc/crypttab to add "discard" in a similar way. If you have a line that ends in e.g. luks,keyscript=/bin/cat then change it to luks,discard,keyscript=/bin/cat.
If you use LVM, then edit /etc/lvm/lvm.conf and in the devices { section, ensure that issue_discards is present and set to 1.
Reboot after any of these changes.
Now to properly erase all filesystem unused blocks, simply run:
sudo fstrim /