Skip to content

Instantly share code, notes, and snippets.

@yorickdowne
Last active September 14, 2023 10:42
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save yorickdowne/fd36009c19fdbee0337bffc0d5ad8284 to your computer and use it in GitHub Desktop.
Save yorickdowne/fd36009c19fdbee0337bffc0d5ad8284 to your computer and use it in GitHub Desktop.
Move Dell server from hardware RAID to software RAID

Overview

I was looking after a Dell PowerEdge R420 server with a hardware RAID card in it, a PERC8 H710P Mini. Connected to this were two WD Blue 3D NAND SATA SSDs in RAID-1, that were handling a write-intensive database: Geth, in this case.

After 11 months of running without problems, I had reason to "resync" Geth, and the server could not keep up with the required IOPS. Symptoms were Database compacting, degraded performance messages for hours, without recovery, and read/write latency in excess of 20/70ms, as measured by sudo iostat -mdx and its r_await and w_await data.

The cause of this was that the RAID controller does not support TRIM on the SSD, and the SSD's performance degraded over time because of it.

Solution

To solve this, I decided to flash the RAID controller to an HBA, "IT" mode. It would lose all RAID capability. At that point, I could use Linux software RAID, and have TRIM support.

Cross-flash the PERC

Very, very carefully, cross-flash the PERC to "IT" mode to make it an HBA. Removing the battery first is a good idea, and keep in mind that H710P Mini, for example, may exist in a non-blade server as well, and has different instructions than H710 Mini. Getting this wrong can brick the controller. Assuming that your server is set to UEFI boot, make sure to also flash the optional UEFI boot ROM to the controller.

After this, the system should still boot from your primary disk - but you lost RAID, so let's wipe and set up software RAID.

Wipe and use software RAID-1

Boot into a Linux Live ISO and wipe the Dell RAID markers from the drives, as well as all partition information. This will mean you lose all data on these SSDs.

sudo dd if=/dev/zero of=/dev/sda bs=512 count=4096 seek=$(expr `sudo blockdev --getsz /dev/sda` - 4096)
sudo dd if=/dev/zero of=/dev/sda bs=512 count=4096
sudo dd if=/dev/zero of=/dev/sdb bs=512 count=4096 seek=$(expr `sudo blockdev --getsz /dev/sdb` - 4096)
sudo dd if=/dev/zero of=/dev/sdb bs=512 count=4096

Verify with sudo wipefs /dev/sda and sudo dmraid -r that the markers are gone.

Boot into an Ubuntu Server install ISO and follow instructions to set up software RAID-1.

Lastly, once you booted into your fresh Ubuntu Server install, run sudo fstrim -av.

After this, your SSDs should be as performant as they were when new, or nearly so. In my testing, read/write latency as measured by sudo iostat -mdx during Geth sync came back down to 0.8/2.5 ms, or thereabouts.

Resources

The latest 9400/9500 series controllers do support TRIM. For Dell, that is the PERC11 (H755(N), H750, H350 and H355) line I believe.

PERC9 does not support TRIM in RAID mode (H330, H730, H730P, H730P MX, and H830 cards). It can in JBOD, but at that point you're better off cross-flashing it to a plain HBA. Ditto PERC10 (H345, H740P, H745, H745P MX, and H840 cards).

Software RAID1 in Linux has the same write performance as a single disk, and doubles the read performance for multi-file reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment