fbartho/0. Synology RAID Expansion-Resync Performance.md

## 0. Synology RAID Expansion-Resync Performance.md

      
    Raw
  

              0. Synology RAID Expansion-Resync Performance.md
            
          
    Performance on Synology RAIDs

(especially while expanding)

Warning: The exact commands may not match for your particular linux OS / Synology(NAS) device. I had to customize the commands after exploring my particular system's setup.

If you're new to linux, or this is a new piece of hardware / a new synology device, jump down to the section called "Inspecting a setup"
Contents


Links
Finding the Problem
Fixing the Problem (Temporarily)
Improving Performance
Inspecting a setup -- do this first if you're new to the hardware you're working on!

Links


https://lucatnt.com/2013/06/improve-software-raid-speed-on-linux/
https://www.simplicate.info/2014/05/07/speeding-up-synology-volume-expansion/
Gist by @stevenharman
https://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html

Figuring out I had a problem:

I wanted to expand my Synology SHR raid by adding 1 new drive. I followed their instructions; but 2 days after installing the device, I noticed in the GUI that I was at less than 20% finished checking parity.
It also looked like my services were running slower than normal / possible.
$ cat /proc/mdstat

The output (snipped to relevant line):
      [<ascii art progress bar>]  reshape = <current percentage>% (<raw blocks/raw total blocks>) finish=<time-in-minutes>min speed=<your speed>/sec

The finish time was on the order of 10 days! -- That's 10 days from the time I install a drive until I get to use the added capacity!
Fixing the problem (Temporarily)

aka if you want to temporarily sacrifice RAM/CPU for increased performance in the raid setup.
I had 8GB of RAM and a CPU that was clearly not doing much, so I decided to figure out how to make use of them to speed things up.
The referenced links above talked about several techniques:

Increasing stripe_cache_size at the cost of RAM
Increasing speed_limit_min -- a minimum goal target for performance -- at the cost of increased dedicated CPU
Increasing read_ahead_kb -- volume read-ahead which could increase read-speed for workloads where you're scanning most of the drive.
Enabling "Bitmap Option" via mdadm -- this improves rebuilds when you had a drive crash, or had to remove & readd a device, but the data is still present. You should not have this on normally, so make sure to disable after the rebuild is complete.
Disabling "NCQ - Native Command Queueing" -- this is a drive feature, but I believe Synology has this already disabled or it doesn't apply to my drives.

For the raid expansion, I interactively checked the values of the first 3 options, and determined that the values were comparatively low.
$ cat /sys/block/md2/md/stripe_cache_size
256
$ cat /proc/sys/dev/raid/speed_limit_min
10000
$ cat /sys/block/md2/queue/read_ahead_kb
256 #-- Note I don't remember exactly what the initial value was for my specific device, and an untimely console clear lost it 🤦‍♂️

I switched the values with the following commands:
$ echo 32768 > /sys/block/md2/md/stripe_cache_size   # This is the max value, and it takes up 32Mib to synchronize read/write operations while the array is degraded
$ echo 50000 > /proc/sys/dev/raid/speed_limit_min    # This is a hint that you want more focus on the sync-expansion task
$ echo 32768 > /sys/block/md2/queue/read_ahead_kb    # This is how far ahead of a read request the drive array will preload

Results

After the above changes:
      [=======>.............]  reshape = 39.5% (2316072704/5855691456) finish=1459.7min speed=40414K/sec

This means that I moved the completion from 8ish days remaining to 23 hours, and in actual practice, it was done in less than 16 hours! #Success!
Cleanup

After the resync was complete, I checked the settings again, and weirdly, the stripe_cache_size reverted to 4096 (not the default I saw of 256)
I reset all the values back to normal before starting to write this article.
Improving Performance Permanently

My NAS is mostly a home media & backup server. I'm not running any databases, so most of my workload is sequential streaming of relatively large files. Based on this, I decided to set the read_ahead_kb to 2048 -- based on my readings, this gives you the max benefit out of read-ahead's ability to limit unnecessary seeking.
This was done by writing a script to be called on startup, automatically:
#!/bin/bash

# Increase the read_ahead_kb to 2048 to maximise sequential large-file read/write performance.

# Put this in /usr/local/etc/rc.d/
# chown this to root
# chmod this to 755
# Must be run as root!

onStart() {
	echo "Starting $0…"
	echo 2048 > /sys/block/md2/queue/read_ahead_kb
	echo "Started $0."
}

onStop() {
	echo "Stopping $0…"
	echo 1024 > /sys/block/md2/queue/read_ahead_kb
	echo "Stopped $0."
}

case $1 in
	start) onStart ;;
	stop) onEnd ;;
	*) echo "Usage: $0 [start|stop]" ;;
esac


Inspecting a setup


Some of these commands work with sudo, but some require being logged in as root
$ sudo su - # to log in as root@<yourhost>


Look at mdstat to learn about your raids
$ cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md2 : active raid5 sdc5[2] sda5[0] sdb5[1]
      11711382912 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md1 : active raid1 sdc2[2] sda2[0] sdb2[1]
      2097088 blocks [5/3] [UUU__]

md0 : active raid1 sdc1[2] sda1[0] sdb1[1]
      2490176 blocks [5/3] [UUU__]

unused devices: <none>
Interpretation: This means that there are 3 raids, 1 big one, and two ~2GB raids
This means that our real raid is probably /dev/md2


Use mdadm to give you details:
$ mdadm --detail /dev/md0
$ mdadm --detail /dev/md1
$ mdadm --detail /dev/md2


Look at the results, and learn about your system!
On my Synology, md0 and md1 were raid1 (mirroring) devices configured across all my drives, and 2gb in size.
I didn't see any specific documentation, but I assume this is used by the Synology OS/GUI
/dev/md2 was a raid device with 3 drives with relevant status lines of:
     Raid Level : raid5
     Array Size : 11711382912 (11168.85 GiB 11992.46 GB)
  Used Dev Size : 5855691456 (5584.42 GiB 5996.23 GB)
Results: now I know that /dev/md2 is the relevant device!


Contents

  
## 1. prepare_expansion.sh
sudo su -

cat /sys/block/md2/md/stripe_cache_size
cat /proc/sys/dev/raid/speed_limit_min
cat /sys/block/md2/queue/read_ahead_kb

cat /proc/mdstat

## 2. accelerate_expansion.sh
echo 32768 > /sys/block/md2/md/stripe_cache_size
echo 50000 > /proc/sys/dev/raid/speed_limit_min
echo 32768 > /sys/block/md2/queue/read_ahead_kb

## 3. cleanup_accelerate.sh
echo 4096 > /sys/block/md2/md/stripe_cache_size
echo 10000 > /proc/sys/dev/raid/speed_limit_min
echo 512 > /sys/block/md2/queue/read_ahead_kb
	sudo su -

	cat /sys/block/md2/md/stripe_cache_size
	cat /proc/sys/dev/raid/speed_limit_min
	cat /sys/block/md2/queue/read_ahead_kb

	cat /proc/mdstat
	echo 32768 > /sys/block/md2/md/stripe_cache_size
	echo 50000 > /proc/sys/dev/raid/speed_limit_min
	echo 32768 > /sys/block/md2/queue/read_ahead_kb
	echo 4096 > /sys/block/md2/md/stripe_cache_size
	echo 10000 > /proc/sys/dev/raid/speed_limit_min
	echo 512 > /sys/block/md2/queue/read_ahead_kb