(especially while expanding)
Warning: The exact commands may not match for your particular linux OS / Synology(NAS) device. I had to customize the commands after exploring my particular system's setup.
If you're new to linux, or this is a new piece of hardware / a new synology device, jump down to the section called "Inspecting a setup"
- Links
- Finding the Problem
- Fixing the Problem (Temporarily)
- Improving Performance
- Inspecting a setup -- do this first if you're new to the hardware you're working on!
- https://lucatnt.com/2013/06/improve-software-raid-speed-on-linux/
- https://www.simplicate.info/2014/05/07/speeding-up-synology-volume-expansion/
- Gist by @stevenharman
- https://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html
I wanted to expand my Synology SHR raid by adding 1 new drive. I followed their instructions; but 2 days after installing the device, I noticed in the GUI that I was at less than 20% finished checking parity. It also looked like my services were running slower than normal / possible.
$ cat /proc/mdstat
The output (snipped to relevant line):
[<ascii art progress bar>] reshape = <current percentage>% (<raw blocks/raw total blocks>) finish=<time-in-minutes>min speed=<your speed>/sec
The finish time was on the order of 10 days! -- That's 10 days from the time I install a drive until I get to use the added capacity!
aka if you want to temporarily sacrifice RAM/CPU for increased performance in the raid setup.
I had 8GB of RAM and a CPU that was clearly not doing much, so I decided to figure out how to make use of them to speed things up.
The referenced links above talked about several techniques:
- Increasing
stripe_cache_size
at the cost of RAM - Increasing
speed_limit_min
-- a minimum goal target for performance -- at the cost of increased dedicated CPU - Increasing
read_ahead_kb
-- volume read-ahead which could increase read-speed for workloads where you're scanning most of the drive. - Enabling "Bitmap Option" via mdadm -- this improves rebuilds when you had a drive crash, or had to remove & readd a device, but the data is still present. You should not have this on normally, so make sure to disable after the rebuild is complete.
- Disabling "NCQ - Native Command Queueing" -- this is a drive feature, but I believe Synology has this already disabled or it doesn't apply to my drives.
For the raid expansion, I interactively checked the values of the first 3 options, and determined that the values were comparatively low.
$ cat /sys/block/md2/md/stripe_cache_size
256
$ cat /proc/sys/dev/raid/speed_limit_min
10000
$ cat /sys/block/md2/queue/read_ahead_kb
256 #-- Note I don't remember exactly what the initial value was for my specific device, and an untimely console clear lost it 🤦♂️
I switched the values with the following commands:
$ echo 32768 > /sys/block/md2/md/stripe_cache_size # This is the max value, and it takes up 32Mib to synchronize read/write operations while the array is degraded
$ echo 50000 > /proc/sys/dev/raid/speed_limit_min # This is a hint that you want more focus on the sync-expansion task
$ echo 32768 > /sys/block/md2/queue/read_ahead_kb # This is how far ahead of a read request the drive array will preload
After the above changes:
[=======>.............] reshape = 39.5% (2316072704/5855691456) finish=1459.7min speed=40414K/sec
This means that I moved the completion from 8ish days remaining to 23 hours, and in actual practice, it was done in less than 16 hours! #Success!
After the resync was complete, I checked the settings again, and weirdly, the stripe_cache_size
reverted to 4096 (not the default I saw of 256)
I reset all the values back to normal before starting to write this article.
My NAS is mostly a home media & backup server. I'm not running any databases, so most of my workload is sequential streaming of relatively large files. Based on this, I decided to set the read_ahead_kb
to 2048 -- based on my readings, this gives you the max benefit out of read-ahead's ability to limit unnecessary seeking.
This was done by writing a script to be called on startup, automatically:
#!/bin/bash
# Increase the read_ahead_kb to 2048 to maximise sequential large-file read/write performance.
# Put this in /usr/local/etc/rc.d/
# chown this to root
# chmod this to 755
# Must be run as root!
onStart() {
echo "Starting $0…"
echo 2048 > /sys/block/md2/queue/read_ahead_kb
echo "Started $0."
}
onStop() {
echo "Stopping $0…"
echo 1024 > /sys/block/md2/queue/read_ahead_kb
echo "Stopped $0."
}
case $1 in
start) onStart ;;
stop) onEnd ;;
*) echo "Usage: $0 [start|stop]" ;;
esac
-
Some of these commands work with sudo, but some require being logged in as root
$ sudo su - # to log in as root@<yourhost>
-
Look at mdstat to learn about your raids
$ cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdc5[2] sda5[0] sdb5[1] 11711382912 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU] md1 : active raid1 sdc2[2] sda2[0] sdb2[1] 2097088 blocks [5/3] [UUU__] md0 : active raid1 sdc1[2] sda1[0] sdb1[1] 2490176 blocks [5/3] [UUU__] unused devices: <none>
Interpretation: This means that there are 3 raids, 1 big one, and two ~2GB raids
This means that our real raid is probably /dev/md2
-
Use mdadm to give you details:
$ mdadm --detail /dev/md0 $ mdadm --detail /dev/md1 $ mdadm --detail /dev/md2
-
Look at the results, and learn about your system!
On my Synology, md0 and md1 were raid1 (mirroring) devices configured across all my drives, and 2gb in size. I didn't see any specific documentation, but I assume this is used by the Synology OS/GUI
/dev/md2 was a raid device with 3 drives with relevant status lines of:
Raid Level : raid5 Array Size : 11711382912 (11168.85 GiB 11992.46 GB) Used Dev Size : 5855691456 (5584.42 GiB 5996.23 GB)
Results: now I know that /dev/md2 is the relevant device!
TLDR: Updating
/sys/block/md2/md/sync_max
to the max value made the biggest difference in RAID reshaping speed.Full Post
My original values on 1520+ running 20GB of RAM and DSM 7.2.2-72806
Original Speed: finish=7553.4min speed=8502K/sec
Setting those values to to the ones above except for speed_limit_min which was set through the GUI by putting it in RAID Resync Faster.
Reshaping from a SHR to SHR2 was running at around the same speed: finish=7320.2min speed=8768K/sec
Setting that value to MAX
Massive increase in speed: finish=1038.3min speed=60979K/sec