lobeck/README.md

## README.md

      
    Raw
  

              README.md
            
          
    I've recently switched my NAS drives to WD RED 6TB (WD60EFRX). Already during rebuild I've noticed a very bad performance of ~37 MB/s.
My first stop was the usual /proc/sys/dev/raid/speed_limit_max, however this turned out to be 200 MB/s, which is fine. I've also checked various other settings without any result.
Closer inspection of iostat showed 100% disk utilization which was weird, but also explained why none of the other settings had any performance effect.
A blog post recommended disabling NCQ which seemed very odd to me, but desperate times, desperate measures. In order to eventually revert this, i've noticed that the value for a WD RED 3TB (WD30EFRX) is 31, however the 6 TB drives has a value of 1.
This seemed like a bingo! When increasing the 6 TB drive to 31 too, throughput immediately jumped to ~170 MB/s which is the value i was looking for. Once the rebuild went through I wanted to verify this, especially since Synology recommends a reboot after adding certain drives which I haven't done so far. So I've plugged in the second drive and rebooted the NAS.
Upon reboot the value for both drives was 1 again.
After a bit of reading I wasn't sure if this is a bug in the drive firmware, especially as there were some reports of early 6 TB drives with a incorrect NCQ config.
But this didn't seem to be the case, as the kernel correctly detected the queue size:
ata2.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
So something else must've disabled the queue. Digging in the kernel source didn't show anything obvious, as the value that's printed on the console is basically set.
However grepping the logs revealed the cause:
scemd: disk/disk_config_single.c:234 apply /usr/syno/bin/DiskApmSet.sh 255  /dev/sdb 1>/dev/null 2>&1
scemd: disk/disk_config_single.c:234 apply /usr/syno/bin/syno_disk_ctl --ncq-on  /dev/sdb 1>/dev/null 2>&1
scemd: disk/disk_config_single.c:234 apply /usr/syno/bin/syno_disk_ctl --ncq-off  /dev/sdb 1>/dev/null 2>&1
scemd: disk/disk_config_single.c:234 apply /usr/syno/bin/DiskApmSet.sh 255  /dev/sda 1>/dev/null 2>&1
scemd: disk/disk_config_single.c:234 apply /usr/syno/bin/syno_disk_ctl --ncq-on  /dev/sda 1>/dev/null 2>&1
scemd: disk/disk_config_single.c:234 apply /usr/syno/bin/syno_disk_ctl --ncq-off  /dev/sda 1>/dev/null 2>&1

syno_disk_ctl is unfortunately a binary but running strings on it showed that it's just messing with /sys/block/%s/device/queue_depth. I couldn't find the exact trigger but it seemed to be triggered by libhwcontrol.so which is unfortunately another binary without available source code.
Most annoyingly, changing the queue_depth to 31 after a reboot isn't sufficient, as syno_disk_ctl gets called after every drive wake-up.
Without any simple means of patching is and in the desire to get a immediate fix, I've decided to create a wrapper to filter the nasty --ncq-off.
First, move away /usr/syno/bin/syno_disk_ctl
mv /usr/syno/bin/syno_disk_ctl /usr/syno/bin/syno_disk_ctl.orig

Then place the shell script below in in's place. Don't forget to chmod +x it.
Now it will try to call --ncq-off but it just returns without executing it and therefore does not degrade performance.
I've reported this to Synology and they've requested access to my NAS which I didn't want to provide. However they still wanted to look into it based on all the details I gave them.
Questions? https://twitter.com/spacebeck

  
## syno_disk_ctl.sh
#!/usr/bin/bash

if [[ "$@" = *ncq-off* ]];
then
  echo "filtering ncq-off"
  exit 0
fi

/usr/syno/bin/syno_disk_ctl.orig $@
	#!/usr/bin/bash

	if [[ "$@" = ncq-off ]];
	then
	echo "filtering ncq-off"
	exit 0
	fi

	/usr/syno/bin/syno_disk_ctl.orig $@