A flexible storage solution on a Linux distro, using off the shelf tools: raid (mdraid), LVM and parted.
I always loved how Synology, Drobo or some other commercial vendor does NAS - with flexibility to add and swap disk, it has been a dream of mine to own one. But, with very! tight budget, I had to improvise and create my own way of RAID management - a flexible solution that would allow me to use all the disks I have had and to swap any of those disks in the future on-line, with only a slight degradation of performance.
I am an engineer by day and by night. I love this stuff.
Here are my notes, I hope I left enough breadcrumbs for anyone to follow.
- DO NOT USE ANY SLIVERS NOT RAIDED!
- For any raid with 2 members - RAID1.
- For any raid with 3 members - RAID5.
- For any raid with 4 or more members - RAID6.
8TB - WDC WD80EFAX
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
# parted /dev/sdl unit s p
Model: ATA WDC WD80EFAX-68L (scsi)
Disk /dev/sdl: 15628053168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
4TB - WDC WD40EZRX
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
# parted /dev/sdc unit s p
Model: ATA WDC WD40EZRX-00S (scsi)
Disk /dev/sdc: 7814037168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
3TB - WDC WD30EZRX
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
# parted /dev/sdj unit s p
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sdj: 5860533168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
2TB - WDC WD20EFRX
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
# parted /dev/sde unit s p
Model: ATA WDC WD20EFRX-68E (scsi)
Disk /dev/sde: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
-----------------------------------------------------------------------------------------
| LVL | SDC | SDD | SDE | SDF | SDG | SDH | SDI | SDJ | SDK | SDL |
-----------------------------------------------------------------------------------------
| RAID6 | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
| RAID6 | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
| RAID6 | 1 TB | 1 TB |-------------------------------| 1 TB | 1 TB |----------------
| RAID1 | 1 TB | 1 TB | -----------------
-------------------------
2tb - 3907029168s, so 1tb is 1953514584s
3tb - 5860533168s, so 1tb is 1953511056s (3528s less = 1.72265625MB smaller than 2TB)
4tb - 7814037168s, so 1tb is 1953509292s (1764s less = 0.861328125MB smaller than 3TB)
determined the RAID SLIVER to be 1953484800s (953850MB)
# parted -a optimal /dev/sdX
mklabel gpt
For 2, 3, 4 and 8TB disk:
mkpart one 2048s 1953486847s
set 1 raid on
mkpart two 1953486848s 3906971647s
set 2 raid on
For 3, 4 and 8TB disk:
mkpart three 3906971648s 5860456447s
set 3 raid on
For 4 and 8TB disk:
mkpart four 5860456448s 7813941247s
set 4 raid on
For 8 TB disk ONLY:
mkpart five 7813941248s 9767426047s
set 5 raid on
mkpart six 9767426048s 11720910847s
set 6 raid on
mkpart seven 11720910848s 13674395647s
set 7 raid on
mkpart eight 13674395648s 15627880447s
set 8 raid on
# mdadm --create --metadata 1.0 --verbose /dev/md1 --chunk=128 --level=6 --raid-devices=10 /dev/sd[cdefghijkl]1
# mdadm --create --metadata 1.0 --verbose /dev/md2 --chunk=128 --level=6 --raid-devices=10 /dev/sd[cdefghijkl]2
# mdadm --create --metadata 1.0 --verbose /dev/md3 --chunk=128 --level=6 --raid-devices=4 /dev/sd[cdij]3
# mdadm --create --metadata 1.0 --verbose /dev/md4 --chunk=128 --level=1 --raid-devices=2 /dev/sd[cd]4
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sdd4[1] sdc4[0]
976742208 blocks super 1.0 [2/2] [UU]
resync=DELAYED
md3 : active raid6 sdj3[3] sdi3[2] sdd3[1] sdc3[0]
1953484288 blocks super 1.0 level 6, 128k chunk, algorithm 2 [4/4] [UUUU]
resync=DELAYED
md2 : active raid6 sdl2[9] sdk2[8] sdj2[7] sdi2[6] sdh2[5] sdg2[4] sdf2[3] sde2[2] sdd2[1] sdc2[0]
7813937152 blocks super 1.0 level 6, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
resync=DELAYED
md1 : active raid6 sdl1[9] sdk1[8] sdj1[7] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
7813937152 blocks super 1.0 level 6, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
[==============>......] resync = 71.9% (702995712/976742144) finish=694.8min speed=6565K/sec
# pvcreate -M2 --dataalignment 128k /dev/md1
# pvcreate -M2 --dataalignment 128k /dev/md2
# pvcreate -M2 --dataalignment 128k /dev/md3
# pvcreate -M2 --dataalignment 128k /dev/md4
Check alignment, the start of any partitions should be divisible by 4096 (sector size):
# pvs -o +pe_start
# vgcreate -v raidgroup /dev/md1 /dev/md2 /dev/md3 /dev/md4
# vgdisplay raidgroup
# lvcreate -l +100%FREE raidgroup -n storage
# lvdisplay /dev/raidgroup/storage
- block size (file system block size, ex. 4096)
- stripe size (same as mdadm chunk size, ex. 512k)
- stride: stripe size / block size (ex. 512k / 4k = 128)
- stripe-width: stride * #-of-data-disks (ex. 4 disks RAID 5 is 3 data disks; 128*3 = 384)
Or using stripe calculator, create the FS.
# mkfs.ext4 -L storage -m 0 -b 4096 -E stride=32,stripe-width=256,lazy_journal_init=0,lazy_itable_init=0 /dev/mapper/raidgroup-storage
Add to fstab:
# cat /etc/fstab
[...]
/dev/mapper/raidgroup-storage /storage ext4 relatime 0 2
# mdadm -a /dev/mdX /dev/sdY1
# mdadm --grow --verbose /dev/mdX --level=5 --raid-disk=3
Example:
# mdadm --add /dev/md4 /dev/sdi4
mdadm: added /dev/sdi4
# mdadm --grow --verbose /dev/md4 --level=5 --raid-disk=3
mdadm: level of /dev/md4 changed to raid5
mdadm: Need to backup 128K of critical section..
Change chunk size:
# mdadm --grow --chunk=128 /dev/mdX
# lvextend -l +100%FREE /dev/VG/LV`
# resize2fs -p /dev/VG/LV`
Need to play with:
--create --metadata 1.0 --verbose /dev/md1 --chunk=128 --level=6 --raid-devices=10`
Original inspiration - nakanote blog
MDADM Cheat Sheet
Linux Raid For Admins - Hardware Monitoring for Linux
----------------------------------------------------------------------------------------------------------
| MD DEV | LVL | SPACE | SDC | SDD | SDE | SDF | SDG | SDH | SDI | SDJ | SDK | SDL |
----------------------------------------------------------------------------------------------------------
| MD1 | RAID6 | 8 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
| MD2 | RAID6 | 8 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
| MD3 | RAID6 | 8 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
| MD4 | RAID6 | 8 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
| MD5 | RAID6 | 3 TB |---------------| 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |------------------------
| MD6 | RAID6 | 3 TB | | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
| MD7 | RAID6 | 3 TB | | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
| MD8 | RAID6 | 3 TB | | 1 TB | 1 TB | 1 TB | 1 TB | 1 TB |
-----------------========= -----------------------------------------
TOTAL: 45 TB