Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

Flexible, extensible and resilient MDADM and LVM based sotarage on Linux

Quick Intro

What

A flexible storage solution on a Linux distro, using off the shelf tools: raid (mdraid), LVM and parted.

Why

I always loved how Synology, Drobo or some other commercial vendor does NAS - with flexibility to add and swap disk, it has been a dream of mine to own one. But, with very! tight budget, I had to improvise and create my own way of RAID management - a flexible solution that would allow me to use all the disks I have had and to swap any of those disks in the future on-line, with only a slight degradation of performance.

Who

I am an engineer by day and by night. I love this stuff.

How

Here are my notes, I hope I left enough breadcrumbs for anyone to follow.

Engineering Notes

Rules of Engagement

  1. DO NOT USE ANY SLIVERS NOT RAIDED!
  2. For any raid with 2 members - RAID1.
  3. For any raid with 3 members - RAID5.
  4. For any raid with 4 or more members - RAID6.

Disks Used

8TB - WDC WD80EFAX
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical

# parted /dev/sdl unit s p
Model: ATA WDC WD80EFAX-68L (scsi)
Disk /dev/sdl: 15628053168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

4TB - WDC WD40EZRX
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical

# parted /dev/sdc unit s p
Model: ATA WDC WD40EZRX-00S (scsi)
Disk /dev/sdc: 7814037168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

3TB - WDC WD30EZRX
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical

# parted /dev/sdj unit s p
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sdj: 5860533168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

2TB - WDC WD20EFRX
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical

# parted /dev/sde unit s p
Model: ATA WDC WD20EFRX-68E (scsi)
Disk /dev/sde: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Layout

-----------------------------------------------------------------------------------------
|  LVL  |  SDC  |  SDD  |  SDE  |  SDF  |  SDG  |  SDH  |  SDI  |  SDJ  |  SDK  |  SDL  |
-----------------------------------------------------------------------------------------
| RAID6 | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
| RAID6 | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
| RAID6 | 1 TB  | 1 TB  |-------------------------------| 1 TB  | 1 TB  |----------------
| RAID1 | 1 TB  | 1 TB  |                               -----------------
-------------------------

Calculations

2tb - 3907029168s, so 1tb is 1953514584s
3tb - 5860533168s, so 1tb is 1953511056s (3528s less = 1.72265625MB smaller than 2TB)
4tb - 7814037168s, so 1tb is 1953509292s (1764s less = 0.861328125MB smaller than 3TB)
determined the RAID SLIVER to be 1953484800s (953850MB)

Partitioning

# parted -a optimal /dev/sdX
 mklabel gpt
For 2, 3, 4 and 8TB disk:
 mkpart one 2048s 1953486847s
 set 1 raid on
 mkpart two 1953486848s 3906971647s
 set 2 raid on
For 3, 4 and 8TB disk:
 mkpart three 3906971648s 5860456447s
 set 3 raid on
For 4 and 8TB disk:
 mkpart four 5860456448s 7813941247s
 set 4 raid on
For 8 TB disk ONLY:
 mkpart five 7813941248s 9767426047s
 set 5 raid on
 mkpart six 9767426048s 11720910847s
 set 6 raid on
 mkpart seven 11720910848s 13674395647s
 set 7 raid on
 mkpart eight 13674395648s 15627880447s
 set 8 raid on

Raid creation

# mdadm --create --metadata 1.0 --verbose /dev/md1 --chunk=128 --level=6 --raid-devices=10 /dev/sd[cdefghijkl]1
# mdadm --create --metadata 1.0 --verbose /dev/md2 --chunk=128 --level=6 --raid-devices=10 /dev/sd[cdefghijkl]2
# mdadm --create --metadata 1.0 --verbose /dev/md3 --chunk=128 --level=6 --raid-devices=4 /dev/sd[cdij]3
# mdadm --create --metadata 1.0 --verbose /dev/md4 --chunk=128 --level=1 --raid-devices=2 /dev/sd[cd]4
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sdd4[1] sdc4[0]
      976742208 blocks super 1.0 [2/2] [UU]
        resync=DELAYED

md3 : active raid6 sdj3[3] sdi3[2] sdd3[1] sdc3[0]
      1953484288 blocks super 1.0 level 6, 128k chunk, algorithm 2 [4/4] [UUUU]
        resync=DELAYED

md2 : active raid6 sdl2[9] sdk2[8] sdj2[7] sdi2[6] sdh2[5] sdg2[4] sdf2[3] sde2[2] sdd2[1] sdc2[0]
      7813937152 blocks super 1.0 level 6, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
        resync=DELAYED

md1 : active raid6 sdl1[9] sdk1[8] sdj1[7] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
      7813937152 blocks super 1.0 level 6, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
      [==============>......]  resync = 71.9% (702995712/976742144) finish=694.8min speed=6565K/sec

Setting up LVM

Create PVs

# pvcreate -M2 --dataalignment 128k /dev/md1
# pvcreate -M2 --dataalignment 128k /dev/md2
# pvcreate -M2 --dataalignment 128k /dev/md3
# pvcreate -M2 --dataalignment 128k /dev/md4

Check alignment, the start of any partitions should be divisible by 4096 (sector size):

# pvs -o +pe_start

Create the VG

# vgcreate -v raidgroup /dev/md1 /dev/md2 /dev/md3 /dev/md4
# vgdisplay raidgroup

Create the LV

# lvcreate -l +100%FREE raidgroup -n storage
# lvdisplay /dev/raidgroup/storage

FS Creation

Rules of FS Creation

  1. block size (file system block size, ex. 4096)
  2. stripe size (same as mdadm chunk size, ex. 512k)
  3. stride: stripe size / block size (ex. 512k / 4k = 128)
  4. stripe-width: stride * #-of-data-disks (ex. 4 disks RAID 5 is 3 data disks; 128*3 = 384)

Or using stripe calculator, create the FS.

Execution

# mkfs.ext4 -L storage -m 0 -b 4096 -E stride=32,stripe-width=256,lazy_journal_init=0,lazy_itable_init=0 /dev/mapper/raidgroup-storage

Add to fstab:

# cat /etc/fstab
[...]
/dev/mapper/raidgroup-storage   /storage        ext4    relatime        0       2

Maintenance

Convert RAID1 to RAID5

# mdadm -a /dev/mdX /dev/sdY1
# mdadm --grow --verbose /dev/mdX --level=5 --raid-disk=3

Example:

# mdadm --add /dev/md4 /dev/sdi4
mdadm: added /dev/sdi4
# mdadm --grow --verbose /dev/md4 --level=5 --raid-disk=3
mdadm: level of /dev/md4 changed to raid5
mdadm: Need to backup 128K of critical section..

Misc. Raid Adjustments

Change chunk size:

# mdadm --grow --chunk=128 /dev/mdX

Resize LVM

# lvextend -l +100%FREE /dev/VG/LV`

Resize FS

# resize2fs -p /dev/VG/LV`

Playground

Need to play with:

--create --metadata 1.0 --verbose /dev/md1 --chunk=128 --level=6 --raid-devices=10`

Resources

Original inspiration - nakanote blog

MDADM Cheat Sheet
Linux Raid For Admins - Hardware Monitoring for Linux

HWRaid for Linux

Updated layout: 9/28/2018

----------------------------------------------------------------------------------------------------------
| MD DEV |  LVL  | SPACE |  SDC  |  SDD  |  SDE  |  SDF  |  SDG  |  SDH  |  SDI  |  SDJ  |  SDK  |  SDL  |
----------------------------------------------------------------------------------------------------------
|  MD1   | RAID6 |  8 TB | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
|  MD2   | RAID6 |  8 TB | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
|  MD3   | RAID6 |  8 TB | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
|  MD4   | RAID6 |  8 TB | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
|  MD5   | RAID6 |  3 TB |---------------| 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |------------------------
|  MD6   | RAID6 |  3 TB |               | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
|  MD7   | RAID6 |  3 TB |               | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
|  MD8   | RAID6 |  3 TB |               | 1 TB  | 1 TB  | 1 TB  | 1 TB  | 1 TB  |
-----------------=========               -----------------------------------------
           TOTAL:  45 TB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.