Skip to content

Instantly share code, notes, and snippets.

Created October 9, 2017 21:21
Show Gist options
  • Save anonymous/429e107611552bb0e10d35cafb712154 to your computer and use it in GitHub Desktop.
Save anonymous/429e107611552bb0e10d35cafb712154 to your computer and use it in GitHub Desktop.
############## VRAID ##############
TITLE: Pseudo-RAID5 block-over-block device driver for analysis and recovery purposes.
AUTHOR: KB <kyle@brosoft.com.au>; based on source from https://github.com/OrenKishon/stackbd
What you need to do, to get this working:
1. Get as much info about your old HP SmartArray as possible.
a. The order of the disks in the array (by serial number is always good)
b. The chunk size (HP call it Stripe size). Defaults are 64KiB, other common values
are 32, 128 and 256KiB.
2. Open vraid.c, you'll see where you need to put the disks on your current system in there (yes, they're
hard coded). Set your chunk size, readonly = 1, debug on if required.
3. make
a. Do not make install. You don't want this thing in your initramfs long-term.
4. insmod vraid.ko
a. Watch dmesg / syslog for vraid output, you should see something along the lines of:
(In this case, it's assembling 4 x 2TB disks...)
Dec 6 19:29:27 gwol kernel: [3493262.260384] vraid: Starting...
Dec 6 19:29:27 gwol kernel: [3493262.260466] vraid: [0] /dev/sdc
Dec 6 19:29:27 gwol kernel: [3493262.260471] Opening /dev/sdc
Dec 6 19:29:27 gwol kernel: [3493262.260480] vraid: [0] capacity: 3907029168
Dec 6 19:29:27 gwol kernel: [3493262.260482] vraid: [0] max_sectors = 128
Dec 6 19:29:27 gwol kernel: [3493262.260484] vraid: [1] /dev/sde
Dec 6 19:29:27 gwol kernel: [3493262.260486] Opening /dev/sde
Dec 6 19:29:27 gwol kernel: [3493262.260490] vraid: [1] capacity: 3907029168
Dec 6 19:29:27 gwol kernel: [3493262.260492] vraid: [1] max_sectors = 128
Dec 6 19:29:27 gwol kernel: [3493262.260493] vraid: [2] /dev/sdf
Dec 6 19:29:27 gwol kernel: [3493262.260496] Opening /dev/sdf
Dec 6 19:29:27 gwol kernel: [3493262.260499] vraid: [2] capacity: 3907029168
Dec 6 19:29:27 gwol kernel: [3493262.260501] vraid: [2] max_sectors = 128
Dec 6 19:29:27 gwol kernel: [3493262.260503] vraid: [3] /dev/sdg
Dec 6 19:29:27 gwol kernel: [3493262.260505] Opening /dev/sdg
Dec 6 19:29:27 gwol kernel: [3493262.260508] vraid: [3] capacity: 5860533168
Dec 6 19:29:27 gwol kernel: [3493262.260510] vraid: [3] max_sectors = 128
Dec 6 19:29:27 gwol kernel: [3493262.260511] vraid: 4 disks processed
Dec 6 19:29:27 gwol kernel: [3493262.260513] vraid: total array capacity: 11721087504 sectors
Dec 6 19:29:27 gwol kernel: [3493262.260514] vraid: Array max_sectors: 128
Dec 6 19:29:27 gwol kernel: [3493262.260709] vraid: done initializing successfully
(This is when the kernel scans for a partition table):
Dec 6 19:29:27 gwol kernel: [3493262.263709] GPT:Primary header thinks Alt. header is not at the end of the disk.
Dec 6 19:29:27 gwol kernel: [3493262.263714] GPT:11720890543 != 11721087503
Dec 6 19:29:27 gwol kernel: [3493262.263716] GPT:Alternate GPT header not at the end of the disk.
Dec 6 19:29:27 gwol kernel: [3493262.263718] GPT:11720890543 != 11721087503
Dec 6 19:29:27 gwol kernel: [3493262.263720] GPT: Use GNU Parted to correct GPT errors.
(Hey, it found one - a good sign you've got the correct disk order):
Dec 6 19:29:27 gwol kernel: [3493262.263727] vraid0: p1
Dec 6 19:29:27 gwol kernel: [3493262.264141] vraid: init done
(..and you may get a few of the following if in read-only mode or if there's a small total-sector difference):
Dec 6 19:29:27 gwol kernel: [3493262.266220] attempt to access beyond end of device
Dec 6 19:29:27 gwol kernel: [3493262.266225] sde: rw=0, want=3907029208, limit=3907029168
Dec 6 19:29:27 gwol kernel: [3493262.266228] quiet_error: 1399 callbacks suppressed
Dec 6 19:29:27 gwol kernel: [3493262.266231] Buffer I/O error on device vraid0, logical block 1465135930
Dec 6 19:29:27 gwol kernel: [3493262.266498] attempt to access beyond end of device
Dec 6 19:29:27 gwol kernel: [3493262.266501] sde: rw=0, want=3907029208, limit=3907029168
Dec 6 19:29:27 gwol kernel: [3493262.266503] Buffer I/O error on device vraid0, logical block 1465135930
Once it's running, try mount it:
mount -o defaults,ro /dev/vraid0p1 /mnt
If it fails,
a. try a different stripe width
b. try a different disk order
c. or you can change the parity delay in [translate_dp4la] from 4 to 8, 2 and 16
If the filesystem is rather intact, it should mount. If you get several thousand dmesg entries referring to group descriptor
corruption (if fs was EXT); time to try a different RAID configuration.
Of course, no warranty implies or explicit - use at your own risk.
Tested with Ubuntu 14.04 LTS x64.
Trouble compliling?
Hints
1. apt-get install build-utils linux-header-`uname -r`
2. If you're running a different flavour, perhaps it may be easier to boot an Ubuntu Live CD than get it working on the native OS.
Happy assembling,
-KB (kyle@brosoft.com.au)
EXTRA_CFLAGS += -D_GNU_SOURCE
obj-m := vraid.o
all:
make -C /usr/src/linux-headers-$(shell uname -r) SUBDIRS=$(PWD) modules
clean:
make -C /usr/src/linux-headers-$(shell uname -r) SUBDIRS=$(PWD) clean
#include "vraid.h"
MODULE_LICENSE("GPL v2");
static int major_num = 0;
module_param(major_num, int, 0);
static int LOGICAL_BLOCK_SIZE = 512;
module_param(LOGICAL_BLOCK_SIZE, int, 0);
static struct vraid_t vraid;
// This is the chunk size. People sometimes (incorrectly) call this the stripe.
// A stripe is 2 or more chunks across a set of disks. In the case of a 128KiB chunk with 4 disks,
// the stripe is 384KiB (Disk 1 + 2 + 3, the 4th disk is not counted as it is used for the parity chunk)
static unsigned int chunksize = 128;
// Set >= 1 to see the logical -> physical disk translations.
static int debug = 0;
// Set = 0 to enable system writes to the array.
// Note that no parity calculations are performed.
static int readonly = 1;
// List your disks here, parity will start on the last disk.
static char* disklist[] = {
"/dev/sdc",
"/dev/sdd",
"/dev/sde",
"/dev/sdf",
NULL};
static DECLARE_WAIT_QUEUE_HEAD(req_event);
//
// Delayed Partity = 4, Left-asymmetrical
// (Typical HP SmartArray P410/P200 RAID5 algorithm)
//
static int translate_dp4la(struct bio *bio)
{
uint64_t chunkinsectors;
uint64_t chunkindex;
uint64_t stripeindex;
uint64_t sec = bio->bi_sector;
uint64_t newsec;
unsigned datadisk;
unsigned paritydisk;
unsigned parity_delay = 4;
// ### First, find the chunk index of the request ###
// chunksize is in units of 1024 bytes,
// we need to map that to sectors (512-bytes)
// chunkinsectors = chunksize * 1024 / 512
// chunkinsectors = chunksize * (1024 / 512)
// chunkinsectors = chunksize * 2
// optimisation,
// chunkinsectors = chunksize << 1
chunkinsectors = chunksize << 1;
// next, divide the requested sector by the chunk size in sectors to give the zero-based chunk index.
chunkindex = sec / chunkinsectors;
// Now we must calculate the stripe this chunk belongs to. There are [disk count - 1] chunks to a stripe + 1 parity chunk.
// Thus, we just divide by the disk count - 1
stripeindex = chunkindex / (vraid.diskcount - 1);
// Last but not least, the parity chunk changes disk location depending on parity delay, and stripe index.
// For Delayed parity 4 / left-asymmetrical with 4 disks look like this:
/*
STRIPE# Disk1 Disk2 Disk3 Disk4
-----------------------------------------
1 D1 D2 D3 [P1]
2 D4 D5 D6 [P2]
...
16 D46 D47 D48 [P16]
17 D49 D50 [P17] D51 <--- Parity shifts left by 1 disk after 16 stripes
18 D52 D53 [P18] D54
...
32 D94 D95 [P32] D96
33 D97 [P33] D98 D99 <--- Parity shifts again
...
and so on...
*/
// Assuming all zero based indexes:
// ParityDisk# = (DiskCount - 1) - (Strip# / ParityDelay) % DiskCount
paritydisk = (vraid.diskcount - 1) - ((stripeindex / parity_delay) % vraid.diskcount);
// And we get the data disk the request is for in the context of the stripe by modulus.
// the datadisk zero-based offset ignores which disk is currently parity so in the case of a 4 disk
// array, we'll only get values between 0 and 2.
//
// data disk = (RequestSector % StripeSizeInSectors) / ChunkSizeInSectors
datadisk = (sec % (chunkinsectors * (vraid.diskcount - 1))) / chunkinsectors;
// Now we calculate the new sector of the physical disk. Each disk lends a chunk in every stripe, data is distributed symmetrically down the array.
// If we work in reverse, we can determine the array's sector by:
// LogicalSector = (StripeIndex x (DiskCount-1)) + (DiskIndex x ChunkSize) + PhysicalSector
// Working backwards from that:
newsec = sec;
newsec /= chunkinsectors * (vraid.diskcount - 1);
newsec *= chunkinsectors;
newsec += sec % chunkinsectors;
// Dump info
if (debug)
{
printk(KERN_INFO "Tranlation: %llu: chunkindex = %llu, stripeindex = %llu, pdisk = %u, ddisk = %u, newsec = %llu, size = %u, numdisks = %u, chunkinsecs = %llu\n",
sec, chunkindex, stripeindex, paritydisk, datadisk, newsec, bio->bi_size, vraid.diskcount, chunkinsectors);
}
// This is where we make up for the parity disk's position. Since datadisk only returns the index of the disk which
// data occurs, if the parity disk's index is equal or less than, we need to increment the data disk to skip it.
// If partity disk <= datadisk, datadisk++
if (paritydisk <= datadisk) datadisk++;
// Hand over the new disk and sector.
bio->bi_sector = newsec;
bio->bi_bdev = vraid.disks[datadisk].bdev_raw;
// Done
return 0;
}
static void vraid_io_fn(struct bio *bio)
{
if (translate_dp4la(bio)) {
printk("vraid: Something is about to go very wrong.\n");
}
// This is how the kernel maps the original bio to our new disk/sector.
trace_block_bio_remap(bdev_get_queue(bio->bi_bdev), bio, bio->bi_bdev->bd_dev, bio->bi_sector);
// ...and the kernel does the rest.
generic_make_request(bio);
}
static int vraid_threadfn(void *data)
{
struct bio *bio;
set_user_nice(current, -20);
// Basic worker loop: listen for requests, hand them off to our request handler (vraid_io_fn)
while (!kthread_should_stop())
{
/* wake_up() is after adding bio to list. No need for condition */
wait_event_interruptible(req_event, kthread_should_stop() ||
!bio_list_empty(&vraid.bio_list));
spin_lock_irq(&vraid.lock);
if (bio_list_empty(&vraid.bio_list))
{
spin_unlock_irq(&vraid.lock);
continue;
}
bio = bio_list_pop(&vraid.bio_list);
spin_unlock_irq(&vraid.lock);
vraid_io_fn(bio);
}
return 0;
}
//
// Handle an I/O request.
//
static void vraid_make_request(struct request_queue *q, struct bio *bio)
{
if (debug) {
printk("vraid: make request %-5s block %-12lu #pages %-4hu total-size "
"%-10u\n", bio_data_dir(bio) == WRITE ? "write" : "read",
bio->bi_sector, bio->bi_vcnt, bio->bi_size);
}
if ((bio_data_dir(bio) == WRITE) && readonly) {
printk("vraid: Write disabled.\n");
bio_io_error(bio);
return;
}
spin_lock_irq(&vraid.lock);
if (!vraid.is_active)
{
printk("vraid: Device not active yet, aborting\n");
goto abort;
}
bio_list_add(&vraid.bio_list, bio);
wake_up(&req_event);
spin_unlock_irq(&vraid.lock);
return;
abort:
spin_unlock_irq(&vraid.lock);
printk("<%p> Abort request\n\n", bio);
bio_io_error(bio);
}
//
// Get a /dev/xyz name and open it for read/write
//
static struct block_device *vraid_bdev_open(char dev_path[])
{
int ret;
/* Open underlying device */
struct block_device *bdev_raw = lookup_bdev(dev_path);
printk("Opening %s\n", dev_path);
if (IS_ERR(bdev_raw))
{
printk("vraid: error opening raw device <%lu>\n", PTR_ERR(bdev_raw));
return NULL;
}
if (!bdget(bdev_raw->bd_dev))
{
printk("vraid: error bdget()\n");
return NULL;
}
if ((ret = blkdev_get(bdev_raw, VRAID_BDEV_MODE, &vraid)))
{
printk("vraid: error blkdev_get(): %d\n", ret);
bdput(bdev_raw);
return NULL;
}
return bdev_raw;
}
//
// Init the array's disks, sizes, etc...
//
static int vraid_start(void)
{
int disk;
unsigned max_sectors;
sector_t totalsize = 0;
disk = 0;
// Go through the staticly defined disk list, open each one, pull specs from the device.
while (disklist[disk] != NULL) {
printk("vraid: [%d] %s\n", disk, disklist[disk]);
if (!(vraid.disks[disk].bdev_raw = vraid_bdev_open(disklist[disk]))) {
printk("vraid: [%d] failed to open device\n", disk);
return -EFAULT;
}
/* Set up our internal device */
vraid.disks[disk].capacity = get_capacity(vraid.disks[disk].bdev_raw->bd_disk);
printk("vraid: [%d] capacity: %lu\n", disk, vraid.disks[disk].capacity);
// Get the smallest disk size in the set.
if (disk == 0) {
totalsize = vraid.disks[disk].capacity;
} else {
if (vraid.disks[disk].capacity < totalsize) {
totalsize = vraid.disks[disk].capacity;
}
}
vraid.disks[disk].max_sectors = queue_max_hw_sectors(bdev_get_queue(vraid.disks[disk].bdev_raw));
printk("vraid: [%d] max_sectors = %d\n", disk, vraid.disks[disk].max_sectors);
disk++;
}
printk("vraid: %d disks processed\n", disk);
vraid.diskcount = disk;
// Minimum of 3 disks needed to make a RAID5 set
if (disk < 3) {
printk("vraid: Not enough disks to make a RAID5 set.\n");
return -EFAULT;
}
disk--;
// [totalsize] will equal the smallest disk in the set, simply multiple it out to :disk count - 1: to exclude
// parity disk. [disk] already = disk count - 1.
totalsize *= disk;
set_capacity(vraid.gd, totalsize);
vraid.capacity = totalsize;
printk("vraid: total array capacity: %lu sectors\n", totalsize);
// NFI what max_sectors means, just set it from the first disk.
max_sectors = vraid.disks[disk].max_sectors;
blk_queue_max_hw_sectors(vraid.queue, max_sectors);
printk("vraid: Array max_sectors: %u\n", max_sectors);
vraid.thread = kthread_create(vraid_threadfn, NULL, vraid.gd->disk_name);
if (IS_ERR(vraid.thread))
{
printk("vraid: error kthread_create <%lu>\n", PTR_ERR(vraid.thread));
goto error_after_bdev;
}
printk("vraid: done initializing successfully\n");
vraid.is_active = 1;
wake_up_process(vraid.thread);
return 0;
error_after_bdev:
disk = 0;
while (disklist[disk] != NULL) {
blkdev_put(vraid.disks[disk].bdev_raw, VRAID_BDEV_MODE);
bdput(vraid.disks[disk].bdev_raw);
}
return -EFAULT;
}
//
// Left over from where I stole the original code from
// https://github.com/OrenKishon/stackbd
//
static int vraid_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg)
{
char dev_path[80];
void __user *argp = (void __user *)arg;
switch (cmd)
{
case VRAID_DO_IT:
printk("\n*** DO IT!!!!!!! ***\n\n");
if (copy_from_user(dev_path, argp, sizeof(dev_path)))
return -EFAULT;
return 0;
default:
return -ENOTTY;
}
}
//
// Who the hell uses C/H/S anymore?!?
//
int vraid_getgeo(struct block_device * block_device, struct hd_geometry * geo)
{
long size;
/* We have no real geometry, of course, so make something up. */
size = vraid.capacity * (LOGICAL_BLOCK_SIZE / KERNEL_SECTOR_SIZE);
geo->cylinders = (size & ~0x3f) >> 6;
geo->heads = 4;
geo->sectors = 16;
geo->start = 0;
return 0;
}
//
// The device operations structure we hand back to the kernel.
//
static struct block_device_operations vraid_ops = {
.owner = THIS_MODULE,
.getgeo = vraid_getgeo,
.ioctl = vraid_ioctl,
};
//
// Entry point. Initialise things that need to be. Fire up our own block device.
//
static int __init vraid_init(void)
{
int disk;
printk(KERN_INFO "vraid: Starting...\n");
/* Set up our internal device */
spin_lock_init(&vraid.lock);
/* blk_alloc_queue() instead of blk_init_queue() so it won't set up the
* queue for requests.
*/
if (!(vraid.queue = blk_alloc_queue(GFP_KERNEL)))
{
printk("vraid: alloc_queue failed\n");
return -EFAULT;
}
blk_queue_make_request(vraid.queue, vraid_make_request);
blk_queue_logical_block_size(vraid.queue, LOGICAL_BLOCK_SIZE);
// Get registered
if ((major_num = register_blkdev(major_num, VRAID_NAME)) < 0)
{
printk("vraid: unable to get major number\n");
goto error_after_alloc_queue;
}
// Initialise the raid disk structs to zero; makes cleaning up easier.
disk = 0;
while (disklist[disk] != NULL) {
memset(&(vraid.disks[disk]), 0, sizeof(struct raiddisk_t));
disk++;
}
// Gendisk structure and disk init
if (!(vraid.gd = alloc_disk(16))) {
goto error_after_redister_blkdev;
} else if (vraid_start()) {
printk("vraid: failed to open disks.\n");
goto error_after_redister_blkdev;
}
vraid.gd->major = major_num;
vraid.gd->first_minor = 0;
vraid.gd->fops = &vraid_ops;
vraid.gd->private_data = &vraid;
strcpy(vraid.gd->disk_name, VRAID_NAME_0);
vraid.gd->queue = vraid.queue;
add_disk(vraid.gd);
printk("vraid: init done\n");
return 0;
error_after_redister_blkdev:
unregister_blkdev(major_num, VRAID_NAME);
error_after_alloc_queue:
blk_cleanup_queue(vraid.queue);
return -EFAULT;
}
//
// Called when rmmod: cleanup, release handles.
//
static void __exit vraid_exit(void)
{
int disk;
printk("vraid: exit\n");
if (vraid.is_active)
{
kthread_stop(vraid.thread);
disk = 0;
while (disklist[disk] != NULL) {
if (vraid.disks[disk].bdev_raw == NULL) {
disk++;
continue;
}
blkdev_put(vraid.disks[disk].bdev_raw, VRAID_BDEV_MODE);
bdput(vraid.disks[disk].bdev_raw);
disk++;
}
}
del_gendisk(vraid.gd);
put_disk(vraid.gd);
unregister_blkdev(major_num, VRAID_NAME);
blk_cleanup_queue(vraid.queue);
}
//
// Standard kmod defines.
//
module_init(vraid_init);
module_exit(vraid_exit);
#pragma once
/* IOCTL */
#define VRAID_DO_IT _IOW( 0xad, 0, char * )
#define VRAID_NAME "vraid"
#define VRAID_NAME_0 VRAID_NAME "0"
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/init.h>
#include <linux/kernel.h> /* printk() */
#include <linux/fs.h> /* everything... */
#include <linux/errno.h> /* error codes */
#include <linux/types.h> /* size_t */
#include <linux/vmalloc.h>
#include <linux/genhd.h>
#include <linux/blkdev.h>
#include <linux/hdreg.h>
#include <linux/kthread.h>
#include <trace/events/block.h>
//#include <../stdint.h>
//#define VRAID_BDEV_MODE (FMODE_READ | FMODE_WRITE | FMODE_EXCL)
#define VRAID_BDEV_MODE (FMODE_READ | FMODE_WRITE)
#define DEBUGGG printk("vraid: %d\n", __LINE__);
/*
* We can tweak our hardware sector size, but the kernel talks to us
* in terms of small sectors, always.
*/
#define KERNEL_SECTOR_SIZE 512
struct raiddisk_t
{
spinlock_t lock;
sector_t capacity;
unsigned max_sectors;
struct block_device *bdev_raw;
};
struct vraid_t {
spinlock_t lock;
struct raiddisk_t disks[32];
unsigned int diskcount;
struct gendisk *gd;
struct bio_list bio_list;
struct task_struct *thread;
struct request_queue *queue;
sector_t capacity;
int is_active;
};
static int vraid_start(void);
static int vraid_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg);
static int vraid_getgeo(struct block_device * block_device, struct hd_geometry * geo);
static int translate_dp4la(struct bio *bio);
static void vraid_io_fn(struct bio *bio);
static int vraid_threadfn(void *data);
static void vraid_make_request(struct request_queue *q, struct bio *bio);
static struct block_device *vraid_bdev_open(char dev_path[]);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment