jpetazzo/README.md

## README.md

      
    Raw
  

              README.md
            
          
    So I heard you hosed your Riak cluster

I don't know what you did (I don't know what I did when this happened to me), but you ended up with a completely borked Riak cluster. Possible causes and symptoms include:

riak-admin transfers shows different things depending on the node you run it on
you tried to leave/join nodes to fix things, but it made them only worse
you ran mixed versions in parallel, instead of doing a clean rolling upgrade
some data seems to be missing, and when you list the keys in a bucket, clearly there is not the amount you were expecting
YOU'RE AFRAID YOU MIGHT HAVE LOST DATA

Don't panic—at least not before having tried this.

Install a new server (spin up a VM, whatever...)
Install a brand new, virgin Riak in it
Stop the riak node running on the new server: riak stop
Wipe it out: rm -rf /var/lib/riak/*
Recreate the bitcask directory: mkdir /var/lib/riak/bitcask
Create a directory (e.g. ~/bitcasks) in this machine
Copy the /var/lib/riak/bitcask directory of each node of your borked cluster into ~/bitcask/node-$HOSTNAME (this $HOSTNAME should be the hostname of the node, not the hostname of your new server)
Copy the merge-bitcask.py file to the same directory
Run it, inspect the output (it should have 1 line per partition, i.e. 64 by default)
Run it again for real: python merge-bitcask.py | sh
Start the riak node and see if your data is there

How does it work?

The bitcask directory contains one subdirectory per partition. Sometimes (at least, that's what happened to me!) partitions get all messed up, and nodes don't know which other node owns which partition. The method described here merges all the partitions to a single new node. But, in some cases, multiple versions of a same partition will be present on different nodes. This script just checks the size of the partitions, and retains each time the biggest partition. You can probably do the same thing with a mix of du/sort/awk.

  
## merge-bitcask.py
#!/usr/bin/env python
import os
import glob
sourcedirs = glob.glob('node-*')
vnodes = set()
for sourcedir in sourcedirs:
    vnodes |= set(os.listdir(sourcedir))
vnodes.remove('manual_cleanup')
for vnode in vnodes:
    biggestsize = 0
    biggestsource = None
    for sourcedir in sourcedirs:
        thissize = 0
        if not os.path.isdir(os.path.join(sourcedir, vnode)):
            continue
        for bcfile in os.listdir(os.path.join(sourcedir, vnode)):
            thissize += os.stat(os.path.join(sourcedir, vnode, bcfile)).st_size
        if thissize > biggestsize:
            biggestsize = thissize
            biggestsource = sourcedir
    print 'cp -r {biggestsource}/{vnode} /var/lib/riak/bitcask'.format(**locals())
	#!/usr/bin/env python
	import os
	import glob
	sourcedirs = glob.glob('node-*')
	vnodes = set()
	for sourcedir in sourcedirs:
	vnodes \|= set(os.listdir(sourcedir))
	vnodes.remove('manual_cleanup')
	for vnode in vnodes:
	biggestsize = 0
	biggestsource = None
	for sourcedir in sourcedirs:
	thissize = 0
	if not os.path.isdir(os.path.join(sourcedir, vnode)):
	continue
	for bcfile in os.listdir(os.path.join(sourcedir, vnode)):
	thissize += os.stat(os.path.join(sourcedir, vnode, bcfile)).st_size
	if thissize > biggestsize:
	biggestsize = thissize
	biggestsource = sourcedir
	print 'cp -r {biggestsource}/{vnode} /var/lib/riak/bitcask'.format(**locals())