Instantly share code, notes, and snippets.

Embed
What would you like to do?
Block device sync between remote hosts. Based off http://www.bouncybouncy.net/programs/blocksync.py
#!/usr/bin/env python
"""
Synchronise block devices over the network
Copyright 2006-2008 Justin Azoff <justin@bouncybouncy.net>
Copyright 2011 Robert Coup <robert@coup.net.nz>
License: GPL
Getting started:
* Copy blocksync.py to the home directory on the remote host
* Make sure your remote user can either sudo or is root itself.
* Make sure your local user can ssh to the remote host
* Invoke:
sudo python blocksync.py /dev/source user@remotehost /dev/dest
"""
import sys
from sha import sha
import subprocess
import time
SAME = "same\n"
DIFF = "diff\n"
def do_open(f, mode):
f = open(f, mode)
f.seek(0, 2)
size = f.tell()
f.seek(0)
return f, size
def getblocks(f, blocksize):
while 1:
block = f.read(blocksize)
if not block:
break
yield block
def server(dev, blocksize):
print dev, blocksize
f, size = do_open(dev, 'r+')
print size
sys.stdout.flush()
for block in getblocks(f, blocksize):
print sha(block).hexdigest()
sys.stdout.flush()
res = sys.stdin.readline()
if res != SAME:
newblock = sys.stdin.read(blocksize)
f.seek(-len(newblock), 1)
f.write(newblock)
def sync(srcdev, dsthost, dstdev=None, blocksize=1024 * 1024):
if not dstdev:
dstdev = srcdev
print "Block size is %0.1f MB" % (float(blocksize) / (1024 * 1024))
# cmd = ['ssh', '-c', 'blowfish', dsthost, 'sudo', 'python', 'blocksync.py', 'server', dstdev, '-b', str(blocksize)]
cmd = ['ssh', '-c', 'blowfish', dsthost, 'python', 'blocksync.py', 'server', dstdev, '-b', str(blocksize)]
print "Running: %s" % " ".join(cmd)
p = subprocess.Popen(cmd, bufsize=0, stdin=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True)
p_in, p_out = p.stdin, p.stdout
line = p_out.readline()
p.poll()
if p.returncode is not None:
print "Error connecting to or invoking blocksync on the remote host!"
sys.exit(1)
a, b = line.split()
if a != dstdev:
print "Dest device (%s) doesn't match with the remote host (%s)!" % (dstdev, a)
sys.exit(1)
if int(b) != blocksize:
print "Source block size (%d) doesn't match with the remote host (%d)!" % (blocksize, int(b))
sys.exit(1)
try:
f, size = do_open(srcdev, 'r')
except Exception, e:
print "Error accessing source device! %s" % e
sys.exit(1)
line = p_out.readline()
p.poll()
if p.returncode is not None:
print "Error accessing device on remote host!"
sys.exit(1)
remote_size = int(line)
if size != remote_size:
print "Source device size (%d) doesn't match remote device size (%d)!" % (size, remote_size)
sys.exit(1)
same_blocks = diff_blocks = 0
print "Starting sync..."
t0 = time.time()
t_last = t0
size_blocks = size / blocksize
for i, l_block in enumerate(getblocks(f, blocksize)):
l_sum = sha(l_block).hexdigest()
r_sum = p_out.readline().strip()
if l_sum == r_sum:
p_in.write(SAME)
p_in.flush()
same_blocks += 1
else:
p_in.write(DIFF)
p_in.flush()
p_in.write(l_block)
p_in.flush()
diff_blocks += 1
t1 = time.time()
if t1 - t_last > 1 or (same_blocks + diff_blocks) >= size_blocks:
rate = (i + 1.0) * blocksize / (1024.0 * 1024.0) / (t1 - t0)
print "\rsame: %d, diff: %d, %d/%d, %5.1f MB/s" % (same_blocks, diff_blocks, same_blocks + diff_blocks, size_blocks, rate),
t_last = t1
print "\n\nCompleted in %d seconds" % (time.time() - t0)
return same_blocks, diff_blocks
if __name__ == "__main__":
from optparse import OptionParser
parser = OptionParser(usage="%prog [options] /dev/source user@remotehost [/dev/dest]")
parser.add_option("-b", "--blocksize", dest="blocksize", action="store", type="int", help="block size (bytes)", default=1024 * 1024)
(options, args) = parser.parse_args()
if len(args) < 2:
parser.print_help()
print __doc__
sys.exit(1)
if args[0] == 'server':
dstdev = args[1]
server(dstdev, options.blocksize)
else:
srcdev = args[0]
dsthost = args[1]
if len(args) > 2:
dstdev = args[2]
else:
dstdev = None
sync(srcdev, dsthost, dstdev, options.blocksize)
@rcoup

This comment has been minimized.

Owner

rcoup commented Mar 28, 2012

Have updated it to work with Old Pythons that don't have a format() method. Enjoy :)

@geraldh

This comment has been minimized.

geraldh commented Aug 8, 2012

Attention this code eats your data !

This gist contains a major flaw - look at gist:3296554 for a fix.

@rcoup

This comment has been minimized.

Owner

rcoup commented Mar 11, 2013

thanks @geraldh! - it definitely didn't eat my data but I guess I was lucky - I'm probably using LVM too much so everything is a nice size :)

I've rolled your changes into the above gist.

@lkraav

This comment has been minimized.

lkraav commented Jul 25, 2013

out of curiosity, is this better in any way than rsync --copy-devices --inplace? well aside from having to bother with a custom patch for rsync a bit.

@DiagonalArg

This comment has been minimized.

DiagonalArg commented Jan 5, 2014

rsync --copy-devices is completely useless for large devices. It spends too much time searching for pieces that are identical. See the comment here:
https://serverfault.com/questions/27397/sync-lvm-snapshots-to-backup-server

@sirio81

This comment has been minimized.

sirio81 commented Jan 27, 2014

This is a great tool.
Please, add in the help description what is the default block size

-b BLOCKSIZE, --blocksize=BLOCKSIZE
block size (bytes) - default 1024*1024 (1M)

@sirio81

This comment has been minimized.

sirio81 commented Jan 27, 2014

I also think it might be useful to be able to run blocksync.py on localhost without involving ssh:

blocksync.py /dev/vg00/test /dev/vg00/test_clone

May you add this option?

@slesru

This comment has been minimized.

slesru commented Apr 10, 2014

Hello!

Is it possible to use it in reverse direction? I.e. copy from remote host device to local host?
It can be useful for backup changes...
Thank you!

@slesru

This comment has been minimized.

slesru commented Apr 21, 2014

OK, here is quick and dirty patch for reverse direction. At least I hope so :-)

diff blocksync.py blocksyncR.py
45c45

< f, size = do_open(dev, 'r+')

f, size = do_open(dev, 'r')

54,56c54,58
< newblock = sys.stdin.read(blocksize)
< f.seek(-len(newblock), 1)

< f.write(newblock)

        sys.stdout.write(block)
        sys.stdout.flush()
        #newblock = sys.stdin.read(blocksize)
        #f.seek(-len(newblock), 1)
        #f.write(newblock)

66c68

< cmd = ['ssh', '-c', 'blowfish', dsthost, 'python', 'blocksync.py', 'server', dstdev, '-b', str(blocksize)]

cmd = ['ssh', '-c', 'blowfish', dsthost, 'python', 'blocksyncR.py', 'server', dstdev, '-b', str(blocksize)]

87c89

< f, size = do_open(srcdev, 'r')

    f, size = do_open(srcdev, 'r+')

119,120c121,125
< p_in.write(l_block)

< p_in.flush()

        newblock = p_out.read(blocksize)
        f.seek(-len(newblock), 1)
        f.write(newblock)
        #p_in.write(l_block)
        #p_in.flush()
@ramcq

This comment has been minimized.

ramcq commented Jun 16, 2014

Hi there, I merged in some changes from holgere's (Holger Ernst) fork to enable non-interactive usage, and some changes of my own. https://gist.github.com/ramcq/0dc76d494598eb09740f/revisions has the changes.

@funkyflash

This comment has been minimized.

funkyflash commented Aug 28, 2014

Excellent work! Extremely useful! I'm afraid I have absolutely no idea what I'm doing with git or python, but here's my hack to add local-to-local logic:

https://gist.github.com/funkyflash/7430f21fa4200c1f7061

@papu12

This comment has been minimized.

papu12 commented Jan 18, 2015

Hi , I run this script , i have two LV devices , mounted - in 1 - i add some files with contents -- when i complete sync , in other end i see the files get copied , but with no contents -- my LV is
[root@localhost test]# lvm version
LVM version: 2.02.105(2)-RHEL7 (2014-03-26)
Library version: 1.02.84-RHEL7 (2014-03-26)
Driver version: 4.27.0

@papu12

This comment has been minimized.

papu12 commented Jan 18, 2015

To add more -- 1st time when it sync files & contents getting copied , next onwards i make any update in those files , and do run , updates are not reflected in dest, only if i add new file , they get reflected..

@jfesler

This comment has been minimized.

jfesler commented Oct 3, 2015

Thanks for sharing - this was exactly what I needed to minimize downtime on a kvm migration involving raw block devices.

@JustinAzoff

This comment has been minimized.

JustinAzoff commented Feb 28, 2016

Hi everyone,

Thanks for maintaining this! I'm surprised it is still useful after all these years.

@ken-zheng

This comment has been minimized.

ken-zheng commented Jul 11, 2016

Thanks. It helps a lot!

@shodanshok

This comment has been minimized.

shodanshok commented Aug 18, 2016

Hi, I added some options and corrected two bugs, one of which quite nasty.

The file can be downloaded from here: https://gist.github.com/shodanshok/b35875e9baabe01f6dda315259d9046c

Solved bug:

  • incorrect display of size_blocks during final transfer report
  • wrong last-block sync when source/dest size is not multiple of 512 bytes (only affects sync between files, as block device are almost always 512-byte aligned)

Added features:

  • changed default hash alg to sha512
  • added '-a' option to select hash alg
  • added '-e' option to select ssh encryption alg
  • added '-x' option to minimize cache pollution during reads (via posix_fadvise)
  • added '-c' option to calculate and show complete hash of source dev/file
  • added '-s' option to select sudo execution

Maybe we should start a new full-fledged repository to better track all the changes accumulated in the past years?

@ct16k

This comment has been minimized.

ct16k commented Sep 25, 2016

Hi,

I already keep a version in a repository here: https://github.com/ct16k/blocksync
It is based on this one, to which I've added stuff I found myself needing, main feature being multiple worker processes, for very large files, but also a "dry run" option (just reports number of different blocks), self-copy so the same code is run both locally and on the remote server, and misc stuff like extra options for SSH, a second hash comparison for the paranoid, output to log, create destination when sync-ing to file etc.

I've also now borrowed two features from @shodanshok's version (automatically using posix_fadvise where available and configurable hash algorithms, instead of just hardcoding sha512+sha384 or sha128+md5).

@mephisstopheles

This comment has been minimized.

mephisstopheles commented Jun 15, 2018

Hallo,
I tried tu run blocksync.py from an ESXi 6.5 system and keep getting this Error: "OSError: [Errno 27] File too large".
If anyone has any solutions I would be thankfull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment