Skip to content

Instantly share code, notes, and snippets.

@FlorianHeigl
Created April 18, 2021 13:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save FlorianHeigl/b75abf421c845b277938f36c2d1ecb19 to your computer and use it in GitHub Desktop.
Save FlorianHeigl/b75abf421c845b277938f36c2d1ecb19 to your computer and use it in GitHub Desktop.
fault tolerant rsync backup script
#!/bin/bash -u
STOREDIR=/srv/backup
NICE="ionice -c3 nice -n 19"
run_rsync()
{
# use -vv to gather options for remote server to put in .ssh/authorized_keys
# that way you can limit rsync to intended use (sender will be fixed)
$NICE rsync --bwlimit=20000 --delete -arHLz --numeric-ids --exclude="*/backup/*" --exclude="weblogs" host.domain.tld: .
return $?
}
loop_rsync()
{
SYNCED=no
TRIES=0
cd $STOREDIR || exit 1
while [ $SYNCED = "no" ]; do
if [ $TRIES -gt 100 ]; then
echo "Bailing out after 100 incomplete sync attempts"
break
fi
TRIES=$(( $TRIES + 1 ))
# call the actual sync, return code will be passed back
run_rsync ; ret=$?
echo "rsync returned $ret"
case ${ret} in
# group return codes by types of issues, see man page
0|24)
SYNCED=yes
;;
1|2|3|4|5)
echo "rsync says we misconfigured it"
break
;;
# cases where a retry could help
10|11|12|14|20|23|30)
continue
;;
19)
echo "user aborted!"
exit 1
;;
*)
echo "unknown issue"
break
;;
esac
done
}
loop_rsync
@FlorianHeigl
Copy link
Author

FlorianHeigl commented Apr 18, 2021

this is used to backup a web shop with millions of files, preserving the timestamps, in spite of timeouts and similar issues.

one notable, important thing:
to make this feasible, the receiver side (local) is optimized for this job.

In order of importance the optimizations are:

  1. filesystem: jfs
  2. storage optimization (ssd)
  3. mount options (noatime)

the key thing is that the local latency, or rather the rsync processing time has been minimized to practically nothing.
this means there is only one source of latency left and as such we avoid any latency feedback loops.

for numbers, this runs like 2 minutes.
the original setup (bad adaptec raid controller w/o bbu, ext3 fs, HDD, slow link) could not finish within hours.

the original inspiration was

  • backup of a 500GB+ HDD
  • over a 10mbit-ish DSL line
  • broken RAM in the local system that corrupted SSH transmissions
  • idea of $boss that this would be basically the same as when he was backing up(*) between two PCs on the lan and no real changes (like using real backup software and not running budget dedis/broken systems) were justified

(rsync is not a backup)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment