Skip to content

Instantly share code, notes, and snippets.

@jarvisms jarvisms/PruneBackups.py
Last active Jun 30, 2019

Embed
What would you like to do?
rsync incremental backup time machine to remote server of entire root tree
command="/path/to/rrsync /path/to/backup/",no-pty,no-port-forwarding,no-X11-forwarding,no-agent-forwarding ssh-rsa ....public_key....== root@backmeup
#!/bin/bash
### These assume public key without passphrase and rrsync is in use rooted to relevent remote backup destination
BACKUPROOT=root@host
LINKDIR=/current
EXCLUDELIST=/root/exclude-list
LOGFILE=/tmp/rsync.log
SYMLINK=/tmp/current
DATE=$(date "+%Y-%m-%dT%H-%M-%S")
rsync -viPhhhaHSxyy --no-R --delete-during --delete-excluded --stats --numeric-ids --link-dest="${LINKDIR}" --log-file="${LOGFILE}" --exclude-from="${EXCLUDELIST}" / "${BACKUPROOT}:/${DATE}"
ln -sfTv "${DATE}" "${SYMLINK}"
rsync -viPhhhaHSx --no-R --delete-during --stats --numeric-ids --remove-source-files "${LOGFILE}" "${SYMLINK}" "${BACKUPROOT}:/"
# System folders to exclude
# assuming -x option (does not cross file systems)
/tmp/*
/var/swap
/var/tmp/*
/media/*
/mnt/*
#!/usr/bin/env python3
import datetime
import os, os.path
import shutil
def striptime(dt):
try:
return datetime.datetime.strptime(dt,"%Y-%m-%dT%H-%M-%S") # Convert it
except ValueError: # If it doesn't look like a date, return None (boolean False)
return None
def testretention(then, now=datetime.datetime.now().replace(hour=0,minute=0,second=0,microsecond=0)):
delta = (now-then)
# Return True for the following "then" datetime compared with "now" datetime:
# Within 7 days
# A sunday in the last 4 weeks
# 1st of the month in the last year
# (Any future date (then after now), while not explicitly checked, will also return True)
# Else return False
return (
( delta <= datetime.timedelta(days=7) )
or ( then.weekday() == 6 and delta <= datetime.timedelta(weeks=4) )
or ( then.day == 1 and delta <= datetime.timedelta(days=365) )
)
dir = r"/path/to/" # the parent of /path/to/backup/
dir = os.path.abspath(os.path.realpath(os.path.expandvars(os.path.expanduser(dir)))) # Make it absolute
start = datetime.datetime.now()
for device in os.scandir(dir): # Root folder contains the various backups for each device
if device.is_dir(follow_symlinks=False): #ignore SymLinks
for entry in os.scandir(device.path): # Within each device's sub folder will be the dated backups
if entry.is_dir(follow_symlinks=False): #ignore SymLinks
dt = striptime(entry.name) # Try to capture the datetime from the folder name
if dt is not None and not testretention(dt): # None (False) if its not a date, if it is, test if its a keeper
print("DELETING:",entry.path) # Report back
shutil.rmtree(entry.path, ignore_errors=True) # Delete the backup
end = datetime.datetime.now()
print("Start:", start.strftime("%c"))
print("End:", end.strftime("%c"))
print("Duration:", (end-start))

This acheives full file system backups to a destination. I this for backing up my entire Raspberry Pi SD cards (with a few minor modifications) daily via my crontab.

The rrsync script is required from https://www.samba.org/ftp/unpacked/rsync/support/rrsync

Sample line from authorized_key file must reside on the destination server where the backups will be. The path to backup it given here and all restrictions are in place so that this is the only thing the ssh user can do, and the only place they have access to. A public/private key combo without passphrase must have been created and shared out first etc. etc. https://www.guyrutenberg.com/2014/01/14/restricting-ssh-access-to-rsync/

The exclude-list can be added to to skip particular folders or files, although the rsync parameters will prevent it from crossing file systems.

The backup.sh file is where the deed is done and is inspired by https://blog.interlinked.org/tutorials/rsync_time_machine.html (link may now be broken) You specify the remote server and your exclude-list and a few other bits. You can look up what the various rsync parameters will do, but in essence everything will be mirrored or deleted but if existing file already reside in the LINKDIR, then these will be hard linked remote server side. Therefore only changed files need to be transfered. The backup will be timestamped and a sym-link will be made called "current" to the backup just made. The second rsync call moves the local log and new symlink to the remote server too. The first backup will copy everything, but subsequent ones should be faster and incremental. Since rrsync (restricted rsync) is used, the root of the destination folder is actually defined in the authorized_key file, and LINKDIR is relative to the destination and can't use . or .. relative folder names.

The resulting struction will be /path/to/backup/ containing sub-folders such as 2019-04-14T20-45-30 each incrementally hardlinked to the last but appearing to be complete in their own right. There will also be the "rsync.log" of the last backup, and "current" symlink to the last backup folder. Since all un-changing files are hardlinked, this is very space efficient. However it may not be obvious which these are and so all of the backups should be considered read-only.

Note that the script does not delete old backups and eventually you may just run out of space, particularly if the file systems change alot.

To sort this last issue, the PruneBackups.py script (Python3) will prune off backups from the parent folder of all of your backups. This is designed to run as root directly on the filesystem containing all of the backups, so could run on on the the machine hosting your backups, or via NFS network shares (without root-squash) etc. Since I back up multiple devices to their own folder, this has been designed to run on the parent folder containing all of these. If not needed, remove the first inner nested loop and unindent the rest of the block to get it to run directly. In essence, this script tries each sub folder from the root which is where the dated backups for each device will live, and then compares the timestamp to the retention policy defined within the "testretention" function. Any files or folders not in the exact timestamp format will be ignored. You can therefore forcefully keep a backup by renaming it's folder so it doesn't look exactly like the timestamp above (you could literally append "keep-me" to the end of it if you want).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.