jarvisms/PruneBackups.py

## authorized_keys
command="/path/to/rrsync /path/to/backup/",no-pty,no-port-forwarding,no-X11-forwarding,no-agent-forwarding ssh-rsa ....public_key....== root@backmeup

## backup.sh
#!/bin/bash

### These assume public key without passphrase and rrsync is in use rooted to relevent remote backup destination
BACKUPROOT=root@host
LINKDIR=/current
EXCLUDELIST=/root/exclude-list
LOGFILE=/tmp/rsync.log
SYMLINK=/tmp/current
DATE=$(date "+%Y-%m-%dT%H-%M-%S")

rsync -viPhhhaHSxyy --no-R --delete-during --delete-excluded --stats --numeric-ids --link-dest="${LINKDIR}" --log-file="${LOGFILE}" --exclude-from="${EXCLUDELIST}" / "${BACKUPROOT}:/${DATE}" \
&& ln -sfTv "${DATE}" "${SYMLINK}"
rsync -viPhhhaHSx --no-R --delete-during --stats --numeric-ids --remove-source-files "${LOGFILE}" "${SYMLINK}" "${BACKUPROOT}:/"

## exclude-list
# System folders to exclude
# assuming -x option (does not cross file systems)

/tmp/*
/var/swap
/var/tmp/*
/media/*
/mnt/*

## PruneBackups.py
#!/usr/bin/env python3
import datetime
import os
import os.path
import shutil

def striptime(dt):
  try:
    return datetime.datetime.strptime(dt,"%Y-%m-%dT%H-%M-%S") # Convert it
  except ValueError:  # If it doesn't look like a date, return None (boolean False)
    return None

def testretention(then, now=datetime.datetime.now().replace(hour=0,minute=0,second=0,microsecond=0)):
  delta = (now-then)
  # Return True for the following "then" datetime compared with "now" datetime:
  #   Within 7 days
  #   A sunday in the last 4 weeks
  #   1st of the month in the last year
  #   (Any future date (then after now), while not explicitly checked, will also return True)
  # Else return False
  return (
    ( delta <= datetime.timedelta(days=7) )
    or ( then.weekday() == 6 and delta <= datetime.timedelta(weeks=4) )
    or ( then.day == 1 and delta <= datetime.timedelta(days=365) )
  )

dir = r"/path/to/"  # the parent of /path/to/backup/
dir = os.path.abspath(os.path.realpath(os.path.expandvars(os.path.expanduser(dir))))  # Make it absolute

start = datetime.datetime.now()

for device in os.scandir(dir):  # Root folder contains the various backups for each device
  if device.is_dir(follow_symlinks=False):  #ignore SymLinks
    for entry in os.scandir(device.path): # Within each device's sub folder will be the dated backups
      if entry.is_dir(follow_symlinks=False):  #ignore SymLinks
        dt = striptime(entry.name) # Try to capture the datetime from the folder name
        if dt is not None and not testretention(dt): # None (False) if its not a date, if it is, test if its a keeper
          print("DELETING:",entry.path) # Report back
          shutil.rmtree(entry.path, ignore_errors=True) # Delete the backup

end = datetime.datetime.now()

print("Start:", start.strftime("%c"))
print("End:", end.strftime("%c"))
print("Duration:", (end-start))

## rsync remote incremental hardlinked backups.md

      
    Raw
  

              rsync remote incremental hardlinked backups.md
            
          
    This acheives full file system backups to a destination. I this for backing up my entire Raspberry Pi SD cards (with a few minor modifications) daily via my crontab.
The rrsync script is required from https://www.samba.org/ftp/unpacked/rsync/support/rrsync
Sample line from authorized_key file must reside on the destination server where the backups will be. The path to backup it given here and all restrictions are in place so that this is the only thing the ssh user can do, and the only place they have access to. A public/private key combo without passphrase must have been created and shared out first etc. etc. https://www.guyrutenberg.com/2014/01/14/restricting-ssh-access-to-rsync/
The exclude-list can be added to to skip particular folders or files, although the rsync parameters will prevent it from crossing file systems.
The backup.sh file is where the deed is done and is inspired by https://blog.interlinked.org/tutorials/rsync_time_machine.html (link may now be broken)
You specify the remote server and your exclude-list and a few other bits. You can look up what the various rsync parameters will do, but in essence everything will be mirrored or deleted but if existing file already reside in the LINKDIR, then these will be hard linked remote server side.
Therefore only changed files need to be transfered. The backup will be timestamped and a sym-link will be made called "current" to the backup just made. The second rsync call moves the local log and new symlink to the remote server too.
The first backup will copy everything, but subsequent ones should be faster and incremental.
Since rrsync (restricted rsync) is used, the root of the destination folder is actually defined in the authorized_key file, and LINKDIR is relative to the destination and can't use . or .. relative folder names.
The resulting struction will be /path/to/backup/ containing sub-folders such as 2019-04-14T20-45-30 each incrementally hardlinked to the last but appearing to be complete in their own right. There will also be the "rsync.log" of the last backup, and "current" symlink to the last backup folder.
Since all un-changing files are hardlinked, this is very space efficient. However it may not be obvious which these are and so all of the backups should be considered read-only.
Note that the script does not delete old backups and eventually you may just run out of space, particularly if the file systems change alot.
To sort this last issue, the PruneBackups.py script (Python3) will prune off backups from the parent folder of all of your backups. This is designed to run as root directly on the filesystem containing all of the backups, so could run on on the the machine hosting your backups, or via NFS network shares (without root-squash) etc.
Since I back up multiple devices to their own folder, this has been designed to run on the parent folder containing all of these. If not needed, remove the first inner nested loop and unindent the rest of the block to get it to run directly. In essence, this script tries each sub folder from the root which is where the dated backups for each device will live, and then compares the timestamp to the retention policy defined within the "testretention" function. Any files or folders not in the exact timestamp format will be ignored. You can therefore forcefully keep a backup by renaming it's folder so it doesn't look exactly like the timestamp above (you could literally append "keep-me" to the end of it if you want).
	#!/bin/bash

	### These assume public key without passphrase and rrsync is in use rooted to relevent remote backup destination
	BACKUPROOT=root@host
	LINKDIR=/current
	EXCLUDELIST=/root/exclude-list
	LOGFILE=/tmp/rsync.log
	SYMLINK=/tmp/current
	DATE=$(date "+%Y-%m-%dT%H-%M-%S")

	rsync -viPhhhaHSxyy --no-R --delete-during --delete-excluded --stats --numeric-ids --link-dest="${LINKDIR}" --log-file="${LOGFILE}" --exclude-from="${EXCLUDELIST}" / "${BACKUPROOT}:/${DATE}" \
	&& ln -sfTv "${DATE}" "${SYMLINK}"
	rsync -viPhhhaHSx --no-R --delete-during --stats --numeric-ids --remove-source-files "${LOGFILE}" "${SYMLINK}" "${BACKUPROOT}:/"
	# System folders to exclude
	# assuming -x option (does not cross file systems)

	/tmp/*
	/var/swap
	/var/tmp/*
	/media/*
	/mnt/*
	#!/usr/bin/env python3
	import datetime
	import os
	import os.path
	import shutil

	def striptime(dt):
	try:
	return datetime.datetime.strptime(dt,"%Y-%m-%dT%H-%M-%S") # Convert it
	except ValueError: # If it doesn't look like a date, return None (boolean False)
	return None

	def testretention(then, now=datetime.datetime.now().replace(hour=0,minute=0,second=0,microsecond=0)):
	delta = (now-then)
	# Return True for the following "then" datetime compared with "now" datetime:
	# Within 7 days
	# A sunday in the last 4 weeks
	# 1st of the month in the last year
	# (Any future date (then after now), while not explicitly checked, will also return True)
	# Else return False
	return (
	( delta <= datetime.timedelta(days=7) )
	or ( then.weekday() == 6 and delta <= datetime.timedelta(weeks=4) )
	or ( then.day == 1 and delta <= datetime.timedelta(days=365) )
	)

	dir = r"/path/to/" # the parent of /path/to/backup/
	dir = os.path.abspath(os.path.realpath(os.path.expandvars(os.path.expanduser(dir)))) # Make it absolute

	start = datetime.datetime.now()

	for device in os.scandir(dir): # Root folder contains the various backups for each device
	if device.is_dir(follow_symlinks=False): #ignore SymLinks
	for entry in os.scandir(device.path): # Within each device's sub folder will be the dated backups
	if entry.is_dir(follow_symlinks=False): #ignore SymLinks
	dt = striptime(entry.name) # Try to capture the datetime from the folder name
	if dt is not None and not testretention(dt): # None (False) if its not a date, if it is, test if its a keeper
	print("DELETING:",entry.path) # Report back
	shutil.rmtree(entry.path, ignore_errors=True) # Delete the backup

	end = datetime.datetime.now()

	print("Start:", start.strftime("%c"))
	print("End:", end.strftime("%c"))
	print("Duration:", (end-start))