Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Convert TimeMachine backup into a Differential-like backup
#!/usr/bin/env bash
# thisfile: remove_dups_tmbackup.sh
# save the current path
CURRENT_DIR=$(pwd)
# receive backup destinarion directory as argument
DEST_DIR="$1"
# search for last previous destination directory name
PREVIOUS_DEST=`find "$DEST_DIR" -maxdepth 1 -type d -name "????-??-??-??????" -prune | sort -r | head -n 2 | tail -n 1`
# search for last destination directory name
LAST_DEST=`find "$DEST_DIR" -maxdepth 1 -type d -name "????-??-??-??????" -prune | sort -r | head -n 1`
# if is the same, exit (to avoid comparing to itself!!!)
eval "if [ \"$PREVIOUS_DEST\" == \"$LAST_DEST\" ]; then echo \"There is only one backup\" ; exit 1 ;fi";
# get only the dirname from path.
LAST_DEST_DIRNAME=`echo "$LAST_DEST" | rev | cut -d '/' -f 1 | rev`
# enter into previous destination directory
cd "$PREVIOUS_DEST" || { echo \"Failed to get in correct directory\" ; exit 1 ; }
# search previus backup files and compare with new at same path
# if both point to same inode then we delete previous backup one
find . -type f -print0 | while read -d '' -r file; do
#echo "if [ `printf '%q' \"$file\"` -ef `printf '../%q/%q' \"$LAST_DEST_DIRNAME\" \"$file\"` ]; then rm -vf `printf '%q' \"$file\"` ;fi";
eval "if [ `printf '%q' \"$file\"` -ef `printf '../%q/%q' \"$LAST_DEST_DIRNAME\" \"$file\"` ]; then rm -f `printf '%q' \"$file\"` ;fi";
done
# delete also empty directories left
find . -type d -empty -delete
# go back to previous directory
cd "$CURRENT_DIR"
@braian87b

This comment has been minimized.

Copy link
Owner Author

commented Nov 20, 2017

This script must run after each backup made by this other script: https://github.com/laurent22/rsync-time-backup
each timestamp directory will have only files that are not equal to previous backup, latest directory will always conseve full backup.

@braian87b

This comment has been minimized.

Copy link
Owner Author

commented Mar 22, 2018

It leaves something like this:

2018-03-10 16:40\
    file1.xls

2018-03-11 10:20\
    file1.xls
    file2.xls

2018-03-11 15:30\
    file2.xls

latest-> 2018-03-22 17:45\
    file1.xls
    file2.xls
    file3.xls
    file4.xls

instead of

2018-03-10 16:40\
    file1.xls
    file2.xls (hardlink)
    file3.xls (hardlink)
    file4.xls (hardlink)
(...) (no modifications here, but several useless backup folders between)
2018-03-11 10:20\
    file1.xls
    file2.xls
    file3.xls (hardlink)
    file4.xls (hardlink)
(...) (no modifications here, but several useless backup folders between)
2018-03-11 12:20\ (no modifications here, but a useless backup folder)
    file1.xls (hardlink)
    file2.xls (hardlink)
    file3.xls (hardlink)
    file4.xls (hardlink)
(...) (no modifications here, but several useless backup folders between)
2018-03-11 15:30\
    file1.xls (hardlink)
    file2.xls
    file3.xls (hardlink)
    file4.xls (hardlink)
(...) (no modifications here, but several useless backup folders between)
latest-> 2018-03-22 17:45\
    file1.xls
    file2.xls
    file3.xls
    file4.xls
@braian87b

This comment has been minimized.

Copy link
Owner Author

commented Jun 17, 2018

Previously I used to run these commands using rmlint instead after each backup (comparing previous and latest directory)

rmlint -gvkm './2018-03-11 15:30' // './2018-03-22 17:45'

this means:

rmlint -gvkm './duplicates-to-be-deleted' // './original-to-be-preserved'
rmlint -gvkm './previous-directory' // './latest-directory'

-g, --progress Enable progressbar
-v, --loud Be more verbose (-vvv for much more)
-k, --keep-all-tagged Keep all tagged files
-m, --must-match-tagged Must have twin in tagged dir

bash rmlint.sh -d

-d perform deletions without asking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.