Skip to content

Instantly share code, notes, and snippets.

Last active February 20, 2023 18:48
Show Gist options
  • Save chuckhoupt/6265268 to your computer and use it in GitHub Desktop.
Save chuckhoupt/6265268 to your computer and use it in GitHub Desktop.
A shell script to backup a user's recently modified files to a remote server via rsync.
set -eux
# Wait for the net to come up
while [ ! $(ifconfig | grep 'status: active') ]; do
sleep 1
HOST=$(hostname -s)
if true
rm -rf $SHADOW_DIR
find "$HOME" \
-path "$HOME/Library" -prune -or \
-path "$HOME/Pictures/iPhoto Library/Thumbnails" -prune -or \
-path "$HOME/Music/iTunes" -prune -or \
-path "$HOME/.Trash" -prune -or \
-path "$HOME/.m2" -prune -or \
-path "$HOME/Downloads" -prune -or \
-name .svn -prune -or \
-name CVS -prune -or \
-name 'iPod Photo Cache' -prune -or \
-not -type d \
-not -name .DS_Store \
-not -name '*.o' \
-not -name '*.class' \
-not -name '*.jar' \
-mtime -$DAYS \
-print \
| cpio -pdl $SHADOW_DIR
du -sk $SHADOW_DIR
# omit dir times, because cpio doesn't preserve dir times, causing re-sync every time
rsync -avz --delete --stats --omit-dir-times \
rm -rf $SHADOW_DIR

Backup-Recent Shell Script

Offsite backups can be very useful for disaster recovery. However, backing up all data to the cloud can be slow and expensive. A compromise is to backup only recently modified data to the cloud. Many ISPs/WebHosts provide small amounts of online storage, which can be useful for recent-file backup. For example, DreamHost accounts come with 50GB of free backup storage -- not enough for a complete backup, but plenty of space to hold many months of recent work.

How it Works

Although rsync is very powerful, it doesn't have an option to filter files based on age. To work around this, the script creates a scratch shadow copy of the user's home directory containing only recently modified files.

Of course, a naive shadow copy of recent files would consume large amounts of disk and time. For efficiency, the shadow copy is built with hard-linked copies, which means only a partial directory structure is truly duplicated. This is done by combining find's modification-time filter (-mtime) and cpio's pass-thru hard-link mode (-p and -l). The basic core of the script are the commands:

find $HOME -mtime -30d -print | cpio -dpl $SHADOW_HOME
rsync -avz --delete --omit-dir-times $SHODOW_HOME/ $BACKUP_ACCOUNT:recent-changes

Much of the script bulk is added filtering to eliminate temporary or derivative files and directories from the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment