Skip to content

Instantly share code, notes, and snippets.

@Spoygg
Last active March 24, 2024 13:26
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Spoygg/f6cdfbe6627a41fcf75fa7320b9dee3d to your computer and use it in GitHub Desktop.
Save Spoygg/f6cdfbe6627a41fcf75fa7320b9dee3d to your computer and use it in GitHub Desktop.
Use rsync to sync to directories but keep history of what is synced to make the process more optimal for huge number of files.
#!/bin/bash
#
# Sync two directories with rsync, but keep history to optimize the process.
# On subsequent runs it will only sync files added since last sync.
# This allows an easy continue in case script is interupted.
# This also helps with network connectivity to remotes since each file is
# transfered by initiating new rsync command. This solves one issue I have
# faced when syncing large number of files and that is connection breaking
# if the process takes to long (in my case it was about 10hrs to transfer all
# the files).
#
# Accepts two arguments source and destination directory.
# Make sure source and destination do not end with a slash.
# Script assumes that both source and destination directory already exist. It
# is meant only to sync source content to destination content.
#
# When started it will output where it saves history. Do not delete that file!
# Delete history file if you want to re-sync from clean state.
#
# Example usage
# In parent directory of Audiobooks run:
# ./sync_with_history.sh Audiobooks user@xhostname:/media/ServerMedia/Audiobooks
# to sync files to a remote server. You'll need ssh access set up.
#
# To sync local directories simply run:
# ./sync_with_history.sh Audiobooks /path/to/destination/Audiobooks
# in the parent directory of Audiobooks.
# Create history file name
escaped1=$(echo $1 | tr / -)
escaped2=$(echo $2 | tr / -)
sync_with_history_done_list="sync_with_history_done_list-$escaped1-to-$escaped2"
echo "Saving history to $sync_with_history_done_list"
# Ensure sync_with_history_done_list exists
touch $sync_with_history_done_list
# List all not rsync-ed files to a list
find $1 -mindepth 1 -type f -printf '%P\n' | grep -vFf $sync_with_history_done_list > sync_with_history_todo_list
cat sync_with_history_todo_list | while read line
do
echo "Sending: $line"
echo "$line" > files-to-include
# NOTE: use rsync -a if you want to keep permissions, owner, group etc.
# I use -r because I don't need those.
rsync -r --files-from=files-to-include $1/ $2/
echo "$line" >> $sync_with_history_done_list
done
# Clean up. leave only sync_with_history_done_list
touch files-to-include
rm files-to-include
rm sync_with_history_todo_list
@Spoygg
Copy link
Author

Spoygg commented Mar 22, 2023

I was setting up self hosted audiobook server on an old laptop and I was trying to sync my audiobooks library from my main computer (about 100GBs and 4.5k files).

My main goal was to sync my main computer Audiobooks directory, where I keep all my audiobooks from Audible and other sources. I fetch new books from Audible via Libation and I don't want to run that on my modest audiobook server. On my audiobook server I also have Audiobooks directory which is used by Audiobookshelf.
With this in mind workflow is to sync new books I bought on Audible via Libation to main computer, then run some file sync to sync it to the server.

I started with scp, which proved to be unreliable for my use case. Then I switched to rsync, which wasn't that much better since it wouldn't transfer everything, but it would query everything on the server to check if it already exists. That also took a long time.
I've researched some more sync utilities like unison and self hosted solutions like Syncthing. Either they didn't do what I wanted, either they were too heavy for my simple use case. I wanted just a command line utility that will keep track of what is already synced and not do it again. My server is very low on resources and I don't want to waste any of them to useless syncing.

So, since I haven't managed to find anything, this script is born. Use at your own risk!

@chapmanjacobd
Copy link

chapmanjacobd commented Oct 23, 2023

Thanks for this!

I'm doing a similar thing and used this for inspiration to write this fish shell function:

function stickysync_backup --argument historyfile from to
    set new_files (mktemp)
    combine (ssh backup "cd $from && fd -S+1b --changed-before '24 hours'" | psub) not $historyfile >$new_files
    cat $new_files | while read line
        rsync -r --files-from=(echo $line | psub) backup:$from $to
        and echo $line >>$historyfile
        or echo failed $line
    end
end

stickysync_backup ~/d/00_Metadata/stickysync_audiobooks d/_audiobooks/ /mnt/d/82_Audiobooks/

🐁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment