Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
# In the rsync command below, be sure to replace $TREE with the name of the rsync tree you are instructed to use,
# and $LOCAL_PATH with the local path you want to save the data to.
# sync full dataset - this is the only step you need to sync the full dataset
rsync --copy-links --delete --ignore-errors --recursive --times --verbose datasets.hathitrust.org::$TREE $LOCAL_PATH
# sync volumes from ID list - for subsets you'll need an additional step to generate the list of paths to sync
# Step 1: generate list of paths to rsync
# Input file id_list.txt must be a plain text file containing one HathiTrust Volume ID per line, with Unix line endings and no other encoding (URL esaping, quotes, etc)
# using ruby, install must install rpairtree gem before running (note there is and 'r' in the gem name and not in the require statement, not a typo)
ruby -e 'require "pairtree";ARGF.each {|l|l.chomp!;n,i=l.split(/\./,2);puts "#{n}/pairtree_root/#{Pairtree::Path.id_to_path i}"}' id_list.txt > path_list.txt
# command using perl, requires File::Pairtree CPAN module to be installed
perl -MFile::Pairtree -ne 'chomp;($n,$i)=split /\./,$_,2;print "$n/".File::Pairtree::id2ppath($i).File::Pairtree::s2ppchars($i)."\n"' id_list.txt > path_list.txt
# Step 2: sync files
rsync --copy-links --delete --ignore-errors --recursive --times --verbose --files-from=path_list.txt datasets.hathitrust.org::$TREE $LOCAL_PATH

rrotter commented Dec 17, 2015

  • list.txt is a text file containing a list of HathiTrust items to sync. Substitute the name of your actual list.
  • ht_text_pd is one of several rsync points available, substitute the name of the rsync point you are instructed to use.
  • You may want to use the --dry-run option, or skip the --verbose option for the rsync command.

rrotter commented Mar 3, 2016

In addition to Linux and OS X this works in PowerShell on Windows. Here are links to get rsync and ruby for Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment