# In the rsync command below, be sure to replace $TREE with the name of the rsync tree you are instructed to use,
# and $LOCAL_PATH with the local path you want to save the data to.
# sync full dataset - this is the only step you need to sync the full dataset
rsync --copy-links --delete --ignore-errors --recursive --times --verbose$TREE $LOCAL_PATH
# sync volumes from ID list - for subsets you'll need an additional step to generate the list of paths to sync
# Step 1: generate list of paths to rsync
# Input file id_list.txt must be a plain text file containing one HathiTrust Volume ID per line, with Unix line endings and no other encoding (URL esaping, quotes, etc)
# using ruby, install must install rpairtree gem before running (note there is and 'r' in the gem name and not in the require statement, not a typo)
ruby -e 'require "pairtree";ARGF.each {|l|l.chomp!;n,i=l.split(/\./,2);puts "#{n}/pairtree_root/#{Pairtree::Path.id_to_path i}"}' id_list.txt > path_list.txt
# command using perl, requires File::Pairtree CPAN module to be installed
perl -MFile::Pairtree -ne 'chomp;($n,$i)=split /\./,$_,2;print "$n/".File::Pairtree::id2ppath($i).File::Pairtree::s2ppchars($i)."\n"' id_list.txt > path_list.txt
# Step 2: sync files
rsync --copy-links --delete --ignore-errors --recursive --times --verbose --files-from=path_list.txt$TREE $LOCAL_PATH

rrotter commented Dec 17, 2015

  • list.txt is a text file containing a list of HathiTrust items to sync. Substitute the name of your actual list.
  • ht_text_pd is one of several rsync points available, substitute the name of the rsync point you are instructed to use.
  • You may want to use the --dry-run option, or skip the --verbose option for the rsync command.

rrotter commented Mar 3, 2016

In addition to Linux and OS X this works in PowerShell on Windows. Here are links to get rsync and ruby for Windows.

