Skip to content

Instantly share code, notes, and snippets.

@steindev
Last active September 30, 2022 21:14
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save steindev/e8f6b52ef163b112d8c8dfccfe956fc8 to your computer and use it in GitHub Desktop.
Save steindev/e8f6b52ef163b112d8c8dfccfe956fc8 to your computer and use it in GitHub Desktop.
Script to transfer data from Juelich Supercomputing Center to HZDR.
#!/bin/bash
#
# Script to transfer data from Juelich Supercomputing Center to HZDR.
# Call with
# > screen # open a screen session first to be able to logout from the
# # data mover system
# > exec ssh-agent bash # prepare shell to add ssh key passphrase to ssh-agent
# # in order to not type it all the time
# > ssh-add ~/.ssh/id_ed25519 # add ssh key passphrase to ssh-agent
# > xargs -a dirs.list -n 1 -P 5 ~/bin/data-transfer_judac.sh | tee transfer.out
# which will transfer 5 directories named in file dirs.list at a time.
# Pepare dirs.list, keeping the list of directories that need to
# be transfered, by running on judac in the directory with the directories to transfer
# > find . -maxdepth 1 -name "your_directory_naming_pattern" -print
#
# In ~/.ssh/config I have a definition for Host judac providing the HostName judac.fz-juelich.de,
# User, IdentityFile, and setting ForwardAgent No, AddKeysToAgent No.
#
# Note that rsync always verifies that each transferred file was correctly reconstructed on the
# receiving side by checking a whole-file checksum that is generated as the file is transferred.
#
# Nevertheless, successful file transfer can be checked manually by creating md5 checksums of all
# files in the source and comparing to the respective checksums on the target system.
# In the source directory, create the file checksums.md by
# > find . -type f -exec md5sum {} + > checksums.md5
# and compare with the transfered directory on the target system by
# > md5sum --check checksums.md5 2>check.err >check.out
# where checksums.md5 can be transferred via rsync.
# Then check if everything is as expected by greping the check.out file for failed files
# > grep -i 'failed' check.out
#
# Better parallelize creation and checking of md5sums directory-wise with xargs the same
# as the data transfer, i.e. rsync below.
#
# CC0 Klaus Steiniger, 2021-2022
# Define source directory on JSC file system
SOURCE="/p/scratch/project/parent/directory/of/folders/to/copy"
# Define target directory on HZDR file system
TARGET="/net/gssnfs/bigdata/project/parent/directory/where/folders/are/saved"
printf "Transfer of %s started at %s\n" ${1##*/} $(date +%F_%H%M%S)
rsync --stats -avzhPe 'ssh' judac:$SOURCE/${1} $TARGET/ 2>$TARGET/transfer_${1##*/}.err > $TARGET/transfer_${1##*/}.out
printf "Transfer of %s finished at %s\n" ${1##*/} $(date +%F_%H%M%S)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment