Skip to content

Instantly share code, notes, and snippets.

@brainstorm
Created September 19, 2011 14:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brainstorm/1226592 to your computer and use it in GitHub Desktop.
Save brainstorm/1226592 to your computer and use it in GitHub Desktop.
md5 + tar for illumina hiseq 2000 datasets, with extra metadata tar
#!/bin/sh
# Generates a TAR archive while computing MD5 checksum for each file.
# From VeriTAR: http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/
DATASET_PATH=`readlink -f $1`
DATASET=`basename $DATASET_PATH`
EXCLUDE_FILES="--exclude .DS_Store --exclude .AppleDouble"
echo "Creating TAR + MD5 for dataset: $DATASET"
tar -cvpf $DATASET.tar $DATASET_PATH $EXCLUDE_FILES \
| xargs -I '{}' sh -c "test -f '{}' && md5sum '{}'" | tee $DATASET.md5 > /dev/null
# There's a need to have some metadata handy.
# Untarring a whole dataset takes ~20min, needed a faster way
echo "Creating subtar dataset ${DATASET}_meta.tar..."
SUBTAR="
RunInfo.xml
runParameters.xml
InterOp
Data/Status.htm
Data/Status_Files
Data/reports
"
for meta in $SUBTAR
do
tar -upf ${DATASET}_meta.tar $DATASET/$meta $EXCLUDE_FILES
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment