Skip to content

Instantly share code, notes, and snippets.

@rpetit3
Created November 10, 2021 16:28
Show Gist options
  • Save rpetit3/9d11dcb1ad7a7b0f4264c7213a62fa26 to your computer and use it in GitHub Desktop.
Save rpetit3/9d11dcb1ad7a7b0f4264c7213a62fa26 to your computer and use it in GitHub Desktop.
Quick script to compare two versions of sra-human-scrubber
#! /bin/bash
SCRUBBER_DB=${SCRUBBER_SHARE}/data/human_filter.db
# Current scrubber
echo "Run test on current Scrubber"
which scrub.sh
scrub.sh test
echo "Run test on PR Scrubber"
sra-human-scrubber/scripts/scrub.sh -t -d ${SCRUBBER_SHARE}/data/human_filter.db
echo "Compare scrubbers"
printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n" "fastq" "input_size" "current_runtime" "gunzip_runtime" "scrub_runtime" "gzip_runtime" "pr_runtime" "current_md5" "pr_md5" "md5s_equal"
rm -rf current_scrubber pr_scrubber
mkdir current_scrubber pr_scrubber
for f in fastqs/*.fastq.gz; do
file_size=$(du ${f} -b | cut -f 1)
file_name="$(basename ${f%%.*})"
# Test urrent scrubber
start=`date +%s`
start_gunzip=`date +%s`
gunzip -c ${f} > current_scrubber/${file_name}.fastq
end_gunzip=`date +%s`
start_scrub=`date +%s`
scrub.sh current_scrubber/${file_name}.fastq
end_scrub=`date +%s`
start_gzip=`date +%s`
gzip current_scrubber/${file_name}.fastq.clean
end_gzip=`date +%s`
rm current_scrubber/${file_name}.fastq
end=`date +%s`
current_runtime=$((end-start))
current_gunzip=$((end_gunzip-start_gunzip))
current_scrub=$((end_scrub-start_scrub))
current_gzip=$((end_gzip-start_gzip))
current_fastq="current_scrubber/${file_name}.fastq.clean.gz"
# test PR scrubber
pr_fastq="pr_scrubber/${file_name}.clean.fastq.gz"
start=`date +%s`
zcat ${f} | sra-human-scrubber/scripts/scrub.sh -d ${SCRUBBER_SHARE}/data/human_filter.db | gzip > ${pr_fastq}
end=`date +%s`
pr_runtime=$((end-start))
# test md5s of fastq contents
current_md5=$(zcat ${current_fastq} | md5sum | cut -d " " -f 1)
pr_md5=$(zcat ${pr_fastq} | md5sum | cut -d " " -f 1)
md5s_equal="false"
if [ "${current_md5}" == "${pr_md5}" ]; then
md5s_equal="true"
fi
printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n" "${f}" "${file_size}" "${current_runtime}" "${current_gunzip}" "${current_scrub}" "${current_gzip}" "${pr_runtime}" "${current_md5}" "${pr_md5}" "${md5s_equal}"
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment