Skip to content

Instantly share code, notes, and snippets.

@lgautier
Last active January 30, 2017 04:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lgautier/fb9e18636cd1a497ecc4fad8c440aaec to your computer and use it in GitHub Desktop.
Save lgautier/fb9e18636cd1a497ecc4fad8c440aaec to your computer and use it in GitHub Desktop.
Demo/benchmark mashing-pumpkins to build minhash sketches
#!/bin/sh
echo
echo '---------------'
echo ' test w/ FASTA '
echo '---------------'
testchrom=chr1.fa.gz
if [ ! -f "${testchrom}" ]; then
wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/${testchrom};
fi
testfile=${testchrom}
echo '- mashing-pumpkins'
time python -m mashingpumpkins.demo.cmdline --format=FASTA --ncpu=3 --maxsize=500 ${testfile}
mv ${testfile}.sig.json ${testfile}_mashingpumpkins.sig.json
echo '---'
echo '- sourmash'
time sourmash compute --force --dna -o ${testfile}_sourmash.sig.json ${testfile}
echo '***'
echo 'The same signature, just faster ?'
sourmash search ${testfile}_mashingpumpkins.sig.json ${testfile}_sourmash.sig.json
echo
echo '---------------'
echo ' test w/ FASTQ '
echo '---------------'
testfastq=DRR013190.fastq.gz
ksize=31
maxsize=1000
if [ ! -f "${testfastq}" ]; then
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/DRR013/DRR013190/${testfastq}
fi
testfile=${testfastq}
echo '- mashing-pumpkins'
for ncpu in 2 3 4
do
echo ' Using ' ${ncpu} ' cores.'
time python -m mashingpumpkins.demo.cmdline --ncpu=${ncpu} --format=FASTQ --ksize=${ksize} --maxsize=${maxsize} ${testfile};
echo ' --'
done
mv ${testfile}.sig.json ${testfile}_mashingpumpkins.sig.json
echo '---'
echo '- sourmash'
time sourmash compute --force --dna -n ${maxsize} --k ${ksize} -o ${testfile}_sourmash.sig.json ${testfile}
echo '***'
echo 'The same signature, just faster ?'
sourmash search --k ${ksize} -n ${maxsize} --dna --no-protein ${testfile}_mashingpumpkins.sig.json ${testfile}_sourmash.sig.json
echo ' Using faster parser and ' ${ncpu} ' cores.'
time python -m mashingpumpkins.demo.cmdline --ncpu=3 --parser=fastqandfurious --format=FASTQ --ksize=${ksize} --maxsize=${maxsize} ${testfile};
echo ' --'
echo
echo '----------------------'
echo ' test w/ larger FASTQ '
echo '----------------------'
testfastq=DRR065801.fastq.gz
if [ ! -f "${testfastq}" ]; then
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/DRR065/DRR065801/${testfastq}
fi
testfile=${testfastq}
echo '- mashing-pumpkins'
time python -m mashingpumpkins.demo.cmdline --ncpu=3 --format=FASTQ --ksize=${ksize} --maxsize=${maxsize} ${testfile};
echo ' Using faster parser'
time python -m mashingpumpkins.demo.cmdline --ncpu=3 --parser=fastqandfurious --format=FASTQ --ksize=${ksize} --maxsize=${maxsize} ${testfile};
echo ' --'
echo '---'
echo '- sourmash'
time sourmash compute --force --dna -n ${maxsize} --k ${ksize} -o ${testfile}_sourmash.sig.json ${testfile}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment