Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Example script for downloading (using aspera) and extracting paired-end data from the SRA and performing parallel compression using pigz.
# Bash script to download a bunch of *.sra files from the NCBI SRA, using
# the aspera client, and extract FASTQ data using the SRA Toolkit.
# These SRA files are for the durum genome
for file in "${files[@]}"; do
echo "${file}"
~/.aspera/connect/bin/ascp -i ~/.aspera/connect/etc/asperaweb_id_dsa.putty -k1 -QTr -l${max_bandwidth_mbps}m${file:0:3}/${file:0:6}/${file%.sra}/${file} ./
if [[ ! -e ${file%.sra}.aspx && ! -e ${file%.sra}.fastq.gz ]]; then
# only process sra file that have completed downloading (i.e. no *.aspx file) and for which no fastq.gz file exists
echo -n " Extracting data into FASTQ format ... "
# Convert SRA to FASTQ and change the formatting of the output to reduce disk space and be consistent with normall Illumina read naming i.e. /1 and /2 suffixes
/home/nhaigh/bioinf/sratoolkit.2.2.2a-ubuntu32/bin/fastq-dump --split-spot --stdout --readids --defline-seq '@$ac.$si/$ri' --defline-qual '+' ${file} | pigz --best --processes 10 > ${file%.sra}.fastq.gz
echo "DONE"
echo " Skipping"

tomck commented Jul 22, 2016

Might it make more sense to use a while read loop and have the user running the script specify the text file list of SRR runs as an argument?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment