Created July 11, 2013 05:24
Example script for downloading (using aspera) and extracting paired-end data from the SRA and performing parallel compression using pigz.
# Bash script to download a bunch of *.sra files from the NCBI SRA, using
# the aspera client, and extract FASTQ data using the SRA Toolkit.
# These SRA files are for the durum genome
for file in "${files[@]}"; do
echo "${file}"
~/.aspera/connect/bin/ascp -i ~/.aspera/connect/etc/asperaweb_id_dsa.putty -k1 -QTr -l${max_bandwidth_mbps}m${file:0:3}/${file:0:6}/${file%.sra}/${file} ./
if [[ ! -e ${file%.sra}.aspx && ! -e ${file%.sra}.fastq.gz ]]; then
# only process sra file that have completed downloading (i.e. no *.aspx file) and for which no fastq.gz file exists
echo -n " Extracting data into FASTQ format ... "
# Convert SRA to FASTQ and change the formatting of the output to reduce disk space and be consistent with normall Illumina read naming i.e. /1 and /2 suffixes
/home/nhaigh/bioinf/sratoolkit.2.2.2a-ubuntu32/bin/fastq-dump --split-spot --stdout --readids --defline-seq '@$ac.$si/$ri' --defline-qual '+' ${file} | pigz --best --processes 10 > ${file%.sra}.fastq.gz
echo "DONE"
echo " Skipping"
tomck commented Jul 22, 2016

Might it make more sense to use a while read loop and have the user running the script specify the text file list of SRR runs as an argument?

