Skip to content

Instantly share code, notes, and snippets.

@nathanhaigh
Created July 11, 2013 05:24
Show Gist options
  • Save nathanhaigh/5972725 to your computer and use it in GitHub Desktop.
Save nathanhaigh/5972725 to your computer and use it in GitHub Desktop.
Example script for downloading (using aspera) and extracting paired-end data from the SRA and performing parallel compression using pigz.
#!/bin/bash
# Bash script to download a bunch of *.sra files from the NCBI SRA, using
# the aspera client, and extract FASTQ data using the SRA Toolkit.
max_bandwidth_mbps=50
# These SRA files are for the durum genome
files=(
'SRR567512.sra'
'SRR567559.sra'
'SRR567563.sra'
'SRR570310.sra'
'SRR567544.sra'
'SRR567549.sra'
'SRR567552.sra'
)
for file in "${files[@]}"; do
echo "${file}"
~/.aspera/connect/bin/ascp -i ~/.aspera/connect/etc/asperaweb_id_dsa.putty -k1 -QTr -l${max_bandwidth_mbps}m anonftp@ftp-trace.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/${file:0:3}/${file:0:6}/${file%.sra}/${file} ./
if [[ ! -e ${file%.sra}.aspx && ! -e ${file%.sra}.fastq.gz ]]; then
# only process sra file that have completed downloading (i.e. no *.aspx file) and for which no fastq.gz file exists
echo -n " Extracting data into FASTQ format ... "
# Convert SRA to FASTQ and change the formatting of the output to reduce disk space and be consistent with normall Illumina read naming i.e. /1 and /2 suffixes
/home/nhaigh/bioinf/sratoolkit.2.2.2a-ubuntu32/bin/fastq-dump --split-spot --stdout --readids --defline-seq '@$ac.$si/$ri' --defline-qual '+' ${file} | pigz --best --processes 10 > ${file%.sra}.fastq.gz
echo "DONE"
else
echo " Skipping"
fi
done
@tomck
Copy link

tomck commented Jul 22, 2016

Might it make more sense to use a while read loop and have the user running the script specify the text file list of SRR runs as an argument?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment