Skip to content

Instantly share code, notes, and snippets.

@danielecook
Last active June 21, 2019 07:26
Show Gist options
  • Save danielecook/1fe7c42ded1e05fabe35 to your computer and use it in GitHub Desktop.
Save danielecook/1fe7c42ded1e05fabe35 to your computer and use it in GitHub Desktop.
Download data from the sequence read archive and convert to fastq format
Download_SRP_Runs() {
SRP_IDs=`esearch -db sra -query $1 | efetch -format docsum | xtract -pattern DocumentSummary -element Run@acc | tr '\t' '\n'`
for r in ${SRP_IDs}; do
url="ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/${r:0:6}/${r}/${r}.sra"
wget $url
done;
}
Download_SRP_Runs <SRP ID GOES HERE>
# Convert to fastq
parallel fastq-dump --split-files --gzip {} ::: *.sra
# Perform quality control
parallel fastqc {} ::: *.fastq.gz
@adomingues
Copy link

Just to point out that the way the ${url} is being constructed will fail in some instances. For example SRR1049814 will generate a url that does not exist:

r='SRR1049814'
url="ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/${r:0:6}/${r:0:9}/${r}.sra"
echo $url
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR104/SRR104981/SRR1049814.sra

--2015-12-02 13:26:51-- ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR104/SRR104981/SRR1049814.sra
=> SRR1049814.sra' Resolving ftp-trace.ncbi.nih.gov (ftp-trace.ncbi.nih.gov)... 2607:f220:41e:250::13, 130.14.250.11 Connecting to ftp-trace.ncbi.nih.gov (ftp-trace.ncbi.nih.gov)|2607:f220:41e:250::13|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /sra/sra-instant/reads/ByRun/sra/SRR/SRR104/SRR104981 ... No such directorysra/sra-instant/reads/ByRun/sra/SRR/SRR104/SRR104981'.

The problem lays in in this bit: ${r:0:9} that truncates this SRR. According to the manual, it should be the full accession number rather than a truncated portion:

/sra/sra-instant/reads/ByRun/sra/{SRR|ERR|DRR}/<first 6 characters of accession>/<accession>/.sra

The solution is small change in the construction of the url:

url="ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/${r:0:6}/${r}/${r}.sra"
echo $url

Your script does of course work for most cases, I just happened to stumble across the SRR that breaks it :)

@danielecook
Copy link
Author

url="ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/${r:0:6}/${r}/${r}.sra"

Just seeing this but thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment