Skip to content

Instantly share code, notes, and snippets.

@martijnvermaat
Last active November 11, 2020 13:09
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save martijnvermaat/4619109 to your computer and use it in GitHub Desktop.
Save martijnvermaat/4619109 to your computer and use it in GitHub Desktop.
Download SRA run files for given sample names

Download SRA run files for given sample names

Have all sample names in samples, one per line and run:

for sample in $(cat samples); do
    IFS=$'\n'
    for line in $(./sra-runs.py $sample); do
        echo $sample $line >> runs
    done
    unset IFS
done

To download the run files, run:

for file in $(cut -d ' ' -f 3 runs); do
    wget "$file"
done

Extract to FASTQ with:

for file in *.sra; do
    fastq-dump $file
done
F11Aptl
F11Aptr
F11Ewxl
F11Ewxr
#!/usr/bin/env python
"""
Get run accession numbers and SRA download urls for given SRA sample name.
2013 Martijn Vermaat <m.vermaat.hg@lumc.nl>
"""
import argparse
import sys
import lxml.etree
from Bio import Entrez
#Entrez.email = 'your email address'
def error(message):
sys.stderr.write(message + '\n')
sys.exit(1)
def get_runs(sample):
handle = Entrez.esearch(db='sra', term=sample)
record = Entrez.read(handle)
if not len(record['IdList']) == 1:
error('Found %d entries in SRA for "%s" instead of the expected 1'
% (len(record['IdList']), sample))
result = record['IdList'][0]
handle = Entrez.efetch(db='sra', id=result)
entry = lxml.etree.parse(handle)
return entry.xpath('//EXPERIMENT_PACKAGE_SET/EXPERIMENT_PACKAGE/RUN_SET/RUN/@accession')
def get_url(run):
# ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/ERR/ERR006/ERR006600/ERR006600.sra
url_template = 'ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/{leading3}/{leading6}/{all}/{all}.sra'
return url_template.format(leading3=run[:3], leading6=run[:6], all=run)
def main(sample):
for run in get_runs(sample):
print '%s %s' % (run, get_url(run))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__.split('\n\n')[0])
parser.add_argument('sample', metavar='SAMPLE',
help='SRA sample name (or any search term)')
args = parser.parse_args()
main(args.sample)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment