Skip to content

Instantly share code, notes, and snippets.

@iracooke
Last active August 29, 2015 14:06
Show Gist options
  • Save iracooke/fd0dc07b5e342480e487 to your computer and use it in GitHub Desktop.
Save iracooke/fd0dc07b5e342480e487 to your computer and use it in GitHub Desktop.
NCBI SRA

Download files from SRA

This gist contains a few snippets of code to help automate downloading data using the NCBI SRA Toolkit.

  1. Use the NCBI website to export a csv file with the details of experiment you wish to download. This file will contain just experiment accession numbers.
  2. Use R to convert experiment accessiont to run accessions. Save these in runs.txt
  3. Use this command to download all the files
  cat runs.txt | ./fetchall.sh

This will probably take a very long time so its a good idea to wrap this in a batch job script

while read p; do
fastq-dump $p
done
# Example showing how to translate experiment accessions into run accessions needed for download
library(SRAdb)
library(stringr)
# This downloads a 7gb sqlite file
sqlfile = getSRAdbFile()
dbcon = dbConnect('SQLite',sqlfile)
# Cephalopods.csv is a file of exported experiments from NCBI
exp_data = read.csv("Cephalopods.csv",stringsAsFactors=FALSE)
exp_acc = str_match(exp_data$Id,'accession:(.*)')[,2]
run_acc = sraConvert(exp_acc,'run',dbcon)[,2]
write.table(run_acc,file="runs.txt",quote=FALSE,row.names = FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment