Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@crazyhottommy
Created April 27, 2017 19:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save crazyhottommy/b9f6ee0122538b69876fad090cb7211b to your computer and use it in GitHub Desktop.
Save crazyhottommy/b9f6ee0122538b69876fad090cb7211b to your computer and use it in GitHub Desktop.

From Mike Love:https://gist.github.com/mikelove/f539631f9e187a8931d34779436a1c01

An R implementation of the rule:

Archive generated fastq files are organised by run accession number under vol1/fastq directory in ftp.sra.ebi.ac.uk:

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/[/]/

is the first 6 letters and numbers of the run accession ( e.g. ERR000 for ERR000916 ),

does not exist if the run accession has six digits.

For example, fastq files for run ERR000916 are in directory: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/.

If the run accession has seven digits then the is 00 + the last digit of the run accession.

For example, fastq files for run SRR1016916 are in directory: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR101/006/SRR1016916/.

If the run accession has eight digits then the is 0 + the last two digits of the run accession.

If the run accession has nine digits then the is the last three digits of the run accession.

http://www.ebi.ac.uk/ena/browse/read-download#downloading_files_ena_browser

./stream_ena SRR3185782.fastq | head
@SRR3185782.1 HWI-D00361:180:HJG3GADXX:2:1101:1460:2181/1
AGTGTGTTCATCAGTGTGGATTTGCCAATGCCGGTCTCCCCCACACAGAG
+
BBBFFBFFFB<FFFFFBFF<FFFFFFFFFFFFFIIIIFFFFFFFFIFFFF
@SRR3185782.2 HWI-D00361:180:HJG3GADXX:2:1101:1613:2218/1
GCCAATTTTCTTAATGTAAGTGCTGACTTCCTTAACAATTTCCTCATATC
+
BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@SRR3185782.3 HWI-D00361:180:HJG3GADXX:2:1101:2089:2243/1
CGGGTTCTTGGACTTCAGCCAGTTGAGCAGGGCATCCTTGTTGAAGGCGG


salmon quant -l IU \
-i Homo_sapiens.GRCh38.78.cdna_ERCC_repbase.fa \
-r <(./stream_ena SRR3185782.fastq) -o SRR3185782

salmon quant -l IU \
-i Homo_sapiens.GRCh38.78.cdna_ERCC_repbase.fa \
-1 <(./stream_ena SRR1274127_1.fastq) \
-2 <(./stream_ena SRR1274127_2.fastq) -o SRR1274127

./stream_ena SRR1274127_1.fastq | fastqc -o SRR1274127_1_fastqc -f fastq stdin

Very cool!!!!!

#/bin/bash
# from http://www.nxn.se/valent/streaming-rna-seq-data-from-ena
fastq="$1"
prefix=ftp://ftp.sra.ebi.ac.uk/vol1/fastq
accession=$(echo $fastq | tr '.' '_' | cut -d'_' -f 1)
dir1=${accession:0:6}
a_len=${#accession}
if (( $a_len == 9 )); then
dir2="";
elif (( $a_len == 10 )); then
dir2=00${accession:9:1};
elif (( $a_len == 11)); then
dir2=0${accession:9:2};
else
dir2=${accession:9:3};
fi
url=$prefix/$dir1/$dir2/$accession/$fastq.gz
curl --keepalive-time 4 -s $url | zcat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment