Skip to content

Instantly share code, notes, and snippets.

@konrad
Created January 7, 2016 20:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save konrad/a00b96b1d84c2f9b5e97 to your computer and use it in GitHub Desktop.
Save konrad/a00b96b1d84c2f9b5e97 to your computer and use it in GitHub Desktop.
# Problem: You have a NCBI GEO accession and would like to get the URL of the SRA file that contains the sequencing data.
# The sed command that removes the last characer of the string is essential as there is a invisible character that messes up the
# downstream steps otherwise.
GEO_ACCESSION="GSM1655353" # set you GEO accession here
SRA_FTP_URL=$(curl "http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=${GEO_ACCESSION}&targ=self&form=text&view=brief" 2>/dev/null | grep ftp-trace.ncbi.nlm.nih.gov | cut -c 32-| sed 's/.$//')
FTP_SUB_FOLDER=$(ncftpls ${SRA_FTP_URL}/)
SRA_FILE=$(ncftpls ${SRA_FTP_URL}/${FTP_SUB_FOLDER}/)
echo $GEO_ACCESSION ${SRA_FTP_URL}/${FTP_SUB_FOLDER}/${SRA_FILE}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment