Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shenwei356/5daa3718a753a4005f442e5f8e1cacdf to your computer and use it in GitHub Desktop.
Save shenwei356/5daa3718a753a4005f442e5f8e1cacdf to your computer and use it in GitHub Desktop.
Downloading genome annotation files from NCBI ftp with given FTP URL list

URL list

$ head choose_ftp.txt
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/015/405/GCA_000015405.1_ASM1540v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/620/625/GCA_000620625.1_ASM62062v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/972/925/GCA_000972925.1_ASM97292v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/021/385/GCA_001021385.1_ASM102138v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/328/565/GCA_000328565.1_ASM32856v1

Target files

  • genome: *_genomic.fna.gz.
  • protein: *_protein.faa.gz, may be missing.
  • gff: *_genomic.gff.gz.
  • genbank: *_genomic.gbff.gz.

Downloading using wget and rush or GNU parallel in parallel

# rush
$ cat choose_ftp.txt | rush -j 6 -c --verbose \
    'mkdir -p {%}; cd {%}; \
    for s in _genomic.fna.gz _protein.faa.gz _genomic.gff.gz _genomic.gbff.gz; do \
    wget -c -q {}/{%}$s; done'

# parallel
$ cat choose_ftp.txt | parallel -j 1 --joblog log --verbose \
    'mkdir -p {/}; cd {/}; for s in _genomic.fna.gz _protein.faa.gz _genomic.gff.gz _genomic.gbff.gz; do wget -c -q {}/{/}$s; done'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment