Skip to content

Instantly share code, notes, and snippets.

@kloetzl
Created February 23, 2018 09:36
Show Gist options
  • Save kloetzl/3d0474f607e592ce658f1c2942a75037 to your computer and use it in GitHub Desktop.
Save kloetzl/3d0474f607e592ce658f1c2942a75037 to your computer and use it in GitHub Desktop.
#!/usr/bin/zsh
wget "ftp://ftp.ensemblgenomes.org/pub/bacteria/release-36/species_EnsemblBacteria.txt"
mv species_EnsemblBacteria.txt species.txt
# |head -n 2
ECOLI=("${(@f)$(grep 'Escherichia coli' species.txt | cut -f 2,5)}")
for line in $ECOLI; do
NAME=$(echo "$line" | cut -f 1)
ASM=$(echo "$line" | cut -f 2 | tr ' #/:' '____')
FILE_NAME="E${NAME#e}.$ASM.dna.toplevel.fa.gz"
if [[ ! -e "$FILE_NAME" ]]; then
FILE_URL=$(curl -s -L -o- "http://bacteria.ensembl.org/$NAME/Info/Index/" |
grep ftp |
grep -oPe "ftp://ftp.ensemblgenomes.org/pub/bacteria/release-36/fasta/[^\"]*" |
head -n 1 |
paste -d '' - <(echo "E${NAME#e}.$ASM.dna.toplevel.fa.gz"))
echo "$NAME \t$FILE_URL"
wget -nv "$FILE_URL"
fi
done
## broken species:
# escherichia_coli_gca_001499595
# escherichia_coli_e1728
# escherichia_coli_k_12_gca_000981485
# escherichia_coli_o26_h11
# deal with synthetic explicitly
wget -nv "ftp://ftp.ensemblgenomes.org/pub/bacteria/release-36/fasta/bacteria_50_collection/synthetic_escherichia_coli_c321_deltaa/dna/Synthetic_escherichia_coli_c321_deltaa.ASM47403v1.dna.toplevel.fa.gz"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment