Skip to content

Instantly share code, notes, and snippets.

@markziemann
Created February 2, 2024 03:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save markziemann/ed8236ad736e3f186c851364ddd0beee to your computer and use it in GitHub Desktop.
Save markziemann/ed8236ad736e3f186c851364ddd0beee to your computer and use it in GitHub Desktop.
Run a simple BLAST workflow. Prerequisites: blast+ (NCBI), emboss, unwrap_fasta.pl
#!/bin/bash
# Download
URL="ftp://ftp.ensemblgenomes.org/pub/bacteria/release-42/fasta/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655/cds/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.cds.all.fa.gz"
# unzip
if [[ ! -r $FA ]] ; then
wget -N $URL
gunzip -kf $FA.gz
fi
# extract a few sequences
# requires unwrap_fasta.pl
cut -d ' ' -f1 $FA \
| perl unwrap_fasta.pl - - \
| paste - - | shuf | head -100 | tr '\t' '\n' | tee sample_named.fa \
| grep -v '>' | nl -n ln | sed 's/^/>/' | tr '\t' '\n' > sample.fa
# in case we need to reindex the db
if [[ ! -r $FA.ndb ]] ; then
formatdb -p F -o T -i $FA
fi
# incorporate some mismatches
# it may generate some error output but actually works (check the output)
msbar -sequence sample.fa -count 100 -point 4 -block 0 -codon 0 -outseq sample_mutated.fa
# run the blastn
blastn -outfmt 6 -evalue 0.001 -db $FA -query sample_mutated.fa > blast_results.tsv
@markziemann
Copy link
Author

sudo apt install blast+
sudo apt install emboss
unwrap_fasta.pl downloaded from https://chk.ipmb.sinica.edu.tw/wiki/doku.php/tutorials/perl/unwrap_fasta.pl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment