Skip to content

Instantly share code, notes, and snippets.

@mfcovington
Last active August 29, 2015 14:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mfcovington/bb3f4b53683ab4219bb2 to your computer and use it in GitHub Desktop.
Save mfcovington/bb3f4b53683ab4219bb2 to your computer and use it in GitHub Desktop.
Extract sequences from a CDS FASTA file and translate them
# Required files:
# - from https://github.com/mfcovington/extract_genotype_specific_seq (commit 0867551092)
# - parse_fasta-lite.pl
# - from https://github.com/mfcovington/fasta-manipulation (v0.1.4)
# - amino_acid_translation.pm
# - translate_cds_fasta.pl
FASTA_DIR= # Directory w/ FASTA files
SPECIES_LIST= # A file w/ species IDs to extract one ID per line
OUT_DIR= # Output directory
TRANSLATE_PATH= # Path to translate_cds_fasta.pl (amino_acid_translation.pm should be in same directory)
PARSE_PATH= # Path to parse_fasta-lite.pl
mkdir -p $OUT_DIR
cd $FASTA_DIR
for FASTA in *.fa; do
$PARSE_PATH $SPECIES_LIST $FASTA # Extract the sequences you want
$TRANSLATE_PATH $SPECIES_LIST.$FASTA $OUT_DIR/aa.$FASTA # Translate the sequences
rm $SPECIES_LIST.$FASTA # Remove the CDS version of the extracted sequences
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment