Last active
August 29, 2015 14:15
-
-
Save mfcovington/bb3f4b53683ab4219bb2 to your computer and use it in GitHub Desktop.
Extract sequences from a CDS FASTA file and translate them
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Required files: | |
# - from https://github.com/mfcovington/extract_genotype_specific_seq (commit 0867551092) | |
# - parse_fasta-lite.pl | |
# - from https://github.com/mfcovington/fasta-manipulation (v0.1.4) | |
# - amino_acid_translation.pm | |
# - translate_cds_fasta.pl | |
FASTA_DIR= # Directory w/ FASTA files | |
SPECIES_LIST= # A file w/ species IDs to extract one ID per line | |
OUT_DIR= # Output directory | |
TRANSLATE_PATH= # Path to translate_cds_fasta.pl (amino_acid_translation.pm should be in same directory) | |
PARSE_PATH= # Path to parse_fasta-lite.pl | |
mkdir -p $OUT_DIR | |
cd $FASTA_DIR | |
for FASTA in *.fa; do | |
$PARSE_PATH $SPECIES_LIST $FASTA # Extract the sequences you want | |
$TRANSLATE_PATH $SPECIES_LIST.$FASTA $OUT_DIR/aa.$FASTA # Translate the sequences | |
rm $SPECIES_LIST.$FASTA # Remove the CDS version of the extracted sequences | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment