Skip to content

Instantly share code, notes, and snippets.

@jrjhealey
Created November 27, 2017 14:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jrjhealey/10cd6b600c32c51c7a4f182386441c5a to your computer and use it in GitHub Desktop.
Save jrjhealey/10cd6b600c32c51c7a4f182386441c5a to your computer and use it in GitHub Desktop.
A useful pure bash construct for dealing with FASTA files. Can be tweaked to perform all sorts of actions on the headers of sequences (e.g. rearrangement, regex, text matching)
#!/bin/bash
#### Print out all fastas ending in a certain string. ####
# Change *"$string" to *"$string"* to find containing,
# or "$string"* to find starts with.
file="$1"
string="$2"
while read line ; do
if [ ${line:0:1} == ">" ] ; then
header="$line"
else
seq="$line"
if [[ "$header" == *"$string" ]] ; then
echo -e "$header""\n""$seq"
fi
fi
done < "$file"
#### Print out all fastas but custom re-order their headers based on delimiter. ####
# Alter the "$header[0,1,2,3...] order in echo to change the order the fields are output.
file="$1"
delimiter="$2"
while read line ; do
if [ ${line:0:1} == ">" ] ; then
IFS="$delimiter" read -a header <<< "$line"
else
seq="$line"
echo -e "${header[0]}"\|"${header[2]}"\|"${header[1]}""\n""$seq"
fi
done < "$file"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment