Created
November 27, 2017 14:13
-
-
Save jrjhealey/10cd6b600c32c51c7a4f182386441c5a to your computer and use it in GitHub Desktop.
A useful pure bash construct for dealing with FASTA files. Can be tweaked to perform all sorts of actions on the headers of sequences (e.g. rearrangement, regex, text matching)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
#### Print out all fastas ending in a certain string. #### | |
# Change *"$string" to *"$string"* to find containing, | |
# or "$string"* to find starts with. | |
file="$1" | |
string="$2" | |
while read line ; do | |
if [ ${line:0:1} == ">" ] ; then | |
header="$line" | |
else | |
seq="$line" | |
if [[ "$header" == *"$string" ]] ; then | |
echo -e "$header""\n""$seq" | |
fi | |
fi | |
done < "$file" | |
#### Print out all fastas but custom re-order their headers based on delimiter. #### | |
# Alter the "$header[0,1,2,3...] order in echo to change the order the fields are output. | |
file="$1" | |
delimiter="$2" | |
while read line ; do | |
if [ ${line:0:1} == ">" ] ; then | |
IFS="$delimiter" read -a header <<< "$line" | |
else | |
seq="$line" | |
echo -e "${header[0]}"\|"${header[2]}"\|"${header[1]}""\n""$seq" | |
fi | |
done < "$file" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment