Skip to content

Instantly share code, notes, and snippets.

@Juke34
Last active April 6, 2020 10:15
Show Gist options
  • Save Juke34/92c5676470cf6d55f3edf351a157dcb1 to your computer and use it in GitHub Desktop.
Save Juke34/92c5676470cf6d55f3edf351a157dcb1 to your computer and use it in GitHub Desktop.
Manipulating fasta
#filter out fasta sequence by pattern in header
awk '/^>/ {P=index($0,"STRING")==0} {if(P) print} ' in.fasta > out.fasta
#keeping fasta sequence based on pattern in header
awk '/^>/ {ok=index($0,"Escherichia coli");} {if(ok) print;}' in.fasta
# removing empty records (e.g. seq1 here)
#>seq1
#
#>seq2
#ATGCATGCATAGC
awk -v RS=">" -v FS="\n" -v ORS="" ' { if ($2) print ">"$0 } ' genome.fa > genome.ok.fa
# removing blank lines (e.g. blank line in seq1 here)
#>seq1
#ATGCATGCATAGC
#[[:BLANK:]]
#>seq2
#ATGCATGCATAGC
grep -v '^$' genome.fa > genome.ok.fa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment