Skip to content

Instantly share code, notes, and snippets.

@lindenb
Last active June 23, 2023 19:41
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save lindenb/2c0d4e11fd8a96d4c345 to your computer and use it in GitHub Desktop.
Save lindenb/2c0d4e11fd8a96d4c345 to your computer and use it in GitHub Desktop.
How to linearize a FASTA sequence using awk.

Linearize a fasta sequence

awk -f linearizefasta.awk < input.fa

or

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < input.fa

Format back to fasta

tr "\t" "\n" < linearized.tsv

if you know your fasta header have a length < 60

tr "\t" "\n" < linearized.tsv | fold -w 60
/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;}
{printf("%s",$0);}
END {printf("\n");}
@upendrak
Copy link

Thanks. I find it quite useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment