Last active
August 29, 2015 14:17
-
-
Save TATABOX42/987dcb18d006f64054f1 to your computer and use it in GitHub Desktop.
Code used for post: http://btovar.com/2015/03/trim-fasta-header/
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>gi|1620934|emb|Z80230.1| Calliphora vicina trp gene | |
AAAAGTTTAAATTGGATAAATTGCAAAAGGACAATTAAGGATACGGAATATATGCGTAGTTTGTGTAAAA | |
TGCGCTTATAGAAACACAGAAAAAAAAAATAAAAACGGATAAATCTTTAGAAACAATAAACACTAGCTTA | |
AAAATTAAAAGCAAAACAAACAATAAAACATGGCAACTGATCCGGAAAAAGGGAAAAATGAGGAAGAAAA | |
CTATAATATACAGTTTGCAGATGAATACGTGTTGACGGAGACAGAGAAAACCTTTATATTGGCTTGTGAG | |
CGCGGTGACATAGCAAGTGTCAAGGTAATAATTGAGGAAAATAAAGGTGCACCGGAAAAGTTTAATATTA | |
ATTGTGTTGATCCCATGAATCGTTCGGCCTTAATATCAGCCATTGAAAATGAAAATTTTGATTTAATGAT | |
TGTACTGTTGGAGGAAGGCATAGATGTGGGCGATGCATTGTTGCATGCTATTTCTGAAGAATATGTGGAG | |
GCTGTGGAGGAACTGTTGCAATGGGAAGAAACGCATCATAAGGAGGGTACACCATATAGTTGGGAGGCAG | |
TTGATCGTTCGAAATCGACATTTACGCCTGATATAACGCCTCTAATATTGGCAGCACATCGTAACAATTA | |
CGAAATTTTAAAAATTCTATTGGATCGTGGTGCCACATTACCAATGCCTCATGATGTCAAATGCGGTTGC | |
GATGAGTGTGTAACATCACAAGAAACCGACTCCTTGCGTCACTCTCAGTCGCGTATTAATGCTTTTCGTG | |
CTCTTTCGGCTAGCTCACTGATATCTCTTAGTTCACGTGATCCCGTCCTAACAGCTTTTGAACTCGCATG | |
GGAATTGAAACGTTTGCAGGCAATGGAATCAGAATTTCGTGCTGAATATGGAGAAATGCGTCATGGTGTC | |
CAAGAATTTGTTACATCTTTACTAGATCATGCCAGAACCTCAACCGAACTGGAAGTAATGTTAAACTTTA | |
ATCATGAAGCTTCCAATGACATTTGGACTCCTGGACAAAGGCAGACATTGGAGAGACTGAAATTGGCCAT | |
TAAGTATAAGCAAAAGACGTTTGTGGCTCATCCCAATGTACAACAACTATTGGCTGCCATTTGGTATGAA | |
GGTTTGCCCGGATTTCGACGTAAAAAAAGTTCTCAACAAATTTTAGAAGTTATAAAATTGGGTTGCATGT | |
TTCCCATTAACAGTATGAATTATCTTATGGCTCCCGAATCGGATGCTGGCAAATTTATGAGAAAACCTTT | |
CGTTAAATTCATTACACATTCATGCTCCTACATGTTCTTTTTGATGTTACTTGGTGCTGCTTCATTGCGT | |
GTTGTTCAAATAACATTTGAGTTGTTAGCGTTCCCATGGATGATTGAAATGTTGGACGATTGGCGTAAAC | |
ATGAAAGAGGCTCTTTGCCGGGACCTATTGAATTAGGTATCATTACGTACATTTCAAGTTTAGTCTTAGG | |
CGAATTGAAGTCATTATATTCGGATGGTTTGTTTGATTACATTATGGATCTATGGAATATTGTGGATTTC | |
ATTTCGAATATGTTTTATGTCACCTGGATACTGTGTAGGGCCACCGCCTGGATTATAGTGCATCGTGATC | |
TCTGGTTTCGAGATATAAATCCTTATTTTCCACGAGAACATTGGCATCCGTTTGATCCCATGCTGCTGTC | |
AGAAGGTGCATTTGCCGCAGGCATGGTATTCTCTTATTTGAAACTAGTTCACATTTTCTCTATAAATCCA | |
CACCTGGGACCATTACAAGTCTCATTGGGCCGCATGATTATCGATATTATAAAGTTCTTCTTTATTTATA | |
CTTTGGTGTTGTTTGCTTTCGGCTGTGGTCTCAATCAATTACTATGGTATTATGCTGAATTGGAGAAAAA | |
TAAATGCTATCACTTGCATCCAGATGTGGCGGATTTTGATGATCAAGAGAAAGCTTGTACTATATGGAGA | |
CGATTTTCAAATCTTTTTGAAACTTCACAATCTCTATTTTGGGCCTCATTTGGTTTAGTGGACTTGGTGT | |
CATTCGATTTGGCCGGCATTAAGAGTTTCACCCGTTTCTGGGCCTTGTTGATGTTCGGCTCATATTCGGT | |
AATCAATATTATCGTATTGCTCAACATGCTTATTGCCATGATGTCCAATTCATATCAAATTATTTCTGAA | |
CGTGCCGATGTTGAATGGAAATTTGCCCGCTCTCAATTGTGGATGAGCTATTTTGAGGATGGTGGCACTG | |
TACCTCCACCATTTAATATGTTCCCCTCAGTAAAGTTGATGCGTAAGGTATTTGGCAAGCAGAGACCAAA | |
ACGTTCCAGGAGTTTTATGTTGAAATCAAGGGTTAAGGCACAATCACTGCATGAGCGTGTAATGAAGCTA | |
TTGATACGTCGTTATATTACGGCCGAACAGCGTCACAGAGATGATTTCGGCATAACCGAGGATGATATCA | |
TTGAAGTGCGTCAGGATATTAGCTCATTGCGTTTCGAACTGTTGGACATTTTTACAAATAATAAATTTGT | |
TGTACCGGATATTGAAAAGAAATCAGCAAGTGCAGCGGCTGGTAAGAAGGGTAAGACAATAGAACGTCGA | |
ATTTTGAAAGATTTCCAAATAGGTTTCGTTGAAACTCTACAAACTGAGTTAGTAAATAGTGTCGAAGAGG | |
GCAAGGATATATTCTCGTCATTGGCCAATGCTATACGCAAAAAAAGATCTCAGAAAGGCGAAAAAGATTG | |
GAATGCTATTGCTCGCAAAAATACTATGAGCTCAAATCCCATTGGTTCCAAACGTTCATCAATTCAGCGG | |
CATTCCCAACGCAGTTTGAGACGCCGCATTATAGATGAAGCAAACGAAGGTCTTAGAATGAATCAGAATC | |
AACTTATAGAATACAATCCCTCTCTTGGAGATGTTTCGAGAGCTACTCGTGTTGCATATGTTAAATTCAT | |
GAAAAAGAAATTAGTGGCTGAAGAAGGTATATCACCTGAAGAACTCAATGAATCTAATAATTCTACGGAC | |
GCTGCAGCAAAACTCGACGCTTTTGCCAGGTCCACAATGAAGAGAGTAGAGAGCAAAAAAGATGATAGTG | |
CAAGTGCTGATGACTCGAAAGCAGCTACTGACAAACCGAAAGCTCCAGTTGGAGATAAAGCAAAACCAGC | |
GGCGCCCGCTAAGCCTGGAGATGCCAAAACAGCTGATGCTAAGGCACCAGCCCCTGCTCCACCTAACAAA | |
CCAGCAGATGCTGCTGCTAAGCCAGCTGTAGCTAAGCCCGGAGCAGAAACCAAACCTGAAGCAGCAGCTA | |
AAAAGGGTGAAGCTGCTAAAACTGCGGATGCCAAACCCGAAGCACCTACTGCAGCAGCTGCAACAAAATC | |
GGCCGCGCCCGCTGCTCCCGCCAAACCCGACGCCTCAACGAAGCCCCCTGCTGATGCTGCAAAACCTGGT | |
ACAGATGCTGCTACAAAAGCAGCTGATGGTGCTGATAAGAAAGCTGACGACAAGAAAGCAGATGATAAGA | |
AACCCGATGACAAGAAACCCGATGACAAGAAACCTGATGACAAGAAACCCGATGACAAGAAGCCCGACGA | |
TAAAAAACCAGGAGCCGATGCCAAGAAACCTGATGACAAAAAAGATCCTCCCGCCAAGCCCGCAATAAAA | |
GTTGGCCAAAGTAGTGCTGCCGCCGGTGGTGATCGTGGTAAATCGGTTGTTACTGGTCGCATGATATCGG | |
GTTGGCTTTAAATTCATTTTATACCAATCATTTATTTGAGACTTGAGACAAATAAAACTGATACTTTTAA | |
AATTAAAATTGAAAAAAGAAAACTTGTAGTTAATTGGAAATCGATTATTATTTTTACGAATAAAAACTAT | |
AAGTAAATACCAACAGT | |
>gi|636806189|gb|KJ572527.1| Limulus polyphemus transient receptor potential ion channel subfamily C trp-like protein (trp) mRNA, complete cds | |
TAGAGAGAAAAGTGACTTAGTCTACGTGATACTGAGCTGGAGAGAAAGGTGACGTATCTACGTGATACTG | |
AGCTAGAGAGAAAGGTGACGTATCTACGTGATTATAGAGGTAGAGAGAAAGGTGACGTATCTACGTGATT | |
ACTGAGTAGAGAGAAAGGTGACGTATCTACGTGAATACTGAACTAGAGAGAAAAAGTGACGCATCTACGT | |
GAATACAGAACTAGAGAGAAAAAGTGACGTATCTACGTGAATACTGAACTAGAGAGAAAAAGTGACGTAT | |
CTACGTGAATACTGAGCTAGAGAGAAAAAGTGACGTATCTACGGTGAATACTAGTAGAGAGAAAAGGTGA | |
CGTATCTACGTGAATTACTGAGCTAGAGAGAAAAGGTGAAGTATCTACGTGAATACTGAGCTAGAGGGAA | |
AAGGTGACGTATCTACGTGAATTACTGAGCTAGAGAGAAAAGGTGACGTATCTACGTGAATACTGAGCTA | |
GAGAGAAAAAGTGACGAATCTACGTGAATATTGAGCTGGCTACAAGAAGTGACATAGCTGCTACATTCAG | |
TATAAAGTTGCCGACAATGGGGAAGAACAACCTTCTCGGGATCCAGTCAAGCGACTGTAAAGACACATTT | |
TCTTCATTAGCAGAACAACATGATTGTCAACTCAGTGTTTACGAAAAGAAATATATGTTATGTGCAGAGC | |
GGGGAGACATTGTTTCCGTTCGAAAACTTTTGGAGACTCACAAATCTAATCCACGTTTTAATATCAATGT | |
TTGTGATCCGCTCGGACGAACAGCCTTGGTGGTGGCTATCGAGAACGAGGATTCTGGACTAATCGAACTT | |
TTACTGAAACATGGAATACAGGCCGGAGATGGACTCTTGCATGCTATTGAAGAGGAATATGTGGAAGCTG | |
TAGAGATGTTGCTCCAGCATGAAGAGAAGGTTCATGTACCTGGACAACTCTACAGTTGGGAGAATATGCC | |
TCAAGAAATGGCCAAGTACACGCCTGACGTGACCCCTTTAATTCTAGCTGCTCATAAAGACAATTACGAG | |
ATACTGAAGCTGCTACTCAACCGTGGCGCCACCTTACCGATGCCACATGATGTACGCTGCGGATGCGATG | |
ACTGCGTAGTGGCGACCTCTAGCGACTGTTTGCGTCACTCTCGTTCTAGAATTAACGCTTATCGTGCTCT | |
CACGTCTCCATCGCTGATTGCCTTGTCTAGTAAAGATCCGATTCTTACAGCTTTTGAACTGAGCTGGGAA | |
CTGCGACGTCTCAGTTCCATTGAAAATGAGTTTAAGGCCGACTACACAGAACTACGTCAAAGATGCCAAG | |
AGTTCGCTACTTCGCTTCTCGATCACGCAAGAACTTCTTCAGAACTGGAAATCATATTAAATTATGACCC | |
GTCTGGTCCGGCTTATCAGACAGGAGAAAGAATGAAATTAGAACGTCTCAAGCTCGCTGTAAATTACAAA | |
CAGAAAACGTTTGTTGCTCATTCCAAAGTGCAACAGCTGCTTGCCTCTGTGTGGTATGAAGGGATGCCAG | |
GATTTAGGAGGAAAGGCCCGATTGGTCAGATACTGGAGATCGGAAAGCTTGGTGCTATGTTTCCAATCTA | |
CTCAATGATATACATCTTAGCACCACACAGCAAAATGGGTCTGAAGATGAAGAAGCCCTTCGCGAAATTC | |
GTCTGTCAGTCAGCATCTTACTGTTTCTTCTTGTTTCTTTTAATCTTGGCCTCCCAACGAGTGGAACAAA | |
TGGTTATCGAATGGTTTGGAACACCGTTTCTCCAGGAGTGGCTTCGAATAGCACAGGAACATGAACGTGG | |
AAAACTTCCTGGATTTGTAGAAACAGCCATTATCATGTATGTGTTTGGGTTTGTATGGTGCGAAGTTAAA | |
CAGCTATGGGACTCCGGGTTGCTTAAGTACATCAGTAACATGTGGAACATCGTGGACTTCATAACGAACA | |
TGCTCTATCTAACATGGATTAGCTTGCGAATGATAGCCTGGGTACAGGTTCGTCGAGAGTTAGCAATGGG | |
CATGGACGCTTATCAACCACGAGAAAATTGGCACCCTTATGATCCAATGCTCATAGCTGAAGGAGTCTTT | |
GGAGCAGCCAATATCTTCAGTTTTCTGAAGATGGTCCACATTTTTTCTGTTAATCATCATCTTGGACCAA | |
TACAAATCTCTCTCGGAAGAATGGTCTATGACATCATGAAATTCTTCTTCATTTACACTCTCGTTCTCTT | |
TGCTTTCGGATGTGGTATGAACCAAATGTTGTGGTACTACGCCGAGATGGATAAGAACATCTGCTACAGT | |
GGCCCAGGAGGGACTCCTAACCCGGAAGAGGATAAGTCGTGCGACATCTGGCGGCGGTTTGCTAACCTTT | |
TCGAAACGTCGCAGAGTTTATTCTGGGCAAGTTTTGGGTTAGTCGAGCTTGAAAGTTTCGAGCTGACAGG | |
AGTGAAAAGTTACACACGGTTCTGGAGTTTGTTGATGTTTGGTTGTTACTCGGTTATCAACATCGTCGTT | |
CTTCTTAATCTTCTCATCGCCATGATGAATCATTCGTACGAGATCATTTCGGAACGTTCGGATATCGAGT | |
GGAAGTTTGCTCGAAGCAAACTATGGATCAGTTACTTTGAAGACGGAGGAACGGTTCCACCCCCTTTTAA | |
CATTATCCCAACACCTAAAGCTTTCAAACACCTATTTGGCTGCAGATCTCATGTTGGAACTTTGTCAGTC | |
AAGATCAGAAAAGAGAAAGATAGAGACGCTAAGTATCAGAAGGTCATGAGGTGTTTAGTTCGGCGGTATG | |
TAACAAGGCAACAGGAAACGTCCGATGATACAGGAGTCACAGAGGATGATGTGAACGAAATTAAACAAAG | |
TATTCATGGCTTCCGTTCGGAACTCTTTGAAATCCTGAAAGATAACGGAATGAAGTTCTCTGCTGGGAAT | |
ACTGATAGAGAAGTCGGTAAGAAAGAGCGCCAGAAAGAACGTCGTCTGCTCAAGGGATTCAACATTGGTA | |
CAGTAGAAAACTTGATTAGCGCAGTTTTTCAAGAAAAAAAGCAGACGACAGGTTTCAAGATCACCCGTCG | |
TCTAACTGGAACCAAAAAACGGAATGCCAGAAAAAATTGGAACGCTATTGTGTCAGCTGCGAAGAACAAA | |
CAAAACCAAATTGGAGAAACCGATTCGAGTATCAGTGGAGATCTCGAAGATCTGAAAGCCCAGGTCCTAA | |
ACGAAACAGAAGTTGATGTCCCACTGACAGGGATAAAGCGAGACAATACATCATCAGTGACGTCACTTTC | |
CCGTTATCCTTCGACCAATGAAACTAACTTAGCTCCAATTTCTGAAGAGTTCAACTCAACAGTTCAGGTT | |
GAGATTGTCAATGAAATAACTCCAGAAATTCCAACCCATACACAAGGGAAAACGTTTTCAGAAGCTTCCG | |
CTTCTTTCAACTCAACTGATGACGTCACCCACAAAGATAATTCTTCCCTCGCACCGAGAGTTGGATCAGA | |
AGATGAAGAAGGAATCATACAAGAAAGTAGTGAACTGTCTTCACGTTCGAATCGTTCTTCGGAAACGAAA | |
TACCATCAAGAACCACTGAAAAGTCGCGTTGGAACAAACAGAGTGACTGCTTCTTCTGTTTCCCTACAAC | |
CGAAAAATGGTTGGCTATGAGGCAAAC | |
>gi|600513|gb|M21306.1|DROTRPC Drosophila melanogaster photoreceptor membrane-associated protein (trp), complete cds | |
AGCCACATTGGGCACTAATGTAATTAGTGGAATATAGCGACCCGTGGCTGCCACTTTTCAGCAGTGCAAC | |
GCGGCTAATTGGAGGCGGAACATCGCCACGATGGAACACTAAAGGATACAGTGCGCGAAAGGATTAGCCC | |
AAGGCTCCCCGAGGAGCAGGGATAAATGCCCATAGTGTTTGTGAGATGTGAAGTGACCAAGTGATCCGAT | |
CCTGATTATCGCGTTCGCATAGACCAGTAAATCAGTGCAGATATGGGCAGCAATACGGAATCCGATGCCG | |
AGAAGGCGTTGGGGTCTCGCCTGGATTACGACCTGATGATGGCCGAGGAGTACATCCTCAGTGATGTGGA | |
GAAGAATTTCATATTGTCCTGCGAGCGGGGTGACTTGCCAGGTGTCAAGAAGTGAGGCTTCCTCCCTATG | |
GCTTCGAGTAACTCGGCTAATATGATTTCGTCCTTGAAGGATCCTCGAGGAGTACCAGGGCACGGACAAG | |
TTCAACATTAACTGCACGGATCCCATGAACCGCTCCGCCCTCATTTCGGCCATCGAGAACGAGAACTTCG | |
ACCTGATGGTGATCCTGCTGGAGCATAACATCGAGGTGGGCGACGCCCTGTTGCACGCCATCTCGGAGGA | |
GTATGTGGAGGCGGTGGAGGAGCTGCTGCAGTGGGAGGAGACCAACCACAAGGAGGGCCAGCCATACGTA | |
GGTCACATCCTGCCGCAGGATATGCCACATCTCTTACTCTTTTCCCCTCCTTTGGCAGAGCTGGGAGGCG | |
GTGGACCGCTCCAAGTCCACCTTCACCGTGGACATCACGCCCCTTATCCTGGCCGCCCACCGAAATAACT | |
ACGAGATACTCAAAATCCTCCTGGATCGCGGGGCCACGCTGCCCATGCCGCACGACGTCAAGTGAGTGGG | |
GTGGTATAGGGATGCAGTTGCTCGGTGATATGGCTATGTAGTGAGCTTCCTCGTCCTTCGAGCGGCACGT | |
TCCATTGTTTGTCCAATTGGCAACACACTTATGTAGCACATCGCTGGGTGCCTCTAATTTTAGATAATTG | |
CGGGCAAATTAGGGTGAAATATTAATACAGGGTGTGATGTTGTGTGTGGGAAGAGAAGGTCCTTGATGTC | |
CTTAAATTGGGTCAATGGACTCAAGCCTTTAACAGCGTCATAAATGTCCTTGCTAGGAGAGAAGGTCCTT | |
TGTACTCCTTTCAGATTGATTTAAATGTATTTATTATTATAGAACTCTTGGGTTCCGTACACTTTGTATA | |
TCATCGCTTAATCTCAATTGTTCCTAGGTGCGGCTGCGATGAGTGTGTGACCTCCCAGACGACGGACTCC | |
CTGCGCCACTCGCAGTCGAGGATCAACGCATACCGCGCCCTGTCCGCCAGCTCGCTGATAGCGCTCAGCT | |
CCCGGGACCCTGTACTGACCGCCTTCCAATTGTCCTGGGAACTCAAGCGCCTGCAGGCGATGGAATCGGA | |
GTTTCGTGCCGAATACACGGTACGGTAGTAGTGTGCAATGTAAGCGGTGGCCAGTGTGCCACTAGAGTCC | |
ATGCGAATGCGAATGCGAGCCTTTGTTTGTGTTTGCCCAACAGGAGATGCGTCAGATGGTGCAGGACTTC | |
GGGACCTCGCTCCTGGACCACGCACGCACATCCATGGAACTCGAGGTGATGCTCAACTTCAACCACGAGC | |
CGTCCCACGACATCTGGTGCCTTGCCAGCAGCGAAACCCTGGAACGACTGAAGCTGGCCATTCGCTATAA | |
GCAAAAGACGGTGGGTTCGGACTAATCCGTGCACATTAAGAAACTGTGAAGGTTACAGGCTTGTGGCTAG | |
ATACCATGCTTAACCCAGACCACAGACTAACTGGGATTTAATAGCTTTATATATAGAAACGGATCTTAGT | |
ACCCTGCTTGAATTTATAAATATTACTCCCCCTTCTAGTTTGTGGCACATCCAAATGTCCAACAATTGTT | |
GGCCGCCATTTGGTACGACGGACTGCCGGGCTTTCCGCAAGAGGCCTCCCAGCAGCTGATGGATGTCGTG | |
AAGCTGGGATGCAGCTTCCCCATCTACAGCTTGAAGTACATCCTGGCCCCGGATTCCGAGGGTGCCAAGT | |
TCATGCGCAATCCTTTGTCAAGTTCATCACGCACTCCTTGCTCCTACATGTTCTTCCTGAGTGGGTAAAT | |
GGGCTGCTTAATACTTCATCCATCTAAATATGCGCCCATTCACAGTGCTCCTGGGTGCTGCCTCCCTGAG | |
GGTGGTGCAAATCACCTTTGAACTCCTCGCATTTCCCTGGATGCTGACCATGCTGGAGGATTGGCGCAAA | |
CACGAGAGAGGTTCACTACCGGGTCCCATTGAACTGGCAATCATTACCTACATAATGGCTCTAATATTTG | |
AGGAACTGAAATCTTTATATTCGGACGGCTTGTTTGAGTACATCATGGATCTTTGGAACATAGTGGACTA | |
CATATCGAACATGTTCTATGTGACGTGGATTCTTTGTAGGGCCACCGCTTGGGTAATCGTCCATGTGAGT | |
CCATATGAATCTAATGAATAAATTGAAGTTACTGAAACGAATTGATTTCAGCGCGATCTCTGGTTCCGGG | |
GCATAGATCCTTACTTCCCGAGGGAACACTGGCATCCGTTTGATCCAATGCTTCTATCAGAGGGCGCCTT | |
TGCTGCCGGAATGGTCTTCTCCTATCTAAAGCTCGTCCACATCTTCTCAATTAATCCCCACCTGGGACCC | |
TTGCAAGTTTCACTGGGTCGCATGATAATCGACATCATCAAGTTCTTCTTCATCTACACACTGGTGTTGT | |
TTGCCTTCGGATGTGGTCTCAACCAGTTGCTATGGTACTACGCTGAGCTGGAGAAGAACAAGTGCTATCA | |
CCTGCATCCCGATGTGGCTGACTTTGATGACCAGGAAAAGGCTTGTACCATCTGGCGAAGATTTTCCAAG | |
TATATTACATTCATTTCTATGTACTTGATCACATTGGCATCCCATTTCAATTTACTTTAGCTTATTCGAA | |
ACATCACAATCGCTCTTCTGGGCCTCTTTTGGCCTGGTGGACCTGGTCTCCTTCGATCTGGCGGGAATCA | |
AGAGCTTCACCCGCTTCTGGGCACTGCTGATGTTCGGCTCCTATTCGGTTATCAACATCATTGTGCTTCT | |
CAACATGCTGATTGCCATGATGTCCAACTCCTACCAAATCATCTCGGAGCGAGCCGACACCGAGTGGAAG | |
TTCGCCCGATCCCAGCTGTGGATGAGCTACTTCGAGGATGGCGGCACCATTCCACCGCCCTTCAACCTCT | |
GTCCCAACATGAAGATGTTGAGGAAGACCCTGGGCCGAAAGCGACCGTCACGAACCAAGAGCTTCATGGT | |
AAGTTACACTCCAGAGAAATTCGAATATTTCAGTTTAAGTATTATACAAATCCACACATATATATTAAAT | |
TCAGTGTACTCTACTTTAAGTGATATCTTCTTATCATATTACCTTTTAAATGTAAGTAACATTTCGAAAG | |
TTCGATTTTTTCAAGTTGAATCACCTCACAGTGAGAGGCGGCTCACAATCAATACAAATCTTAAGCGAAA | |
TCATTTCAATCAAGCCCATGCAGAGCTATCTGTGCGTCAAATTGACGTCTTAATACATCCTTTCTGTGTC | |
CTTCCACCCTCTAATCGCTCCCTGTTTCCAACTGCCAGCGAAAGTCCATGGAACGGGCACAGACGCTGCA | |
TGACAAAGTGATGAAGCTGCTGGTCAGGAGGTACATTACGGCGGAGCAGCGGCGGCGGGACGATTACGGC | |
ATTACCGAGGATGATATCATTGAGGTGCGCCAGGACATCAGCTCCTTGCGGTTCGAGTTGCTGGAGATTT | |
TCACCAACAATAACTGGGATGTACCCGACATTGAGAAGAAGTCGCAGGGTAAGGGAAAGCCTACCCACAT | |
CCAAATTGGCAACTTACACAATCTTAAGATAGAAGTAGCAGTATATATGAATAATTATTTACAATTAGCT | |
TTTGAATACCTAATAAAGTTCAACCCCATCTTGAATATTAATTTGAAATATGTTGTGCTAGAAGACTACA | |
AGCTAGATCGGATTTTAATCGGATTCTATTCACTAAACTTACTATTGAAGGAGTTGCTCGAACCACCAAG | |
GGCAAGGTGATGGAACGTCGCATCCTGAAGGACTTCCAGATTGGCTTCGTCGAGAATCTGAAGCAGGAAA | |
TGAGCGAATCTGAAAGCGGACGAGATATATTCTCATCGCTGGCCAAGGTCATCGGCAGAAAGAAGACCCA | |
GAAGGGGTAGGTCCACTTAAATCTGTAAGACCCTTTGAGTAAAGGTTACTATTCTCAATAGAGACAAGGA | |
TTGGAACGCCATTGCGAGGAAGAATACTTTCGCCTCCGATCCCATTGGCTCCAAGCGCTCCTCCATGCAA | |
CGTCATAGCCAGCGAAGCTTGAGGAGGAAGATCATCGAGCAGGCGAATGAGGGTCTTCAGATGAACCAGA | |
CCCAGTTGATTGGTAAATTAATAAAGATGTGCCCAATGTATGTATATTTCCATACCCTTGATCTCCTCCA | |
GAATTCAATCCCAACTTGGGTGATGTGACGCGTGCCACAAGAGTGGCTTATGTCAAGTTCATGCGGAAGA | |
AGATGGCTGCCGACGAGGTTTCCTTGGCCGATGACGAGGGTGCTCCAAATGGCGAAGGCGAAAAGAAGCC | |
ACTGGATGCCTCTGGGGTAAGTTCCAATCAATATATATATATGTATTCCCTGGCTAATATATAATCTCAC | |
TTGCAGTCTAAAAAGTCCATAACTAGTGGTGGAACTGGAGGAGGAGCTTCTATGTTGGCTGCAGCTGCTC | |
TAAGAGCATCGGTCAAGAATGTGGATGAAAAATCCGGAGCCGATGGCAAGCCCGGCACGATGGGCAAGCC | |
AACGGATGACAAGAAACCAGGTGATGATAAGGATAAGCAGCAGCCTCCCAAGGACTCCAAGCCGTCAGCA | |
GGTGGTCCCAAGCCCGGGGATCAGAAGCCAACTCCGGGTGCGGGAGCTCCAAAGCCCCAAGCAGCTGGCA | |
CTATCAGCAAGCCCGGTGAGTCACAAAAGAAGGACGCTCCGGCACCACCTACCAAACCTGGAGACACCAA | |
GCCTGCTGCGCCGAAGCCTGGAGAATCCGCCAAGCCCGAGGCCGCTGCCAAAAAGGAGGAGTCTTCCAAG | |
ACCGAAGCTAGCAAGCCGGCAGCCACAAATGGAGCAGCCAAGAGCGCAGCTCCCTCCGCTCCTTCGGATG | |
CCAAGCCGGATTCCAAACTGAAACCAGGAGCAGCTGGAGCACCAGAAGCAACCAAGGCAACCAATGGGGC | |
CTCCAAGCCGGACGAAAAGAAGAGCGGTCCGGAGGAGCCAAAAAAGGCTGCAGGAGACTCCAAGCCAGGA | |
GACGATGCCAAGGACAAGGATAAGAAACCCGGCGACGATAAGGACAAGAAACCTGGCGACGACAAAGACA | |
AGAAACCTGCCGACAATAATGATAAGAAGCCAGCCGATGACAAGGACAAGAAGCCGGGAGACGATAAGGA | |
CAAGAAGCCGGGTGACGACAAGGACAAGAAGCCGAGCGATGATAAGGACAAGAAGCCTGCCGATGACAAG | |
GACAAGAAGCCAGCAGCAGCTCCTCTGAAGCCGGCGATCAAGGTGGGTCAGAGCAGTGCCGCAGCTGGCG | |
GAGAACGAGGCAAATCCACGGTCACAGGACGCATGATCTCCGGCTGGCTCTAAGCCGCGGAATCCACTTT | |
CATAGCAATTAAATAATTAAGTCCTTTGTTTGTGTGAACAATAAAAAAAAAGAATCTCAAGTACCACGTT | |
TCTGCAACTTGTTGCTAAAGGCCACCTGTTGCCAGC | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ******************************* | |
# Author: Benjamin Tovar | |
# Date: March 15, 2015 | |
# Post http://btovar.com/2015/03/trim-fasta-header/ | |
# ******************************* | |
# Hi there, how many times we have a FASTA file that contains huge FASTA headers like this: | |
# >gi|600513|gb|M21306.1|DROTRPC Drosophila melanogaster photoreceptor membrane-associated protein (trp), complete cds | |
# AGCCACATTGGGCACTAATGTAATTAGTGGAATATAGCGACCCGTGGCTGCCACTTTTCAGCAGTGCAAC | |
# GCGGCTAATTGGAGGCGGAACATCGCCACGATGGAACACTAAAGGATACAGTGCGCGAAAGGATTAGCCC | |
# AAGGCTCCCCGAGGAGCAGGGATAAATGCCCATAGTGTTTGTGAGATGTGAAGTGACCAAGTGATCCGAT | |
# CCTGATTATCGCGTTCGCATAGACCAGTAAATCAGTGCAGATATGGGCAGCAATACGGAATCCGATGCCG | |
# So, to clean up the header, just use this simple command line: | |
# $ cat <input_file> | awk '{print $1}' > <output_file> | |
# EXAMPLE (option 1): | |
cat sequence.fasta | awk '{print $1}' > sequence_filtered_op1.fasta | |
# EXAMPLE (option 2): | |
awk '{print $1}' < sequence.fasta > sequence_filtered_op2.fasta | |
# And the output will be: | |
# >gi|600513|gb|M21306.1|DROTRPC | |
# AGCCACATTGGGCACTAATGTAATTAGTGGAATATAGCGACCCGTGGCTGCCACTTTTCAGCAGTGCAAC | |
# GCGGCTAATTGGAGGCGGAACATCGCCACGATGGAACACTAAAGGATACAGTGCGCGAAAGGATTAGCCC | |
# AAGGCTCCCCGAGGAGCAGGGATAAATGCCCATAGTGTTTGTGAGATGTGAAGTGACCAAGTGATCCGAT | |
# CCTGATTATCGCGTTCGCATAGACCAGTAAATCAGTGCAGATATGGGCAGCAATACGGAATCCGATGCCG | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment