Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
>gi|1620934|emb|Z80230.1| Calliphora vicina trp gene
AAAAGTTTAAATTGGATAAATTGCAAAAGGACAATTAAGGATACGGAATATATGCGTAGTTTGTGTAAAA
TGCGCTTATAGAAACACAGAAAAAAAAAATAAAAACGGATAAATCTTTAGAAACAATAAACACTAGCTTA
AAAATTAAAAGCAAAACAAACAATAAAACATGGCAACTGATCCGGAAAAAGGGAAAAATGAGGAAGAAAA
CTATAATATACAGTTTGCAGATGAATACGTGTTGACGGAGACAGAGAAAACCTTTATATTGGCTTGTGAG
CGCGGTGACATAGCAAGTGTCAAGGTAATAATTGAGGAAAATAAAGGTGCACCGGAAAAGTTTAATATTA
ATTGTGTTGATCCCATGAATCGTTCGGCCTTAATATCAGCCATTGAAAATGAAAATTTTGATTTAATGAT
TGTACTGTTGGAGGAAGGCATAGATGTGGGCGATGCATTGTTGCATGCTATTTCTGAAGAATATGTGGAG
GCTGTGGAGGAACTGTTGCAATGGGAAGAAACGCATCATAAGGAGGGTACACCATATAGTTGGGAGGCAG
TTGATCGTTCGAAATCGACATTTACGCCTGATATAACGCCTCTAATATTGGCAGCACATCGTAACAATTA
CGAAATTTTAAAAATTCTATTGGATCGTGGTGCCACATTACCAATGCCTCATGATGTCAAATGCGGTTGC
GATGAGTGTGTAACATCACAAGAAACCGACTCCTTGCGTCACTCTCAGTCGCGTATTAATGCTTTTCGTG
CTCTTTCGGCTAGCTCACTGATATCTCTTAGTTCACGTGATCCCGTCCTAACAGCTTTTGAACTCGCATG
GGAATTGAAACGTTTGCAGGCAATGGAATCAGAATTTCGTGCTGAATATGGAGAAATGCGTCATGGTGTC
CAAGAATTTGTTACATCTTTACTAGATCATGCCAGAACCTCAACCGAACTGGAAGTAATGTTAAACTTTA
ATCATGAAGCTTCCAATGACATTTGGACTCCTGGACAAAGGCAGACATTGGAGAGACTGAAATTGGCCAT
TAAGTATAAGCAAAAGACGTTTGTGGCTCATCCCAATGTACAACAACTATTGGCTGCCATTTGGTATGAA
GGTTTGCCCGGATTTCGACGTAAAAAAAGTTCTCAACAAATTTTAGAAGTTATAAAATTGGGTTGCATGT
TTCCCATTAACAGTATGAATTATCTTATGGCTCCCGAATCGGATGCTGGCAAATTTATGAGAAAACCTTT
CGTTAAATTCATTACACATTCATGCTCCTACATGTTCTTTTTGATGTTACTTGGTGCTGCTTCATTGCGT
GTTGTTCAAATAACATTTGAGTTGTTAGCGTTCCCATGGATGATTGAAATGTTGGACGATTGGCGTAAAC
ATGAAAGAGGCTCTTTGCCGGGACCTATTGAATTAGGTATCATTACGTACATTTCAAGTTTAGTCTTAGG
CGAATTGAAGTCATTATATTCGGATGGTTTGTTTGATTACATTATGGATCTATGGAATATTGTGGATTTC
ATTTCGAATATGTTTTATGTCACCTGGATACTGTGTAGGGCCACCGCCTGGATTATAGTGCATCGTGATC
TCTGGTTTCGAGATATAAATCCTTATTTTCCACGAGAACATTGGCATCCGTTTGATCCCATGCTGCTGTC
AGAAGGTGCATTTGCCGCAGGCATGGTATTCTCTTATTTGAAACTAGTTCACATTTTCTCTATAAATCCA
CACCTGGGACCATTACAAGTCTCATTGGGCCGCATGATTATCGATATTATAAAGTTCTTCTTTATTTATA
CTTTGGTGTTGTTTGCTTTCGGCTGTGGTCTCAATCAATTACTATGGTATTATGCTGAATTGGAGAAAAA
TAAATGCTATCACTTGCATCCAGATGTGGCGGATTTTGATGATCAAGAGAAAGCTTGTACTATATGGAGA
CGATTTTCAAATCTTTTTGAAACTTCACAATCTCTATTTTGGGCCTCATTTGGTTTAGTGGACTTGGTGT
CATTCGATTTGGCCGGCATTAAGAGTTTCACCCGTTTCTGGGCCTTGTTGATGTTCGGCTCATATTCGGT
AATCAATATTATCGTATTGCTCAACATGCTTATTGCCATGATGTCCAATTCATATCAAATTATTTCTGAA
CGTGCCGATGTTGAATGGAAATTTGCCCGCTCTCAATTGTGGATGAGCTATTTTGAGGATGGTGGCACTG
TACCTCCACCATTTAATATGTTCCCCTCAGTAAAGTTGATGCGTAAGGTATTTGGCAAGCAGAGACCAAA
ACGTTCCAGGAGTTTTATGTTGAAATCAAGGGTTAAGGCACAATCACTGCATGAGCGTGTAATGAAGCTA
TTGATACGTCGTTATATTACGGCCGAACAGCGTCACAGAGATGATTTCGGCATAACCGAGGATGATATCA
TTGAAGTGCGTCAGGATATTAGCTCATTGCGTTTCGAACTGTTGGACATTTTTACAAATAATAAATTTGT
TGTACCGGATATTGAAAAGAAATCAGCAAGTGCAGCGGCTGGTAAGAAGGGTAAGACAATAGAACGTCGA
ATTTTGAAAGATTTCCAAATAGGTTTCGTTGAAACTCTACAAACTGAGTTAGTAAATAGTGTCGAAGAGG
GCAAGGATATATTCTCGTCATTGGCCAATGCTATACGCAAAAAAAGATCTCAGAAAGGCGAAAAAGATTG
GAATGCTATTGCTCGCAAAAATACTATGAGCTCAAATCCCATTGGTTCCAAACGTTCATCAATTCAGCGG
CATTCCCAACGCAGTTTGAGACGCCGCATTATAGATGAAGCAAACGAAGGTCTTAGAATGAATCAGAATC
AACTTATAGAATACAATCCCTCTCTTGGAGATGTTTCGAGAGCTACTCGTGTTGCATATGTTAAATTCAT
GAAAAAGAAATTAGTGGCTGAAGAAGGTATATCACCTGAAGAACTCAATGAATCTAATAATTCTACGGAC
GCTGCAGCAAAACTCGACGCTTTTGCCAGGTCCACAATGAAGAGAGTAGAGAGCAAAAAAGATGATAGTG
CAAGTGCTGATGACTCGAAAGCAGCTACTGACAAACCGAAAGCTCCAGTTGGAGATAAAGCAAAACCAGC
GGCGCCCGCTAAGCCTGGAGATGCCAAAACAGCTGATGCTAAGGCACCAGCCCCTGCTCCACCTAACAAA
CCAGCAGATGCTGCTGCTAAGCCAGCTGTAGCTAAGCCCGGAGCAGAAACCAAACCTGAAGCAGCAGCTA
AAAAGGGTGAAGCTGCTAAAACTGCGGATGCCAAACCCGAAGCACCTACTGCAGCAGCTGCAACAAAATC
GGCCGCGCCCGCTGCTCCCGCCAAACCCGACGCCTCAACGAAGCCCCCTGCTGATGCTGCAAAACCTGGT
ACAGATGCTGCTACAAAAGCAGCTGATGGTGCTGATAAGAAAGCTGACGACAAGAAAGCAGATGATAAGA
AACCCGATGACAAGAAACCCGATGACAAGAAACCTGATGACAAGAAACCCGATGACAAGAAGCCCGACGA
TAAAAAACCAGGAGCCGATGCCAAGAAACCTGATGACAAAAAAGATCCTCCCGCCAAGCCCGCAATAAAA
GTTGGCCAAAGTAGTGCTGCCGCCGGTGGTGATCGTGGTAAATCGGTTGTTACTGGTCGCATGATATCGG
GTTGGCTTTAAATTCATTTTATACCAATCATTTATTTGAGACTTGAGACAAATAAAACTGATACTTTTAA
AATTAAAATTGAAAAAAGAAAACTTGTAGTTAATTGGAAATCGATTATTATTTTTACGAATAAAAACTAT
AAGTAAATACCAACAGT
>gi|636806189|gb|KJ572527.1| Limulus polyphemus transient receptor potential ion channel subfamily C trp-like protein (trp) mRNA, complete cds
TAGAGAGAAAAGTGACTTAGTCTACGTGATACTGAGCTGGAGAGAAAGGTGACGTATCTACGTGATACTG
AGCTAGAGAGAAAGGTGACGTATCTACGTGATTATAGAGGTAGAGAGAAAGGTGACGTATCTACGTGATT
ACTGAGTAGAGAGAAAGGTGACGTATCTACGTGAATACTGAACTAGAGAGAAAAAGTGACGCATCTACGT
GAATACAGAACTAGAGAGAAAAAGTGACGTATCTACGTGAATACTGAACTAGAGAGAAAAAGTGACGTAT
CTACGTGAATACTGAGCTAGAGAGAAAAAGTGACGTATCTACGGTGAATACTAGTAGAGAGAAAAGGTGA
CGTATCTACGTGAATTACTGAGCTAGAGAGAAAAGGTGAAGTATCTACGTGAATACTGAGCTAGAGGGAA
AAGGTGACGTATCTACGTGAATTACTGAGCTAGAGAGAAAAGGTGACGTATCTACGTGAATACTGAGCTA
GAGAGAAAAAGTGACGAATCTACGTGAATATTGAGCTGGCTACAAGAAGTGACATAGCTGCTACATTCAG
TATAAAGTTGCCGACAATGGGGAAGAACAACCTTCTCGGGATCCAGTCAAGCGACTGTAAAGACACATTT
TCTTCATTAGCAGAACAACATGATTGTCAACTCAGTGTTTACGAAAAGAAATATATGTTATGTGCAGAGC
GGGGAGACATTGTTTCCGTTCGAAAACTTTTGGAGACTCACAAATCTAATCCACGTTTTAATATCAATGT
TTGTGATCCGCTCGGACGAACAGCCTTGGTGGTGGCTATCGAGAACGAGGATTCTGGACTAATCGAACTT
TTACTGAAACATGGAATACAGGCCGGAGATGGACTCTTGCATGCTATTGAAGAGGAATATGTGGAAGCTG
TAGAGATGTTGCTCCAGCATGAAGAGAAGGTTCATGTACCTGGACAACTCTACAGTTGGGAGAATATGCC
TCAAGAAATGGCCAAGTACACGCCTGACGTGACCCCTTTAATTCTAGCTGCTCATAAAGACAATTACGAG
ATACTGAAGCTGCTACTCAACCGTGGCGCCACCTTACCGATGCCACATGATGTACGCTGCGGATGCGATG
ACTGCGTAGTGGCGACCTCTAGCGACTGTTTGCGTCACTCTCGTTCTAGAATTAACGCTTATCGTGCTCT
CACGTCTCCATCGCTGATTGCCTTGTCTAGTAAAGATCCGATTCTTACAGCTTTTGAACTGAGCTGGGAA
CTGCGACGTCTCAGTTCCATTGAAAATGAGTTTAAGGCCGACTACACAGAACTACGTCAAAGATGCCAAG
AGTTCGCTACTTCGCTTCTCGATCACGCAAGAACTTCTTCAGAACTGGAAATCATATTAAATTATGACCC
GTCTGGTCCGGCTTATCAGACAGGAGAAAGAATGAAATTAGAACGTCTCAAGCTCGCTGTAAATTACAAA
CAGAAAACGTTTGTTGCTCATTCCAAAGTGCAACAGCTGCTTGCCTCTGTGTGGTATGAAGGGATGCCAG
GATTTAGGAGGAAAGGCCCGATTGGTCAGATACTGGAGATCGGAAAGCTTGGTGCTATGTTTCCAATCTA
CTCAATGATATACATCTTAGCACCACACAGCAAAATGGGTCTGAAGATGAAGAAGCCCTTCGCGAAATTC
GTCTGTCAGTCAGCATCTTACTGTTTCTTCTTGTTTCTTTTAATCTTGGCCTCCCAACGAGTGGAACAAA
TGGTTATCGAATGGTTTGGAACACCGTTTCTCCAGGAGTGGCTTCGAATAGCACAGGAACATGAACGTGG
AAAACTTCCTGGATTTGTAGAAACAGCCATTATCATGTATGTGTTTGGGTTTGTATGGTGCGAAGTTAAA
CAGCTATGGGACTCCGGGTTGCTTAAGTACATCAGTAACATGTGGAACATCGTGGACTTCATAACGAACA
TGCTCTATCTAACATGGATTAGCTTGCGAATGATAGCCTGGGTACAGGTTCGTCGAGAGTTAGCAATGGG
CATGGACGCTTATCAACCACGAGAAAATTGGCACCCTTATGATCCAATGCTCATAGCTGAAGGAGTCTTT
GGAGCAGCCAATATCTTCAGTTTTCTGAAGATGGTCCACATTTTTTCTGTTAATCATCATCTTGGACCAA
TACAAATCTCTCTCGGAAGAATGGTCTATGACATCATGAAATTCTTCTTCATTTACACTCTCGTTCTCTT
TGCTTTCGGATGTGGTATGAACCAAATGTTGTGGTACTACGCCGAGATGGATAAGAACATCTGCTACAGT
GGCCCAGGAGGGACTCCTAACCCGGAAGAGGATAAGTCGTGCGACATCTGGCGGCGGTTTGCTAACCTTT
TCGAAACGTCGCAGAGTTTATTCTGGGCAAGTTTTGGGTTAGTCGAGCTTGAAAGTTTCGAGCTGACAGG
AGTGAAAAGTTACACACGGTTCTGGAGTTTGTTGATGTTTGGTTGTTACTCGGTTATCAACATCGTCGTT
CTTCTTAATCTTCTCATCGCCATGATGAATCATTCGTACGAGATCATTTCGGAACGTTCGGATATCGAGT
GGAAGTTTGCTCGAAGCAAACTATGGATCAGTTACTTTGAAGACGGAGGAACGGTTCCACCCCCTTTTAA
CATTATCCCAACACCTAAAGCTTTCAAACACCTATTTGGCTGCAGATCTCATGTTGGAACTTTGTCAGTC
AAGATCAGAAAAGAGAAAGATAGAGACGCTAAGTATCAGAAGGTCATGAGGTGTTTAGTTCGGCGGTATG
TAACAAGGCAACAGGAAACGTCCGATGATACAGGAGTCACAGAGGATGATGTGAACGAAATTAAACAAAG
TATTCATGGCTTCCGTTCGGAACTCTTTGAAATCCTGAAAGATAACGGAATGAAGTTCTCTGCTGGGAAT
ACTGATAGAGAAGTCGGTAAGAAAGAGCGCCAGAAAGAACGTCGTCTGCTCAAGGGATTCAACATTGGTA
CAGTAGAAAACTTGATTAGCGCAGTTTTTCAAGAAAAAAAGCAGACGACAGGTTTCAAGATCACCCGTCG
TCTAACTGGAACCAAAAAACGGAATGCCAGAAAAAATTGGAACGCTATTGTGTCAGCTGCGAAGAACAAA
CAAAACCAAATTGGAGAAACCGATTCGAGTATCAGTGGAGATCTCGAAGATCTGAAAGCCCAGGTCCTAA
ACGAAACAGAAGTTGATGTCCCACTGACAGGGATAAAGCGAGACAATACATCATCAGTGACGTCACTTTC
CCGTTATCCTTCGACCAATGAAACTAACTTAGCTCCAATTTCTGAAGAGTTCAACTCAACAGTTCAGGTT
GAGATTGTCAATGAAATAACTCCAGAAATTCCAACCCATACACAAGGGAAAACGTTTTCAGAAGCTTCCG
CTTCTTTCAACTCAACTGATGACGTCACCCACAAAGATAATTCTTCCCTCGCACCGAGAGTTGGATCAGA
AGATGAAGAAGGAATCATACAAGAAAGTAGTGAACTGTCTTCACGTTCGAATCGTTCTTCGGAAACGAAA
TACCATCAAGAACCACTGAAAAGTCGCGTTGGAACAAACAGAGTGACTGCTTCTTCTGTTTCCCTACAAC
CGAAAAATGGTTGGCTATGAGGCAAAC
>gi|600513|gb|M21306.1|DROTRPC Drosophila melanogaster photoreceptor membrane-associated protein (trp), complete cds
AGCCACATTGGGCACTAATGTAATTAGTGGAATATAGCGACCCGTGGCTGCCACTTTTCAGCAGTGCAAC
GCGGCTAATTGGAGGCGGAACATCGCCACGATGGAACACTAAAGGATACAGTGCGCGAAAGGATTAGCCC
AAGGCTCCCCGAGGAGCAGGGATAAATGCCCATAGTGTTTGTGAGATGTGAAGTGACCAAGTGATCCGAT
CCTGATTATCGCGTTCGCATAGACCAGTAAATCAGTGCAGATATGGGCAGCAATACGGAATCCGATGCCG
AGAAGGCGTTGGGGTCTCGCCTGGATTACGACCTGATGATGGCCGAGGAGTACATCCTCAGTGATGTGGA
GAAGAATTTCATATTGTCCTGCGAGCGGGGTGACTTGCCAGGTGTCAAGAAGTGAGGCTTCCTCCCTATG
GCTTCGAGTAACTCGGCTAATATGATTTCGTCCTTGAAGGATCCTCGAGGAGTACCAGGGCACGGACAAG
TTCAACATTAACTGCACGGATCCCATGAACCGCTCCGCCCTCATTTCGGCCATCGAGAACGAGAACTTCG
ACCTGATGGTGATCCTGCTGGAGCATAACATCGAGGTGGGCGACGCCCTGTTGCACGCCATCTCGGAGGA
GTATGTGGAGGCGGTGGAGGAGCTGCTGCAGTGGGAGGAGACCAACCACAAGGAGGGCCAGCCATACGTA
GGTCACATCCTGCCGCAGGATATGCCACATCTCTTACTCTTTTCCCCTCCTTTGGCAGAGCTGGGAGGCG
GTGGACCGCTCCAAGTCCACCTTCACCGTGGACATCACGCCCCTTATCCTGGCCGCCCACCGAAATAACT
ACGAGATACTCAAAATCCTCCTGGATCGCGGGGCCACGCTGCCCATGCCGCACGACGTCAAGTGAGTGGG
GTGGTATAGGGATGCAGTTGCTCGGTGATATGGCTATGTAGTGAGCTTCCTCGTCCTTCGAGCGGCACGT
TCCATTGTTTGTCCAATTGGCAACACACTTATGTAGCACATCGCTGGGTGCCTCTAATTTTAGATAATTG
CGGGCAAATTAGGGTGAAATATTAATACAGGGTGTGATGTTGTGTGTGGGAAGAGAAGGTCCTTGATGTC
CTTAAATTGGGTCAATGGACTCAAGCCTTTAACAGCGTCATAAATGTCCTTGCTAGGAGAGAAGGTCCTT
TGTACTCCTTTCAGATTGATTTAAATGTATTTATTATTATAGAACTCTTGGGTTCCGTACACTTTGTATA
TCATCGCTTAATCTCAATTGTTCCTAGGTGCGGCTGCGATGAGTGTGTGACCTCCCAGACGACGGACTCC
CTGCGCCACTCGCAGTCGAGGATCAACGCATACCGCGCCCTGTCCGCCAGCTCGCTGATAGCGCTCAGCT
CCCGGGACCCTGTACTGACCGCCTTCCAATTGTCCTGGGAACTCAAGCGCCTGCAGGCGATGGAATCGGA
GTTTCGTGCCGAATACACGGTACGGTAGTAGTGTGCAATGTAAGCGGTGGCCAGTGTGCCACTAGAGTCC
ATGCGAATGCGAATGCGAGCCTTTGTTTGTGTTTGCCCAACAGGAGATGCGTCAGATGGTGCAGGACTTC
GGGACCTCGCTCCTGGACCACGCACGCACATCCATGGAACTCGAGGTGATGCTCAACTTCAACCACGAGC
CGTCCCACGACATCTGGTGCCTTGCCAGCAGCGAAACCCTGGAACGACTGAAGCTGGCCATTCGCTATAA
GCAAAAGACGGTGGGTTCGGACTAATCCGTGCACATTAAGAAACTGTGAAGGTTACAGGCTTGTGGCTAG
ATACCATGCTTAACCCAGACCACAGACTAACTGGGATTTAATAGCTTTATATATAGAAACGGATCTTAGT
ACCCTGCTTGAATTTATAAATATTACTCCCCCTTCTAGTTTGTGGCACATCCAAATGTCCAACAATTGTT
GGCCGCCATTTGGTACGACGGACTGCCGGGCTTTCCGCAAGAGGCCTCCCAGCAGCTGATGGATGTCGTG
AAGCTGGGATGCAGCTTCCCCATCTACAGCTTGAAGTACATCCTGGCCCCGGATTCCGAGGGTGCCAAGT
TCATGCGCAATCCTTTGTCAAGTTCATCACGCACTCCTTGCTCCTACATGTTCTTCCTGAGTGGGTAAAT
GGGCTGCTTAATACTTCATCCATCTAAATATGCGCCCATTCACAGTGCTCCTGGGTGCTGCCTCCCTGAG
GGTGGTGCAAATCACCTTTGAACTCCTCGCATTTCCCTGGATGCTGACCATGCTGGAGGATTGGCGCAAA
CACGAGAGAGGTTCACTACCGGGTCCCATTGAACTGGCAATCATTACCTACATAATGGCTCTAATATTTG
AGGAACTGAAATCTTTATATTCGGACGGCTTGTTTGAGTACATCATGGATCTTTGGAACATAGTGGACTA
CATATCGAACATGTTCTATGTGACGTGGATTCTTTGTAGGGCCACCGCTTGGGTAATCGTCCATGTGAGT
CCATATGAATCTAATGAATAAATTGAAGTTACTGAAACGAATTGATTTCAGCGCGATCTCTGGTTCCGGG
GCATAGATCCTTACTTCCCGAGGGAACACTGGCATCCGTTTGATCCAATGCTTCTATCAGAGGGCGCCTT
TGCTGCCGGAATGGTCTTCTCCTATCTAAAGCTCGTCCACATCTTCTCAATTAATCCCCACCTGGGACCC
TTGCAAGTTTCACTGGGTCGCATGATAATCGACATCATCAAGTTCTTCTTCATCTACACACTGGTGTTGT
TTGCCTTCGGATGTGGTCTCAACCAGTTGCTATGGTACTACGCTGAGCTGGAGAAGAACAAGTGCTATCA
CCTGCATCCCGATGTGGCTGACTTTGATGACCAGGAAAAGGCTTGTACCATCTGGCGAAGATTTTCCAAG
TATATTACATTCATTTCTATGTACTTGATCACATTGGCATCCCATTTCAATTTACTTTAGCTTATTCGAA
ACATCACAATCGCTCTTCTGGGCCTCTTTTGGCCTGGTGGACCTGGTCTCCTTCGATCTGGCGGGAATCA
AGAGCTTCACCCGCTTCTGGGCACTGCTGATGTTCGGCTCCTATTCGGTTATCAACATCATTGTGCTTCT
CAACATGCTGATTGCCATGATGTCCAACTCCTACCAAATCATCTCGGAGCGAGCCGACACCGAGTGGAAG
TTCGCCCGATCCCAGCTGTGGATGAGCTACTTCGAGGATGGCGGCACCATTCCACCGCCCTTCAACCTCT
GTCCCAACATGAAGATGTTGAGGAAGACCCTGGGCCGAAAGCGACCGTCACGAACCAAGAGCTTCATGGT
AAGTTACACTCCAGAGAAATTCGAATATTTCAGTTTAAGTATTATACAAATCCACACATATATATTAAAT
TCAGTGTACTCTACTTTAAGTGATATCTTCTTATCATATTACCTTTTAAATGTAAGTAACATTTCGAAAG
TTCGATTTTTTCAAGTTGAATCACCTCACAGTGAGAGGCGGCTCACAATCAATACAAATCTTAAGCGAAA
TCATTTCAATCAAGCCCATGCAGAGCTATCTGTGCGTCAAATTGACGTCTTAATACATCCTTTCTGTGTC
CTTCCACCCTCTAATCGCTCCCTGTTTCCAACTGCCAGCGAAAGTCCATGGAACGGGCACAGACGCTGCA
TGACAAAGTGATGAAGCTGCTGGTCAGGAGGTACATTACGGCGGAGCAGCGGCGGCGGGACGATTACGGC
ATTACCGAGGATGATATCATTGAGGTGCGCCAGGACATCAGCTCCTTGCGGTTCGAGTTGCTGGAGATTT
TCACCAACAATAACTGGGATGTACCCGACATTGAGAAGAAGTCGCAGGGTAAGGGAAAGCCTACCCACAT
CCAAATTGGCAACTTACACAATCTTAAGATAGAAGTAGCAGTATATATGAATAATTATTTACAATTAGCT
TTTGAATACCTAATAAAGTTCAACCCCATCTTGAATATTAATTTGAAATATGTTGTGCTAGAAGACTACA
AGCTAGATCGGATTTTAATCGGATTCTATTCACTAAACTTACTATTGAAGGAGTTGCTCGAACCACCAAG
GGCAAGGTGATGGAACGTCGCATCCTGAAGGACTTCCAGATTGGCTTCGTCGAGAATCTGAAGCAGGAAA
TGAGCGAATCTGAAAGCGGACGAGATATATTCTCATCGCTGGCCAAGGTCATCGGCAGAAAGAAGACCCA
GAAGGGGTAGGTCCACTTAAATCTGTAAGACCCTTTGAGTAAAGGTTACTATTCTCAATAGAGACAAGGA
TTGGAACGCCATTGCGAGGAAGAATACTTTCGCCTCCGATCCCATTGGCTCCAAGCGCTCCTCCATGCAA
CGTCATAGCCAGCGAAGCTTGAGGAGGAAGATCATCGAGCAGGCGAATGAGGGTCTTCAGATGAACCAGA
CCCAGTTGATTGGTAAATTAATAAAGATGTGCCCAATGTATGTATATTTCCATACCCTTGATCTCCTCCA
GAATTCAATCCCAACTTGGGTGATGTGACGCGTGCCACAAGAGTGGCTTATGTCAAGTTCATGCGGAAGA
AGATGGCTGCCGACGAGGTTTCCTTGGCCGATGACGAGGGTGCTCCAAATGGCGAAGGCGAAAAGAAGCC
ACTGGATGCCTCTGGGGTAAGTTCCAATCAATATATATATATGTATTCCCTGGCTAATATATAATCTCAC
TTGCAGTCTAAAAAGTCCATAACTAGTGGTGGAACTGGAGGAGGAGCTTCTATGTTGGCTGCAGCTGCTC
TAAGAGCATCGGTCAAGAATGTGGATGAAAAATCCGGAGCCGATGGCAAGCCCGGCACGATGGGCAAGCC
AACGGATGACAAGAAACCAGGTGATGATAAGGATAAGCAGCAGCCTCCCAAGGACTCCAAGCCGTCAGCA
GGTGGTCCCAAGCCCGGGGATCAGAAGCCAACTCCGGGTGCGGGAGCTCCAAAGCCCCAAGCAGCTGGCA
CTATCAGCAAGCCCGGTGAGTCACAAAAGAAGGACGCTCCGGCACCACCTACCAAACCTGGAGACACCAA
GCCTGCTGCGCCGAAGCCTGGAGAATCCGCCAAGCCCGAGGCCGCTGCCAAAAAGGAGGAGTCTTCCAAG
ACCGAAGCTAGCAAGCCGGCAGCCACAAATGGAGCAGCCAAGAGCGCAGCTCCCTCCGCTCCTTCGGATG
CCAAGCCGGATTCCAAACTGAAACCAGGAGCAGCTGGAGCACCAGAAGCAACCAAGGCAACCAATGGGGC
CTCCAAGCCGGACGAAAAGAAGAGCGGTCCGGAGGAGCCAAAAAAGGCTGCAGGAGACTCCAAGCCAGGA
GACGATGCCAAGGACAAGGATAAGAAACCCGGCGACGATAAGGACAAGAAACCTGGCGACGACAAAGACA
AGAAACCTGCCGACAATAATGATAAGAAGCCAGCCGATGACAAGGACAAGAAGCCGGGAGACGATAAGGA
CAAGAAGCCGGGTGACGACAAGGACAAGAAGCCGAGCGATGATAAGGACAAGAAGCCTGCCGATGACAAG
GACAAGAAGCCAGCAGCAGCTCCTCTGAAGCCGGCGATCAAGGTGGGTCAGAGCAGTGCCGCAGCTGGCG
GAGAACGAGGCAAATCCACGGTCACAGGACGCATGATCTCCGGCTGGCTCTAAGCCGCGGAATCCACTTT
CATAGCAATTAAATAATTAAGTCCTTTGTTTGTGTGAACAATAAAAAAAAAGAATCTCAAGTACCACGTT
TCTGCAACTTGTTGCTAAAGGCCACCTGTTGCCAGC
# *******************************
# Author: Benjamin Tovar
# Date: March 15, 2015
# Post http://btovar.com/2015/03/trim-fasta-header/
# *******************************
# Hi there, how many times we have a FASTA file that contains huge FASTA headers like this:
# >gi|600513|gb|M21306.1|DROTRPC Drosophila melanogaster photoreceptor membrane-associated protein (trp), complete cds
# AGCCACATTGGGCACTAATGTAATTAGTGGAATATAGCGACCCGTGGCTGCCACTTTTCAGCAGTGCAAC
# GCGGCTAATTGGAGGCGGAACATCGCCACGATGGAACACTAAAGGATACAGTGCGCGAAAGGATTAGCCC
# AAGGCTCCCCGAGGAGCAGGGATAAATGCCCATAGTGTTTGTGAGATGTGAAGTGACCAAGTGATCCGAT
# CCTGATTATCGCGTTCGCATAGACCAGTAAATCAGTGCAGATATGGGCAGCAATACGGAATCCGATGCCG
# So, to clean up the header, just use this simple command line:
# $ cat <input_file> | awk '{print $1}' > <output_file>
# EXAMPLE (option 1):
cat sequence.fasta | awk '{print $1}' > sequence_filtered_op1.fasta
# EXAMPLE (option 2):
awk '{print $1}' < sequence.fasta > sequence_filtered_op2.fasta
# And the output will be:
# >gi|600513|gb|M21306.1|DROTRPC
# AGCCACATTGGGCACTAATGTAATTAGTGGAATATAGCGACCCGTGGCTGCCACTTTTCAGCAGTGCAAC
# GCGGCTAATTGGAGGCGGAACATCGCCACGATGGAACACTAAAGGATACAGTGCGCGAAAGGATTAGCCC
# AAGGCTCCCCGAGGAGCAGGGATAAATGCCCATAGTGTTTGTGAGATGTGAAGTGACCAAGTGATCCGAT
# CCTGATTATCGCGTTCGCATAGACCAGTAAATCAGTGCAGATATGGGCAGCAATACGGAATCCGATGCCG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment