Skip to content

Instantly share code, notes, and snippets.

@necrolyte2
Created October 14, 2014 23:10
Show Gist options
  • Save necrolyte2/b0df3a584551554e2e91 to your computer and use it in GitHub Desktop.
Save necrolyte2/b0df3a584551554e2e91 to your computer and use it in GitHub Desktop.
Removes duplicate sequence names from fasta file
#!/usr/bin/env python
import sys
from Bio import SeqIO
f = sys.argv[1]
seqs = SeqIO.parse(f,'fasta')
index = {}
for seq in seqs:
if seq.id not in index:
index[seq.id] = seq
with open('nodup.fasta','w') as fh:
SeqIO.write(index.values(), fh, 'fasta')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment