Skip to content

Instantly share code, notes, and snippets.

@gregpinero
Created October 8, 2012 18:31
Show Gist options
  • Save gregpinero/3854089 to your computer and use it in GitHub Desktop.
Save gregpinero/3854089 to your computer and use it in GitHub Desktop.
Clean up a FASTA file that has unwanted characters in the description lines
import re
infile = 'dsim-all-chromosome-r1.3.reassembly1.updated_wsu1.fasta'
ref_count = 0
for line in open(infile,'r'):
if line.startswith('>'):
ref_count += 1
print '>' + re.sub(r'[^0-9a-zA-Z_]', '', line)[:50]
else:
print line,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment