Skip to content

Instantly share code, notes, and snippets.

@yk-tanigawa
Created April 19, 2016 01:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yk-tanigawa/a67dd6a0ecc59d54d71983aeff7e1d3e to your computer and use it in GitHub Desktop.
Save yk-tanigawa/a67dd6a0ecc59d54d71983aeff7e1d3e to your computer and use it in GitHub Desktop.
from itertools import groupby
def fasta_iter(fasta_name):
'''
given a fasta file. yield tuples of header, sequence
modified from Brent Pedersen
Correct Way To Parse A Fasta File In Python
https://www.biostars.org/p/710/
'''
with open(fasta_name) as f:
# ditch the boolean (x[0]) and just keep the header or sequence since
# we know they alternate.
data = (x[1] for x in groupby(f, lambda line: line[0] == ">"))
for header in data:
# drop the ">"
header = header.__next__()[1:].strip()
# join all sequence lines to one.
seq = "".join(s.strip() for s in data.__next__())
yield(header, seq)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment