Created
April 19, 2016 01:35
-
-
Save yk-tanigawa/a67dd6a0ecc59d54d71983aeff7e1d3e to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from itertools import groupby | |
def fasta_iter(fasta_name): | |
''' | |
given a fasta file. yield tuples of header, sequence | |
modified from Brent Pedersen | |
Correct Way To Parse A Fasta File In Python | |
https://www.biostars.org/p/710/ | |
''' | |
with open(fasta_name) as f: | |
# ditch the boolean (x[0]) and just keep the header or sequence since | |
# we know they alternate. | |
data = (x[1] for x in groupby(f, lambda line: line[0] == ">")) | |
for header in data: | |
# drop the ">" | |
header = header.__next__()[1:].strip() | |
# join all sequence lines to one. | |
seq = "".join(s.strip() for s in data.__next__()) | |
yield(header, seq) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment