Last active
November 29, 2016 05:45
-
-
Save smac89/bddb27d975c59a5f053256c893630cdc to your computer and use it in GitHub Desktop.
Efficient method to read words from a file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import itertools | |
def readwords(file_object): | |
byte_stream = itertools.groupby( | |
itertools.takewhile(lambda c: bool(c), | |
itertools.imap(file_object.read, | |
itertools.repeat(1))), str.isspace) | |
return ("".join(group) for pred, group in byte_stream if not pred) | |
# Example usage | |
import sys | |
if __name__ == '__main__': | |
# read from a user file | |
with open(sys.argv[1], 'r') as f: | |
for w in readwords(f): | |
print (w) | |
# read from stdin | |
for w in readwords(sys.stdin): | |
print (w) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Replace
itertools.imap
withmap
if running this on python version 3The function assumes words are separated by space characters. By space characters, I assume the same as contained in
string.whitespace