Skip to content

Instantly share code, notes, and snippets.

@insightcoder
Last active October 14, 2017 03:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save insightcoder/9f29724ba792649b07524859d793142a to your computer and use it in GitHub Desktop.
Save insightcoder/9f29724ba792649b07524859d793142a to your computer and use it in GitHub Desktop.
import re
import pprint
def header_split(text, header_pattern):
""" Return chunks of text split by the header pattern.
Keyword arguments:
text -- Text that is desired to be split.
header_pattern -- Regular expression pattern string that sufficiently
describes the header of each chunk.
"""
find_pattern = r'%s.*?(?=%s|$)' % (header_pattern, header_pattern)
parts = re.findall(find_pattern, text, re.DOTALL)
return parts
def main():
with(open('data.txt')) as f:
text = f.read()
header_pattern = r'# Person \d+'
parts = header_split(text, header_pattern)
for part in parts:
pprint.pprint(part)
if __name__ == '__main__':
main()
'# Person 1\nName: Frodo\nDOB: 9/22/2968\nFavorite Food: Mushroom Pizza\n'
'# Person 2\nName: Samwise\nDOB: 4/6/2980\nFavorite Food: Lembas with Queso\n'
'# Person 3\nName: Gollum\nDOB: 1/1/2430\nFavorite Food: Fruit Rings Cereal'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment