Skip to content

Instantly share code, notes, and snippets.

@leeyspaul
Created July 31, 2018 23:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save leeyspaul/4d625ad9c83d681a9c0bee39f4f09c81 to your computer and use it in GitHub Desktop.
Save leeyspaul/4d625ad9c83d681a9c0bee39f4f09c81 to your computer and use it in GitHub Desktop.
Generate .txt file to raw corpus (Code snippet)
def txt_to_raw(file_name):
'''
take in a file_name and then return a raw String corpus of the contained text
'''
with open(file_name, 'r') as file:
raw_corpus = file.readlines()
return raw_corpus
task_training_corpus = txt_to_raw(task_training)
task_testing_corpus = txt_to_raw(task_testing)
print(f'The first few lines of our raw training corpus: \n{task_training_corpus[:3]}')
print(f'\n---\n\nThe few lines of our raw testing corpus: \n{task_testing_corpus[:3]}')
print(f'\n---\n\nWhat the raw corpus looks like: \n\n{task_training_corpus}')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment