Skip to content

Instantly share code, notes, and snippets.

@CrazyPython
Created December 29, 2016 00:10
Show Gist options
  • Save CrazyPython/e324a3ecc310e5e68c23632149cece75 to your computer and use it in GitHub Desktop.
Save CrazyPython/e324a3ecc310e5e68c23632149cece75 to your computer and use it in GitHub Desktop.
import nltk
emma_sents = []
f = open('train.txt', 'w')
for fileid in ['shakespeare-caesar.txt', 'shakespeare-hamlet.txt','shakespeare-macbeth.txt']:
emma_sents.extend(nltk.corpus.gutenberg.sents(fileid))
def join(array):
result = ''
for i in array:
if i not in ', .,"/[]\+-)(*&^%$#@!~`<>:;{}|-_?' + "'":
result += ' '+i
else:
result += i
result = result.strip()
return result
for i in emma_sents:
f.write(join(i) + '\n\n')
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment