Skip to content

Instantly share code, notes, and snippets.

@janprill
Created July 16, 2016 10:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save janprill/5bafc6ea22f6566f8154af6ae7064316 to your computer and use it in GitHub Desktop.
Save janprill/5bafc6ea22f6566f8154af6ae7064316 to your computer and use it in GitHub Desktop.
# Export wordlists of differently tagged word types from brown corpus
# using the natural language toolkit.
import io
import nltk
brown_tagged = nltk.corpus.brown.tagged_words(tagset='universal')
word_tag_fd = nltk.FreqDist(brown_tagged)
types = ['NOUN', 'VERB', 'DET', 'ADJ', 'ADP', 'CONJ', 'ADV', 'PRT']
for t in types:
words = [wt[0] for (wt, _) in word_tag_fd.most_common() if wt[1] == t]
with io.open("{0}.txt".format(t), 'w', encoding='utf8') as out:
out.write('\n'.join(words))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment