Skip to content

Instantly share code, notes, and snippets.

@vcidst
Last active May 31, 2020 10:52
Show Gist options
  • Save vcidst/65c33e4c12d9e11281d816a77c5479b7 to your computer and use it in GitHub Desktop.
Save vcidst/65c33e4c12d9e11281d816a77c5479b7 to your computer and use it in GitHub Desktop.
Creates a dictionary object with the 3 most common prepositions for every verb in the NLTK Brown Corpus. Export ConditionalFrequencyDist Object as a JSON
import nltk
from nltk.corpus import brown
import json
prepchoices = nltk.ConditionalFreqDist((v[0], p[0])
for (v, p) in nltk.bigrams(brown.tagged_words(tagset="universal"))
if v[1] == "VERB" and p[1] == "ADP")
pc = dict()
for word in prepchoices.conditions():
# get three most common prepositions
word_dist = prepchoices[word].most_common(3)
# convert [('with', 5), ('to', 4), ('about', 3)]
# to ['with', 'to', 'about']
word_dist_clean = [x[0] for x in word_dist]
pc[word] = word_dist_clean
with open('prepositions.json', 'w') as fp:
json.dump(pc, fp)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment