Skip to content

Instantly share code, notes, and snippets.

@dyerrington dyerrington/bi-gram
Created Apr 14, 2015

Embed
What would you like to do?
Find bi-grams, filter on frequency, return PMI
import nltk
from nltk.collocations import *
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()
# change this to read in your data
finder = BigramCollocationFinder.from_words(
nltk.corpus.genesis.words('/var/www/htdocs/rapstats/data/albums/wutang_all.txt'))
# only bigrams that appear 3+ times
finder.apply_freq_filter(2)
# return the 10 n-grams with the highest PMI
finder.nbest(bigram_measures.pmi, 10)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.