Skip to content

Instantly share code, notes, and snippets.

@trylks
Last active May 29, 2016 11:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save trylks/733172ee3c63082935235f85e5dbbbdb to your computer and use it in GitHub Desktop.
Save trylks/733172ee3c63082935235f85e5dbbbdb to your computer and use it in GitHub Desktop.
import nltk
from nltk.probability import LidstoneProbDist
from nltk.model.ngram import NgramModel
import pandas as pd
tweets = pd.read_csv('tweeets.csv')
tokenize = lambda x: nltk.word_tokenize(str(x))
train = [tokenize(text) for text in tweets[tweets.user == 'trylks']['text']]
text = tokenize("I think that the #Python library #nltk is great")
# here is the good part
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
model = NgramModel(3, train, estimator=estimator)
perplexity = model.perplexity(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment