Skip to content

Instantly share code, notes, and snippets.

Created January 10, 2012 15:10
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
NLTK. Ex 2.18 Write a program to print the 50 most frequent bigrams (pairs of adjacent words) of a text, omitting bigrams that contain stopwords.
def top_bigrams(text):
fdist = nltk.probability.FreqDist(nltk.bigrams(text)) #формируем список кортежей биграмм
stopwords = nltk.corpus.stopwords.words('english') #формируем стоплист
top_list = [(x,y) for x,y in fdist.keys() if x.isalpha() and y.isalpha() and x not in stopwords and y not in stopwords] #показываем только если элементы кортежа - слова и невходят в стоплист
return top_list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment