Skip to content

Instantly share code, notes, and snippets.

@AlexDel
Created January 10, 2012 15:10
Show Gist options
  • Save AlexDel/1589515 to your computer and use it in GitHub Desktop.
Save AlexDel/1589515 to your computer and use it in GitHub Desktop.
NLTK. Ex 2.18 Write a program to print the 50 most frequent bigrams (pairs of adjacent words) of a text, omitting bigrams that contain stopwords.
def top_bigrams(text):
fdist = nltk.probability.FreqDist(nltk.bigrams(text)) #формируем список кортежей биграмм
stopwords = nltk.corpus.stopwords.words('english') #формируем стоплист
top_list = [(x,y) for x,y in fdist.keys() if x.isalpha() and y.isalpha() and x not in stopwords and y not in stopwords] #показываем только если элементы кортежа - слова и невходят в стоплист
return top_list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment