Skip to content

Instantly share code, notes, and snippets.

@AlexDel
Created January 10, 2012 12:45
Show Gist options
  • Save AlexDel/1588878 to your computer and use it in GitHub Desktop.
Save AlexDel/1588878 to your computer and use it in GitHub Desktop.
Genre diversity score. Считаем лексическую насыщенность в корпусе Брауна и выводим данные. NLTK Упр 2.16
import nltk
from __future__ import division
for genre in nltk.corpus.brown.categories():
words = nltk.corpus.brown.words(categories = genre)
print genre +' - ' + str(round((len(set(words))/len(words)),6)*100) + '%'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment