Skip to content

Instantly share code, notes, and snippets.

@ikegami-yukino
Created July 31, 2015 05:00
Show Gist options
  • Save ikegami-yukino/a1205133e19c4a20e984 to your computer and use it in GitHub Desktop.
Save ikegami-yukino/a1205133e19c4a20e984 to your computer and use it in GitHub Desktop.
Japanese Lexical Density
"""
Lexical Density
http://web.archive.org/web/20110810174351/http://www.unisanet.unisa.edu.au/Resources/la/Readability/Content%20words%20and%20lexical%20density.htm
"""
from __future__ import division
import MeCab
CONTENT_WORD_POS = ('名詞', '動詞', '形容詞', '副詞')
def compute_lexical_density(sentence):
t = MeCab.Tagger()
n = t.parseToNode(sentence)
content_words = 0
total = 0
while n:
if not n.feature.startswith('BOS/EOS'):
if n.feature.startswith(CONTENT_WORD_POS) and ',非自立,' not in n.feature:
content_words += 1
total += 1
n = n.next
return content_words / total
print(compute_lexical_density("もう何もこわくない"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment