Skip to content

Instantly share code, notes, and snippets.

@yono
Created November 11, 2009 11:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yono/231879 to your computer and use it in GitHub Desktop.
Save yono/231879 to your computer and use it in GitHub Desktop.
指定されたファイル中の文章を解析して 単語とその出現頻度を返す
#!/usr/bin/env python
# -*- coding:utf-8 -*-
"""
feature_vector.py
指定されたファイル中の文章を解析して
単語とその出現頻度を返す
使い方(コマンドラインから)
% python feature_vector.py file
使い方 2(Pythonスクリプト中で)
import feature_vector
text = '単語とその出現頻度を返す'
result = feature_vector.analyse(text)
"""
import MeCab
def analyse(text):
mecab = MeCab.Tagger() ## MeCab のインスタンス作成
feature_vector = {} ## 結果を納める辞書
node = mecab.parseToNode(text) ## 解析を実行
while node:
print node.surface, node.feature ## それぞれ単語とその情報(品詞など)
surface = node.surface.decode('utf-8')
feature_vector[surface] = feature_vector.get(surface, 0) + 1 # 辞書に単語を追加
node = node.next
return feature_vector
if __name__ == '__main__':
import sys
filename = sys.argv[1]
file = open(filename).read()
feature_vector = analyse(file)
for word,freq in feature_vector.items():
print "%s\t%d" % (word,freq)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment