Skip to content

Instantly share code, notes, and snippets.

@hujuu
Last active August 19, 2018 13:13
Show Gist options
  • Save hujuu/ca15e204d8182c019af22cab6b1e1efd to your computer and use it in GitHub Desktop.
Save hujuu/ca15e204d8182c019af22cab6b1e1efd to your computer and use it in GitHub Desktop.
【R】【MeCab】RMeCabのインストールと形態素解析 ref: https://qiita.com/hujuu/items/314a64a50875cdabf755
$ brew doctor
$ brew doctor
Your system is ready to brew.
$ brew doctor
Please note that these warnings are just used to help the Homebrew maintainers
with debugging if you file an issue. If everything you use Homebrew for is
working fine: please don't worry and just ignore them. Thanks!
$ brew install mecab
$ brew install mecab-ipadic
install.packages("RMeCab", repos = "http://rmecab.jp/R")
library(RMeCab)
res <- RMeCabC("すもももももももものうち")
unlist (res)
名詞 助詞 名詞 助詞 名詞 助詞 名詞
"すもも" "も" "もも" "も" "もも" "の" "うち"
library(RMeCab)
library(ggplot2)
# 解析対象となるデータの読み込み
res <- RMeCabFreq("steve-jobs-speech.txt")
# 名詞だけを取り出してデータフレームres_nounへ
res_noun <- res[res[,2]=="名詞",]
# 2回以上登場する名詞の数。res[,4]で"Freq"列を参照
nrow(res_noun <- res[res[,2]=="名詞" & res[,4] > 1,])
# res_nounをFreqで降順ソート
res_noun[rev(order(res_noun$Freq)),]
# 1列目と4列目を抜き出してデータフレームを作成する
res_noun2 <- data.frame(word=as.character(res_noun[,1]),
freq=res_noun[,4])
# 上位25位に絞り込む
res_noun2 <- subset(res_noun2, rank(-freq)<25)
# ggplotでグラフを描画する
ggplot(res_noun2, aes(x=reorder(word,freq), y=freq)) +
geom_bar(stat = "identity", fill="grey") +
theme_bw(base_size = 10, base_family = "HiraKakuProN-W3") +
coord_flip()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment