Skip to content

Instantly share code, notes, and snippets.

@takehiko
Created March 2, 2011 20:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save takehiko/851715 to your computer and use it in GitHub Desktop.
Save takehiko/851715 to your computer and use it in GitHub Desktop.
kc.rb: Japanese Educational Kanji Checker
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
if RUBY_VERSION < "1.9"
$KCODE = "utf8"
end
# kc.rb : 各漢字の出現頻度を求める.
# 教育漢字(学習漢字,小学漢字)でない漢字を見つけることもできる.
class KanjiChecker
def initialize(filename = nil)
@filename = filename # ファイル名の文字列.nilなら標準入力より得る
@freq_display_method = 1 # update_lineで使用
@use_grep = true # 配当外漢字の出現位置を出力するならtrue
@kanji_grade = {} # '一' => 1, ...
@kanji_freq = Hash.new(0) # 入力ファイルにおける文字の頻度
setup_kanji
end
def start
analyze
report
end
private
def setup_kanji
# 学年別漢字配当表 http://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8
kanji1 = '一右雨円王音下火花貝学気九休玉金空月犬見五口校左三山子四糸字耳七車手十出女小上森人水正生青夕石赤千川先早草足村大男竹中虫町天田土二日入年白八百文木本名目立力林六'
kanji2 = '引羽雲園遠何科夏家歌画回会海絵外角楽活間丸岩顔汽記帰弓牛魚京強教近兄形計元言原戸古午後語工公広交光考行高黄合谷国黒今才細作算止市矢姉思紙寺自時室社弱首秋週春書少場色食心新親図数西声星晴切雪船線前組走多太体台地池知茶昼長鳥朝直通弟店点電刀冬当東答頭同道読内南肉馬売買麦半番父風分聞米歩母方北毎妹万明鳴毛門夜野友用曜来里理話'
kanji3 = '悪安暗医委意育員院飲運泳駅央横屋温化荷開界階寒感漢館岸起期客究急級宮球去橋業曲局銀区苦具君係軽血決研県庫湖向幸港号根祭皿仕死使始指歯詩次事持式実写者主守取酒受州拾終習集住重宿所暑助昭消商章勝乗植申身神真深進世整昔全相送想息速族他打対待代第題炭短談着注柱丁帳調追定庭笛鉄転都度投豆島湯登等動童農波配倍箱畑発反坂板皮悲美鼻筆氷表秒病品負部服福物平返勉放味命面問役薬由油有遊予羊洋葉陽様落流旅両緑礼列練路和'
kanji4 = '愛案以衣位囲胃印英栄塩億加果貨課芽改械害各覚街完官管関観願希季紀喜旗器機議求泣救給挙漁共協鏡競極訓軍郡径型景芸欠結建健験固功好候航康告差菜最材昨札刷殺察参産散残士氏史司試児治辞失借種周祝順初松笑唱焼象照賞臣信成省清静席積折節説浅戦選然争倉巣束側続卒孫帯隊達単置仲貯兆腸低底停的典伝徒努灯堂働特得毒熱念敗梅博飯飛費必票標不夫付府副粉兵別辺変便包法望牧末満未脈民無約勇要養浴利陸良料量輪類令冷例歴連老労録'
kanji5 = '圧移因永営衛易益液演応往桜恩可仮価河過快賀解格確額刊幹慣眼基寄規技義逆久旧居許境均禁句群経潔件券険検限現減故個護効厚耕鉱構興講混査再災妻採際在財罪雑酸賛支志枝師資飼示似識質舎謝授修述術準序招承証条状常情織職制性政勢精製税責績接設舌絶銭祖素総造像増則測属率損退貸態団断築張提程適敵統銅導徳独任燃能破犯判版比肥非備俵評貧布婦富武復複仏編弁保墓報豊防貿暴務夢迷綿輸余預容略留領'
kanji6 = '異遺域宇映延沿我灰拡革閣割株干巻看簡危机貴揮疑吸供胸郷勤筋系敬警劇激穴絹権憲源厳己呼誤后孝皇紅降鋼刻穀骨困砂座済裁策冊蚕至私姿視詞誌磁射捨尺若樹収宗就衆従縦縮熟純処署諸除将傷障城蒸針仁垂推寸盛聖誠宣専泉洗染善奏窓創装層操蔵臓存尊宅担探誕段暖値宙忠著庁頂潮賃痛展討党糖届難乳認納脳派拝背肺俳班晩否批秘腹奮並陛閉片補暮宝訪亡忘棒枚幕密盟模訳郵優幼欲翌乱卵覧裏律臨朗論'
add_grade(1, kanji1)
add_grade(2, kanji2)
add_grade(3, kanji3)
add_grade(4, kanji4)
add_grade(5, kanji5)
add_grade(6, kanji6)
@kanji_array = [kanji1, kanji2, kanji3, kanji4, kanji5, kanji6].map {|str| str.split(//)}
end
def add_grade(grade, str)
str.split(//).each do |c|
@kanji_grade[c] = grade
end
end
def analyze
if @filename
f_in = open(@filename)
else
f_in = $stdin
end
f_in.each_line do |line0|
line = line0.gsub(/[^一-龠]/, '')
line.split(//).each do |c|
@kanji_freq[c] += 1
end
end
if @filename
f_in.close
end
end
def update_line(line, freq_all, sort, kanji, freq)
case @freq_display_method
when 1
line += kanji * freq
when 2
line += "%s(%d) " % [kanji, freq]
else
line += "\n %s(%d)" % [kanji, freq]
end
freq_all += freq
sort += 1
return line, freq_all, sort
end
def report
@kanji_array.each_with_index do |kanji_grade_array, i|
grade = i + 1
line = ""
freq = 0
sort = 0
kanji_grade_array.each do |c|
if @kanji_freq[c] > 0
line, freq, sort = update_line(line, freq, sort, c, @kanji_freq[c])
@kanji_freq.delete(c)
end
end
line = "#{grade}年(#{sort}種#{freq}字): " + line
puts line
end
line = ""
freq = 0
sort = 0
kanji_outside = @kanji_freq.keys.sort
kanji_outside.each do |c|
line, freq, sort = update_line(line, freq, sort, c, @kanji_freq[c])
# @kanji_freq.delete(c)
end
line = "配当外(#{sort}種#{freq}字): " + line
puts line
if @use_grep && @filename && !kanji_outside.empty?
report_grep_outside(kanji_outside)
end
end
def report_grep_outside(kanji_outside)
puts
kanji_outside.each do |c|
grep(c)
end
end
def grep(c)
command = "grep -n #{c} #{@filename}"
puts command
system command
end
end
if __FILE__ == $0
KanjiChecker.new(ARGV.shift).start
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment