Skip to content

Instantly share code, notes, and snippets.

@takehiko
Created April 9, 2020 12:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save takehiko/29ffa944d70e390c2fe2e1bd9d428a99 to your computer and use it in GitHub Desktop.
Save takehiko/29ffa944d70e390c2fe2e1bd9d428a99 to your computer and use it in GitHub Desktop.
MJ文字情報から,部首と同じ字形の漢字を求める
#!/usr/bin/env ruby
# radicalfinder.rb : 部首と同じ字形の漢字を求める
# by takehikom
require "roo" # gem install roo
class Radicalfinder
def initialize(opt = {})
@opt_kyoikukanji_only = opt[:kyoiku]
@filename_xlsx = "mji.00601.xlsx"
@rad_h = {} # Radical number => character
end
def start
scan_sheet
if @opt_kyoikukanji_only
setup_kyoikukanji
find_radical(@kyoikukanji)
else
find_radical(@allkanji)
end
print_radical
end
def scan_sheet
if !test(?f, @filename_xlsx)
puts "#{@filename_xlsx} not found."
puts "Download the file via https://mojikiban.ipa.go.jp/1311.html"
exit
end
puts "scanning #{@filename_xlsx}..."
@xlsx = Roo::Spreadsheet.open(@filename_xlsx)
@sheet = @xlsx.sheet(0)
@char_h = {} # "一" => [row of "一"] + [pos]
@allkanji = ""
@sheet.each_with_index do |row, i|
c = row[1]
next if c == "font" || c == "実装なし"
@allkanji << c
row << i
@char_h[c] = row
end
puts "done!"
@char_h
end
def setup_kyoikukanji
kanji1 = '一右雨円王音下火花貝学気九休玉金空月犬見五口校左三山子四糸字耳七車手十出女小上森人水正生青夕石赤千川先早草足村大男竹中虫町天田土二日入年白八百文木本名目立力林六'
kanji2 = '引羽雲園遠何科夏家歌画回会海絵外角楽活間丸岩顔汽記帰弓牛魚京強教近兄形計元言原戸古午後語工公広交光考行高黄合谷国黒今才細作算止市矢姉思紙寺自時室社弱首秋週春書少場色食心新親図数西声星晴切雪船線前組走多太体台地池知茶昼長鳥朝直通弟店点電刀冬当東答頭同道読内南肉馬売買麦半番父風分聞米歩母方北毎妹万明鳴毛門夜野友用曜来里理話'
kanji3 = '悪安暗医委意育員院飲運泳駅央横屋温化荷開界階寒感漢館岸起期客究急級宮球去橋業曲局銀区苦具君係軽血決研県庫湖向幸港号根祭皿仕死使始指歯詩次事持式実写者主守取酒受州拾終習集住重宿所暑助昭消商章勝乗植申身神真深進世整昔全相送想息速族他打対待代第題炭短談着注柱丁帳調追定庭笛鉄転都度投豆島湯登等動童農波配倍箱畑発反坂板皮悲美鼻筆氷表秒病品負部服福物平返勉放味命面問役薬由油有遊予羊洋葉陽様落流旅両緑礼列練路和'
kanji4 = '愛案以衣位茨印英栄媛塩岡億加果貨課芽賀改械害街各覚潟完官管関観願岐希季旗器機議求泣給挙漁共協鏡競極熊訓軍郡群径景芸欠結建健験固功好香候康佐差菜最埼材崎昨札刷察参産散残氏司試児治滋辞鹿失借種周祝順初松笑唱焼照城縄臣信井成省清静席積折節説浅戦選然争倉巣束側続卒孫帯隊達単置仲沖兆低底的典伝徒努灯働特徳栃奈梨熱念敗梅博阪飯飛必票標不夫付府阜富副兵別辺変便包法望牧末満未民無約勇要養浴利陸良料量輪類令冷例連老労録'
kanji5 = '圧囲移因永営衛易益液演応往桜可仮価河過快解格確額刊幹慣眼紀基寄規喜技義逆久旧救居許境均禁句型経潔件険検限現減故個護効厚耕航鉱構興講告混査再災妻採際在財罪殺雑酸賛士支史志枝師資飼示似識質舎謝授修述術準序招証象賞条状常情織職制性政勢精製税責績接設絶祖素総造像増則測属率損貸態団断築貯張停提程適統堂銅導得毒独任燃能破犯判版比肥非費備評貧布婦武復複仏粉編弁保墓報豊防貿暴脈務夢迷綿輸余容略留領歴'
kanji6 = '胃異遺域宇映延沿恩我灰拡革閣割株干巻看簡危机揮貴疑吸供胸郷勤筋系敬警劇激穴券絹権憲源厳己呼誤后孝皇紅降鋼刻穀骨困砂座済裁策冊蚕至私姿視詞誌磁射捨尺若樹収宗就衆従縦縮熟純処署諸除承将傷障蒸針仁垂推寸盛聖誠舌宣専泉洗染銭善奏窓創装層操蔵臓存尊退宅担探誕段暖値宙忠著庁頂腸潮賃痛敵展討党糖届難乳認納脳派拝背肺俳班晩否批秘俵腹奮並陛閉片補暮宝訪亡忘棒枚幕密盟模訳郵優預幼欲翌乱卵覧裏律臨朗論'
@kyoikukanji = kanji1 + kanji2 + kanji3 + kanji4 + kanji5 + kanji6
end
def radical(code)
(0x2f00 - 1 + code).chr('UTF-8')
end
def find_radical(k = @allkanji)
k.each_char do |c|
if @char_h.key?(c)
r = @char_h[c]
radical_detail_a = []
[19, 21, 23, 25].each do |i|
# 部首1,内画数1,...,部首4,内画数4 を参照する
if r[i + 1] == 0
radical_detail_a << "#{radical(r[i])} (U+#{(0x2f00 - 1 + r[i]).to_s(16).upcase})"
# 例えば田部0画の漢字には"田"のほか"由"や"申"が該当する.
# その場合,Excel出現上位の漢字を選ぶ.
if !@rad_h.key?(r[i]) || r[-1] < @char_h[@rad_h[r[i]]][-1]
@rad_h[r[i]] = c
end
end
end
puts "#{c} (#{r[3]}; \##{r.last + 1})" +
(radical_detail_a.empty? ? "" : " => ") +
radical_detail_a.join(", ")
else
puts "#{c} not found"
end
end
end
def print_radical
1.upto(214) do |i|
u = "U+#{(0x2f00 - 1 + i).to_s(16).upcase}"
puts [u, radical(i), @rad_h[i]].join(' | ')
end
end
end
if __FILE__ == $0
Radicalfinder.new.start
# Radicalfinder.new(kyoiku: true).start # 教育漢字のみ
end
=begin
U+2F00 | ⼀ | 一
U+2F01 | ⼁ | 丨
U+2F02 | ⼂ | 丶
U+2F03 | ⼃ | 丿
U+2F04 | ⼄ | 乙
U+2F05 | ⼅ | 亅
U+2F06 | ⼆ | 二
U+2F07 | ⼇ | 亠
U+2F08 | ⼈ | 人
U+2F09 | ⼉ | 儿
U+2F0A | ⼊ | 両
U+2F0B | ⼋ | 八
U+2F0C | ⼌ | 冂
U+2F0D | ⼍ | 冖
U+2F0E | ⼎ | 冫
U+2F0F | ⼏ | 几
U+2F10 | ⼐ | 凵
U+2F11 | ⼑ | 刀
U+2F12 | ⼒ | 力
U+2F13 | ⼓ | 勹
U+2F14 | ⼔ | 业
U+2F15 | ⼕ | 匚
U+2F16 | ⼖ | 匸
U+2F17 | ⼗ | 十
U+2F18 | ⼘ | 卜
U+2F19 | ⼙ | 㔾
U+2F1A | ⼚ | 厂
U+2F1B | ⼛ | 厶
U+2F1C | ⼜ | 又
U+2F1D | ⼝ | 単
U+2F1E | ⼞ | 円
U+2F1F | ⼟ | 土
U+2F20 | ⼠ | 士
U+2F21 | ⼡ | 夂
U+2F22 | ⼢ | 夊
U+2F23 | ⼣ | 夕
U+2F24 | ⼤ | 大
U+2F25 | ⼥ | 女
U+2F26 | ⼦ | 子
U+2F27 | ⼧ | 写
U+2F28 | ⼨ | 寸
U+2F29 | ⼩ | 小
U+2F2A | ⼪ | 兀
U+2F2B | ⼫ | 尸
U+2F2C | ⼬ | 屮
U+2F2D | ⼭ | 山
U+2F2E | ⼮ | 巛
U+2F2F | ⼯ | 㔫
U+2F30 | ⼰ | 己
U+2F31 | ⼱ | 巾
U+2F32 | ⼲ | 干
U+2F33 | ⼳ | 幺
U+2F34 | ⼴ | 壥
U+2F35 | ⼵ | 廴
U+2F36 | ⼶ | 廾
U+2F37 | ⼷ | 弋
U+2F38 | ⼸ | 弓
U+2F39 | ⼹ | 彐
U+2F3A | ⼺ | 彡
U+2F3B | ⼻ | 彳
U+2F3C | ⼼ | 㣺
U+2F3D | ⼽ | 戈
U+2F3E | ⼾ | 戶
U+2F3F | ⼿ | 手
U+2F40 | ⽀ | 支
U+2F41 | ⽁ | 攴
U+2F42 | ⽂ | 文
U+2F43 | ⽃ | 斗
U+2F44 | ⽄ | 斤
U+2F45 | ⽅ | 方
U+2F46 | ⽆ | 无
U+2F47 | ⽇ | 日
U+2F48 | ⽈ | 会
U+2F49 | ⽉ | 月
U+2F4A | ⽊ | 木
U+2F4B | ⽋ | 欠
U+2F4C | ⽌ | 帰
U+2F4D | ⽍ | 歹
U+2F4E | ⽎ | 殳
U+2F4F | ⽏ | 毋
U+2F50 | ⽐ | 比
U+2F51 | ⽑ | 毛
U+2F52 | ⽒ | 氏
U+2F53 | ⽓ | 气
U+2F54 | ⽔ | 水
U+2F55 | ⽕ | 火
U+2F56 | ⽖ | 争
U+2F57 | ⽗ | 父
U+2F58 | ⽘ | 爻
U+2F59 | ⽙ | 丬
U+2F5A | ⽚ | 片
U+2F5B | ⽛ | 㸦
U+2F5C | ⽜ | 牛
U+2F5D | ⽝ | 犬
U+2F5E | ⽞ | 玄
U+2F5F | ⽟ | 玉
U+2F60 | ⽠ | 瓜
U+2F61 | ⽡ | 瓦
U+2F62 | ⽢ | 甘
U+2F63 | ⽣ | 生
U+2F64 | ⽤ | 用
U+2F65 | ⽥ | 当
U+2F66 | ⽦ | 疋
U+2F67 | ⽧ | 疒
U+2F68 | ⽨ | 癶
U+2F69 | ⽩ | 白
U+2F6A | ⽪ | 皮
U+2F6B | ⽫ | 尽
U+2F6C | ⽬ | 目
U+2F6D | ⽭ | 矛
U+2F6E | ⽮ | 矢
U+2F6F | ⽯ | 石
U+2F70 | ⽰ | 示
U+2F71 | ⽱ | 禸
U+2F72 | ⽲ | 禾
U+2F73 | ⽳ | 穴
U+2F74 | ⽴ | 並
U+2F75 | ⽵ | 竹
U+2F76 | ⽶ | 米
U+2F77 | ⽷ | 県
U+2F78 | ⽸ | 缶
U+2F79 | ⽹ | 㓁
U+2F7A | ⽺ | 羊
U+2F7B | ⽻ | 羽
U+2F7C | ⽼ | 老
U+2F7D | ⽽ | 而
U+2F7E | ⽾ | 耒
U+2F7F | ⽿ | 声
U+2F80 | ⾀ | 聿
U+2F81 | ⾁ | 肉
U+2F82 | ⾂ | 臣
U+2F83 | ⾃ | 自
U+2F84 | ⾄ | 至
U+2F85 | ⾅ | 与
U+2F86 | ⾆ | 舌
U+2F87 | ⾇ | 舛
U+2F88 | ⾈ | 舟
U+2F89 | ⾉ | 艮
U+2F8A | ⾊ | 色
U+2F8B | ⾋ | 䒑
U+2F8C | ⾌ | 乕
U+2F8D | ⾍ | 虫
U+2F8E | ⾎ | 血
U+2F8F | ⾏ | 行
U+2F90 | ⾐ | 衣
U+2F91 | ⾑ | 襾
U+2F92 | ⾒ | 見
U+2F93 | ⾓ | 角
U+2F94 | ⾔ | 言
U+2F95 | ⾕ | 谷
U+2F96 | ⾖ | 豆
U+2F97 | ⾗ | 豕
U+2F98 | ⾘ | 豸
U+2F99 | ⾙ | 売
U+2F9A | ⾚ | 赤
U+2F9B | ⾛ | 走
U+2F9C | ⾜ | 足
U+2F9D | ⾝ | 身
U+2F9E | ⾞ | 車
U+2F9F | ⾟ | 辛
U+2FA0 | ⾠ | 辰
U+2FA1 | ⾡ | 辵
U+2FA2 | ⾢ | 邑
U+2FA3 | ⾣ | 医
U+2FA4 | ⾤ | 釆
U+2FA5 | ⾥ | 里
U+2FA6 | ⾦ | 金
U+2FA7 | ⾧ | 長
U+2FA8 | ⾨ | 門
U+2FA9 | ⾩ | 阜
U+2FAA | ⾪ | 隶
U+2FAB | ⾫ | 旧
U+2FAC | ⾬ | 雨
U+2FAD | ⾭ | 靑
U+2FAE | ⾮ | 非
U+2FAF | ⾯ | 面
U+2FB0 | ⾰ | 革
U+2FB1 | ⾱ | 韋
U+2FB2 | ⾲ | 韭
U+2FB3 | ⾳ | 音
U+2FB4 | ⾴ | 頁
U+2FB5 | ⾵ | 凬
U+2FB6 | ⾶ | 飛
U+2FB7 | ⾷ | 食
U+2FB8 | ⾸ | 首
U+2FB9 | ⾹ | 香
U+2FBA | ⾺ | 馬
U+2FBB | ⾻ | 骨
U+2FBC | ⾼ | 高
U+2FBD | ⾽ | 髟
U+2FBE | ⾾ | 鬥
U+2FBF | ⾿ | 鬯
U+2FC0 | ⿀ | 鬲
U+2FC1 | ⿁ | 鬼
U+2FC2 | ⿂ | 魚
U+2FC3 | ⿃ | 鳥
U+2FC4 | ⿄ | 塩
U+2FC5 | ⿅ | 鹿
U+2FC6 | ⿆ | 麥
U+2FC7 | ⿇ | 麻
U+2FC8 | ⿈ | 黃
U+2FC9 | ⿉ | 黍
U+2FCA | ⿊ | 点
U+2FCB | ⿋ | 黹
U+2FCC | ⿌ | 黽
U+2FCD | ⿍ | 鼎
U+2FCE | ⿎ | 鼓
U+2FCF | ⿏ | 䑕
U+2FD0 | ⿐ | 鼻
U+2FD1 | ⿑ | 斉
U+2FD2 | ⿒ | 歯
U+2FD3 | ⿓ | 竜
U+2FD4 | ⿔ | 亀
U+2FD5 | ⿕ | 龠
=end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment