Skip to content

Instantly share code, notes, and snippets.

@mandel59
Created March 24, 2024 11:25
Show Gist options
  • Save mandel59/5ad7edc2a769b4259ce66a2eb30b1d2f to your computer and use it in GitHub Desktop.
Save mandel59/5ad7edc2a769b4259ce66a2eb30b1d2f to your computer and use it in GitHub Desktop.
部首別漢字総数の集計
-- Unihanのデータを使って、部首ごとの文字数をカウントする。
-- 複数の部首が登録されている漢字については、それぞれの部首で重複してカウントされる。
-- 実行コマンド: erq --init mojidata.erq < unihan_radicals.erq
table temp.radicals_chart =
unihan_kRSUnicode
join regexp_all(value, '(?<r>\d+)''*\.(?<s>\d+)')
{ UCS, unpack groups {r, s} }
join radicals on r=radical
{radical: 部首漢字 => count: count(distinct UCS)}
;;
-- 上位20件を表示
radicals_chart order by count desc limit 20;;
-- 棒グラフ化
radicals_chart output vega lite with
encoding { x: count q, y: radical n sort(count desc) },
layer(mark bar; mark text {align:"left",dx:3}, encoding { text: count q })
;;

部首別漢字総数の集計

元ネタ: https://twitter.com/JUMANJIKYO/status/1771847201221677188

MojidataとErqを使って、同様の集計を行う。

実行方法

  1. erq-mojidata-playgroundリポジトリを次のコマンドでcloneする
git clone https://github.com/mandel59/erq-mojidata-playground.git
cd erq-mojidata-playground
  1. unihan_radicals.erqファイルをerq-mojidata-playgroundディレクトリに置く
  2. erq --init mojidata.erq < unihan_radicals.erq を実行

実行結果

$ erq --init mojidata.erq < unihan_radicals.erq
Connected to :memory:
attach 'node_modules/@mandel59/mojidata/dist/moji.db' as moji
ok (0.006s)
create table `temp`.radicals_chart as select 部首漢字 as radical, count(distinct UCS) as count from (select UCS, `groups`->>'$."r"' as r, `groups`->>'$."s"' as s from unihan_kRSUnicode join regexp_all(value, '(?<r>\d+)''*\.(?<s>\d+)')) join radicals on r = radical group by (部首漢字)
ok (0.167s)
select * from radicals_chart order by count desc limit 20
["radical","count"]
["艸",3985]
["水",3774]
["口",3708]
["木",3390]
["手",2764]
["金",2678]
["心",2476]
["火",2188]
["人",2107]
["土",1991]
["糸",1945]
["虫",1887]
["竹",1835]
["言",1795]
["女",1761]
["鳥",1733]
["山",1604]
["魚",1551]
["肉",1509]
["玉",1443]
20 rows (0.001s)
select * from radicals_chart
214 rows loaded (0.000s)
WARN Domains that should be unioned has conflicting sort properties. Sort will be set to true.

Image of

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment