This script inductively analyzes relationships between conditional values and their dependent substructures from XML files exported from the official website of Standard Korean Language Dictionary (標準國語大辭典).1
Tested with Python 3.9, but it would probably work with Python 3.8, and maybe even 3.7.
The usage is simple. Just pass all exported dictionary XML files into its arguments:
./analyze.py ./*.xml
Here's an example result (as of July 2022):
<pos>
(Common): comm_pattern_info, pos_code
동사:
구:
명사:
품사 없음:
부사:
형용사:
어미:
접사:
의존 명사:
대명사:
관형사:
보조 동사:
감탄사:
조사:
수사:
<unit>
(Common): link, link_target_code, type, word
의미:
어휘:
<word_type>
(Common): lexical_info, origin, original_language_info, pos_info, relation_info, word, word_unit
고유어: conju_info, pronunciation_info
한자어: conju_info, pronunciation_info
외래어: allomorph
혼종어: conju_info, pronunciation_info
<word_unit>
(Common): pos_info, word
단어: allomorph, conju_info, lexical_info, origin, original_language_info, pronunciation_info, relation_info, word_type
속담:
구: lexical_info, original_language_info, word_type
관용구:
I know it's trivial, but distributed under GPLv3 or later.
Footnotes
-
For your information, you need an account for stdict.korean.go.kr to download exported XML files. ↩