Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@bebop-001
bebop-001 / JapRegexUtils.kt
Created June 3, 2020 04:36
Kotlin unicode-block regexes for extracting various Japanese char types to a string list.
// see: http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
val fullWidthHiraganaRegex = "[ぁ-ゟ]".toRegex()
val fullWidthKatakanaRegex = "[゠-ヿ]".toRegex()
val kanjiRegex = "[㐀-䶵一-鿋豈-頻]".toRegex()
val radicalsRegex = "[⺀-⿕]".toRegex()
val halfWidthKatakanaRegex = "[ア-ン]".toRegex()
val fullWidthAlphaNumRegex = "[!-~]".toRegex()
val japSymbolsRegex = "[、-〿]".toRegex()