Skip to content

Instantly share code, notes, and snippets.

@oanhnn
Forked from terrancesnyder/regex-japanese.txt
Last active October 25, 2023 01:33
Show Gist options
  • Star 35 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save oanhnn/9043867 to your computer and use it in GitHub Desktop.
Save oanhnn/9043867 to your computer and use it in GitHub Desktop.
Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf) ~ The Big Kahuna!
([一-龯])
Regex for matching Hirgana or Katakana (*)
([ぁ-んァ-ン])
Regex for matching Non-Hirgana or Non-Katakana
([^ぁ-んァ-ン])
Regex for matching Hirgana or Katakana or basic punctuation (、。’)
([ぁ-んァ-ン\w])
Regex for matching Hirgana or Katakana and random other characters
([ぁ-んァ-ン!:/])
Regex for matching Hirgana
([ぁ-ん])
Regex for matching full-width Katakana (zenkaku 全角)
([ァ-ン])
Regex for matching half-width Katakana (hankaku 半角)
([ァ-ン゙゚])
Regex for matching full-width Numbers (zenkaku 全角)
([0-9])
Regex for matching full-width Letters (zenkaku 全角)
([A-z])
Regex for matching Hiragana codespace characters (includes non phonetic characters)
([ぁ-ゞ])
Regex for matching full-width (zenkaku) Katakana codespace characters (includes non phonetic characters)
([ァ-ヶ])
Regex for matching half-width (hankaku) Katakana codespace characters (this is an old character set so the order is inconsistent with the hiragana)
([ヲ-゚])
Regex for matching Japanese Post Codes
/^\d{3}-\d{4}$/
/^\d{3}-\d{4}$|^\d{3}-\d{2}$|^\d{3}$/
Regex for matching Japanese mobile phone numbers (keitai bangou)
/^\d{3}-\d{4}-\d{4}$|^\d{11}$/
/^0\d0-\d{4}-\d{4}$/
Regex for matching Japanese fixed line phone numbers
/^[0-9-]{6,9}$|^[0-9-]{12}$/
/^\d{1,4}-\d{4}$|^\d{2,5}-\d{1,4}-\d{4}$/
Note: Katakana without mentioning "full-width" or "half-width" means "full-width katakana".
@tonyhoangdev
Copy link

good

@twwn
Copy link

twwn commented Mar 21, 2020

The kana ones don't range their unicode blocks.

While there's few who'll ever need to match the rarer ones, that also excludes combining characters from normalized strings, the chōonpu and the nakaguro. So, full one:

([ぁ-ゟ゠-ヿ])

Also, note that the "Big Kahuna" for kanji doesn't match the iteration mark 々.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment