Skip to content

Instantly share code, notes, and snippets.

@ryanmcgrath
Forked from sym3tri/JapaneseRegex.js
Created May 20, 2011 02:32
Show Gist options
  • Save ryanmcgrath/982242 to your computer and use it in GitHub Desktop.
Save ryanmcgrath/982242 to your computer and use it in GitHub Desktop.
Regex to test for presence of Japanese characters
// REFERENCE UNICODE TABLES:
// http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
// http://www.tamasoft.co.jp/en/general-info/unicode.html
//
// TEST EDITOR:
// http://www.gethifi.com/tools/regex
//
// UNICODE RANGE : DESCRIPTION
//
// 3000-303F : punctuation
// 3040-309F : hiragana
// 30A0-30FF : katakana
// FF00-FFEF : Full-width roman + half-width katakana
// 4E00-9FAF : Common and uncommon kanji
//
// Non-Japanese punctuation/formatting characters commonly used in Japanese text
// 2605-2606 : Stars
// 2190-2195 : Arrows
// u203B : Weird asterisk thing
var regex = /[\u3000-\u303F]|[\u3040-\u309F]|[\u30A0-\u30FF]|[\uFF00-\uFFEF]|[\u4E00-\u9FAF]|[\u2605-\u2606]|[\u2190-\u2195]|\u203B/g;
var input = "input string";
if(regex.test(input)) {
console.log("Japanese characters found")
}
else {
console.log("No Japanese characters");
}
@ryanmcgrath
Copy link
Author

This is a kickass bit of code right here.

@pinopino
Copy link

it works! thank you for your amazing work!

@swathiPaipalle
Copy link

Thanks! It helps me

@FlashJunior
Copy link

thx!

@koreahadif
Copy link

Thanks~!! It works

@ram4git
Copy link

ram4git commented Apr 15, 2017

Why can't the initial few or conditions be consolidated into [\u3000-\u30FF]?

@den-chan
Copy link

den-chan commented Apr 30, 2017

As a side note, the characters from 0x4e00 to 0x9faf include Chinese-only characters. The following code will give you a list of the standard 6355 Japanese kanji:

for (var i = 0x4e00, acc=[]; i < 0x9faf; i++) acc.push(String.fromCharCode(i));
var sortedChars = acc.sort(Intl.Collator("ja-JP").compare);
var level1Kanji = sortedChars.slice(0, 2965); // JIS X 0208 - Level 1 Kanji (2965 characters)
var level2Kanji = sortedChars.slice(2965, 6355) // JIS X 0208 - Level 2 Kanji (3390 characters)

@mortress
Copy link

Thanks! It helps me

@hanhpp
Copy link

hanhpp commented Aug 22, 2017

Thanks, this helped me too.

@AndrewThian
Copy link

AndrewThian commented Nov 17, 2017

my god you lifesaver <3 hearts and cookies!

@paulgaumer
Copy link

Big thank you, this helped a lot!

@binhapp
Copy link

binhapp commented May 22, 2018

Thank you!

@wlgnsdh
Copy link

wlgnsdh commented Aug 24, 2018

thank you!!

@littletsu
Copy link

thx

@Yatufo
Copy link

Yatufo commented Nov 7, 2018

thanks!

@sudheeshms
Copy link

(y)

@KeshavGeek
Copy link

Thanks for such creative work.
Cheers

@wtberry
Copy link

wtberry commented Sep 5, 2019

Thanks, super useful code!

@icarcal
Copy link

icarcal commented Mar 20, 2020

This is awesome 👏 🎉 🚀

@hoandm
Copy link

hoandm commented Jul 14, 2020

THANKS (Y)

@izadiegizabal
Copy link

This is perfect, thank you! 🥳

@CescaMerlin
Copy link

So helpful, thank you!

@azoom-tran-nhat-anh
Copy link

Thanks!!!

@mydatahack
Copy link

OMG 🤯 🤯 🤯 Love it!

@devprantoroy
Copy link

Thank you

Copy link

ghost commented Jan 17, 2023

Thank you so much, helped me out big time!

@stefan-pribicevic
Copy link

Works like a charm, thanks.

@seishunpop
Copy link

I believe this code is what the kids call "bussin"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment