Skip to content

Instantly share code, notes, and snippets.

@chinghanho
Created June 14, 2013 16:31
Show Gist options
  • Save chinghanho/5783349 to your computer and use it in GitHub Desktop.
Save chinghanho/5783349 to your computer and use it in GitHub Desktop.
判斷字串是否含有中文
// 英文、數字、符號:[a-z0-9~!@#&;=_\$\%\^\*\-\+\,\.\/(\\)\?\:\'\"\[\]\(\)]
// 中文:\u4e00-\u9fa5
// 日文:\u3040-\u30FF
var string = "中文內容"
string.match(/[\u4e00-\u9fa5]+/)
#encoding: UTF-8
# \p{} matches a character’s Unicode script.
# The following scripts are supported: Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, and Yi.
class String
def contains_cjk?
!!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}|\p{Hangul}/)
end
end
strings= ['日本', '광고 프로그램', '艾弗森将退出篮坛', 'Watashi ha bakana gaijin desu.']
strings.each{|s| puts s.contains_cjk?}
#true
#true
#true
#false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment