Created
June 14, 2013 16:31
-
-
Save chinghanho/5783349 to your computer and use it in GitHub Desktop.
判斷字串是否含有中文
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// 英文、數字、符號:[a-z0-9~!@#&;=_\$\%\^\*\-\+\,\.\/(\\)\?\:\'\"\[\]\(\)] | |
// 中文:\u4e00-\u9fa5 | |
// 日文:\u3040-\u30FF | |
var string = "中文內容" | |
string.match(/[\u4e00-\u9fa5]+/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#encoding: UTF-8 | |
# \p{} matches a character’s Unicode script. | |
# The following scripts are supported: Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, and Yi. | |
class String | |
def contains_cjk? | |
!!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}|\p{Hangul}/) | |
end | |
end | |
strings= ['日本', '광고 프로그램', '艾弗森将退出篮坛', 'Watashi ha bakana gaijin desu.'] | |
strings.each{|s| puts s.contains_cjk?} | |
#true | |
#true | |
#true | |
#false |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment