Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
chinese string
# regexps to check if a string is pure chinese
class String
# 20k chars
CHINESE_UCS2 = /^(?:
[\x4e-\x9e][\x00-\xff]
|\x9f[\x00-\xa5]
)+$/xn
# 20k chars
CHINESE_UTF8 = /^(?:
\xe4[\xb8-\xbf][\x80-\xbf]
|[\xe5-\xe8][\x80-\xbf][\x80-\xbf]
|\xe9[\x80-\xbd][\x80-\xbf]
|\xe9\xbe[\x80-\xa5]
)+$/xn
# 27k chars
CHINESE_GB_2000 = /^(?:
[\xB0-\xF7][\xA1-\xFE]
|[\x81-\xA0][\x40-\xFE]
|[\xAA-\xFE][\x40-\xA0]
|[\x81-\x82][\x30-\x39][\x81-\xFE][\x30-\x39]
)+$/xn
# 70k chars (including minorities chars)
CHINESE_GB_2005 = /^(?:
[\xB0-\xF7][\xA1-\xFE]
|[\x81-\xA0][\x40-\xFE]
|[\xAA-\xFE][\x40-\xA0]
|[\x81-\x82][\x30-\x39][\x81-\xFE][\x30-\x39]
|[\x95-\x98][\x30-\x39][\x81-\xFE][\x30-\x39]
)+$/xn
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment