Skip to content

Instantly share code, notes, and snippets.

@luikore
Created July 18, 2009 09:55
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save luikore/149493 to your computer and use it in GitHub Desktop.
Save luikore/149493 to your computer and use it in GitHub Desktop.
chinese string
# regexps to check if a string is pure chinese
class String
# 20k chars
CHINESE_UCS2 = /^(?:
[\x4e-\x9e][\x00-\xff]
|\x9f[\x00-\xa5]
)+$/xn
# 20k chars
CHINESE_UTF8 = /^(?:
\xe4[\xb8-\xbf][\x80-\xbf]
|[\xe5-\xe8][\x80-\xbf][\x80-\xbf]
|\xe9[\x80-\xbd][\x80-\xbf]
|\xe9\xbe[\x80-\xa5]
)+$/xn
# 27k chars
CHINESE_GB_2000 = /^(?:
[\xB0-\xF7][\xA1-\xFE]
|[\x81-\xA0][\x40-\xFE]
|[\xAA-\xFE][\x40-\xA0]
|[\x81-\x82][\x30-\x39][\x81-\xFE][\x30-\x39]
)+$/xn
# 70k chars (including minorities chars)
CHINESE_GB_2005 = /^(?:
[\xB0-\xF7][\xA1-\xFE]
|[\x81-\xA0][\x40-\xFE]
|[\xAA-\xFE][\x40-\xA0]
|[\x81-\x82][\x30-\x39][\x81-\xFE][\x30-\x39]
|[\x95-\x98][\x30-\x39][\x81-\xFE][\x30-\x39]
)+$/xn
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment