Created
September 27, 2010 13:45
-
-
Save benui-dev/599048 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Transliterate-hacked from Perl | |
# http://blog.naver.com/PostView.nhn?blogId=mokomoji&logNo=130013133481 | |
$KCODE = 'UTF8' | |
class String | |
# I think in the original the text was forced to cp949... | |
def split_korean | |
# ㄱ ㄲ ㄴ ㄷ ㄸ ㄹ ㅁ ㅂ ㅃ ㅅ ㅆ ㅇ ㅈ ㅉ ㅊ ㅋ ㅌ ㅍ ㅎ | |
chosung = [0x3131, 0x3132, 0x3134, 0x3137, 0x3138, 0x3139, 0x3141, 0x3142, 0x3143, 0x3145, 0x3146, 0x3147, 0x3148, 0x3149, 0x314a, 0x314b, 0x314c, 0x314d, 0x314e] | |
# ㅏ ㅐ ㅑ ㅒ ㅓ ㅔ ㅕ ㅖ ㅗ ㅘ ㅙ ㅚ ㅛ ㅜ ㅝ ㅞ ㅟ ㅠ ㅡ ㅢ ㅣ | |
jwungsung = [0x314f, 0x3150, 0x3151, 0x3152, 0x3153, 0x3154, 0x3155, 0x3156, 0x3157, 0x3158, 0x3159, 0x315a, 0x315b, 0x315c, 0x315d, 0x315e, 0x315f, 0x3160, 0x3161, 0x3162, 0x3163] | |
# ㄱ ㄲ ㄳ ㄴ ㄵ ㄶ ㄷ ㄹ ㄺ ㄻ ㄼ ㄽ ㄾ ㄿ ㅀ ㅁ ㅂ ㅄ ㅅ ㅆ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ | |
jongsung = [ 0, 0x3131, 0x3132, 0x3133, 0x3134, 0x3135, 0x3136, 0x3137, 0x3139, 0x313a, 0x313b, 0x313c, 0x313d, 0x313e, 0x313f, 0x3140, 0x3141, 0x3142, 0x3144, 0x3145, 0x3146, 0x3147, 0x3148, 0x314a, 0x314b, 0x314c, 0x314d, 0x314e ] | |
raw_chars = self.unpack("U*") | |
result = Array.new | |
raw_chars.each do |char| | |
if (char >= 0xAC00 && char <= 0xD7A3) | |
# Move it down in the range | |
c = char - 0xAC00; | |
# Here be dragons | |
a = c.to_f / (21 * 28); | |
c = c % (21 * 28); | |
b = c.to_f / 28; | |
c = c % 28; | |
a = a.to_i | |
b = b.to_i | |
c = c.to_i | |
result.push( chosung[a], jwungsung[b] ) | |
if c != 0 | |
result.push( jongsung[c] ) | |
end | |
else | |
result.push(char) | |
end | |
end | |
return result.pack("U*") | |
end | |
end | |
p '안녕하세요'.split_korean |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment