Skip to content

Instantly share code, notes, and snippets.

@elegantcoder
Last active July 22, 2018 14:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save elegantcoder/d871a47b8f231a1c659e0c35080bdc64 to your computer and use it in GitHub Desktop.
Save elegantcoder/d871a47b8f231a1c659e0c35080bdc64 to your computer and use it in GitHub Desktop.
한글 초성 분리기
# 삼성전자 => ㅅㅅㅈㅈ, 유안타제3호스팩 => ㅇㅇㅌㅈ3ㅎㅅㅍ
def extract_korean_initials(keyword)
initials = ['ㄱ', 'ㄲ', 'ㄴ', 'ㄷ', 'ㄸ', 'ㄹ', 'ㅁ', 'ㅂ', 'ㅃ', 'ㅅ', 'ㅆ', 'ㅇ', 'ㅈ', 'ㅉ', 'ㅊ', 'ㅋ', 'ㅌ', 'ㅍ', 'ㅎ']
# hangul_range = '가'..'힣'
hangul_first = 44032 # '가'.ord
size = 588 # '까'.ord - '가'.ord
keyword.split('').collect do |k|
k_char_code = k.ord
initials[(k_char_code - hangul_first) / size] || k
end
.join('')
end
@fallroot
Copy link

fallroot commented Jul 20, 2018

require 'active_support/all'

def extract_korean_initials(keyword)
  keyword.mb_chars.chars.map do |char|
      char.mb_chars.decompose.first
  end.join
end

RoR의 ActiveSupport::Multibyte::Chars를 이용하면 좀 수월할 듯. 에전에 잠깐 만져본 경험이 있어서 ㅎㅎ

@elegantcoder
Copy link
Author

@fallroot 이건 대단히, 너무 간단한걸요!?

감사합니다 ㅎㅎㅎ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment