Skip to content

Instantly share code, notes, and snippets.

@tune
Created December 5, 2012 06:01
Show Gist options
  • Save tune/4212800 to your computer and use it in GitHub Desktop.
Save tune/4212800 to your computer and use it in GitHub Desktop.
Rubyを使ってタイ語の表示文字単位で文字列を区切る ref: http://qiita.com/items/55c4347df63472346ac8
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
text = "พี่ชาย" # ["e1e", "e35", "e48", "e0a", "e32", "e22"]
text.scan(/.(?:[\u0E31]|[\u0E33-\u0E3A]|[\u0E47-\u0E4E])*/).each do |ch|
clist = ch.each_codepoint.map {|cp| "U+" + sprintf("%04x", cp)}
puts ch + " [#{clist.join(' ')}]"
end
$ ruby disp_ucs_thai.rb
พี่ [U+e1e U+e35 U+e48]
ช [U+e0a]
า [U+e32]
ย [U+e22]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment