Skip to content

Instantly share code, notes, and snippets.

@afair
Created June 11, 2012 16:32
Show Gist options
  • Save afair/2911107 to your computer and use it in GitHub Desktop.
Save afair/2911107 to your computer and use it in GitHub Desktop.
Ruby 1.8 Unicode Examples
#!/usr/bin/env ruby
# encoding: UTF-8 # 2nd line, says file has Unicode data and names
#####################################################################
# Ruby 1.8 Unicode Demonstration
#####################################################################
# Enable Unicode Support, or start with: ruby -Ku
$KCODE = "UTF-8"
puts 123 # =>
require 'jcode'
require "iconv"
require 'rubygems'
require 'unicode'
require 'rchardet'
FR = "Résumé"
EN = "Hello world"
CN = "你好世界"
JP = "こんにちは、世界"
AR = "مرحبا العالم"
GR = "Γεια σας κόσμο"
HE = "שלום עולם"
RU = "привет мир"
VN = "Xin chào thế giới"
KO = "안녕하세요 세계"
puts "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Characters"
s = FR
puts s # =>
puts s.length # =>
puts s.jsize
puts s.chars.map.inspect # =>
puts s.upcase
puts "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex"
puts s.match(/^\w+/) # =>
puts s.match(/^\w+/u) # =>
puts "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ICONV"
def convert_encoding(str, from, to='UTF-8')
# Iconv::iconv(to, from, *strs)
Iconv.conv(to, from, str)
rescue Iconv::InvalidEncoding =>e
rescue Iconv::InvalidCharacter =>e
rescue Iconv::IllegalSequence =>e
puts "Could not convert #{str} Error:#{e}"
end
text = FR
latin1 = convert_encoding(text, 'UTF-8', 'LATIN1')
puts "UTF-8: #{text}, Latin-1: #{latin1}"
puts "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CharDet"
def to_utf8(text)
cd = CharDet.detect(text)
#puts cd.inspect # => {"confidence"=>0.813842214321461, "encoding"=>"ISO-8859-2"}
text = cd['confidence'] > 0.6 ? Iconv.conv('UTF-8', cd['encoding'], text) : text
end
s = to_utf8(latin1)
puts "To UTF-8: #{to_utf8(latin1)}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment