Skip to content

Instantly share code, notes, and snippets.

@cris
Created February 9, 2011 11:15
Show Gist options
  • Save cris/818326 to your computer and use it in GitHub Desktop.
Save cris/818326 to your computer and use it in GitHub Desktop.
UTF-8 & UCS-4
require 'iconv'
utf8 = "123 abc привет"
utf8.each_codepoint {|cp| puts cp}
ucs4 = Iconv.iconv("UCS-4", "UTF-8", utf8).first
# s.pack("C*").unpack("N") - get 4bytes from array and pack them into string
# with 4 bytes and then unpack them into one 32bit number
ucs4.each_byte.each_slice(4) {|s| puts s.pack("C*").unpack("N")}
# we can see that both outputs produces the same result:
# 49
# 50
# 51
# 32
# 97
# 98
# 99
# 32
# 1087
# 1088
# 1080
# 1074
# 1077
# 1090
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment