Skip to content

Instantly share code, notes, and snippets.

@jamis
Created August 3, 2011 21:56
  • Star 9 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save jamis/1123893 to your computer and use it in GitHub Desktop.
Repairing a unicode string that contains invalid characters
# encoding: utf-8
s = "Blah \xe9 blah 헌글"
puts "BEFORE"
puts "encoding: #{s.encoding}"
puts "valid : #{s.valid_encoding?}"
puts "text : #{s}"
s = s.
encode('utf-16le', 'utf-8',
:invalid => :replace,
:undef => :replace,
:replace => "#").
encode('utf-8')
puts "\nAFTER"
puts "encoding: #{s.encoding}"
puts "valid : #{s.valid_encoding?}"
puts "text : #{s}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment