public
Created

Repairing a unicode string that contains invalid characters

  • Download Gist
demo.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# encoding: utf-8
 
s = "Blah \xe9 blah 헌글"
 
puts "BEFORE"
puts "encoding: #{s.encoding}"
puts "valid : #{s.valid_encoding?}"
puts "text : #{s}"
 
s = s.
encode('utf-16le', 'utf-8',
:invalid => :replace,
:undef => :replace,
:replace => "#").
encode('utf-8')
 
puts "\nAFTER"
puts "encoding: #{s.encoding}"
puts "valid : #{s.valid_encoding?}"
puts "text : #{s}"

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.