Skip to content

Instantly share code, notes, and snippets.

Created December 14, 2012 01:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save anonymous/4281645 to your computer and use it in GitHub Desktop.
Save anonymous/4281645 to your computer and use it in GitHub Desktop.
Hacky bandaid to convert windows-1252 CSV uploads to UTF8
def ensure_utf8
# TODO Proper solution using https://github.com/brianmario/charlock_holmes & detection
# processing copied from http://trackingrails.com/posts/video-encoding-processor-for-carrierwave
cache_stored_file! if !cached?
file_data = File.read(current_path)
# Ugh. Only way to detect bad UTF8. See
# http://bibwild.wordpress.com/2012/04/17/checkingfixing-bad-bytes-in-ruby-1-9-char-encoding/
is_valid_utf8 = begin
file_data =~ //
true
rescue ArgumentError => e
if e.message == 'invalid byte sequence in UTF-8'
false
else
raise
end
end
if !is_valid_utf8
# Assume that it's windows-1252
File.open(current_path, 'w+') do |f|
f.write file_data.encode('UTF-8', 'windows-1252', :replace => '?', :invalid => :replace, :undef => :replace)
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment