Skip to content

Instantly share code, notes, and snippets.

@nunommc
Created December 17, 2013 17:53
Show Gist options
  • Save nunommc/8009500 to your computer and use it in GitHub Desktop.
Save nunommc/8009500 to your computer and use it in GitHub Desktop.
CSV charset conversion to utf-8
require 'fileutils'
module FileUtils
def self.convert_file_encoding inputfile, encoding='utf-8'
if File.exists? inputfile
file_info = %x{file -ib #{inputfile}}
file_info =~ /charset=(.*)/
file_charset = $1
unless file_charset
# we're relying on file-5.11 that returns the correct charset
# our DEV environment has an older version of this package
# that doesn't return the 'charset='', so we're defaulting it to iso-8859-1
file_charset ||= 'iso-8859-1'
puts "Current encoding of #{inputfile} defaulted to #{file_charset}"
else
puts "Current encoding of #{inputfile} is #{file_charset}"
end
if file_charset != encoding
%x{ iconv --from-code=#{file_charset} --to-code=#{encoding} #{inputfile} --output=#{inputfile}_conv }
FileUtils.mv "#{inputfile}_conv", inputfile
puts "File saved on encoding: #{encoding}"
end
else
puts "#{__CLASS__} :: File '#{file_path}' does not exist"
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment