public
Last active

converting characters that blow up our app

  • Download Gist
char_converter.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
# config/initializers/char_converter.rb
require 'uri'
 
module Support
class CharConverter
 
def initialize(app)
@app = app
end
 
def call(env)
@app.call(self.class.sanitize_env(env))
end
 
def self.sanitize_env(env)
["HTTP_REFERER", "PATH_INFO", "QUERY_STRING", "REQUEST_PATH", "REQUEST_URI"].each do |key|
next unless value = env[key]
fixed = sanitize_string(URI.decode(value))
env[key] = URI.encode(fixed) if fixed
end
["HTTP_COOKIE"].each do |key|
next unless value = env[key]
fixed = sanitize_string(value)
env[key] = fixed if fixed
end
env
end
 
def self.sanitize_string(string)
return if !string.is_a?(String) || string == ''
 
# Try it as UTF-8 directly
cleaned = string.dup.force_encoding(Encoding::UTF_8)
# Some of it might be old Windows code page
cleaned.encode(Encoding::UTF_8, Encoding::Windows_1250) unless cleaned.valid_encoding?
rescue EncodingError
# Force it to UTF-8, throwing out invalid bits
cleaned.encode(Encoding::UTF_8, invalid: :replace, undef: :replace)
end
 
end
end

Thanks! I came across this problem today and modified it to work with Ruby 1.9.3:

https://gist.github.com/3639014

In some cases, for example with such url /companies/123-%d0%9d%d0%be%d0%b2%d0%be%d0%b5-%d0%b6%d0%b5%d0%bb%d1%82%d0%be%d0%b5-%d1%82%d0%b0%d0%ba%0d-0.1%81%d0%b8 you code can't help.

Solution is to use

cleaned.encode!('UTF-16', 'UTF-8', :invalid => :replace, :replace => '')
cleaned.encode!('UTF-8', 'UTF-16')

Link: http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.