Skip to content

Instantly share code, notes, and snippets.

@phoet
Last active September 27, 2015 21:48
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save phoet/1336754 to your computer and use it in GitHub Desktop.
Save phoet/1336754 to your computer and use it in GitHub Desktop.
converting characters that blow up our app
# config/initializers/char_converter.rb
require 'uri'
module Support
class CharConverter
def initialize(app)
@app = app
end
def call(env)
@app.call(self.class.sanitize_env(env))
end
def self.sanitize_env(env)
["HTTP_REFERER", "PATH_INFO", "QUERY_STRING", "REQUEST_PATH", "REQUEST_URI"].each do |key|
next unless value = env[key]
fixed = sanitize_string(URI.decode(value))
env[key] = URI.encode(fixed) if fixed
end
["HTTP_COOKIE"].each do |key|
next unless value = env[key]
fixed = sanitize_string(value)
env[key] = fixed if fixed
end
env
end
def self.sanitize_string(string)
return if !string.is_a?(String) || string == ''
# Try it as UTF-8 directly
cleaned = string.dup.force_encoding(Encoding::UTF_8)
# Some of it might be old Windows code page
cleaned.encode(Encoding::UTF_8, Encoding::Windows_1250) unless cleaned.valid_encoding?
rescue EncodingError
# Force it to UTF-8, throwing out invalid bits
cleaned.encode(Encoding::UTF_8, invalid: :replace, undef: :replace)
end
end
end
@pithyless
Copy link

Thanks! I came across this problem today and modified it to work with Ruby 1.9.3:

https://gist.github.com/3639014

@petRUShka
Copy link

In some cases, for example with such url /companies/123-%d0%9d%d0%be%d0%b2%d0%be%d0%b5-%d0%b6%d0%b5%d0%bb%d1%82%d0%be%d0%b5-%d1%82%d0%b0%d0%ba%0d-0.1%81%d0%b8 you code can't help.

Solution is to use

cleaned.encode!('UTF-16', 'UTF-8', :invalid => :replace, :replace => '') cleaned.encode!('UTF-8', 'UTF-16')

Link: http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment