Skip to content

Instantly share code, notes, and snippets.

@mperham
Created February 10, 2010 19:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mperham/300766 to your computer and use it in GitHub Desktop.
Save mperham/300766 to your computer and use it in GitHub Desktop.
def to_ascii
s = nil
begin
# fast but fails on bad UTF-8 data
s = self.to_s.unpack('U*').collect { |c| (c <= 127) ? c.chr : translation_hash[c] }.join
rescue => ex
# much slower but doesn't
s = self.chars.split('').collect { |c| (c[0] <= 127) ? c : translation_hash[c[0]] }.join
end
converter = Iconv.new('ASCII//IGNORE//TRANSLIT', 'UTF-8')
converter.iconv(s).unpack('U*').select{ |cp| cp < 127 }.pack('U*')
end
def translation_hash
@@translation_hash ||= setup_translation_hash
end
def setup_translation_hash
accented_chars = "–—ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüý’”‘“\240".chars.split('')
unaccented_chars = "--AAAAAACEEEEIIIIDNOOOOOxOUUUUYaaaaaaceeeeiiiinoooooouuuuy'\"'\" ".split('')
translation_hash = {}
accented_chars.each_with_index { |char, idx| translation_hash[char[0]] = unaccented_chars[idx] }
translation_hash["Æ".chars[0]] = 'AE'
translation_hash["æ".chars[0]] = 'ae'
translation_hash
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment