Skip to content

Instantly share code, notes, and snippets.

@acehand
Last active December 30, 2015 12:49
Show Gist options
  • Save acehand/7831227 to your computer and use it in GitHub Desktop.
Save acehand/7831227 to your computer and use it in GitHub Desktop.
Handling weird multibyte encodings and converting them to proper utf-8
def uncoupleEncodings text
text = text.force_encoding 'ASCII-8BIT'
uncoupledText = text.
gsub("#{226.chr}#{128.chr}#{152.chr}","'").
gsub("#{226.chr}#{128.chr}#{153.chr}","'").
gsub("#{226.chr}#{128.chr}#{156.chr}",'"').
gsub("#{226.chr}#{128.chr}#{157.chr}",'"').
gsub("#{226.chr}#{128.chr}#{147.chr}","--").
gsub("#{226.chr}#{128.chr}#{148.chr}","---").
gsub("#{226.chr}#{128.chr}#{162.chr}","*").
gsub("#{226.chr}#{128.chr}#{166.chr}","...").
gsub("#{194.chr}#{183.chr}","*")
return uncoupledText.force_encoding 'UTF-8'
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment