Skip to content

Instantly share code, notes, and snippets.

@eyberg
Created May 6, 2009 15:33
Show Gist options
  • Save eyberg/107557 to your computer and use it in GitHub Desktop.
Save eyberg/107557 to your computer and use it in GitHub Desktop.
unicode boms
# test.txt
#
# ����sample text
# ��more text...
# ��sample more text...
# ��
# read in our multi-byte file
File.open('test.txt', 'r') do |f|
@hfile = f.read
end
#change to unicode
unistring = @hfile.each_byte.map do |p|
[p].pack('U')
end
# drop any null bytes
blah = []
unistring.each do |o|
if !o.eql? "\000" then
blah << o
end
end
#convert array over to string
newstr = blah.to_s
# drop the bom
newstr = newstr.gsub(/\303\277\303\276/, '')
puts newstr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment