Skip to content

Instantly share code, notes, and snippets.

@sgonyea
Created April 25, 2012 00:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sgonyea/2484827 to your computer and use it in GitHub Desktop.
Save sgonyea/2484827 to your computer and use it in GitHub Desktop.
How to test if there are non-ASCII characters in a file.
# Surely there is a better way than this. You can do grep:
# grep --color='auto' -P -n "[\x80-\xFF]"
def string_has_non_ascii_chars?(string)
begin
string.encode("US-ASCII")
return false
rescue Encoding::InvalidByteSequenceError => e
return true
end
end
# Very, very hackish.
# To identify which line / column the unicode characters are on, in the given file.
def check_for_unicode(filename)
file = File.read(filename)
bad_lines = []
file.lines.each_with_index do |line, line_no|
next unless string_has_non_ascii_chars?(line)
line.each_char.with_index do |char, col_no|
next unless string_has_non_ascii_chars?(char)
bad_lines << [line_no + 1, col_no + 1]
end
end
bad_lines.map! {|line_col| line_col.join(":") }
puts "#{filename} has unicode characters on line:col -- #{bad_lines.join(', ')}" if bad_lines.any?
end
# Lulz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment