Skip to content

Instantly share code, notes, and snippets.

@pmarreck
Created September 22, 2011 18:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pmarreck/1235549 to your computer and use it in GitHub Desktop.
Save pmarreck/1235549 to your computer and use it in GitHub Desktop.
A little code to validate your entire repo for proper UTF-8 encoding.
#!/usr/bin/env ruby
class Utf8Checker
def initialize(path = `pwd`.chomp)
@files = Dir["#{path}/**/**"]
@total_num = @files.count
puts "#{@total_num} files to process."
@num_left = @total_num
@valid = false
@err = false
@extension_hash = Hash.new(0)
end
def process
@files.each do |file|
@valid = true
@err = false
begin
unless /\.(jpg|png|gif)$/ =~ file
if /text/ =~ `file #{file}`.chomp
ext = file.match(/\.([a-z]{3,5})$/)
@extension_hash[file.match(/\.([a-z]{3,5})$/)[1]] += 1 if ext
@valid = File.read(file).force_encoding("utf-8").valid_encoding?
end
end
rescue ArgumentError => e
unless /\.(jpg|png|gif)$/ =~ file
@valid = false
@err = true
end
end
unless @valid
puts
puts "File #{file} has INVALID utf-8 encoding!#{' ' + e.message if @err}"
end
@num_left -= 1
# print '.' if num_left % 10 == 0
puts "#{@num_left} left to process" if @num_left % 1000 == 0
end
""
@extension_hash
end
end
u = Utf8Checker.new; nil
u.process
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment