Skip to content

Instantly share code, notes, and snippets.

@l1x
Created April 25, 2011 15:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save l1x/940697 to your computer and use it in GitHub Desktop.
Save l1x/940697 to your computer and use it in GitHub Desktop.
require 'zlib'
require 'pp'
require 'logger'
# original seen on http://www.igvita.com/2011/04/20/intuition-data-driven-machine-learning/
$log = Logger.new(STDOUT)
$log.level = Logger::WARN
class CompareFiles
attr_reader :pairs
def initialize
@files = Dir[ARGV[0] + '/*']
@pairs = process_files(@files)
return @pairs
end
def deflate(*files)
z = Zlib::Deflate.new
z.deflate(files.collect {|f|
open(f).read
}.join("\\n"), Zlib::FINISH).size
end
private :deflate
def process_files(files)
pairs = files.combination(2).collect { |f1, f2|
a, b = deflate(f1), deflate(f2)
both = deflate(f1, f2)
{:files => [f1, f2], :score => (a+b)-both}
}
return pairs
end
private :process_files
end
begin
pairwise = CompareFiles.new.pairs
pp pairwise.sort {|a,b| b[:score] <=> a[:score]}[0,20]
rescue => ex
$log.fatal("Caught exception....")
$log.fatal(ex)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment