Skip to content

Instantly share code, notes, and snippets.

@Rio517
Created April 29, 2011 16:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Rio517/948552 to your computer and use it in GitHub Desktop.
Save Rio517/948552 to your computer and use it in GitHub Desktop.
fuzzymatch.rb
#http://rcoder.net/content/testing-string-similarity-using-ruby-and-zlib
#see also: http://www.postgresql.org/docs/8.3/static/fuzzystrmatch.html
#see also: http://flori.github.com/amatch/
require 'zlib'
# this is optimized to store a relatively smaller subset of common index
# terms against which new strings should be tested
class ZlibDistanceCalc
def initialize
@c_hash = Hash.new
end
def index(key, term)
@c_hash[key] = [term, compress(term)]
end
def test(term)
coeff = term.size.to_f
results = {}
@c_hash.each do |k,v|
orig_str, orig_cmp = v
delta = (compress(orig_str + term).size - orig_cmp.size) / coeff
results[k] = delta
end
return results
end
def search(term, delta=0.5)
test_results = Hash[*test(term).select {|k,v| v <= delta }.flatten]
if block_given?
test_results.each {|k,v| yield k, v }
else
return test_results
end
end
def compress(term)
Zlib::Deflate.deflate(term.strip.downcase)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment