Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save atroutt/25cc0022c050fa258a34 to your computer and use it in GitHub Desktop.
Save atroutt/25cc0022c050fa258a34 to your computer and use it in GitHub Desktop.
require 'json'
require 'levenshtein'
# Given two files with lists (one item per line) see if you can fuzzy match strings between the two.
# Only prints out matching strings and the match distance (lower is better, scale is normalized 0.0-1.0)
shortList = []
File.open("shortList", "r") do |f|
f.each_line do |v|
shortList << v
end
end
File.open("longList", "r") do |f|
f.each_line do |s|
shortList.each do |v|
d = Levenshtein.normalized_distance(v, s)
puts "#{v} matches #{s} --distance #{d}" if d < 0.25 # for my data 0.25 was a good cutoff for fuzzy matches
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment