Created
December 30, 2016 12:54
-
-
Save romiras/386e3694a59949f6bef29f11af03531c to your computer and use it in GitHub Desktop.
Simple function for fuzzy string match
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'active_support/all' # mb_chars | |
def simple_fuzzy_match(s1, s2) | |
levenshtein_distance( normalize_str(s1), normalize_str(s2) ) < 2 | |
end | |
def normalize_str(s) | |
s. | |
mb_chars. # convert to multibyte string (ActiveSupport::Multibyte::Chars) - required in Ruby version below 2.4 | |
downcase. # lower case for all characters | |
strip. # remove whitespace from start and end | |
split(/\s+/). # RegEx split by spaces into array of words | |
sort. # sort array of words alphabetically | |
join(' ') # join back to string by concatenating with space for further comparison by Levenshtein distance | |
end | |
### Helper function | |
# http://stackoverflow.com/questions/16323571/measure-the-distance-between-two-strings-with-ruby | |
def levenshtein_distance(s, t) | |
m = s.length | |
n = t.length | |
return m if n == 0 | |
return n if m == 0 | |
d = Array.new(m+1) {Array.new(n+1)} | |
(0..m).each {|i| d[i][0] = i} | |
(0..n).each {|j| d[0][j] = j} | |
(1..n).each do |j| | |
(1..m).each do |i| | |
d[i][j] = if s[i-1] == t[j-1] # adjust index into string | |
d[i-1][j-1] # no operation required | |
else | |
[ d[i-1][j]+1, # deletion | |
d[i][j-1]+1, # insertion | |
d[i-1][j-1]+1, # substitution | |
].min | |
end | |
end | |
end | |
d[m][n] | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment