Skip to content

Instantly share code, notes, and snippets.

@shaggyone
Created January 27, 2014 09:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shaggyone/8645347 to your computer and use it in GitHub Desktop.
Save shaggyone/8645347 to your computer and use it in GitHub Desktop.
Ruby implementation of Sift3 strings distance much faster alternative to levenshtein distance
# Ruby version of Siderite's fast and accurate string distance algorith described here:
#
# http://siderite.blogspot.com/2007/04/super-fast-and-accurate-string-distance.html
class StringTools
def self.sift3distance(s1, s2, maxOffset, maxDistance)
if (s1.try(:length) || 0) == 0
return s2.try(:length) || 0
end
if (s2.try(:length) || 0) == 0
return s1.length
end
if s1 == s2
return 0
end
c1 = c2 = lcs = 0
temporaryDistance = nil
while (c1 < s1.length) && (c2 < s2.length)
if s1[c1] == s2[c2]
lcs += 1
else
if (c1<c2)
c2=c1
else
c1=c2
end
maxOffset.times do |i|
if ((c1 + i < s1.length) && (s1[c1 + i] == s2[c2]))
c1 += i
break
end
if ((c2 + i < s2.length) && (s1[c1] == s2[c2 + i]))
c2 += i
break
end
end
end
c1 += 1
c2 += 1
if maxDistance.present?
temporaryDistance=(c1+c2)/1.5-lcs
if (temporaryDistance>=maxDistance)
return temporaryDistance.round
end
end
end
return ((s1.length + s2.length) / 1.5 - lcs).round
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment