Skip to content

Instantly share code, notes, and snippets.

@taylorbrooks
Created April 18, 2017 21:10
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save taylorbrooks/ba23cdaa7b1ad2147b683814111c3362 to your computer and use it in GitHub Desktop.
Save taylorbrooks/ba23cdaa7b1ad2147b683814111c3362 to your computer and use it in GitHub Desktop.
emails = Customer.select(:email).map(&:email)
emails.map do |email|
grouping = emails
.group_by{|em| Levenshtein.distance(email, em) }
.select {|k,v| k < 5 && != 0 } # ignore exact matches and ones far off
if !grouping.empty?
[email, grouping]
end
end
@rolentle
Copy link

rolentle commented Apr 18, 2017

Assuming that the following is desired:

email | matches
-----------------
abc@test.com | [acb@test.com, cab@test.com]

My guess is that the actual query would look something like this:

SELECT email,
ARRAY(SELECT
b.email
FROM customers b
WHERE b.email != email
AND levenshtein(email, b.email) < 5
AND levenshtein(email, b.email) > 0) as matches
FROM customers

Then an AR version of it would be

Customer.select(:email).select("ARRAY(SELECT
b.email
FROM customers b
WHERE b.email != email
AND levenshtein(email, b.email) < 5
AND levenshtein(email, b.email) > 0) as matches
")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment