Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
emails = Customer.select(:email).map(&:email)
emails.map do |email|
grouping = emails
.group_by{|em| Levenshtein.distance(email, em) }
.select {|k,v| k < 5 && != 0 } # ignore exact matches and ones far off
if !grouping.empty?
[email, grouping]
end
end
@rolentle

This comment has been minimized.

Copy link

commented Apr 18, 2017

Assuming that the following is desired:

email | matches
-----------------
abc@test.com | [acb@test.com, cab@test.com]

My guess is that the actual query would look something like this:

SELECT email,
ARRAY(SELECT
b.email
FROM customers b
WHERE b.email != email
AND levenshtein(email, b.email) < 5
AND levenshtein(email, b.email) > 0) as matches
FROM customers

Then an AR version of it would be

Customer.select(:email).select("ARRAY(SELECT
b.email
FROM customers b
WHERE b.email != email
AND levenshtein(email, b.email) < 5
AND levenshtein(email, b.email) > 0) as matches
")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.