Skip to content

Instantly share code, notes, and snippets.

@carlwiedemann
Last active August 29, 2015 14:09
Show Gist options
  • Save carlwiedemann/14056fb19ccc52d503dc to your computer and use it in GitHub Desktop.
Save carlwiedemann/14056fb19ccc52d503dc to your computer and use it in GitHub Desktop.
results.rb
results =
{
'123' => [
1,
2,
3
],
'234' => [
1,
2
],
'345' => [
1,
2
],
'456' => [
4,
5
],
'567' => [
6,
7
]
}
# Initialize hash of empty arrays
duplicates = Hash.new { |k,v| k[v] = [] }
# If it has more than half the elements, log the result.
threshold = 0.5;
results.each do |primary_id, primary_result|
# For every item in set check intersection with all other items.
# If length of intersection is close to length of result,
# log secondary item.
results.each do |secondary_id, secondary_result|
if primary_id != secondary_id
intersect = primary_result & secondary_result
if (intersect.length.to_f / primary_result.length.to_f) > threshold
duplicates[primary_id] << secondary_id
end
end
end
end
puts duplicates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment