Skip to content

Instantly share code, notes, and snippets.

@pavlov99
Last active January 4, 2017 05:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pavlov99/b4f5eeb14c99fb8bf2f81d9bf532598d to your computer and use it in GitHub Desktop.
Save pavlov99/b4f5eeb14c99fb8bf2f81d9bf532598d to your computer and use it in GitHub Desktop.
Check groups overlap
// Data example:
// id group
// 1 A
// 2 A
// 2 B
// In this case object `2` belongs to both groups "A" and "B"
val overlappedGroups = groups.select($"id", $"group" as "_group")
groups
.join(overlappedGroups, (groups("id") === overlappedGroups("id")) && ($"group" < $"_group")) // NOTE: group A < group B, so duplicates (A,B) (B,A) would be removed.
.groupBy("group", "_group")
.count()
.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment