Skip to content

Instantly share code, notes, and snippets.

@drbridgewater
Created March 22, 2014 04:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drbridgewater/9701535 to your computer and use it in GitHub Desktop.
Save drbridgewater/9701535 to your computer and use it in GitHub Desktop.
filter joins in scoobi are amazing
// deep joins are easy using scoobi.
// for instance if you have a function that computes your most frequent queries in hadoop and you want to
// return only the search session objects (which could have tons of event data other than searches)
// that contain at least one of those queries you can do it like this.
val sessions_w_pop_queries = listOfPopQueries.join(sessions).filter {
case (pop_queries, session) => session.justSearchPages.flatMap { _.keyword }.toSet.intersect(pop_queries).nonEmpty
}.map { case (pop_queries, session) => session }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment