Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Last active September 10, 2019 13:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dgadiraju/9477a2a41a2b5d6e105a4ac8b3d3eb42 to your computer and use it in GitHub Desktop.
Save dgadiraju/9477a2a41a2b5d6e105a4ac8b3d3eb42 to your computer and use it in GitHub Desktop.
val orderCustomers1 = sc.textFile("/public/retail_db/orders").
filter(o => o.split(",")(1).contains("2013-08")).
map(o => o.split(",")(2).toInt)
val orderCustomers2 = sc.textFile("/public/retail_db/orders").
filter(o => o.split(",")(1).contains("2013-09")).
map(o => o.split(",")(2).toInt)
orderCustomers1.count
orderCustomers2.count
orderCustomers1.
intersection(orderCustomers2).
take(10).
foreach(println)
orderCustomers1.
intersection(orderCustomers2).
count
val orderCustomers1 = sc.textFile("/public/retail_db/orders").
filter(o => o.split(",")(1).contains("2013-08")).
map(o => o.split(",")(2).toInt)
val orderCustomers2 = sc.textFile("/public/retail_db/orders").
filter(o => o.split(",")(1).contains("2013-09")).
map(o => o.split(",")(2).toInt)
val customers = sc.textFile("/public/retail_db/customers").
map(c => (c.split(",")(0).toInt, c))
orderCustomers1.
intersection(orderCustomers2).
map(c => (c, 1)).
join(customers).
map(c => c._2._2).
take(10).
foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment