Skip to content

Instantly share code, notes, and snippets.

View InvisibleTech's full-sized avatar

John Ferguson InvisibleTech

View GitHub Profile
@InvisibleTech
InvisibleTech / joiner.scala
Created February 5, 2016 01:34
Apache Spark Example - take two colums, join them and duplicate the other.
// Need to :paste this into Spark Shell to see it work.
//
// Load up the columns
val alpha = sc.parallelize(List("a", "b", "c", "d"))
val nums = sc.parallelize(List(1, 2, 3, 4))
// Key them by index
val alphaK = alpha.zipWithIndex.map(t => (t._2, t._1))
val numsK = nums.zipWithIndex.map(t => (t._2, t._1))