Skip to content

Instantly share code, notes, and snippets.

Avatar
🦇
He Must

Batman kjsingh

🦇
He Must
  • AWS
  • Vancouver, Canada
View GitHub Profile
@kjsingh
kjsingh / Split.scala
Created Aug 6, 2019
Split RDDs based on index
View Split.scala
val rdd = sc.parallelize(1 to 100)
//rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24
var max = 3
//max: Int = 3
var multiRdds:List[org.apache.spark.rdd.RDD[Int]] = Nil
//multiRdds: List[org.apache.spark.rdd.RDD[Int]] = List()
for(i <- 0 until max) {multiRdds = rdd.zipWithIndex.filter(_._2 % max == i).map(_._1)::multiRdds}