Skip to content

Instantly share code, notes, and snippets.

@HyukjinKwon
Last active March 25, 2016 15:39
Show Gist options
  • Save HyukjinKwon/5b044d46d07907d66d29 to your computer and use it in GitHub Desktop.
Save HyukjinKwon/5b044d46d07907d66d29 to your computer and use it in GitHub Desktop.
[Spark][Scala] repartition issue reproduce
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
case class Custom(a: Int, b: String)
// Set parallelism level as 2
val conf = new SparkConf().setAppName("RepartitionIssue").setMaster("local[2]")
val sc = new SparkContext(conf)
val data = Seq(Custom(1, "a"), Custom(2, "b"))
val rdd = sc.parallelize(data)
// Try to repartition data
val mappedRDD = rdd.repartition(2)
.mapPartitions { iter =>
// Print out at executor-side
iter.foreach(println)
iter
}
mappedRDD.collect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment