Skip to content

Instantly share code, notes, and snippets.

@4lex1v
Created November 16, 2015 10:30
Show Gist options
  • Save 4lex1v/095d05e534e74f4ed714 to your computer and use it in GitHub Desktop.
Save 4lex1v/095d05e534e74f4ed714 to your computer and use it in GitHub Desktop.
Code for demystifying spark serization error article
/**
* Model for our case.
* Note that [[Value]] class is not serializable, while
* [[Wrapper]] is a case class, which is serializable by default.
*/
class Value
case class Wrapper(v: Value)
// defined class Value
// defined class Wrapper
val wrapper = Wrapper(new Value)
// wrapper: Wrapper = Wrapper(Value@2d8f65a4)
val data = sc.parallelize(1 to 10)
// data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1]
data.map(_ => value).foreach(println)
// org.apache.spark.SparkException: Task not serializable
// at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
// ...
sc.parallelize(1 to 10).map(_ => Wrapper(new Value)).foreach(println)
// Wrapper($line15.$read$$iwC$$iwC$Value@1479edee)
// Wrapper($line15.$read$$iwC$$iwC$Value@342efa0e)
// 8 more ...
data.map(_ => Wrapper(new Value)).collect()
// 15/11/16 10:07:38 ERROR Executor: Exception in task 6.0 in stage 1.0 (TID 14)
// java.io.NotSerializableException: $line15.$read$$iwC$$iwC$Value
// ...
data.map(_ => Wrapper(new Value)).count()
// res6: Long = 10
data.map(_ => Wrapper(new Value)).map(_.toString).reduce(_ + _)
// res7: String = Wrapper(...)Wrapper(..)...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment