Skip to content

Instantly share code, notes, and snippets.

@PanAeon
Last active December 17, 2015 15:41
Show Gist options
  • Save PanAeon/c7d7bbd4dfd7e1c7afb9 to your computer and use it in GitHub Desktop.
Save PanAeon/c7d7bbd4dfd7e1c7afb9 to your computer and use it in GitHub Desktop.
Serialization Error
val rdd = sc.parallelize(Stream.from(1).take(1000).toList,4)
class NotSerializable()
val bar = new NotSerializable()
val simpleVariable = Set(1, 2, 3, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19)
// here's the problem, this code runs forever:
rdd
.filter(x => simpleVariable(x)) // translates to this.simpleVariable
.collect()
// btw, and this works, fine because of parameter capture:
rdd
.filter(simpleVariable)
.collect()
/* =========== here's a workaround: =============================
object Foo1 {
val simpleVariable = Set(1, 2, 3, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19)
}
import Foo1._
rdd
.filter(x => simpleVariable(x))
.collect()
*/
// But for a newbie scala user it would be quite difficult to figure out this solution!!!!
// so my question is - why not we wrap each line in an object automatically, as in a workaround?
// If the answer is positive I will create a new ticket and will try to fix this and submit PR.
rdd
.filter(x => simpleVariable(x)) // translates to $line11.$read$$iw.......$iw.simpleVariable, works fine
.collect()
@PanAeon
Copy link
Author

PanAeon commented Dec 17, 2015

yeah, I already see a problem with defining companion objects for example, and out of order definitions, maybe we'll need a special cell type for them, which will complicate things, but I hope that we will avoid a greater evil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment