This writeup describes a tricky Scala serialization issue that we ran into while porting Spark to Scala 2.11.
First, let's create a factory that builds anonymous functions:
class FunctionFactory extends Serializable {
// This method returns a function whose $outer scope is this class.
// Note that the function does not reference any variables in this scope,
// so the generated class won't contain any fields.
def createFunc = (a: Int) => a
}
Let's also create a class that holds functions:
class FunctionHolder(val func: Int => Int) extends Serializable { }
Finally, let's put these pieces together by using the factory to create a function, putting that function into our function holder, then serializing the function holder using Java serialization:
val functionFactory = new FunctionFactory()
val functionHolder = new FunctionHolder(functionFactory.createFunc)
val fs = new FileOutputStream("test.bin")
val os = new ObjectOutputStream(fs)
os.writeObject(functionHolder)
os.close()
fs.close()
Let's read it back:
val fs = new FileInputStream("test.bin")
val os = new ObjectInputStream(fs)
val functionHolder = os.readObject().asInstanceOf[FunctionHolder]
So far, so good. Here's where things break: imagine that I want to deserialize my function in a remote JVM whose class path doesn't contain FunctionFactory
. In Scala 2.10.4, I'm able to deserialize test.bin
even after deleting FunctionFactory.class
, whereas in Scala 2.11.2 this results in ClassNotFoundException
.
To be more concrete, let's run an actual example and compare the results on 2.10.4 and 2.11.2. In addition to this README, this gist contains a Test.scala
file and test.sh
script for running this. The test script is parameterized by different Scala versions and will use sbt
to download the specified version, so we can test this with any Scala release.
Testing with Scala 2.10.4
+ rm test.bin
+ sbt '++ 2.10.4' clean compile
[info] Loading global plugins from /Users/joshrosen/.dotfiles/.sbt/0.13/plugins
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
[info] Setting version to 2.10.4
[info] Reapplying settings...
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
[success] Total time: 0 s, completed Nov 4, 2014 11:00:12 PM
[info] Updating {file:/Users/joshrosen/Documents/gists/211/}root-211...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 1 Scala source to /Users/joshrosen/Documents/gists/211/target/scala-2.10/classes...
[success] Total time: 2 s, completed Nov 4, 2014 11:00:15 PM
+ scala -cp target/scala-2.10/classes/ Main
Serializing object to test.bin
+ scala -cp target/scala-2.10/classes/ Main read
Deserializing object from test.bin
+ rm target/scala-2.10/classes/Functionfactory.class
+ scala -cp target/scala-2.10/classes/ Main read
Deserializing object from test.bin
Testing with Scala 2.11.2
+ rm test.bin
+ sbt '++ 2.11.2' clean compile
[info] Loading global plugins from /Users/joshrosen/.dotfiles/.sbt/0.13/plugins
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
[info] Setting version to 2.11.2
[info] Reapplying settings...
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
[success] Total time: 0 s, completed Nov 4, 2014 11:00:54 PM
[info] Updating {file:/Users/joshrosen/Documents/gists/211/}root-211...
[info] Resolving jline#jline;2.12 ...
[info] Done updating.
[info] Compiling 1 Scala source to /Users/joshrosen/Documents/gists/211/target/scala-2.11/classes...
[success] Total time: 3 s, completed Nov 4, 2014 11:00:57 PM
+ scala -cp target/scala-2.11/classes/ Main
Serializing object to test.bin
+ scala -cp target/scala-2.11/classes/ Main read
Deserializing object from test.bin
+ rm target/scala-2.11/classes/Functionfactory.class
+ scala -cp target/scala-2.11/classes/ Main read
Deserializing object from test.bin
java.lang.ClassNotFoundException: FunctionFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
at java.lang.Class.getDeclaredConstructors(Class.java:1901)
at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749)
at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250)
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at Main$.main(Test.scala:38)
at Main.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:68)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:99)
at scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:68)
at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:99)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:22)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:39)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:29)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:39)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:65)
at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
Looking only at the ClassNotFoundException
, your first instinct might be to see whether FunctionFactory
objects are being included in the serialized data.
If we disassemble the anonymous function, though, we see that it doesn't contain any fields. Here's the class generated by Scala 2.11.2:
javap -p target/scala-2.11/classes/FunctionFactory\$\$anonfun\$createFunc\$1.class
Compiled from "Test.scala"
public final class FunctionFactory$$anonfun$createFunc$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
public final int apply(int);
public int apply$mcII$sp(int);
public final java.lang.Object apply(java.lang.Object);
public FunctionFactory$$anonfun$createFunc$1(FunctionFactory);
}
However, its constructor does reference FunctionFactory
. This constructor isn't actually called during deserialization, though; it's only called from createFunc
when creating the anonymous function.
Let's compare this to the class generated by Scala 2.10.4:
javap -p target/scala-2.10/classes/FunctionFactory\$\$anonfun\$createFunc\$1.class
Compiled from "Test.scala"
public final class FunctionFactory$$anonfun$createFunc$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
public static final long serialVersionUID;
public final int apply(int);
public int apply$mcII$sp(int);
public final java.lang.Object apply(java.lang.Object);
public FunctionFactory$$anonfun$createFunc$1(FunctionFactory);
}
The key difference is that Scala 2.10.4 defined a serialVersionUID
, whereas 2.11.2 did not.
Since our class does not define a serialVersionUID
, Java attempts to calculate one when deserializing our object in order to determine whether the serialized representation is compatible with its version of the function class. When calculating the default SUID, Java reflectively inspects the class's constructors. In our case, the anonymous function's constructor has a parameter $outer
of type FunctionFactory
, which triggers class loading of the missing FunctionFactory
class. We can see this from the error stack trace:
java.lang.ClassNotFoundException: FunctionFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
at java.lang.Class.getDeclaredConstructors(Class.java:1901)
at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749)
at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250)
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613)
[...]
Our fancy testing script can help to find the specific Scala 2.11 release or milestone that changed this behavior. The test passes on 2.10.0
buts fails on 2.11.1
. There were only a few changes between those releases, including three commits related to serialVersionUID
:
- SI-8549 Serialization: fix regression with @SerialVersionUID / start enforcing backwards compatibility
- SI-8574 Copy @SerialVersionUID, etc, to specialized subclasses
- Avoid ClassfileAnnotation warning for @SerialVersionUID
My hunch is that the problem was introduced by one of these commits, but I'm unfamiliar with the Scala compiler internals and could be totally wrong.
Here's a rough outline of the original Spark reproduction of this issue:
package org.apache.spark
import org.scalatest.FunSuite
import org.apache.spark.{SparkConf, SparkContext}
class ReproSuite extends FunSuite {
test("Reproducing Scala 2.11 failure") {
val sc = new SparkContext("local-cluster[%d, 1, 512]".format(2),
"test", new SparkConf())
sc.parallelize(1 to 10, 10).map(x).collect()
}
// Simply moving this to the same scope as the test fixes it
def x = (k: Int) => k
}
In this test, the map(x)
function is executed in a separate JVM (launched by Spark's local-cluster
mode) that contains our test classes (such as ReproSuite.class
) but not our third-party test dependencies (such as ScalaTest's FunSuite
class).