Skip to content

Instantly share code, notes, and snippets.

@hubte1g
Forked from JoshRosen/README.md
Last active August 29, 2015 14:08
Show Gist options
  • Save hubte1g/cb56e4f4ae656643d6a3 to your computer and use it in GitHub Desktop.
Save hubte1g/cb56e4f4ae656643d6a3 to your computer and use it in GitHub Desktop.

Confusing Scala serialization puzzle

This writeup describes a tricky Scala serialization issue that we ran into while porting Spark to Scala 2.11.

First, let's create a factory that builds anonymous functions:

class FunctionFactory extends Serializable {
  // This method returns a function whose $outer scope is this class.
  // Note that the function does not reference any variables in this scope,
  // so the generated class won't contain any fields.
  def createFunc = (a: Int) => a
}

Let's also create a class that holds functions:

class FunctionHolder(val func: Int => Int) extends Serializable { }

Finally, let's put these pieces together by using the factory to create a function, putting that function into our function holder, then serializing the function holder using Java serialization:

val functionFactory = new FunctionFactory()
val functionHolder = new FunctionHolder(functionFactory.createFunc)
val fs = new FileOutputStream("test.bin")
val os = new ObjectOutputStream(fs)
os.writeObject(functionHolder)
os.close()
fs.close()

Let's read it back:

val fs = new FileInputStream("test.bin")
val os = new ObjectInputStream(fs)
val functionHolder = os.readObject().asInstanceOf[FunctionHolder]

So far, so good. Here's where things break: imagine that I want to deserialize my function in a remote JVM whose class path doesn't contain FunctionFactory. In Scala 2.10.4, I'm able to deserialize test.bin even after deleting FunctionFactory.class, whereas in Scala 2.11.2 this results in ClassNotFoundException.

To be more concrete, let's run an actual example and compare the results on 2.10.4 and 2.11.2. In addition to this README, this gist contains a Test.scala file and test.sh script for running this. The test script is parameterized by different Scala versions and will use sbt to download the specified version, so we can test this with any Scala release.

With 2.10.4

Testing with Scala 2.10.4
+ rm test.bin
+ sbt '++ 2.10.4' clean compile
[info] Loading global plugins from /Users/joshrosen/.dotfiles/.sbt/0.13/plugins
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
[info] Setting version to 2.10.4
[info] Reapplying settings...
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
[success] Total time: 0 s, completed Nov 4, 2014 11:00:12 PM
[info] Updating {file:/Users/joshrosen/Documents/gists/211/}root-211...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 1 Scala source to /Users/joshrosen/Documents/gists/211/target/scala-2.10/classes...
[success] Total time: 2 s, completed Nov 4, 2014 11:00:15 PM
+ scala -cp target/scala-2.10/classes/ Main
Serializing object to test.bin
+ scala -cp target/scala-2.10/classes/ Main read
Deserializing object from test.bin
+ rm target/scala-2.10/classes/Functionfactory.class
+ scala -cp target/scala-2.10/classes/ Main read
Deserializing object from test.bin

With 2.11.2

Testing with Scala 2.11.2
+ rm test.bin
+ sbt '++ 2.11.2' clean compile
[info] Loading global plugins from /Users/joshrosen/.dotfiles/.sbt/0.13/plugins
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
[info] Setting version to 2.11.2
[info] Reapplying settings...
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
[success] Total time: 0 s, completed Nov 4, 2014 11:00:54 PM
[info] Updating {file:/Users/joshrosen/Documents/gists/211/}root-211...
[info] Resolving jline#jline;2.12 ...
[info] Done updating.
[info] Compiling 1 Scala source to /Users/joshrosen/Documents/gists/211/target/scala-2.11/classes...
[success] Total time: 3 s, completed Nov 4, 2014 11:00:57 PM
+ scala -cp target/scala-2.11/classes/ Main
Serializing object to test.bin
+ scala -cp target/scala-2.11/classes/ Main read
Deserializing object from test.bin
+ rm target/scala-2.11/classes/Functionfactory.class
+ scala -cp target/scala-2.11/classes/ Main read
Deserializing object from test.bin
java.lang.ClassNotFoundException: FunctionFactory
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at java.lang.Class.getDeclaredConstructors0(Native Method)
	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
	at java.lang.Class.getDeclaredConstructors(Class.java:1901)
	at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749)
	at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72)
	at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250)
	at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247)
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at Main$.main(Test.scala:38)
	at Main.main(Test.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:68)
	at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
	at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:99)
	at scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:68)
	at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:99)
	at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:22)
	at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:39)
	at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:29)
	at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:39)
	at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:65)
	at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87)
	at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98)
	at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103)
	at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Understanding the exception

Looking only at the ClassNotFoundException, your first instinct might be to see whether FunctionFactory objects are being included in the serialized data.

If we disassemble the anonymous function, though, we see that it doesn't contain any fields. Here's the class generated by Scala 2.11.2:

javap -p target/scala-2.11/classes/FunctionFactory\$\$anonfun\$createFunc\$1.class
Compiled from "Test.scala"
public final class FunctionFactory$$anonfun$createFunc$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
  public final int apply(int);
  public int apply$mcII$sp(int);
  public final java.lang.Object apply(java.lang.Object);
  public FunctionFactory$$anonfun$createFunc$1(FunctionFactory);
}

However, its constructor does reference FunctionFactory. This constructor isn't actually called during deserialization, though; it's only called from createFunc when creating the anonymous function.

Let's compare this to the class generated by Scala 2.10.4:

javap -p target/scala-2.10/classes/FunctionFactory\$\$anonfun\$createFunc\$1.class
Compiled from "Test.scala"
public final class FunctionFactory$$anonfun$createFunc$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
  public static final long serialVersionUID;
  public final int apply(int);
  public int apply$mcII$sp(int);
  public final java.lang.Object apply(java.lang.Object);
  public FunctionFactory$$anonfun$createFunc$1(FunctionFactory);
}

The key difference is that Scala 2.10.4 defined a serialVersionUID, whereas 2.11.2 did not.

Since our class does not define a serialVersionUID, Java attempts to calculate one when deserializing our object in order to determine whether the serialized representation is compatible with its version of the function class. When calculating the default SUID, Java reflectively inspects the class's constructors. In our case, the anonymous function's constructor has a parameter $outer of type FunctionFactory, which triggers class loading of the missing FunctionFactory class. We can see this from the error stack trace:

java.lang.ClassNotFoundException: FunctionFactory
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at java.lang.Class.getDeclaredConstructors0(Native Method)
	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
	at java.lang.Class.getDeclaredConstructors(Class.java:1901)
	at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749)
	at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72)
	at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250)
	at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247)
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613)
	[...]	

Finding where the Scala behavior changed

Our fancy testing script can help to find the specific Scala 2.11 release or milestone that changed this behavior. The test passes on 2.10.0 buts fails on 2.11.1. There were only a few changes between those releases, including three commits related to serialVersionUID:

My hunch is that the problem was introduced by one of these commits, but I'm unfamiliar with the Scala compiler internals and could be totally wrong.

Appendix: the original Spark reproduction of this issue

Here's a rough outline of the original Spark reproduction of this issue:

package org.apache.spark 

import org.scalatest.FunSuite 

import org.apache.spark.{SparkConf, SparkContext} 

class ReproSuite extends FunSuite { 
  test("Reproducing Scala 2.11 failure") { 
    val sc = new SparkContext("local-cluster[%d, 1, 512]".format(2), 
    "test", new SparkConf()) 
    sc.parallelize(1 to 10, 10).map(x).collect() 
  } 

  // Simply moving this to the same scope as the test fixes it 
  def x = (k: Int) => k 
}

In this test, the map(x) function is executed in a separate JVM (launched by Spark's local-cluster mode) that contains our test classes (such as ReproSuite.class) but not our third-party test dependencies (such as ScalaTest's FunSuite class).

import java.io.ObjectOutputStream
import java.io.ObjectInputStream
import java.io.FileOutputStream
import java.io.FileInputStream
class FunctionFactory extends Serializable {
// This method returns a function whose $outer scope is this class.
// Note that the function does not reference any variables in this scope,
// so the generated class won't contain any fields.
def createFunc = (a: Int) => a
}
// This class is a simple wrapper that holds a function:
class FunctionHolder(val func: Int => Int) extends Serializable { }
object Main {
def main(args: Array[String]) {
if (args.size == 0) {
println("Serializing object to test.bin")
// Define a function in an outer scope
val functionFactory = new FunctionFactory()
// Put that function into an object
val functionHolder = new FunctionHolder(functionFactory.createFunc)
// Serialize that object
val fs = new FileOutputStream("test.bin")
val os = new ObjectOutputStream(fs)
os.writeObject(functionHolder)
os.close()
fs.close()
}
else {
println("Deserializing object from test.bin")
// Read the object serialized in the previous step
val fs = new FileInputStream("test.bin")
val os = new ObjectInputStream(fs)
val functionHolder = os.readObject().asInstanceOf[FunctionHolder]
}
}
}
#!/usr/bin/env bash
if [ -z $1 ]; then
echo "Usage: $0 scalaVersion"
exit -1
fi
SCALA_VERSION=$1
echo "Testing with Scala $1"
set -x
rm test.bin
sbt "++ $SCALA_VERSION" clean compile
scala -cp target/scala-*/classes/ Main
scala -cp target/scala-*/classes/ Main read
rm target/scala-*/classes/Functionfactory.class
scala -cp target/scala-*/classes/ Main read
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment