Several Apache Spark APIs rely on the ability to serialize Scala closures. Closures may reference non-Serializable objects, preventing them from being serialized. In some cases (SI-1419 and others), however, these references are unnecessary and can be nulled out, allowing otherwise-unserializable closures to be serialized (in Spark, this nulling is performed by the ClosureCleaner
).
Scala 2.12's use of Java 8 lambdas for implementing closures appears to have broken our ability to serialize closures which contain local def
s. If we cannot resolve this problem, Spark will be unable to support Scala 2.12 and will be stuck on 2.10 and 2.11 forever.
As an example which illustrates this problem, the following closure
has a nested localDef
and is defined inside of a non-serializable class:
object NonSerializableClass {
val closure: Int => Int = (i: Int) => {
def localDef = 2
2 + localDef
}
}
In Scala 2.10 / 2.11, this closure is compiled into its own class and localDef
is lifted into a private method of that class:
public final class NonSerializableClass$$anonfun$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
public static final long serialVersionUID;
descriptor: J
public final int apply(int);
descriptor: (I)I
public int apply$mcII$sp(int);
descriptor: (I)I
public final java.lang.Object apply(java.lang.Object);
descriptor: (Ljava/lang/Object;)Ljava/lang/Object;
private final int localDef$1();
descriptor: ()I
public NonSerializableClass$$anonfun$1();
descriptor: ()V
}
As a result, instances of this class do not contain references to NonSerializableClass
instances and therefore are serializable.
In Scala 2.12.0-M4, closure
gets compiled into a Java 8 lambda and localDef
gets lifted into a method of NonSerializableClass$
:
public final class NonSerializableClass$ {
public static final NonSerializableClass$ MODULE$;
descriptor: LNonSerializableClass$;
private final scala.Function1<java.lang.Object, java.lang.Object> closure;
descriptor: Lscala/Function1;
public static {};
descriptor: ()V
public scala.Function1<java.lang.Object, java.lang.Object> closure();
descriptor: ()Lscala/Function1;
private final int localDef$1();
descriptor: ()I
public final int NonSerializableClass$$$anonfun$5(int);
descriptor: (I)I
private NonSerializableClass$();
descriptor: ()V
private static java.lang.Object $deserializeLambda$(java.lang.invoke.SerializedLambda);
descriptor: (Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object;
}
Because localDef
was lifted into the private final int localDef$1()
instance method of NonSerializableClass$
, the closure
lambda is implemented as an invocation of the NonSerializableClass$$$anonfun$5()
instance method of NonSerializableClass$
. As a result, closure
's SerializedLambda
captures the instance of NonSerializableClass$
, rendering this closure non-serializable:
import java.lang.invoke.SerializedLambda
val writeReplaceMethod = closure.getClass.getDeclaredMethod("writeReplace")
writeReplaceMethod.setAccessible(true)
val serializedLambda = writeReplaceMethod.invoke(closure).asInstanceOf[SerializedLambda]
println(serializedLambda)
println(serializedLambda.getCapturedArg(0).getClass.getName)
outputs
SerializedLambda[capturingClass=class NonSerializableClass$, functionalInterfaceMethod=scala/runtime/java8/JFunction1$mcII$sp.apply$mcII$sp:(I)I, implementation=invokeVirtual NonSerializableClass$.NonSerializableClass$$$anonfun$5:(I)I, instantiatedMethodType=(I)I, numCaptured=1]
NonSerializableClass$
We can't null out or otherwise eliminate this captured reference because the lambda's own code is implemented by instance methods rather than static methods.
I believe that one possible workaround would be to have scalac
's lambdalift
phase lift the local def
into a static method rather than an instance method.
I'm also hitting this problem at my dependency injection library for Scala: wvlet/airframe#39.
@JoshRosen Did you find any workaround for it?