Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save JoshRosen/8aacdee0162da430868e7f73247d45d8 to your computer and use it in GitHub Desktop.
Save JoshRosen/8aacdee0162da430868e7f73247d45d8 to your computer and use it in GitHub Desktop.
Serialization of Scala closures that contain local defs

Serialization of Scala closures that contain local defs

Several Apache Spark APIs rely on the ability to serialize Scala closures. Closures may reference non-Serializable objects, preventing them from being serialized. In some cases (SI-1419 and others), however, these references are unnecessary and can be nulled out, allowing otherwise-unserializable closures to be serialized (in Spark, this nulling is performed by the ClosureCleaner).

Scala 2.12's use of Java 8 lambdas for implementing closures appears to have broken our ability to serialize closures which contain local defs. If we cannot resolve this problem, Spark will be unable to support Scala 2.12 and will be stuck on 2.10 and 2.11 forever.

As an example which illustrates this problem, the following closure has a nested localDef and is defined inside of a non-serializable class:

object NonSerializableClass {
  val closure: Int => Int = (i: Int) => {
    def localDef = 2
    2 + localDef
  }
}

In Scala 2.10 / 2.11, this closure is compiled into its own class and localDef is lifted into a private method of that class:

public final class NonSerializableClass$$anonfun$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
  public static final long serialVersionUID;
    descriptor: J
  public final int apply(int);
    descriptor: (I)I

  public int apply$mcII$sp(int);
    descriptor: (I)I

  public final java.lang.Object apply(java.lang.Object);
    descriptor: (Ljava/lang/Object;)Ljava/lang/Object;

  private final int localDef$1();
    descriptor: ()I

  public NonSerializableClass$$anonfun$1();
    descriptor: ()V
}

As a result, instances of this class do not contain references to NonSerializableClass instances and therefore are serializable.

In Scala 2.12.0-M4, closure gets compiled into a Java 8 lambda and localDef gets lifted into a method of NonSerializableClass$:

public final class NonSerializableClass$ {
  public static final NonSerializableClass$ MODULE$;
    descriptor: LNonSerializableClass$;
  private final scala.Function1<java.lang.Object, java.lang.Object> closure;
    descriptor: Lscala/Function1;
  public static {};
    descriptor: ()V

  public scala.Function1<java.lang.Object, java.lang.Object> closure();
    descriptor: ()Lscala/Function1;

  private final int localDef$1();
    descriptor: ()I

  public final int NonSerializableClass$$$anonfun$5(int);
    descriptor: (I)I

  private NonSerializableClass$();
    descriptor: ()V

  private static java.lang.Object $deserializeLambda$(java.lang.invoke.SerializedLambda);
    descriptor: (Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object;
}

Because localDef was lifted into the private final int localDef$1() instance method of NonSerializableClass$, the closure lambda is implemented as an invocation of the NonSerializableClass$$$anonfun$5() instance method of NonSerializableClass$. As a result, closure's SerializedLambda captures the instance of NonSerializableClass$, rendering this closure non-serializable:

  import java.lang.invoke.SerializedLambda
  
  val writeReplaceMethod = closure.getClass.getDeclaredMethod("writeReplace")
  writeReplaceMethod.setAccessible(true)
  
  val serializedLambda = writeReplaceMethod.invoke(closure).asInstanceOf[SerializedLambda]
  
  println(serializedLambda)
  println(serializedLambda.getCapturedArg(0).getClass.getName)

outputs

SerializedLambda[capturingClass=class NonSerializableClass$, functionalInterfaceMethod=scala/runtime/java8/JFunction1$mcII$sp.apply$mcII$sp:(I)I, implementation=invokeVirtual NonSerializableClass$.NonSerializableClass$$$anonfun$5:(I)I, instantiatedMethodType=(I)I, numCaptured=1]
NonSerializableClass$

We can't null out or otherwise eliminate this captured reference because the lambda's own code is implemented by instance methods rather than static methods.

I believe that one possible workaround would be to have scalac's lambdalift phase lift the local def into a static method rather than an instance method.

@xerial
Copy link

xerial commented Nov 8, 2016

I'm also hitting this problem at my dependency injection library for Scala: wvlet/airframe#39.

@JoshRosen Did you find any workaround for it?

@JoshRosen
Copy link
Author

@xerial, it sounds like this may have been fixed for 2.12 in https://issues.scala-lang.org/browse/SI-9390

@asarkar
Copy link

asarkar commented Dec 14, 2017

Would you have any comment on this SO question regarding function serialization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment