Skip to content

Instantly share code, notes, and snippets.

@abeln
Last active December 18, 2020 13:57
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abeln/9f79774bac111d99b3ae2cb9016a33e6 to your computer and use it in GitHub Desktop.
Save abeln/9f79774bac111d99b3ae2cb9016a33e6 to your computer and use it in GitHub Desktop.
Scala with Explicit Nulls

project logo

Scala with Explicit Nulls

Contents

What is This?

This proposal and the accompanying pull request describe a modification to the Scala type system that makes reference types (anything that extends AnyRef) non-nullable.

This means the following code will no longer typecheck

val x: String = null // error: found `Null`,  but required `String`

Instead, to mark a type as nullable we use a type union

val x: String|Null = null // ok

Read on for details.

New Type Hierarchy

There are two type hierarchies with respect to Null, depending on whether we're before or after erasure.

Before erasure, Null is no longer a subtype of all reference types. Additionally, Null is a subtype of Any directly, as opposed to AnyRef.

type hierarchy before erasure

After erasure, Null remains a subtype of all reference types (as forced by the JVM).

type hierarchy after erasure

Unsoundness

The new type system is unsound with respect to null. This means there are still instances where an expressions has a non-nullable type like String, but its value is null.

The unsoundness happens because uninitialized fields in a class start out as null:

class C {
  val f: String = foo(f)
  def foo(f2: String): String = if (f2 == null) "field is null" else f2 
}
val c = new C()
// c.f == "field is null"

Enforcing sound initialization is a non-goal of this proposal. However, once we have a type system where nullability is explicit, we can use a sound initialization scheme like the one proposed by @liufengyun and @biboudis in scala/scala3#4543 to eliminate this particular source of unsoundness.

Equality

Because of the unsoundness, we need to allow comparisons of the form x == null or x != null even when x has a non-nullable reference type (but not a value type). This is so we have an "escape hatch" for when we know x is nullable even when the type says it shouldn't be.

val x: String|Null = null
x == null       // ok: x is a nullable string
"hello" == null // ok: String is a reference type
1 == null       // error: Int is a value type

Reference Equality

Recall that Null is now a direct subtype of Any, as opposed to AnyRef. However, we also need to allow reference equality comparisons:

val x: String = null
x eq null // ok: could return `true` because of unsoundness

We implement this by moving the eq and ne methods out of AnyRef and into a new trait RefEq that is extended by both AnyRef and Null:

trait RefEq {
  def eq(that: RefEq): Boolean
  def ne(that: RefEq): Boolean
}

class AnyRef extends Any with RefEq
class Null extends Any with RefEq

Working with Null

To make working with nullable values easier, we propose adding a few utilities to the standard library. So far, we have found the following useful:

  • An extension method .nn to "cast away" nullability
    implicit class NonNull[T](x: T|Null) extends AnyVal {
      def nn: T = if (x == null) {
        throw new NullPointerException("tried to cast away nullability, but value is null")
      } else {
        x.asInstanceOf[T]
      }
    }
    This means that given x: String|Null, x.nn has type String, so we can call all the usual methods on it. Of course, x.nn will throw a NPE if x is null.
  • Implicit conversions from/to nullable arrays
    implicit def fromNullable1[T](a: Array[T|Null]): Array[T] = a.asInstanceOf[Array[T]]
    implicit def toNullable1[T](a: Array[T]): Array[T|Null] = a.asInstanceOf[Array[T|Null]]
    // similar methods for matrices and higher dimensions: e.g. `toNullable{2, 3, ...}`
    These are useful because Java APIs often return nullable arrays. Additionally, because Array is invariant neither Array[T] <: Array[T|Null] nor the other way, so to go from one to the other we need casts. For example, suppose we want to write a function that sorts arrays
    def sort[T <: AnyRef : Ordering](a: Array[T]): Array[T] = {
      // error: `copyOf` expects an `Array[T|Null]` and returns an `Array[T|Null]`
      val a2: Array[T] = java.util.Arrays.copyOf(a, a.length).nn
      scala.util.Sorting.quickSort(a2)
      a2
    }
    To fix the error we use the implicit conversions above, which turn the third line into
    val a2: Array[T] = fromNullable1(java.util.Arrays.copyOf(toNullable1(a), a.length)).nn
    Of course, this is also unsound and a2 could end up with a null value in it.

Java Interop

The compiler can load Java classes in two ways: from source or from bytecode. In either case, when a Java class is loaded, we "patch" the type of its members to reflect that Java types remain implicitly nullable.

Specifically, we patch

  • the type of fields
  • the argument type and return type of methods

Nullification Function

We do the patching with a "nullification" function nf on types:

1. nf(R)      = R|JavaNull          if R is a reference type (a subtype of AnyRef)
2. nf(R)      = R                   if R is a value type (a subtype of AnyVal)
3. nf(T)      = T|JavaNull          if T is a type parameter
4. nf(C[R])   = C[R]|JavaNull       if C is Java-defined
5. nf(C[R])   = C[nf(R)]|JavaNull   if C isn't Java-defined
6. nf(A => B) = nf(A) => nf(B)    
7. nf(A & B)  = nf(A) & nf(B) 
8. nf(T)      = T                   otherwise (T is any other type)

JavaNull is an alias for Null with magic properties (see below). We illustrate the rules for nf below with examples.

  • The first two rules are easy: we nullify reference types but not value types.

    class C {
      String s;
      int x;
    }
    ==>
    class C {
      val s: String|Null
      val x: Int
    }
  • In rule 3 we nullify type parameters because in Java a type parameter is always nullable, so the following code compiles.

    class C<T> { T foo() { return null; } }
    ==>
    class C[T] { def foo(): T|Null } 

    Notice this is rule is sometimes too conservative, as witnessed by

    class InScala {
      val c: C[Bool] = ???  // C as above
      val b: Bool = c.foo() // no longer typechecks, since foo now returns Bool|Null
    }
  • Rule 4 reduces the number of redundant nullable types we need to add. Consider

    class Box<T> { T get(); }
    class BoxFactory<T> { Box<T> makeBox(); }
    ==>
    class Box[T] { def get(): T|JavaNull }
    class BoxFactory[T] { def makeBox(): Box[T]|JavaNull }

    Suppose we have a BoxFactory[String]. Notice that calling makeBox() on it returns a Box[String]|JavaNull, not a Box[String|JavaNull]|JavaNull, because of rule 4. This seems at first glance unsound ("What if the box itself has null inside?"), but is sound because calling get() on a Box[String] returns a String|JavaNull, as per rule 3.

    Notice that for rule 4 to be correct we need to patch all Java-defined classes that transitively appear in the argument or return type of a field or method accessible from the Scala code being compiled. Absent crazy reflection magic, we think that all such Java classes must be visible to the Typer in the first place, so they will be patched.

  • Rule 5 is needed because Java code might use a generic that's defined in Scala as opposed to Java.

    class BoxFactory<T> { Box<T> makeBox(); } // Box is Scala defined
    ==>
    class BoxFactory[T] { def makeBox(): Box[T|JavaNull]|JavaNull }

    In this case, since Box is Scala-defined, nf is applied to the type argument T, so rule 3 applies and we get Box[T|JavaNull]|JavaNull. This is needed because our nullability function is only applied (modularly) to the Java classes, but not to the Scala ones, so we need a way to tell Box that it contains a nullable value.

  • Rules 6 and 7 just recurse structurally on the components of the type. The implementation of rule 7 n the compiler are a bit more involved than the presentation above. Specifically, the implementation makes sure to add | Null only at the top level of a type: e.g. nf(A & B) = (A & B) | JavaNull, as opposed to (A | JavaNull) & (B | JavaNull).

JavaNull

To enable method chaining on Java-returned values, we have a special JavaNull alias

type JavaNull = Null @FromJava

JavaNull behaves just like Null, except it allows (unsound) member selections:

// Assume someJavaMethod()'s original Java signature is
// String someJavaMethod() {}
val s2: String = someJavaMethod().trim().substring(2).toLowerCase() // unsound

Here, all of trim, substring and toLowerCase return a String|JavaNull. The Typer notices the JavaNull and allows the member selection to go through. However, if someJavaMethod were to return null, then the first member selection would throw a NPE.

Without JavaNull, the chaining becomes too cumbersome

val ret = someJavaMethod()
val s2 = if (ret != null) {
  val tmp = ret.trim()
  if (tmp != null) {
    val tmp2 = tmp.substring(2)
    if (tmp2 != null) {
      tmp2.toLowerCase()
    }
  }
}
// Additionally, we need to handle the `else` branches.

JavaNull is also special in a few other ways

  • users cannot directly write the JavaNull type (in this sense, JavaNull is similar to Kotlin's platform types which are too non-denotable). Instead, users can only write Null, which is more restrictive.
    val s: String|Null = someJavaMethod()
    s.trim().substring(2) // error: `trim` is not a member of `String|Null`
    
  • JavaNull can, however, be inferred
    val s = someJavaMethod() // s: String|JavaNull inferred
    s.trim().substring(2) // ok
    
  • if the user wants to use JavaNull "non-locally" (e.g. by passing it to another method), they'll need to either cast away the nullability or use vanilla Null (this is a consequence of the first two bullets)
    val s = someJavaMethod() // s: String|JavaNull inferred
    def foo(s: String|Null) = ???
    foo(s) // allowed, but within `foo` we have to work with the strict `Null`
    
  • When looking for an implicit conversion from A|JavaNull to B, if we can't find it, then also look for an implicit conversion from A to B. This is done for ease of interop with Java.

Improving Precision

Our nullification function is sometimes too conservative. For example, consider the trim method in java.lang.String:

/** Returns a copy of the string, with leading and trailing whitespace omitted. */
public String trim()

A vanilla application of nullification will turn the above into def trim(): String|JavaNull. However, inspecting the function contract in the javadoc reveals that if s is a non-null string to begin with, then s.trim() should in fact never return null (if s is null, then s.trim() should throw). From the point of view of Scala it is then preferable to have a more precise signature for trim where the return type is String.

In general, this is a problem for many methods in the Java standard library: in particular, methods in commonly-used classes like String and Class. We address this by maintaining a whitelist of methods that need "special" treatment during nullification (e.g. nullify only the arguments, or only the return type).

Currently, the whitelist is hard-coded in Dotty, but the plan is to use an annotated version of the Java JDK with nullability annotations produced by the Checker Framework (hat tip to @liufengyun and Werner Dietl for suggesting this).

toString

The toString method is a special case where we chose to trade away soundness for precision and usability. toString is special because even though it lives in scala.Any, it actually comes from java.lang.Object and it could, in principle, be overriden to return null.

However, changing toString's signature to return String|JavaNull would break too much existing Scala code. Additionally, our assumption is that toString is unlikely to return null.

This means the signature of toString remains def toString(): String.

Binary Compatibility

Our strategy for binary compatibility with Scala binaries that predate explicit nulls is to leave the types unchanged and be compatible but unsound.

Concretely, the problem is how to interpret the return type of foo below

// As compiled by e.g. Scala 2.12
class Old {
  def foo(): String = ???
}

There are two options:

  • def foo(): String
  • def foo(): String|Null

The first option is unsound. The second option matches how we handle Java methods.

However, this approach is too-conservative in the presence of generics

class Old[T] {
  def id(x: T): T = x
}
==>
class Old[T] {
  def id(x: T|Null): T|Null = x
}

If we instantiate Old[T] with a value type, then id now returns a nullable value, even though it shouldn't:

val o: Old[Boolean] = ???
val b = o.id(true) // b: Boolean|Null

So really the options are between being unsound and being too conservative. The unsoundness only kicks in if the Scala code being used returns a null value. We hypothesize that null is used infrequently in Scala libraries, so we go with the first option.

If a using an unported Scala library that produces null, the user can wrap the (hopefully rare) API in a type-safe wrapper:

// Unported library
class Old {
  def foo(): String = null
}

// User code in explicit-null world
def fooWrapper(o: Old): String|Null = o.foo() // ok: String <: String|Null

val o: Old = ???
val s = fooWrapper(o)

If the offending API consumes null, then the user can cast the null literal to the right type (the cast will succeed, since at runtime Null is a subtype of any reference type).

// Unported library
class Old() {
  /** Pass a String, or null to signal a special case */
  def foo(s: String): Unit = ???
}

// User code in explicit-null world
val o: Old = ???
o.foo(null.asInstanceOf[String]) // ok: cast will succeed at runtime

Flow-Sensitive Type Inference

We added a simple form of flow-sensitive type inference. The idea is that if p is a stable path, then we can know that p is non-null if it's "compared" with the null literal. This information can then be propagated to the then and else branches of an if-statement (among other places).

Example:

val s: String|Null = ???
if (s != null) {
  // s: String
}
// s: String|Null

A similar inference can be made for the else case if the test is p == null

if (s == null) {
  // s: String|Null
} else {
  // s: String
}

What exactly is considered a comparison for the purposes of the flow inference:

  • == and !=
  • eq and ne
  • isInstanceOf[Null]. Only kicks in if the type test is against Null:
    val s: String|Null
    if (!s.isInstanceOf[Null]) {
      // s: String
    }
    If the test had been (if (s.isInstanceOf[String])), we currently don't infer non-nullness.

Non-Stable Paths

If p isn't stable, then inferring non-nullness is potentially unsound:

var s: String|Null = "hello"
if (s != null && {s = null; true}) {
  // s == null
}

Logical Operators

We also support logical operators (&&, ||, and !):

val s: String|Null = ???
val s2: String|Null = ???
if (s != null && s2 != null) {
  // s: String
  // s2: String
}

if (s == null || s2 == null) {
  // s: String|Null
  // s2: String|Null
} else {
  // s: String
  // s2: String
}

Inside Conditions

We also support type specialization within the condition, taking into account that && and || are short-circuiting:

val s: String|Null
if (s != null && s.length > 0) { // s: String in `s.length > 0`
  // s: String
}

if (s == null || s.length > 0) // s: String in `s.length > 0` {
  // s: String|Null
} else {
  // s: String|Null
}

Algorithm

Let NN(cond, true/false) be the set of paths (TermRefs in the compiler) that we can infer to be non-null if cond is true/false, respectively.

Then define NN by (basically De Morgan's laws)

NN(p == null, true)  = {}                     
NN(p == null, false) = {p} if p is stable     
NN(p != null, true)  = {p} if p is stable   
NN(p != null, false) = {}                    
NN(A && B,    true)  = ∪(NN(A, true),  NN(B, true))
NN(A && B,    false) = ∩(NN(A, false), NN(B, false))
NN(A || B,    true)  = ∩(NN(A, true),  NN(B, true))
NN(A || B,    false) = ∪(NN(A, false), NN(B, false))
NN(!A,        true)  = NN(A, false)
NN(!A,        false) = NN(A, true)
NN(cond,      _)     = {} otherwise

To type If(cond, then, else)

  1. compute NN(cond, true) and NN(cond, false)
  2. type the then branch with the knowledge that the paths in NN(cond, true) are non-nullable
  3. ditto for the else branch and the paths in NN(cond, false)

To propagate nullability facts within a condition, when typing A && B

  1. type A
  2. compute NN(A, true)
  3. type B augmented context with the facts in NN(A, true)

Similarly, when typing A || B

  1. type A
  2. compute NN(A, false)
  3. type B in an augmented context with the facts in NN(A, false)

Unsupported Idioms

We don't support

  • reasoning about non-stable paths
  • flow facts not related to nullability (if (x == 0) { // x: 0.type not inferred })
  • tracking aliasing between non-nullable paths
    val s: String|Null = ???
    val s2: String|Null = ???
    if (s != null && s == s2) {
      // s:  String inferred
      // s2: String not inferred
    }

Local Type Inference

Explicit nullability interacts with local type inference in a few ways:

  • Suppor for SAM types: if S is a SAM type, then we consider S|Null as a SAM type as well. This allows

    trait S {
      def foo(x: Int): Int
    }
    val s: S|Null = (x: Int) => x
  • Inferred nullable unions: Dotty currently never infers a union type; instead, the union is eagerly widened. (see #4687).

    def foo(): Int|String = 42
    val x = foo() // x: Any inferred

    We changed the inference rules so that unions of the form T|Null are inferred:

    def foo2(): String|Null = "hello"
    val x = foo2() // x: String|Null inferred

    Otherwise, the nullable type String|Null would be collapsed to Any, which loses too much information.

  • Nullable unions in prototypes: the compiler currently doesn't allow unions in prototypes

    def bar(a: Array[String|Int]|Int) = ()
    bar(Array(42))
    
    4 |  bar(Array(42))
      |      ^^^^^^^^^
      |      found:    Array[Int]
      |      required: Array[String | Int] | Int

    Notice that the type of Array(42) is synthesized as Array[Int], even though the (correct) type Array[String|Int] could be inherited. The problem is that the compiler sees a prototype of Array[String|Int]|Int, and because the outer type is a union it discards the prototype and switches to synthesis.

    See https://github.com/lampepfl/dotty/commit/8067b952875426d640968be865773f6ef3783f3c for why this was done.

    We've changed this behaviour so that nullable unions can be used in prototypes

    def bar(a: Array[String|Null]|Null) = ()
    bar(Array("hello")) // ok: inferred type parameter via prototype is String|Null`

Other Languages

Kotlin

TODO

TODOs

  • Bootstrap Dotty with the new type system
  • Port the standard library to the new type system
  • Figure out how we'll present JavaNull in e.g. type errors
  • Support flow-sensitive type inference in TASTy
  • Flow-sensitive type inference for pattern-matching
    val x: String|Null = ???
    x match {
      case null => ???
      case _ => // x: String inferred
    }
  • Use the Checkers Framework to whitelist non-nullable Java methods
  • Recognize @NonNull annotation
@abeln
Copy link
Author

abeln commented Jan 22, 2019

@kjsingh opened abeln/dotty#20 for tracking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment