Skip to content

Instantly share code, notes, and snippets.

@smarter smarter/gadt.md
Last active Mar 5, 2020

Embed
What would you like to do?
GADTs in Scala

Generalized Algebraic Data Types in Scala

Basic GADTs

Here's an ADT which is not a GADT, in Haskell:

data Expr = IntExpr Int | BoolExpr Bool

And in Scala:

sealed trait Expr
case class IntExpr(i: Int) extends Expr
case class BoolExpr(b: Boolean) extends Expr

And here's a GADT:

data Expr a where
  IntExpr :: Int -> Expr Int
  BoolExpr :: Bool -> Expr Bool
sealed trait Expr[T]
case class IntExpr(i: Int) extends Expr[Int]
case class BoolExpr(b: Boolean) extends Expr[Boolean]

At first, it may look like GADTs should not require any special handling from the compiler in Scala since they're defined like regular ADTs, but consider this example:

object Test {
  def eval[T](e: Expr[T]): T = e match {
    case IntExpr(i) => i
    case BoolExpr(b) => b
  }
}

eval must return a value of type T, but i has type Int and b has type Boolean, how does this typecheck? The answer is that inside each case we know more information about T: since IntExpr is a subtype of Expr[Int], T must be equal to Int, etc. This is very unusual for types in Scala, they normally always keep the same meaning, this is why GADTs need to be treated specially by the compiler.

TODO: expand this section with some details on how GADTs are implemented in Dotty

A type soudness bug in pattern matching

Consider the following code:

case class A[S](_s: S) {
  var s: S = _s
}

The parameter S of A is invariant, which means that A[String] is not a subtype of A[Object], if we replace A[S] by A[+S] to make S covariant, the compiler will not allow us to compile A until we remove the var, this is important for type soundness, otherwise we would be able to write the following:

val x = new A[String]("abc")
val y: A[Object] = x // Does not compile because A[String] is not a subtype of A[Object]
y.s = new Integer(1)
val z: String = x.s // If this compiled, we would get a crash here since x.s is
                    // now an Integer

However, by combining pattern matching with variance, we can get around this restriction in Scala 2, the following code compiles without any warning or error but crashes at runtime with a ClassCastException:

class C[+T]

case class D[S](_s: S) extends C[S] {
  var s: S = _s
}

object Test {
  def main(args: Array[String]): Unit = {
    val x = new D[String]("abc")
    val y: C[Object] = x

    y match {
      case d @ D(x) =>
        d.s = new Integer(1)
    }

    val z: String = x.s // ClassCastException: Integer cannot be cast into String
  }
}

The issue is that when we typecheck case d @ D(x) =>, the compiler needs to infer the type parameter S of D under the constraint D[S] <:< C[Object] (<:< means is a subtype of). This constraint does not uniquely determine S, at this point we only know that S <:< Object, but the compiler still chooses to infer S = Object which is incorrect since D[String] is not a subtype of D[Object].

In Dotty this code still compiles by default but you get a warning:

warning: There is no best instantiation of pattern type D[S]
that makes it a subtype of selector type C[Object].
Non-variant type variable S cannot be uniquely instantiated.
(This would be an error under strict mode)
    case d @ D(x) =>
              ^

This warning becomes an error if -strict is passed to the compiler, this is not currently the default but is supposed to become so in the future. What the warning says is that since we don't know exactly what S is (we only know S <:< Object) we should give up and fail to compile the pattern match.

The strict mode is too restrictive

So far, what I've described is not specific to GADTs, however using -strict currently prevents many useful GADTs from compiling:

trait Exp[T]
case class Lit(value: Int) extends Exp[Int]
case class Pair[A, B](fst: Exp[A], snd: Exp[B]) extends Exp[(A, B)]

object Test {
  def eval[T](e: Exp[T]): T = e match {
    case Lit(x) => x
    case Pair(a, b) => (eval(a), eval(a))
  }
}

This code is perfectly safe but fails to compile under -strict because the type parameters A and B of Pair(a, b) are not constrained (we only know that T = (A, B)).

A possible fix

I believe that it is possible to fix the soudness issue without preventing the previous example from compiling: instead of always assigning a specific type to type variables of patterns, we should treat them as abstract bounded types, so in:

    y match {
      case d @ D(x) =>
        d.s = new Integer(1)
    }

we only know that S is some unknown type such that S <:< Object, this means that d.s = new Integer(1) won't compile, because we don't know if Integer is a subtype of S, on the other hand, the GADT example above will compile because even though we don't know anything about A and B themselves we, do know that T = (A, B). Implementing this should not be too complicated because a lot of the required machinery is already present.

GADTs and variance

Combining GADTs with variance lead to some interesting issues, see Open GADTs and Declaration-site Variance: A Problem Statement.

GADTs and exhaustiveness checking

The exhaustiveness checker is the component of the compiler that warns you when you forget some cases in a pattern match, a new implementation for Dotty is currently being worked on at https://github.com/lampepfl/dotty/pull/1364 and already covers more cases than the existing Scala 2 checker. GADTs are especially challenging to analyze, a recent paper describes improvement in this area done for Haskell: GADTs meet their match, it might be possible to adapt this to Scala.

Special syntax for GADTs ?

For Dotty we are considering adding some syntactic sugar to make ADTs less verbose to write, it would be interesting to figure out if and how this syntax could allow defining GADTs.

@Blaisorblade

This comment has been minimized.

Copy link

Blaisorblade commented Dec 14, 2016

I believe that it is possible to fix the soudness issue without preventing the previous example from compiling: instead of always assigning a specific type to type variables of patterns, we should treat them as abstract bounded types

I agree that's a fundamental necessary step. As you say,

This constraint does not uniquely determine S, at this point we only know that S <:< Object, but the compiler still chooses to infer S = Object which is incorrect since D[String] is not a subtype of D[Object].

However, if D were covariant, D[String] <: D[Object] would be true, but deducing S <: Object would still be incorrect. I'd rather say:

This constraint does not uniquely determine S, at this point we only know that S <:< Object, but the compiler still chooses to infer S = Object which is incorrect because (AGAIN) we only know that S <:< Object and we shouldn't deduce more!

The difference appears in a different example:

case class D[+S]
def f[S](v: D[S]): S = 
  v match {
    case D(x) => new Object() // If we deduce S = Object, this typechecks.
  }
f(D[Int](1)) + 2 //D(1): D[Object] is true, but this will still explode.

There we know no extra info on S after the match—we only know S >: Nothing <: Any, and that's all the compiler should know.
Since D is covariant, picking S = Object seems fine because of declaration-site variance (D[Int](1): D[Object]), but we still get a soundness violation.

@smarter

This comment has been minimized.

Copy link
Owner Author

smarter commented Dec 14, 2016

Yes, thanks for the example! We should of course only deduct things which are sound, which shouldn't be too hard since this is already what we do whenever type parameter inference happens.

@Blaisorblade

This comment has been minimized.

Copy link

Blaisorblade commented Dec 16, 2016

We should of course only deduct things which are sound

😄

which shouldn't be too hard since this is already what we do whenever type parameter inference happens.

FWIW, pattern matching seems harder — type parameter inference takes Const(1) and figures that T can be Int. Here you have an arbitrary Expr[Int] and must figure out if it can be Const[T] and for which Ts (and which constructors are plausible for exhaustiveness checking). Accounting for variance and unknown subclasses complicates things.

BTW, relevant papers (from Burak Emir, the author of the old Scalac pattern matcher, which got rewritten by Adriaan because of quite a few concerns) include:

https://infoscience.epfl.ch/record/85494
https://infoscience.epfl.ch/record/98468/files/MatchingObjectsWithPatterns-TR.pdf

but note that both formalizations forbid defining Unsound as in my example.

@nicball

This comment has been minimized.

Copy link

nicball commented Jul 24, 2018

I think maybe we should reinforce variance through inheritance? For example class Base[+T] has covariance type parameter T, then in class Derived[+T] extends Base[T] , Derived should have the same variance on T because T is used on Base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.