chpatrick/typeable-reify

## typeable-reify
# Black magic and GADT serialization

I thought it would be a cool exercise to write a simple but flexible RPC system in Haskell. The idea was that the server would have some functions and values available for you to use, and you can assemble them into a function call as you like, or provide your own parameters. In the end, writing the serialization and deserialization code ended up leading to some far more interesting challenges.

Some types, such as functions, cannot be serialized so they must be made available on the server. I represented this with the following GADT:

```haskell
type ValueID = Word32
type TypeID = Word32

data Term a where
  Shared :: ValueID -> Term a
  Constant :: (Typeable a, Binary a) => a -> Term a
  App :: Typeable b => Term (b -> a) -> Term b -> Term a
  deriving Typeable
```

We'll need some way to store shared values. However, these have different types so we'll need to hide what they are. I initially used `Dynamic` for these but they turned out to be inadequate, as we'll see later. We'll also need to be able to encode the types of the constants sent over, so we know how to parse them.

```haskell
-- something typeable
data Dyn where
  Dyn :: Typeable a => a -> Dyn

-- a Proxy for a shared, serializable type
data KnownType where
  KnownType :: (Typeable a, Binary a) => Proxy a -> KnownType

type SharedValues = Vector Dyn
type KnownTypes = Vector KnownType
```

In case you haven't seen a `Proxy` before, it's a datatype used to carry a type variable but no data. If we pattern match on a `KnownType` and have `ScopedTypeVariables` turned on, we can get a name for the type it contains:

```haskell
case someType of
  -- get something of the type of someType
  KnownType (Proxy :: Proxy t) -> get :: Get t
```

We can now encode shared values as indices in our shared value table, and types as indices in our shared type table. Let's write the serialization function:

```haskell
type TypeMap = Map TypeRep TypeID

typeMap :: KnownTypes -> TypeMap
typeMap knownTypes = M.fromList $ zipWith mapping [0..] (V.toList knownTypes)

serialize :: TypeMap -> Term a -> Put
serialize typeMap = serialize'
  where
    serialize' :: Term a -> Put
    serialize' = \case
      Shared valueID -> putWord8 0 >> put valueID
      Constant x -> do
        putWord8 1
        case M.lookup (typeOf x) typeMap of
          Nothing -> fail "Unserializable constant type."
          Just typeID -> put typeID >> put x
      App f x -> putWord8 2 >> serialize' f >> serialize' x
```

This produces a simple tagged structure. Shared value references are encoded as `0` followed by the value index, constants are encoded as `1`, then the type index, then the serialized constant, and applications are encoded as `2` followed by the serialized function and the serialized argument. To figure out a constant's type's index in the shared type table, we can look up its `TypeRep` in our `TypeMap`.  `TypeRep` is a datatype containing the representation of a type, and can be obtained at runtime for any `Typeable` type. Internally, it's a tree of type constructors. `typeRep (Proxy :: Proxy (Maybe ( Int, String )) == TypeRep <<Maybe>> [ TypeRep <<(,)>> [ <<Int>>, <<String>> ] ]`.

Now, how about deserialization. Since we're getting some `Term` over the wire, we don't actually know what its inner type is. Let's define its type signature first.

```haskell
-- a known type with an unknown parameter
data Dyn1 f where
  Dyn1 :: Typeable a => f a -> Dyn1 f

deserialize :: SharedValues -> KnownTypes -> Get (Dyn1 Term)
```

Let's write the first two cases of deserialize:

```haskell
deserialize shared types = deserialize'
  where
    deserialize' = getWord8 >>= \case
      0 -> do
        valueID <- get
        case shared V.!? fromIntegral valueID of
          Nothing -> fail "Value ID out of bounds."
          Just (Dyn (_ :: v)) -> return $ Dyn1 (Shared valueID :: Term v)
      1 -> do
        typeID :: TypeID <- get
        case types V.!? fromIntegral typeID of
          Nothing -> fail "Type ID out of bounds."
          Just (KnownType (_ :: Proxy t)) -> do
            value :: t <- get
            return $ Dyn1 (Constant value)
```

Nothing too crazy so far. If we read a shared value, we need to make a `Term a` where `a` is the type of the shared value. This can be done by looking it up and doing the pattern matching described above. Likewise, for constants we need to look up their type in `KnownTypes`, and `get` the appropriate type. We then wrap up either case in `Dyn1`, since these types are unknown at compile-time.

Now for application. This is where it gets tricky.

```haskell
2 -> do
  df <- deserialize'
  dx <- deserialize'
  case app df dx of
    Nothing -> fail "Type mismatch in application."
    Just dr -> return dr

app :: Dyn1 Term -> Dyn1 Term -> Maybe (Dyn1 Term)
```

Seems simple enough. We deserialize both types, and then make an application. However, we don't know that the types of the terms that we parsed are actually `(a -> b)` and `a`. I was actually beating my head against this problem for hours. :)

The thing is, we can get the `TypeRep`s for the function and value term types, since we know they're `Typeable`. We can even figure out the `TypeRep` of the result (if the types are correct), using `funResultTy` from `Data.Typeable`. If we have a function type `f` and a value type `a`, and a result type `b`, we could use `eqT` from `Data.Typeable` to get a proof that `f ~ a -> b`. The problem is, we know `b`'s `TypeRep`, but we don't have `b` itself. What to do?

Well, we need a function that takes a `TypeRep` and gives you a `Proxy` for the same type.

```haskell
reifyTypeRep
  :: forall r. TypeRep -> (forall a. Typeable a => Proxy a -> r) -> r
```

What's going on here? Well, we don't know `a` at runtime, so the only way we can provide this `Proxy` is to pass it into a function that works for _any_ `a`, then return what it returns.

This seems like a function that should be possible, but I couldn't figure it out. At this point I asked my insane type wizard friend Slemi for advice. He couldn't do it either, but the next day he came back with this:

```haskell
-- dark magic a la Data.Reflection
newtype Magic r = Magic (forall a. Typeable a => Proxy a -> r)
reifyTypeRep
  :: forall r. TypeRep -> (forall a. Typeable a => Proxy a -> r) -> r
reifyTypeRep rep f
  = unsafeCoerce (Magic f) typeRep' Proxy
    where
      -- typeRep seems to take a zero-length argument
      typeRep' :: Proxy# a -> TypeRep
      typeRep' _ = rep
```

This is some serious black magic, and I wish I thought of it. The trick behind this is that internally the implementation of typeclasses is passed in by GHC as an explicit parameter. So if you write `Show a => a -> String`, internally it gets converted into `ShowImpl a -> a -> String`, where `ShowImpl` contains the implementations of the methods in `Show`. If your typeclass has just one method, like `Typeable`, then the implementation consists of a single function.
This means that we can `unsafeCoerce` `(forall a. Typeable a => Proxy a -> r)` into `(forall a. (Proxy# a -> TypeRep) -> Proxy a -> r)`, where `(Proxy# a -> TypeRep)` is our implementation of `typeRep`. This trick is borrowed from ekmett's `Data.Reflection`.

We can use this to produce a Proxy for a type such that if you call `typeRep` on it, you get our given `TypeRep` back. The compiler will then happily propagate our made-up Typeable instance everywhere where the type is used, meaning that for all intents and purposes it is the reified `TypeRef`.

Now we've done a deal with the devil, let's use it to deconstruct unknown types:

```haskell
-- witnesses c ~ (some) f a
data Decons (c :: k) where
  Decons :: (Typeable f, Typeable a) => c :~: (f :: * -> k) a -> Decons c

-- if c ~ (some) f a, provide a witness
deconsT :: forall proxy c. Typeable c => proxy c -> Maybe (Decons (c :: k))
deconsT p = buildProof <$> viewEnd tyParams
  where
    ( tyCon, tyParams ) = splitTyConApp (typeRep p)
    viewEnd = \case
      [] -> Nothing -- if this happens, c is not of the form f a
      [ x ] -> Just ( x, [] )
      x : xs -> fmap (x:) <$> viewEnd xs
    buildProof ( aRep, prevParams ) =
      let fRep = mkTyConApp tyCon prevParams in
        reifyTypeRep fRep $ \(_ :: Proxy f) ->
          reifyTypeRep aRep $ \(_ :: Proxy a) ->
            Decons (unsafeCoerce Refl :: c :~: f a)
             -- eqT would just check what we already know, so we can cheat
```

As described above, a `TypeRep` is just a tree of type constructors. So, we just need to do some simple list manipulation to separate `Either String Int` into `Either String` and `Int`. If the list of parameters is empty, we don't have something of the form `f a`, so we return `Nothing`. Now we've got a TypeRep for `f` and `a`, we just reify them both into type variables, and we can produce a witness for `c ~ f a`. We could use `eqT` for this, but since it internally checks if the `TypeReps` match and `unsafeCoerce`s a proof if they do, and we know for a fact that they match since we made them ourselves, we can just `unsafeCoerce` the proof ourselves.

We can now implement app easily:

```haskell
app :: Dyn1 Term -> Dyn1 Term -> Maybe (Dyn1 Term)
app (Dyn1 f) (Dyn1 (x :: Term a)) = do
  Decons (Refl :: fab :~: fa b) <- deconsT f
  Refl :: fa :~: (->) a <- eqT
  return $ Dyn1 (App f x)
```

All in the Maybe Monad, we first show `fab ~ fa b` using `f` as a proxy. Then, if `fa ~ (->) a`, the compiler knows that `fab ~ ((->) a) b ~ (a -> b)`, and it lets us apply `App` to `f` and `x`.

Now you might say "OK, this works, but the unsafeCoerces make me uneasy". That's true, but all the black magic is contained in a couple of small general primitives that can be checked for safety easily (much like `cast` or `eqT`, which also `unsafeCoerce`). The rest of the code can then benefit from them without anything dodgy.
	# Black magic and GADT serialization

	I thought it would be a cool exercise to write a simple but flexible RPC system in Haskell. The idea was that the server would have some functions and values available for you to use, and you can assemble them into a function call as you like, or provide your own parameters. In the end, writing the serialization and deserialization code ended up leading to some far more interesting challenges.

	Some types, such as functions, cannot be serialized so they must be made available on the server. I represented this with the following GADT:

	```haskell
	type ValueID = Word32
	type TypeID = Word32

	data Term a where
	Shared :: ValueID -> Term a
	Constant :: (Typeable a, Binary a) => a -> Term a
	App :: Typeable b => Term (b -> a) -> Term b -> Term a
	deriving Typeable
	```

	We'll need some way to store shared values. However, these have different types so we'll need to hide what they are. I initially used `Dynamic` for these but they turned out to be inadequate, as we'll see later. We'll also need to be able to encode the types of the constants sent over, so we know how to parse them.

	```haskell
	-- something typeable
	data Dyn where
	Dyn :: Typeable a => a -> Dyn

	-- a Proxy for a shared, serializable type
	data KnownType where
	KnownType :: (Typeable a, Binary a) => Proxy a -> KnownType

	type SharedValues = Vector Dyn
	type KnownTypes = Vector KnownType
	```

	In case you haven't seen a `Proxy` before, it's a datatype used to carry a type variable but no data. If we pattern match on a `KnownType` and have `ScopedTypeVariables` turned on, we can get a name for the type it contains:

	```haskell
	case someType of
	-- get something of the type of someType
	KnownType (Proxy :: Proxy t) -> get :: Get t
	```

	We can now encode shared values as indices in our shared value table, and types as indices in our shared type table. Let's write the serialization function:

	```haskell
	type TypeMap = Map TypeRep TypeID

	typeMap :: KnownTypes -> TypeMap
	typeMap knownTypes = M.fromList $ zipWith mapping [0..] (V.toList knownTypes)

	serialize :: TypeMap -> Term a -> Put
	serialize typeMap = serialize'
	where
	serialize' :: Term a -> Put
	serialize' = \case
	Shared valueID -> putWord8 0 >> put valueID
	Constant x -> do
	putWord8 1
	case M.lookup (typeOf x) typeMap of
	Nothing -> fail "Unserializable constant type."
	Just typeID -> put typeID >> put x
	App f x -> putWord8 2 >> serialize' f >> serialize' x
	```

	This produces a simple tagged structure. Shared value references are encoded as `0` followed by the value index, constants are encoded as `1`, then the type index, then the serialized constant, and applications are encoded as `2` followed by the serialized function and the serialized argument. To figure out a constant's type's index in the shared type table, we can look up its `TypeRep` in our `TypeMap`. `TypeRep` is a datatype containing the representation of a type, and can be obtained at runtime for any `Typeable` type. Internally, it's a tree of type constructors. `typeRep (Proxy :: Proxy (Maybe ( Int, String )) == TypeRep <<Maybe>> [ TypeRep <<(,)>> [ <<Int>>, <<String>> ] ]`.

	Now, how about deserialization. Since we're getting some `Term` over the wire, we don't actually know what its inner type is. Let's define its type signature first.

	```haskell
	-- a known type with an unknown parameter
	data Dyn1 f where
	Dyn1 :: Typeable a => f a -> Dyn1 f

	deserialize :: SharedValues -> KnownTypes -> Get (Dyn1 Term)
	```

	Let's write the first two cases of deserialize:

	```haskell
	deserialize shared types = deserialize'
	where
	deserialize' = getWord8 >>= \case
	0 -> do
	valueID <- get
	case shared V.!? fromIntegral valueID of
	Nothing -> fail "Value ID out of bounds."
	Just (Dyn (_ :: v)) -> return $ Dyn1 (Shared valueID :: Term v)
	1 -> do
	typeID :: TypeID <- get
	case types V.!? fromIntegral typeID of
	Nothing -> fail "Type ID out of bounds."
	Just (KnownType (_ :: Proxy t)) -> do
	value :: t <- get
	return $ Dyn1 (Constant value)
	```

	Nothing too crazy so far. If we read a shared value, we need to make a `Term a` where `a` is the type of the shared value. This can be done by looking it up and doing the pattern matching described above. Likewise, for constants we need to look up their type in `KnownTypes`, and `get` the appropriate type. We then wrap up either case in `Dyn1`, since these types are unknown at compile-time.

	Now for application. This is where it gets tricky.

	```haskell
	2 -> do
	df <- deserialize'
	dx <- deserialize'
	case app df dx of
	Nothing -> fail "Type mismatch in application."
	Just dr -> return dr

	app :: Dyn1 Term -> Dyn1 Term -> Maybe (Dyn1 Term)
	```

	Seems simple enough. We deserialize both types, and then make an application. However, we don't know that the types of the terms that we parsed are actually `(a -> b)` and `a`. I was actually beating my head against this problem for hours. :)

	The thing is, we can get the `TypeRep`s for the function and value term types, since we know they're `Typeable`. We can even figure out the `TypeRep` of the result (if the types are correct), using `funResultTy` from `Data.Typeable`. If we have a function type `f` and a value type `a`, and a result type `b`, we could use `eqT` from `Data.Typeable` to get a proof that `f ~ a -> b`. The problem is, we know `b`'s `TypeRep`, but we don't have `b` itself. What to do?

	Well, we need a function that takes a `TypeRep` and gives you a `Proxy` for the same type.

	```haskell
	reifyTypeRep
	:: forall r. TypeRep -> (forall a. Typeable a => Proxy a -> r) -> r
	```

	What's going on here? Well, we don't know `a` at runtime, so the only way we can provide this `Proxy` is to pass it into a function that works for _any_ `a`, then return what it returns.

	This seems like a function that should be possible, but I couldn't figure it out. At this point I asked my insane type wizard friend Slemi for advice. He couldn't do it either, but the next day he came back with this:

	```haskell
	-- dark magic a la Data.Reflection
	newtype Magic r = Magic (forall a. Typeable a => Proxy a -> r)
	reifyTypeRep
	:: forall r. TypeRep -> (forall a. Typeable a => Proxy a -> r) -> r
	reifyTypeRep rep f
	= unsafeCoerce (Magic f) typeRep' Proxy
	where
	-- typeRep seems to take a zero-length argument
	typeRep' :: Proxy# a -> TypeRep
	typeRep' _ = rep
	```

	This is some serious black magic, and I wish I thought of it. The trick behind this is that internally the implementation of typeclasses is passed in by GHC as an explicit parameter. So if you write `Show a => a -> String`, internally it gets converted into `ShowImpl a -> a -> String`, where `ShowImpl` contains the implementations of the methods in `Show`. If your typeclass has just one method, like `Typeable`, then the implementation consists of a single function.
	This means that we can `unsafeCoerce` `(forall a. Typeable a => Proxy a -> r)` into `(forall a. (Proxy# a -> TypeRep) -> Proxy a -> r)`, where `(Proxy# a -> TypeRep)` is our implementation of `typeRep`. This trick is borrowed from ekmett's `Data.Reflection`.

	We can use this to produce a Proxy for a type such that if you call `typeRep` on it, you get our given `TypeRep` back. The compiler will then happily propagate our made-up Typeable instance everywhere where the type is used, meaning that for all intents and purposes it is the reified `TypeRef`.

	Now we've done a deal with the devil, let's use it to deconstruct unknown types:

	```haskell
	-- witnesses c ~ (some) f a
	data Decons (c :: k) where
	Decons :: (Typeable f, Typeable a) => c :~: (f :: * -> k) a -> Decons c

	-- if c ~ (some) f a, provide a witness
	deconsT :: forall proxy c. Typeable c => proxy c -> Maybe (Decons (c :: k))
	deconsT p = buildProof <$> viewEnd tyParams
	where
	( tyCon, tyParams ) = splitTyConApp (typeRep p)
	viewEnd = \case
	[] -> Nothing -- if this happens, c is not of the form f a
	[ x ] -> Just ( x, [] )
	x : xs -> fmap (x:) <$> viewEnd xs
	buildProof ( aRep, prevParams ) =
	let fRep = mkTyConApp tyCon prevParams in
	reifyTypeRep fRep $ \(_ :: Proxy f) ->
	reifyTypeRep aRep $ \(_ :: Proxy a) ->
	Decons (unsafeCoerce Refl :: c :~: f a)
	-- eqT would just check what we already know, so we can cheat
	```

	As described above, a `TypeRep` is just a tree of type constructors. So, we just need to do some simple list manipulation to separate `Either String Int` into `Either String` and `Int`. If the list of parameters is empty, we don't have something of the form `f a`, so we return `Nothing`. Now we've got a TypeRep for `f` and `a`, we just reify them both into type variables, and we can produce a witness for `c ~ f a`. We could use `eqT` for this, but since it internally checks if the `TypeReps` match and `unsafeCoerce`s a proof if they do, and we know for a fact that they match since we made them ourselves, we can just `unsafeCoerce` the proof ourselves.

	We can now implement app easily:

	```haskell
	app :: Dyn1 Term -> Dyn1 Term -> Maybe (Dyn1 Term)
	app (Dyn1 f) (Dyn1 (x :: Term a)) = do
	Decons (Refl :: fab :~: fa b) <- deconsT f
	Refl :: fa :~: (->) a <- eqT
	return $ Dyn1 (App f x)
	```

	All in the Maybe Monad, we first show `fab ~ fa b` using `f` as a proxy. Then, if `fa ~ (->) a`, the compiler knows that `fab ~ ((->) a) b ~ (a -> b)`, and it lets us apply `App` to `f` and `x`.

	Now you might say "OK, this works, but the unsafeCoerces make me uneasy". That's true, but all the black magic is contained in a couple of small general primitives that can be checked for safety easily (much like `cast` or `eqT`, which also `unsafeCoerce`). The rest of the code can then benefit from them without anything dodgy.