Skip to content

Instantly share code, notes, and snippets.

@chpatrick
Last active September 2, 2017 07:53
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chpatrick/05ff34e0f352be06f04c to your computer and use it in GitHub Desktop.
Save chpatrick/05ff34e0f352be06f04c to your computer and use it in GitHub Desktop.
Black magic and GADT serialization
# Black magic and GADT serialization
I thought it would be a cool exercise to write a simple but flexible RPC system in Haskell. The idea was that the server would have some functions and values available for you to use, and you can assemble them into a function call as you like, or provide your own parameters. In the end, writing the serialization and deserialization code ended up leading to some far more interesting challenges.
Some types, such as functions, cannot be serialized so they must be made available on the server. I represented this with the following GADT:
```haskell
type ValueID = Word32
type TypeID = Word32
data Term a where
Shared :: ValueID -> Term a
Constant :: (Typeable a, Binary a) => a -> Term a
App :: Typeable b => Term (b -> a) -> Term b -> Term a
deriving Typeable
```
We'll need some way to store shared values. However, these have different types so we'll need to hide what they are. I initially used `Dynamic` for these but they turned out to be inadequate, as we'll see later. We'll also need to be able to encode the types of the constants sent over, so we know how to parse them.
```haskell
-- something typeable
data Dyn where
Dyn :: Typeable a => a -> Dyn
-- a Proxy for a shared, serializable type
data KnownType where
KnownType :: (Typeable a, Binary a) => Proxy a -> KnownType
type SharedValues = Vector Dyn
type KnownTypes = Vector KnownType
```
In case you haven't seen a `Proxy` before, it's a datatype used to carry a type variable but no data. If we pattern match on a `KnownType` and have `ScopedTypeVariables` turned on, we can get a name for the type it contains:
```haskell
case someType of
-- get something of the type of someType
KnownType (Proxy :: Proxy t) -> get :: Get t
```
We can now encode shared values as indices in our shared value table, and types as indices in our shared type table. Let's write the serialization function:
```haskell
type TypeMap = Map TypeRep TypeID
typeMap :: KnownTypes -> TypeMap
typeMap knownTypes = M.fromList $ zipWith mapping [0..] (V.toList knownTypes)
serialize :: TypeMap -> Term a -> Put
serialize typeMap = serialize'
where
serialize' :: Term a -> Put
serialize' = \case
Shared valueID -> putWord8 0 >> put valueID
Constant x -> do
putWord8 1
case M.lookup (typeOf x) typeMap of
Nothing -> fail "Unserializable constant type."
Just typeID -> put typeID >> put x
App f x -> putWord8 2 >> serialize' f >> serialize' x
```
This produces a simple tagged structure. Shared value references are encoded as `0` followed by the value index, constants are encoded as `1`, then the type index, then the serialized constant, and applications are encoded as `2` followed by the serialized function and the serialized argument. To figure out a constant's type's index in the shared type table, we can look up its `TypeRep` in our `TypeMap`. `TypeRep` is a datatype containing the representation of a type, and can be obtained at runtime for any `Typeable` type. Internally, it's a tree of type constructors. `typeRep (Proxy :: Proxy (Maybe ( Int, String )) == TypeRep <<Maybe>> [ TypeRep <<(,)>> [ <<Int>>, <<String>> ] ]`.
Now, how about deserialization. Since we're getting some `Term` over the wire, we don't actually know what its inner type is. Let's define its type signature first.
```haskell
-- a known type with an unknown parameter
data Dyn1 f where
Dyn1 :: Typeable a => f a -> Dyn1 f
deserialize :: SharedValues -> KnownTypes -> Get (Dyn1 Term)
```
Let's write the first two cases of deserialize:
```haskell
deserialize shared types = deserialize'
where
deserialize' = getWord8 >>= \case
0 -> do
valueID <- get
case shared V.!? fromIntegral valueID of
Nothing -> fail "Value ID out of bounds."
Just (Dyn (_ :: v)) -> return $ Dyn1 (Shared valueID :: Term v)
1 -> do
typeID :: TypeID <- get
case types V.!? fromIntegral typeID of
Nothing -> fail "Type ID out of bounds."
Just (KnownType (_ :: Proxy t)) -> do
value :: t <- get
return $ Dyn1 (Constant value)
```
Nothing too crazy so far. If we read a shared value, we need to make a `Term a` where `a` is the type of the shared value. This can be done by looking it up and doing the pattern matching described above. Likewise, for constants we need to look up their type in `KnownTypes`, and `get` the appropriate type. We then wrap up either case in `Dyn1`, since these types are unknown at compile-time.
Now for application. This is where it gets tricky.
```haskell
2 -> do
df <- deserialize'
dx <- deserialize'
case app df dx of
Nothing -> fail "Type mismatch in application."
Just dr -> return dr
app :: Dyn1 Term -> Dyn1 Term -> Maybe (Dyn1 Term)
```
Seems simple enough. We deserialize both types, and then make an application. However, we don't know that the types of the terms that we parsed are actually `(a -> b)` and `a`. I was actually beating my head against this problem for hours. :)
The thing is, we can get the `TypeRep`s for the function and value term types, since we know they're `Typeable`. We can even figure out the `TypeRep` of the result (if the types are correct), using `funResultTy` from `Data.Typeable`. If we have a function type `f` and a value type `a`, and a result type `b`, we could use `eqT` from `Data.Typeable` to get a proof that `f ~ a -> b`. The problem is, we know `b`'s `TypeRep`, but we don't have `b` itself. What to do?
Well, we need a function that takes a `TypeRep` and gives you a `Proxy` for the same type.
```haskell
reifyTypeRep
:: forall r. TypeRep -> (forall a. Typeable a => Proxy a -> r) -> r
```
What's going on here? Well, we don't know `a` at runtime, so the only way we can provide this `Proxy` is to pass it into a function that works for _any_ `a`, then return what it returns.
This seems like a function that should be possible, but I couldn't figure it out. At this point I asked my insane type wizard friend Slemi for advice. He couldn't do it either, but the next day he came back with this:
```haskell
-- dark magic a la Data.Reflection
newtype Magic r = Magic (forall a. Typeable a => Proxy a -> r)
reifyTypeRep
:: forall r. TypeRep -> (forall a. Typeable a => Proxy a -> r) -> r
reifyTypeRep rep f
= unsafeCoerce (Magic f) typeRep' Proxy
where
-- typeRep seems to take a zero-length argument
typeRep' :: Proxy# a -> TypeRep
typeRep' _ = rep
```
This is some serious black magic, and I wish I thought of it. The trick behind this is that internally the implementation of typeclasses is passed in by GHC as an explicit parameter. So if you write `Show a => a -> String`, internally it gets converted into `ShowImpl a -> a -> String`, where `ShowImpl` contains the implementations of the methods in `Show`. If your typeclass has just one method, like `Typeable`, then the implementation consists of a single function.
This means that we can `unsafeCoerce` `(forall a. Typeable a => Proxy a -> r)` into `(forall a. (Proxy# a -> TypeRep) -> Proxy a -> r)`, where `(Proxy# a -> TypeRep)` is our implementation of `typeRep`. This trick is borrowed from ekmett's `Data.Reflection`.
We can use this to produce a Proxy for a type such that if you call `typeRep` on it, you get our given `TypeRep` back. The compiler will then happily propagate our made-up Typeable instance everywhere where the type is used, meaning that for all intents and purposes it is the reified `TypeRef`.
Now we've done a deal with the devil, let's use it to deconstruct unknown types:
```haskell
-- witnesses c ~ (some) f a
data Decons (c :: k) where
Decons :: (Typeable f, Typeable a) => c :~: (f :: * -> k) a -> Decons c
-- if c ~ (some) f a, provide a witness
deconsT :: forall proxy c. Typeable c => proxy c -> Maybe (Decons (c :: k))
deconsT p = buildProof <$> viewEnd tyParams
where
( tyCon, tyParams ) = splitTyConApp (typeRep p)
viewEnd = \case
[] -> Nothing -- if this happens, c is not of the form f a
[ x ] -> Just ( x, [] )
x : xs -> fmap (x:) <$> viewEnd xs
buildProof ( aRep, prevParams ) =
let fRep = mkTyConApp tyCon prevParams in
reifyTypeRep fRep $ \(_ :: Proxy f) ->
reifyTypeRep aRep $ \(_ :: Proxy a) ->
Decons (unsafeCoerce Refl :: c :~: f a)
-- eqT would just check what we already know, so we can cheat
```
As described above, a `TypeRep` is just a tree of type constructors. So, we just need to do some simple list manipulation to separate `Either String Int` into `Either String` and `Int`. If the list of parameters is empty, we don't have something of the form `f a`, so we return `Nothing`. Now we've got a TypeRep for `f` and `a`, we just reify them both into type variables, and we can produce a witness for `c ~ f a`. We could use `eqT` for this, but since it internally checks if the `TypeReps` match and `unsafeCoerce`s a proof if they do, and we know for a fact that they match since we made them ourselves, we can just `unsafeCoerce` the proof ourselves.
We can now implement app easily:
```haskell
app :: Dyn1 Term -> Dyn1 Term -> Maybe (Dyn1 Term)
app (Dyn1 f) (Dyn1 (x :: Term a)) = do
Decons (Refl :: fab :~: fa b) <- deconsT f
Refl :: fa :~: (->) a <- eqT
return $ Dyn1 (App f x)
```
All in the Maybe Monad, we first show `fab ~ fa b` using `f` as a proxy. Then, if `fa ~ (->) a`, the compiler knows that `fab ~ ((->) a) b ~ (a -> b)`, and it lets us apply `App` to `f` and `x`.
Now you might say "OK, this works, but the unsafeCoerces make me uneasy". That's true, but all the black magic is contained in a couple of small general primitives that can be checked for safety easily (much like `cast` or `eqT`, which also `unsafeCoerce`). The rest of the code can then benefit from them without anything dodgy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment