Last active
September 2, 2017 07:53
-
-
Save chpatrick/05ff34e0f352be06f04c to your computer and use it in GitHub Desktop.
Black magic and GADT serialization
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Black magic and GADT serialization | |
I thought it would be a cool exercise to write a simple but flexible RPC system in Haskell. The idea was that the server would have some functions and values available for you to use, and you can assemble them into a function call as you like, or provide your own parameters. In the end, writing the serialization and deserialization code ended up leading to some far more interesting challenges. | |
Some types, such as functions, cannot be serialized so they must be made available on the server. I represented this with the following GADT: | |
```haskell | |
type ValueID = Word32 | |
type TypeID = Word32 | |
data Term a where | |
Shared :: ValueID -> Term a | |
Constant :: (Typeable a, Binary a) => a -> Term a | |
App :: Typeable b => Term (b -> a) -> Term b -> Term a | |
deriving Typeable | |
``` | |
We'll need some way to store shared values. However, these have different types so we'll need to hide what they are. I initially used `Dynamic` for these but they turned out to be inadequate, as we'll see later. We'll also need to be able to encode the types of the constants sent over, so we know how to parse them. | |
```haskell | |
-- something typeable | |
data Dyn where | |
Dyn :: Typeable a => a -> Dyn | |
-- a Proxy for a shared, serializable type | |
data KnownType where | |
KnownType :: (Typeable a, Binary a) => Proxy a -> KnownType | |
type SharedValues = Vector Dyn | |
type KnownTypes = Vector KnownType | |
``` | |
In case you haven't seen a `Proxy` before, it's a datatype used to carry a type variable but no data. If we pattern match on a `KnownType` and have `ScopedTypeVariables` turned on, we can get a name for the type it contains: | |
```haskell | |
case someType of | |
-- get something of the type of someType | |
KnownType (Proxy :: Proxy t) -> get :: Get t | |
``` | |
We can now encode shared values as indices in our shared value table, and types as indices in our shared type table. Let's write the serialization function: | |
```haskell | |
type TypeMap = Map TypeRep TypeID | |
typeMap :: KnownTypes -> TypeMap | |
typeMap knownTypes = M.fromList $ zipWith mapping [0..] (V.toList knownTypes) | |
serialize :: TypeMap -> Term a -> Put | |
serialize typeMap = serialize' | |
where | |
serialize' :: Term a -> Put | |
serialize' = \case | |
Shared valueID -> putWord8 0 >> put valueID | |
Constant x -> do | |
putWord8 1 | |
case M.lookup (typeOf x) typeMap of | |
Nothing -> fail "Unserializable constant type." | |
Just typeID -> put typeID >> put x | |
App f x -> putWord8 2 >> serialize' f >> serialize' x | |
``` | |
This produces a simple tagged structure. Shared value references are encoded as `0` followed by the value index, constants are encoded as `1`, then the type index, then the serialized constant, and applications are encoded as `2` followed by the serialized function and the serialized argument. To figure out a constant's type's index in the shared type table, we can look up its `TypeRep` in our `TypeMap`. `TypeRep` is a datatype containing the representation of a type, and can be obtained at runtime for any `Typeable` type. Internally, it's a tree of type constructors. `typeRep (Proxy :: Proxy (Maybe ( Int, String )) == TypeRep <<Maybe>> [ TypeRep <<(,)>> [ <<Int>>, <<String>> ] ]`. | |
Now, how about deserialization. Since we're getting some `Term` over the wire, we don't actually know what its inner type is. Let's define its type signature first. | |
```haskell | |
-- a known type with an unknown parameter | |
data Dyn1 f where | |
Dyn1 :: Typeable a => f a -> Dyn1 f | |
deserialize :: SharedValues -> KnownTypes -> Get (Dyn1 Term) | |
``` | |
Let's write the first two cases of deserialize: | |
```haskell | |
deserialize shared types = deserialize' | |
where | |
deserialize' = getWord8 >>= \case | |
0 -> do | |
valueID <- get | |
case shared V.!? fromIntegral valueID of | |
Nothing -> fail "Value ID out of bounds." | |
Just (Dyn (_ :: v)) -> return $ Dyn1 (Shared valueID :: Term v) | |
1 -> do | |
typeID :: TypeID <- get | |
case types V.!? fromIntegral typeID of | |
Nothing -> fail "Type ID out of bounds." | |
Just (KnownType (_ :: Proxy t)) -> do | |
value :: t <- get | |
return $ Dyn1 (Constant value) | |
``` | |
Nothing too crazy so far. If we read a shared value, we need to make a `Term a` where `a` is the type of the shared value. This can be done by looking it up and doing the pattern matching described above. Likewise, for constants we need to look up their type in `KnownTypes`, and `get` the appropriate type. We then wrap up either case in `Dyn1`, since these types are unknown at compile-time. | |
Now for application. This is where it gets tricky. | |
```haskell | |
2 -> do | |
df <- deserialize' | |
dx <- deserialize' | |
case app df dx of | |
Nothing -> fail "Type mismatch in application." | |
Just dr -> return dr | |
app :: Dyn1 Term -> Dyn1 Term -> Maybe (Dyn1 Term) | |
``` | |
Seems simple enough. We deserialize both types, and then make an application. However, we don't know that the types of the terms that we parsed are actually `(a -> b)` and `a`. I was actually beating my head against this problem for hours. :) | |
The thing is, we can get the `TypeRep`s for the function and value term types, since we know they're `Typeable`. We can even figure out the `TypeRep` of the result (if the types are correct), using `funResultTy` from `Data.Typeable`. If we have a function type `f` and a value type `a`, and a result type `b`, we could use `eqT` from `Data.Typeable` to get a proof that `f ~ a -> b`. The problem is, we know `b`'s `TypeRep`, but we don't have `b` itself. What to do? | |
Well, we need a function that takes a `TypeRep` and gives you a `Proxy` for the same type. | |
```haskell | |
reifyTypeRep | |
:: forall r. TypeRep -> (forall a. Typeable a => Proxy a -> r) -> r | |
``` | |
What's going on here? Well, we don't know `a` at runtime, so the only way we can provide this `Proxy` is to pass it into a function that works for _any_ `a`, then return what it returns. | |
This seems like a function that should be possible, but I couldn't figure it out. At this point I asked my insane type wizard friend Slemi for advice. He couldn't do it either, but the next day he came back with this: | |
```haskell | |
-- dark magic a la Data.Reflection | |
newtype Magic r = Magic (forall a. Typeable a => Proxy a -> r) | |
reifyTypeRep | |
:: forall r. TypeRep -> (forall a. Typeable a => Proxy a -> r) -> r | |
reifyTypeRep rep f | |
= unsafeCoerce (Magic f) typeRep' Proxy | |
where | |
-- typeRep seems to take a zero-length argument | |
typeRep' :: Proxy# a -> TypeRep | |
typeRep' _ = rep | |
``` | |
This is some serious black magic, and I wish I thought of it. The trick behind this is that internally the implementation of typeclasses is passed in by GHC as an explicit parameter. So if you write `Show a => a -> String`, internally it gets converted into `ShowImpl a -> a -> String`, where `ShowImpl` contains the implementations of the methods in `Show`. If your typeclass has just one method, like `Typeable`, then the implementation consists of a single function. | |
This means that we can `unsafeCoerce` `(forall a. Typeable a => Proxy a -> r)` into `(forall a. (Proxy# a -> TypeRep) -> Proxy a -> r)`, where `(Proxy# a -> TypeRep)` is our implementation of `typeRep`. This trick is borrowed from ekmett's `Data.Reflection`. | |
We can use this to produce a Proxy for a type such that if you call `typeRep` on it, you get our given `TypeRep` back. The compiler will then happily propagate our made-up Typeable instance everywhere where the type is used, meaning that for all intents and purposes it is the reified `TypeRef`. | |
Now we've done a deal with the devil, let's use it to deconstruct unknown types: | |
```haskell | |
-- witnesses c ~ (some) f a | |
data Decons (c :: k) where | |
Decons :: (Typeable f, Typeable a) => c :~: (f :: * -> k) a -> Decons c | |
-- if c ~ (some) f a, provide a witness | |
deconsT :: forall proxy c. Typeable c => proxy c -> Maybe (Decons (c :: k)) | |
deconsT p = buildProof <$> viewEnd tyParams | |
where | |
( tyCon, tyParams ) = splitTyConApp (typeRep p) | |
viewEnd = \case | |
[] -> Nothing -- if this happens, c is not of the form f a | |
[ x ] -> Just ( x, [] ) | |
x : xs -> fmap (x:) <$> viewEnd xs | |
buildProof ( aRep, prevParams ) = | |
let fRep = mkTyConApp tyCon prevParams in | |
reifyTypeRep fRep $ \(_ :: Proxy f) -> | |
reifyTypeRep aRep $ \(_ :: Proxy a) -> | |
Decons (unsafeCoerce Refl :: c :~: f a) | |
-- eqT would just check what we already know, so we can cheat | |
``` | |
As described above, a `TypeRep` is just a tree of type constructors. So, we just need to do some simple list manipulation to separate `Either String Int` into `Either String` and `Int`. If the list of parameters is empty, we don't have something of the form `f a`, so we return `Nothing`. Now we've got a TypeRep for `f` and `a`, we just reify them both into type variables, and we can produce a witness for `c ~ f a`. We could use `eqT` for this, but since it internally checks if the `TypeReps` match and `unsafeCoerce`s a proof if they do, and we know for a fact that they match since we made them ourselves, we can just `unsafeCoerce` the proof ourselves. | |
We can now implement app easily: | |
```haskell | |
app :: Dyn1 Term -> Dyn1 Term -> Maybe (Dyn1 Term) | |
app (Dyn1 f) (Dyn1 (x :: Term a)) = do | |
Decons (Refl :: fab :~: fa b) <- deconsT f | |
Refl :: fa :~: (->) a <- eqT | |
return $ Dyn1 (App f x) | |
``` | |
All in the Maybe Monad, we first show `fab ~ fa b` using `f` as a proxy. Then, if `fa ~ (->) a`, the compiler knows that `fab ~ ((->) a) b ~ (a -> b)`, and it lets us apply `App` to `f` and `x`. | |
Now you might say "OK, this works, but the unsafeCoerces make me uneasy". That's true, but all the black magic is contained in a couple of small general primitives that can be checked for safety easily (much like `cast` or `eqT`, which also `unsafeCoerce`). The rest of the code can then benefit from them without anything dodgy. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment