In the blog post "Limits and Colimits", Bartosz Milewski defines the category theoretic concept of a "pullback", and states that a pullback can be used to infer the type of \f x -> f (f x)
:
For instance, suppose that the compiler wants to infer the type of a function:
twice f x = f (f x)
[...]
The most general substitution is obtained using a pullback. I won't go into the details, because they are beyond the scope of this book, but you can convince yourself that the result should be:
twice :: (t -> t) -> t -> t
with
t
a free type variable.
While the given type of twice
is certainly the most general type for normal Hasekll, the Rank2Types
language extension makes it possible to give many other different types to \f x -> f (f x)
.
As a non-exhaustive list, consider:
{-# LANGUAGE Rank2Types #-}
-- ...
doubleWrap :: (forall a. a -> m a) -> t -> m (m t)
doubleWrap f x = f (f x)
doubleUnwrap :: (forall a. w a -> a) -> w (w t) -> t
doubleUnwrap f x = f (f x)
pairAndRepair :: (forall a. a -> (a,a)) -> t -> ((t,t),(t,t))
pairAndRepair f x = f (f x)
lookupBoth :: (forall a b. c a => (e, a -> b) -> (e, b)) -> (c i, c j) => (e, i -> j -> k) -> (e, k)
lookupBoth f x = f (f x)
doubleRotation :: (forall a0 a1 a2. m a0 a1 a2 -> m a1 a2 a0) -> m t0 t1 t2 -> m t2 t0 t1
doubleRotation f x = f (f x)
Enabling ConstraintKinds
and AllowAmbiguousTypes
allows even more types for \f x -> f (f x)
, for example:
{-# LANGUAGE ConstraintKinds, AllowAmbiguousTypes #-}
-- ...
constrainedTwice :: (forall a. c a => a -> a) -> c t => t -> t
constrainedTwice f x = f (f x)
All of these functions have the same implementation, \f x -> f (f x)
, yet their types look very different.
What then is a more general type for \f x -> f (f x)
?
Before we get to that question, there's a more basic question that needs to be answered. What does it mean for two types to be different, or, for that matter, the same?
In haskell, we compare non-polymorphic types using nominal equality; if two types have the same name, they are nominally equal. If not, they're not. So Int
is nominally equal to Int
, but not to Char
.
Parametric types are a little different since some parts of the name (the parameters) are missing, so they can't necessarily be tested for nominal equality. However, the compiler can still test if two parametric types are unifiable; that is, determine if there's a substitution of parameters that makes the two types equal. For example, (Int, a)
and m (Char -> b)
are unifiable using the substitutions m ~ (,) Int
and a ~ (Char -> b)
, but there's no way to unify (Int, a)
and [b]
.
Type specialization is a specific sort of type unification where given a type a and b, on esays that type a specializes to type b if there exists a substitution of parameters to type a that makes it equal to type b. In the prior example, though (Int, a)
and m (Char -> b)
are unifiable, neither type can specialize to the other. Both, though, can specialize to (Int, Char -> b)
.
The basic question then becomes "What is a type for \f x -> f (f x)
that can specialize to any other type for \f x -> f (f x)
?
GHC is pretty good at type unification and specialization. For example, the type of twice
can be specialized to the type of constrainedTwice
:
constrainedTwice_using_twice :: (forall a. c => a -> a) -> c t => t -> t
constrainedTwice_using_twice = twice
However, GHC can't specialize the type of constrainedTwice
to that of twice
without assistance:
ghci> :{
... twice_using_constrainedTwice :: (t -> t) -> t -> t
... twice_using_constrainedTwice = constrainedTwice
... :}
<interactive>:152:32: error:
* Could not deduce: c0 t arising from a use of `constrainedTwice` ##FIXME *, quotes
...
<interactive>:152:32: error:
* Could not deduce: t ~ a
...
Part of the problem is that GHC can not automatically infer what the type variable c
in constrainedTwice
's type should be (this is the ambiguity AllowAmbiguousTypes
allowed), and so doesn't know whether the outer type parameter t
satisfies it. Another part is that it doesn't know how to unify forall a. c a => a -> a
and t -> t
without assuming t ~ a
.
The constraint t ~ a
can also be written in prefix form as (~) t a
, which gives an elegant solution to both problems. If GHC can be told to substitute (~) t
for c
, then not only is the ambiguity resolved, but forall a. (t ~ a) => a -> a
can be unified with t -> t
.
GHC's TypeApplications
language extension adds an @
operator to perform such substitutions. This can be used to give GHC the hint it needs to specialize the type of constrainedTwice
to that of twice
:
{-# LANGUAGE TypeApplications, ScopedTypeVariables #-}
-- ...
twice_using_constrainedTwice :: forall t. (t -> t) -> t -> t
twice_using_constrainedTwice = constrainedTwice @((~) t)
As a step towards discovering a more general type, consider a more specific one. \f x -> f (f x)
can also be applied to a family of functions forall a. a -> a
.
restrictedTwice :: (forall a. a -> a) -> t -> t
restrictedTwice f x = f (f x)
The only value of type forall a. a -> a
that doesn't involve bottom is id
, so restrictedTwice
isn't a very useful function, but the type of restrictedTwice
is unifiable with that of twice
:
restrictedTwice_using_twice :: (forall a. a -> a) -> t -> t
restrictedTwice_using_twice = twice
The type of restrictedTwice
is also almost identical to that of constrainedTwice
, but GHC again needs some help to specialize the type of the latter to that of the former.
{-# LANGUAGE FlexibleInstances #-}
-- ...
class Any a
instance Any a
restrictedTwice_using_constrainedTwice :: (forall a. a -> a) -> t -> t
restrictedTwice_using_constrainedTwice = constrainedTwice @Any
Even though the type of restrictedTwice
can't be specialized from the types of doubleWrap
or doubleUnwrap
, it can be implemented in terms of either of them using the Identity
functor.
-- import Data.Functor.Identity
-- ...
restrictedTwice_using_doubleWrap :: (forall a. a -> a) -> t -> t
restrictedTwice_using_doubleWrap f = runIdentity . runIdentity . doubleWrap (Identity . f)
-- restrictedTwice_using_doubleWrap = coerce . doubleWrap @Identity . coerce
-- restrictedTwice_using_doubleWrap = coerce (doubleWrap @Identity)
restrictedTwice_using_doubleUnwrap :: (forall a. a -> a) -> t -> t
restrictedTwice_using_doubleUnwrap f = doubleUnwrap (f . runIdentity) . Identity . Identity
-- restrictedTwice_using_doubleUnwrap f = coerce . doubleUnwrap @Identity . coerce
Here Identity
is used to substitute for the type constructor m
in doubleWrap
's type (forall a. a -> m a) -> t -> m (m t)
and for the type constructor w
in doubleUnwrap
's type (forall a. w a -> a) -> w (w t) -> t
, yet the wrapping and unwrapping it introduces obscures the relation between the types of restrictedTwice
and doubleWrap
or doubleUnwrap
.
Consider the type synonym Is
:
type Is a = a
If it were possible to use the unsaturated type synonym Is
as a substitute for m
in doubleWrap
's type or w
in doubleUnwrap
's type, then Is (Is a) ~ a
would follow for free, without all that wrapping and unwrapping. However, unsaturated type synonyms like Is
can't be passed arround in Haskell. Attempting to do so results in a compile error:
ghci> :t doubleWrap @Is
<interactive>:1:1: error:
* The type synonym 'Is' should have 1 argument, but has been given none #FIXME
This limitation can be overcome using defunctionalization.
In broad strokes, there are two sorts of type-level functions in Haskell:
-
type synonyms defined by
type
(likeIs
above). These can't be passed around unsaturated, but saturated type synonyms can evaluate to arbitrary types of any kind. -
type constructors defined by
data
ornewtype
(e.g.IO
,Maybe
orEither
). These may be passed around unsaturated, but type constructors must eventually evaluate to parameterized types of kindType
(aka*
).
Defunctionalization uses the best of both varieties of type-level functions while bypassing the limitations of each.
-
Define a type constructor to use as an identifier for a type-level function; allowing the type-level function to be passed around unsaturated.
-
Use a type family to match type-functions' identifiers with their implementations; allowing the type-level function to evaluate to types of any kind.
In Haskell, strongly-typed defunctionalization can be implemented with only a slight syntatic cost.
First, define a kind a --> b
for identifier types for type-functions that map a type of kind a
to a type of kind b
:
{-# LANGUAGE TypeOperators #-}
-- ...
type a --> b = (b -> Type) -> a -> Type
infixr 0 -->
The choice of implementation of a --> b
here is thoroughly arbitrary. (a,b,b) -> Type
or Either b a -> Type
or a -> b -> Type
all work just as well, as do infinitely many others. As we will never be saturating the type, all that matters on the right-hand side of the definition is:
-
it mentions
a
andb
, so they're still in scope for typechecking the implementation, and -
it's a function to kind
Type
, soa --> b
can be used as a kind for unsaturated type constructors.
An open type family can be used to connect an identifier type with its implementation. Since types and values have different namespaces in Haskell, this type family is here named $
, punning off the use of $
at the value level for application.
{-# LANGUAGE PolyKinds, DataKinds, TypeInType, TypeFamilies #-}
-- ...
type family ($) (f :: a --> b) (x :: a) :: b
infixr 0 $
Defining type-level functions is now very similar to defining value-level functions.
-
Declare the type-level function's identifier type and its kind on one line.
-
Define the type-level function's implementation on the next line.
For instance, compare
data ID :: a --> a
type instance ID $ a = a
with
id :: a -> a
id a = a
or
data CONST :: c -> a --> c
type instance (CONST c) $ _ = c
with
const :: c -> a -> c
const c _ = c
Another useful identifier is LIFT f
, which lifts a type constructor f :: a -> b
to a type-level function identifier of kind a --> b
.
data LIFT :: (a -> b) -> a --> b
type instance (LIFT f) $ a = f a
(More discussion of these type-level function identifiers can be found in Appendix C)
Applying defunctionalization to the types of doubleWrap
, doubleUnwrap
, and constrainedTwice
gives:
(forall a. a -> output $ a) -> t -> output $ output $ t
(forall a. input $ a -> a) -> input $ input $ t -> t
(forall a. constrain $ a => a -> a) -> constrain $ t => t -> t
It can be seen from looking at these types that the various type-level function identifiers only interfere with each other by introducing a constraint that the output
of the first application of f
must be a possible input
to the second. ###FIXME REWRITE
Making that constraint explicit enables the types of these functions all to have a common generalization, selfCompose
:
selfCompose ::
forall constrain input output v w.
(forall u. constrain $ u => input $ u -> output $ u) ->
(constrain $ v, (output $ v) ~ (input $ w), constrain $ w) =>
input $ v -> output $ w
selfCompose f x = f @w (f @v x)
-- ghci> :t selfCompose @(LIFT Any) @ID @(LIFT _)
-- selfCompose @(LIFT Any) @ID @(LIFT _)
-- :: (forall u. Any u => u -> t u) -> v -> t (t v)
doubleWrap_using_selfCompose :: forall m t. (forall a. a -> m a ) -> t -> m (m t)
doubleWrap_using_selfCompose = selfCompose @(LIFT Any) @ID @(LIFT m)
-- ghci> :t selfCompose @(LIFT Any) @(LIFT _) @ID
-- selfCompose @(LIFT Any) @(LIFT _) @ID
-- :: (forall u. Any u => t u -> u) -> t (t w) -> w
doubleUnwrap_using_selfCompose :: forall w t. (forall a. w a -> a) -> w (w t) -> t
doubleUnwrap_using_selfCompose = selfCompose @(LIFT Any) @(LIFT w) @ID
-- ghci> :t selfCompose @(LIFT _) @ID @ID
-- selfCompose @(LIFT _) @ID @ID
-- :: (forall u. t u => u -> u) -> w -> w
constrainedTwice_using_selfCompose :: forall c t. (forall a. c a => a -> a) -> c t -> t -> t
constrainedTwice_using_selfCompose = selfCompose @(LIFT c) @ID @ID
Furthermore, the type of selfCompose
turns out to be sufficiently general to specialize with all the other types of \f x -> f (f x)
given above (see Appendixes A and B).