Macro types
I very often find myself wanting unboxed polymorphic types
(e.g. types that contain UNPACK
ed type variables). I find
it extremely frustrating that it's easier to write fast and
generic code in C++ than in Haskell.
I'd like to submit to the mailing list a very rough proposal on how this could be achieved in a pretty straightforward way in GHC.
The proposal is meant to be a proof of concept, just to show that this could be done rather easily. I did not think about a nice interface or the implementation details in GHC. My goal is to check the feasibility of this plan with GHC developers.
I'll call such types "macro types", since their effect is similar to defining a macro that defines a new type for each type variable instantiation.
Consider
data #Point a = Point
{ x :: {-# UNPACK #-} !a
, y :: {-# UNPACK #-} !a
}
This definition defines the macro type #Point
, with one parameter
a
.
Macro types definition would be allowed only for single-constructor
records. The intent is that if we mention #Point Double
, it will
be equivalent to
data PointDouble = PointDouble
{ x :: {-# UNPACK #-} !Double
, y :: {-# UNPACK #-} !Double
}
To use #Point
generically, the following type class would be
generated:
class PointFamily a where
data #Point a :: * -- Family of types generated by @data #Point a@.
#Point :: a -> a -> #Point a -- Constructor.
#x :: #Point a -> a -- Projection @x@.
#y :: #Point a -> a -- Projection @y@.
Thi type class lets us work with #Point
s generically, for example
distance :: (PointFamily a, Fractional a) => #Point a -> #Point a -> a
distance p1 p2 =
let dx = #x p1 - #x p2
dy = #y p1 - #y p2
in sqrt (dx*dx + dy*dy)
Internally, for every type appearing for a
, e.g. #Point Double
,
a new type equivalent to the PointDouble
above would be generated
by GHC, with the corresponding instance
instance PointFamily Double where
data #Point Double = PointDouble
#x = x
#y = x
If it's not possible to instantiate #Point
with the provided type
(for example because the type is not UNPACK
able, e.g.
#Point (Maybe A)
), GHC would throw an error.
Note that we can compile distance
in its polymorphic version
(as opposed to C++ templates, where template functions must be
instantiated at every use). The polymorphic distance
would
require a call to "virtual functions" #x
and #y
, as provided by
the PointFamily
dictionary. But if we use
INLINE
or SPECIALIZE
pragmas the virtual calls to #x
and #y
would disappear, making this as efficient as if we were to define
distance
on the manually defined PointDouble
. Compiler hints
would be put in place to always inline functions using macro types,
if possible.
Note that the inlining is only important so that the PointFamily
dictionary disappears, e.g. functions containing recursive
helpers are fine, such as
{-# INLINE leftmost #-}
leftmost :: forall a. (PointFamily a, Ord a) => [#Point a] -> #Point a
leftmost [] = error "leftmost: no points"
leftmost (p0 : ps0) = go p0 ps0
where
go :: #Point a -> [#Point a] -> Point# a
go candidate (p : ps) =
if #x p < #x candidate
then go p ps
else go candidate ps
It might be worth considering throwing a warning when a top-level definition whose type contains a macro type cannot be inlined, since the main performance benefit of using macro types would be lost.
We can define instances for these types as normal, for instance
instance (Show a, PointFamily a) => Show (#Point a) where
{-# INLINE show #-}
show pt = "Point{x = " ++ #x pt ++ ", y = " ++ #y pt ++ "}"
deriving
support could also be added.
Further ideas
PointFamily
from the user
Hide or remove In the examples above PointFamily
is manipulated explicitely
(e.g. in the type signature for distance
).
In most cases the right constraint could be generated
automatically by GHC, although I think direct access to the
type class would be beneficial (history shows that direct
access to these facilities is good, see upcoming explicit
type applications).
Maybe the type class associated to a macro type should not even
exist -- for example we could simply represent #Point
as a type
family and treat construction and destruction as built-in syntax
(the #
prefix).
Pattern matching
Sugar could be added to pattern match, e.g.
foo :: Point# a -> ...
distance (Point# x1 y1) = ...
or
dinstance Point#{..} = ... -- #x and #y are available
No "record types" limitation
Instead of restricting ourselves to single-constructor records, we could simply generate
data Point a = Point a
{ x :: !a
, y :: !a
}
class PointFamily a where
data Point# a :: *
destruct :: Point# a -> Point a
construct :: Point a -> Point# a
However, I think it would be harder to guarantee the well-behavedness of the inlined functions if we had this intermediate type. I also don't think macro types would be very useful beyond polymorphic unboxed types.