Skip to content

Instantly share code, notes, and snippets.

What would you like to do?

Macro types

I very often find myself wanting unboxed polymorphic types (e.g. types that contain UNPACKed type variables). I find it extremely frustrating that it's easier to write fast and generic code in C++ than in Haskell.

I'd like to submit to the mailing list a very rough proposal on how this could be achieved in a pretty straightforward way in GHC.

The proposal is meant to be a proof of concept, just to show that this could be done rather easily. I did not think about a nice interface or the implementation details in GHC. My goal is to check the feasibility of this plan with GHC developers.

I'll call such types "macro types", since their effect is similar to defining a macro that defines a new type for each type variable instantiation.


data #Point a = Point
  { x :: {-# UNPACK #-} !a
  , y :: {-# UNPACK #-} !a

This definition defines the macro type #Point, with one parameter a.

Macro types definition would be allowed only for single-constructor records. The intent is that if we mention #Point Double, it will be equivalent to

data PointDouble = PointDouble
  { x :: {-# UNPACK #-} !Double
  , y :: {-# UNPACK #-} !Double

To use #Point generically, the following type class would be generated:

class PointFamily a where
  data #Point a :: * -- Family of types generated by @data #Point a@.
  #Point :: a -> a -> #Point a -- Constructor.
  #x :: #Point a -> a -- Projection @x@.
  #y :: #Point a -> a -- Projection @y@.

Thi type class lets us work with #Points generically, for example

distance :: (PointFamily a, Fractional a) => #Point a -> #Point a -> a
distance p1 p2 =
  let dx = #x p1 - #x p2
      dy = #y p1 - #y p2
  in sqrt (dx*dx + dy*dy)

Internally, for every type appearing for a, e.g. #Point Double, a new type equivalent to the PointDouble above would be generated by GHC, with the corresponding instance

instance PointFamily Double where
  data #Point Double = PointDouble
  #x = x
  #y = x

If it's not possible to instantiate #Point with the provided type (for example because the type is not UNPACKable, e.g. #Point (Maybe A)), GHC would throw an error.

Note that we can compile distance in its polymorphic version (as opposed to C++ templates, where template functions must be instantiated at every use). The polymorphic distance would require a call to "virtual functions" #x and #y, as provided by the PointFamily dictionary. But if we use INLINE or SPECIALIZE pragmas the virtual calls to #x and #y would disappear, making this as efficient as if we were to define distance on the manually defined PointDouble. Compiler hints would be put in place to always inline functions using macro types, if possible.

Note that the inlining is only important so that the PointFamily dictionary disappears, e.g. functions containing recursive helpers are fine, such as

{-# INLINE leftmost #-}
leftmost :: forall a. (PointFamily a, Ord a) => [#Point a] -> #Point a
leftmost [] = error "leftmost: no points"
leftmost (p0 : ps0) = go p0 ps0
    go :: #Point a -> [#Point a] -> Point# a
    go candidate (p : ps) =
      if #x p < #x candidate
        then go p ps
        else go candidate ps

It might be worth considering throwing a warning when a top-level definition whose type contains a macro type cannot be inlined, since the main performance benefit of using macro types would be lost.

We can define instances for these types as normal, for instance

instance (Show a, PointFamily a) => Show (#Point a) where
  {-# INLINE show #-}
  show pt = "Point{x = " ++ #x pt ++ ", y = " ++ #y pt ++ "}"

deriving support could also be added.

Further ideas

Hide or remove PointFamily from the user

In the examples above PointFamily is manipulated explicitely (e.g. in the type signature for distance). In most cases the right constraint could be generated automatically by GHC, although I think direct access to the type class would be beneficial (history shows that direct access to these facilities is good, see upcoming explicit type applications).

Maybe the type class associated to a macro type should not even exist -- for example we could simply represent #Point as a type family and treat construction and destruction as built-in syntax (the # prefix).

Pattern matching

Sugar could be added to pattern match, e.g.

foo :: Point# a -> ...
distance (Point# x1 y1) = ...
dinstance Point#{..} = ... -- #x and #y are available

No "record types" limitation

Instead of restricting ourselves to single-constructor records, we could simply generate

data Point a = Point a
  { x :: !a
  , y :: !a

class PointFamily a where
  data Point# a :: *
  destruct :: Point# a -> Point a
  construct :: Point a -> Point# a

However, I think it would be harder to guarantee the well-behavedness of the inlined functions if we had this intermediate type. I also don't think macro types would be very useful beyond polymorphic unboxed types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment