The type system of HN is known as Hindley-Milner Let Polymorphism. Out of many variants of implementation, so called Algorithm W by Damas is used.
The algorithm is not implemented directly by hand but builds upon 2 abstractions: UUAGC attribute grammar and unification-fd library.
data T
= T String -- simple types such as int or char
| TT [T] -- function types (T,T,...,T) -> T, but stored as a uniform list
| TV String -- type variable
| TD String [T] -- parametrized types such as list<char> or pair<int, char>
While usually described for types, the so called Structural Unification of First-Order Terms is defined for any trees with variables in leaves.
The library requires us to decompose our T into 2 types:
data UTerm t v = UTerm (t (UTerm t v)) | UVar t
data T a
= T a
| TT [a]
| TD String [a]
The approach to modularity here is known as two-level types and it is similar to that of Datatypes a la Carte (Swierstra, 2008). That is, fixpoint of T is the term without variables, and the library decorates the fixpoint with UTerm to extend the type with variables in a modular way (see also Expression Problem).
The approach makes unification-fd not only a library for unification, but also a library for variable bindings. That part of unification-fd is used by the type inference algorithm as it needs to find term free variables and similar things that unification-fd implements in a generic way.
The unification logic itself is only 7 lines of work (not counting the instance..where boilerplate):
maybeZip a b | length a == length b = Just $ zipWith (curry Right) a b
maybeZip _ _ = Nothing
instance Unifiable T where
zipMatch (TT a1) (TT a2) = TT <$> maybeZip a1 a2
zipMatch (T a) (T b) | a == b = Just $ T a
zipMatch (TU a) (TU b) | a == b = Just $ TU a
zipMatch (TD l1 a1) (TD l2 a2) | l1 == l2 = TD l1 <$> maybeZip a1 a2
zipMatch _ _ = Nothing
Quoting the documentation, zipMatch must only perform one level of equality testing for terms. If the term constructors are unequal then return Nothing; if they are equal, then return the one-level spine filled with resolved subterms and/or pairs of subterms to be recursively checked.
The idea of "spine with pairs" is somewhat analogous to that of paramorphism recursion scheme.
The declaration of T and the instance are in Unifier.Unifier module. Unifier.Restricted module
reexports unify
and subsumes
functions from unification-fd with types restricted to HNC
choice of representation of variables and errors. Without the restriction error messages
become much worse, and ambiguity often arises, so Unifier.Restricted is a boilerplate module
containing only type signatures.
subsumes
is an asymmetric version of unify
: it checks if the first term can be converted into
the second one by substitutions in it while unify
allows substitutions in both.
You can use the following gist to see the unifier at work: https://gist.github.com/nponeccop/739a720582c961ab05991d61fb5cb598
The example is self-contained: only the aforementioned Unifier/Unifier.hs and Unifier/Restricted.hs are needed. Load the min-unifier-hn.hs into
GHCi and try runStack uni1
or runStack uni2
Below is a complete Hindley-Milner type checking algorithm for an combinatorial subset of HNC. There are only applications, no
LetIn
, Constant
or Definition
or uncurried functions with more than one parameter.
The necessary boilerplate along with predefined library and examples are provided in https://gist.github.com/nponeccop/7ba833db625fed06cb449cdfbd538be1
data Expression = Atom String | App Expression Expression
check :: Lib -> Expression -> Stack Type
check lib = \case
Atom a -> look lib a
App a b -> do
UTerm (TT [argA, res]) <- check lib a
argB <- check lib b
unify argA argB
return res
As the mini-language doesn't let us define new combinators, types of all atoms are supplied by the standard library Lib
. We check that
the symbol is defined at all (by look
) and that if a symbol is applied to anything, its type must be a TT
(function). Then we see if
argA
type the function accepts matches the actual inferred type argB
of passed value.