Skip to content

Instantly share code, notes, and snippets.

@nponeccop
Created February 17, 2017 00:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nponeccop/69105a7005ffbcaf916b13220d00b3bd to your computer and use it in GitHub Desktop.
Save nponeccop/69105a7005ffbcaf916b13220d00b3bd to your computer and use it in GitHub Desktop.
Groupoid HNC posts raw texts

Type Checker

Type Inference

The type system of HN is known as Hindley-Milner Let Polymorphism. Out of many variants of implementation, so called Algorithm W by Damas is used.

The algorithm is not implemented directly by hand but builds upon 2 abstractions: UUAGC attribute grammar and unification-fd library.

Abstract Syntax of Types

data T
    = T String        -- simple types such as int or char
    | TT [T]          -- function types (T,T,...,T) -> T, but stored as a uniform list
    | TV String       -- type variable
    | TD String [T]   -- parametrized types such as list<char> or pair<int, char> 

Type Unification

While usually described for types, the so called Structural Unification of First-Order Terms is defined for any trees with variables in leaves.

The library requires us to decompose our T into 2 types:

data UTerm t v = UTerm (t (UTerm t v)) | UVar t 

data T a
    = T a 
    | TT [a]
    | TD String [a]

The approach to modularity here is known as two-level types and it is similar to that of Datatypes a la Carte (Swierstra, 2008). That is, fixpoint of T is the term without variables, and the library decorates the fixpoint with UTerm to extend the type with variables in a modular way (see also Expression Problem).

The approach makes unification-fd not only a library for unification, but also a library for variable bindings. That part of unification-fd is used by the type inference algorithm as it needs to find term free variables and similar things that unification-fd implements in a generic way.

The unification logic itself is only 7 lines of work (not counting the instance..where boilerplate):

maybeZip a b | length a == length b = Just $ zipWith (curry Right) a b
maybeZip _ _ = Nothing

instance Unifiable T where
    zipMatch (TT a1) (TT a2) = TT <$> maybeZip a1 a2
    zipMatch (T a) (T b) | a == b = Just $ T a
    zipMatch (TU a) (TU b) | a == b = Just $ TU a
    zipMatch (TD l1 a1) (TD l2 a2) | l1 == l2 = TD l1 <$> maybeZip a1 a2
    zipMatch _ _ = Nothing

Quoting the documentation, zipMatch must only perform one level of equality testing for terms. If the term constructors are unequal then return Nothing; if they are equal, then return the one-level spine filled with resolved subterms and/or pairs of subterms to be recursively checked.

The idea of "spine with pairs" is somewhat analogous to that of paramorphism recursion scheme.

The declaration of T and the instance are in Unifier.Unifier module. Unifier.Restricted module reexports unify and subsumes functions from unification-fd with types restricted to HNC choice of representation of variables and errors. Without the restriction error messages become much worse, and ambiguity often arises, so Unifier.Restricted is a boilerplate module containing only type signatures.

subsumes is an asymmetric version of unify: it checks if the first term can be converted into the second one by substitutions in it while unify allows substitutions in both.

You can use the following gist to see the unifier at work: https://gist.github.com/nponeccop/739a720582c961ab05991d61fb5cb598

The example is self-contained: only the aforementioned Unifier/Unifier.hs and Unifier/Restricted.hs are needed. Load the min-unifier-hn.hs into GHCi and try runStack uni1 or runStack uni2

Unification and Type Inference

Below is a complete Hindley-Milner type checking algorithm for an combinatorial subset of HNC. There are only applications, no LetIn, Constant or Definition or uncurried functions with more than one parameter.

The necessary boilerplate along with predefined library and examples are provided in https://gist.github.com/nponeccop/7ba833db625fed06cb449cdfbd538be1

data Expression = Atom String | App Expression Expression

check :: Lib -> Expression -> Stack Type
check lib = \case
    Atom a -> look lib a
    App a b -> do
        UTerm (TT [argA, res]) <- check lib a
        argB <- check lib b
        unify argA argB
        return res

As the mini-language doesn't let us define new combinators, types of all atoms are supplied by the standard library Lib. We check that the symbol is defined at all (by look) and that if a symbol is applied to anything, its type must be a TT (function). Then we see if argA type the function accepts matches the actual inferred type argB of passed value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment