emilaxelsson/FeldsparBound.lhs

## FeldsparBound.lhs
 > [UPDATE, 2015-02-23: The post has been slightly updated based on comments by Edward Kmett.]

In trying to understand the [Bound library](http://hackage.haskell.org/package/bound), and in particular, its limitations (if any) in terms of expressiveness and efficiency. I will here try to implement a small subset of the [Feldspar](http://hackage.haskell.org/package/feldspar-language) EDSL using Bound.

DISCLAIMER: Statements in this text may very well reflect my misunderstandings rather than actual truths about Bound.

Note that I'm assuming basic familiarity with Bound. If you're not familiar with the library, see [its documentation](http://hackage.haskell.org/package/bound/docs/Bound.html).

I would be happy to receive feedback and hints for how to improve the code.

The whole code can be found here: [FeldsparBound.lhs](https://gist.github.com/emilaxelsson/f9fca8f27afb4da01829#file-feldsparbound-lhs)


  <!--

\begin{code}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE Rank2Types #-}

{-# OPTIONS_GHC -fno-warn-missing-methods #-}

import Bound
import Bound.Scope
import Bound.Var
import Control.Applicative
import Control.Monad (ap)
import Control.Monad.Trans
import Prelude.Extras
import Data.Foldable as Foldable
import Data.Traversable
\end{code}

  -->


Implementing Feldspar using Bound
----------------------------------------------------------------------------------------------------

Here is the subset of Feldspar that we will work with:

\begin{code}
data Data a
    = Var a
    | Int Int
    | Add (Data a) (Data a)
    | Ix (Data a) (Data a)
    | Par (Data a) (Scope () Data a)
  deriving (Eq,Show,Functor,Foldable,Traversable)
\end{code}

`Ix` represents array indexing, and `Par` represents array construction, corresponding to the following function on lists:

\begin{code}
parList :: Int -> (Int -> a) -> [a]
parList l f = [f i | i <- [0 .. l-1]]
\end{code}

For simplicity, I chose to put the `Scope` directly on `Par` instead of going through a lambda as we do in Feldspar.

  <!--

For the deriving of `Eq` and `Show` above, we need some class instances from the package [prelude-extras](http://hackage.haskell.org/package/prelude-extras):

\begin{code}
instance Eq1 Data
instance Show1 Data
\end{code}

These are required by the `Eq` and `Show` instances for `Scope`.

  -->

The `Monad` instance defines substitution in a standard way:

\begin{code}
instance Monad Data
  where
    return = Var
    Var a   >>= k = k a
    Int a   >>= k = Int a
    Add a b >>= k = Add (a >>= k) (b >>= k)
    Ix a i  >>= k = Ix (a >>= k) (i >>= k)
    Par l f >>= k = Par (l >>= k) (f >>>= k)

instance Applicative Data where pure = return; (<*>) = ap
\end{code}


User interface
----------------------------------------------------------------------------------------------------

Here I choose to represent free variables as strings:

\begin{code}
type Name = String

type D = Data Name
\end{code}

The user interface is then as follows:

\begin{code}
v :: Name -> D
v = Var

instance Num D
  where
    fromInteger = Int . fromInteger
    (+) = Add

(!) :: D -> D -> D
(!) = Ix

par :: Name -> D -> D -> D
par i l f = Par l $ abstract1 i f
\end{code}

Here are two simple example expressions:

\begin{code}
ex1 = par "i" 10 $ v "i" + 0
  -- Corresponds to `parList 10 $ \i -> i + 0`

ex2 = par "i" 10 $ v "arr" ! (v "i" + 0)
  -- Corresponds to `parList 10 $ \i -> arr !! (i + 0)`
\end{code}

In the definition of `par`, we use `abstract1` is to bind the free variable (convert to deBruijn). It has to traverse the whole body, so the complexity is quadratic in the number of nested binders.


Transformations
----------------------------------------------------------------------------------------------------

Now to the interesting question of how one goes about to transform terms in the above representation.

Ideally, we want to do this without looking at deBruijn indices at all. Two questions arise:

  1. How to apply a transformation under a `Scope`?
  2. How to pattern match on binders and variables and rewrite them?

For the first question, my initial naive attempt was to use a combination of `instantiate1` and `abstract1` to lift a transformation on `D` to a transformation on `Scope`:

\begin{code}
transScopeBAD :: (D -> D) -> Scope () Data Name -> Scope () Data Name
transScopeBAD f = abstract1 "x" . f . instantiate1 (Var "x")
\end{code}

The use of `instantiate1` changes all occurrences of the bound variable to the free variable `Var "x"` and `abstract1` re-binds that variable after `f` has been applied to the body. Unfortunately, this doesn't work. If there are nested scopes, their corresponding variables will all be renamed to `"x"`, so the innermost `abstract1` is going to capture all of them. Also, since `abstract1` has to traverse the whole body, nested uses of `transScope` are going to be inefficient.

A better approach is to work more directly with the representation and use `fromScope` and `toScope` (here with the type specialized to `Data`):

\begin{code}
transScope :: (Data (Var b a) -> Data (Var b a)) -> Scope b Data a -> Scope b Data a
transScope f = toScope . f . fromScope
\end{code}

(This version of `transScope` was suggested by Edward. In my initial attempt, I used `hoistScope` instead.)

Note how the type clearly expresses the fact that the term passed to the transformation function `f` has one extra variable (of type `b`) in addition to the free variables (of type `a`). This information was missing in `transScopeBAD`.

Using `transScope`, we can now write the following simplifier:

\begin{code}
simplify :: Data a -> Data a
simplify e = simp $ case e of
    Add a b -> Add (simplify a) (simplify b)
    Ix a i  -> Ix (simplify a) (simplify i)
    Par l f -> Par (simplify l) $ transScope simplify f
    _ -> e
  where
    simp (Add (Int 0) b) = b
    simp (Add a (Int 0)) = a
    -- The catch-all case `simp e = e` will come later
\end{code}

Note that it is important for `simplify` to be polymorphic in the free variables since we're using it at a different type when passing it to `transScope`. This requirement was not obvious to me at first. (In retrospect, it is obvious that the polymorpic type is preferred, since I don't want the simplifier to do anything with the free variables.)

To make the simplifier more interesting, we add a case corresponding to the following rule for `parList` (defined above):

~~~~~~~~~~~~~~~~~~~~
parList l f !! i   ==>   f i
~~~~~~~~~~~~~~~~~~~~

This turns out to be easy:

\begin{code}
    simp (Ix (Par l f) i) = instantiate1 i f
\end{code}

Another similar rule that we may want to express is

~~~~~~~~~~~~~~~~~~~~
parList l (\i -> arr !! i)   ==>   arr   (where i `notElem` freeVars arr)
~~~~~~~~~~~~~~~~~~~~

(In both of these rules, we ignore out-of-bounds indexing for simplicity.)

To express this rule, we need to pattern match on binders and variables. My initial attempt was the following:

\begin{code}
--     simp (Par l f)
--       | Ix arr i <- instantiate1 (v "i") f
--       , i == v "i"
--       , "i" `Foldable.notElem` arr
--       = arr
\end{code}

This looks quite nice from a readability perspective, but it doesn't work because `transScope` does not permit free variables as strings. (And, in general, using `instantiate1` in this way has the same problem as `transScopeBAD` above, that the variable `"i"` might easily clash with other instantiated variables.)

Instead, it seems that one has to work more directly with the deBruijn representation:

\begin{code}
    simp (Par l f)
      | Ix arr i   <- fromScope f
      , Var (B ()) <- i
      , Just arr'  <- lower arr
      = arr'

    -- Catch-all case to complete the definition of `simp`:
    simp e = e
\end{code}

The `lower` function tries to lower the scope of an expression, and succeeds if the expression does not refer to the closest bound variable:

\begin{code}
lower :: Data (Var b a) -> Maybe (Data a)
lower = traverse $ unvar (const Nothing) Just
\end{code}

Note the use of `fromScope` rather than `unscope` above. This is to change the generalized deBruijn representation into one the standard representation which `lower` expects.

I would have hoped to avoid matching explicitly on the deBruijn representation of `i` (namely `Var (B ())`). But I guess this is just unavoidable?

The check that `i` does not occur in `arr` is replaced by the pattern match on the result of `lower`. One nice thing here is that the type system does not allow us to return `arr` directly, so we cannot accidentally forget to check whether `i` occurs in `arr` (which I tend to forget everytime!).


Examples
--------

~~~~~~~~~~~~~~~~~~~~
*Main> ex1
Par (Int 10) (Scope (Add (Var (B ())) (Int 0)))

*Main> simplify ex1
Par (Int 10) (Scope (Var (B ())))

*Main> ex2
Par (Int 10) (Scope (Ix (Var (F (Var "arr"))) (Add (Var (B ())) (Int 0))))

*Main> simplify ex2
Var "arr"
~~~~~~~~~~~~~~~~~~~~


Discussion
----------------------------------------------------------------------------------------------------

At first I was tempted to use `instantiate1` and `abstract1` everywhere and to represent free variables as strings. Although this leads to code that looks readable, it does not work in general:

  * Representing free variables as strings can easily lead to capturing bugs
  * It leads to inefficient code:
      - `abstract1` has to traverse the whole expression, so it should be used sparingly
      - `instantiate1` may not have to traverse the whole expression due to the generalized deBruijn representation, but I'm not sure what this means in practice

From a capturing point of view, it may be considered OK to use `abstract1` and variables as strings the way I did in the user interface above. The user will just have to make sure to use distinct variable names. However, due to the nested uses of `abstract1`, it is still very inefficient to construct terms in this way. A better way is probably to construct terms in a different representation and then convert them all to deBruijn in a single pass.

Once I realized that I couldn't use `instantiate1` and `abstract1` for transformations under a `Scope`, I was forced to use `fromScope`/`toScope` which expose a bit the representation of variables. However, this turned out to be a very good thing that just made the code safer.

The resulting code does not use `abstract1`, so it avoids that source of inefficiency. There is one use of `instantiate1`, but that is a substitution coming from the rewrite rule -- it has nothing to do with the Bound library (and it could be avoided by rewriting to a let binding instead of performing the substitution).

In the end, there appears to be only one problematic inefficiency left: the `lower` function. In order to know whether an expression refers to the closest bound variable, `lower` naturally has to traverse the whole expression. Of course, this is true regardless of how variables are represented. But in a standard named representation of variables, it is e.g. possible to return a set of free variables from `simplify`, so that membership can be tested without traversing the expression. I don't see how that could be done here.

In conclusion, I see many advantages of using Bound, especially when it comes to safety and not worrying about capturing. However, there are some things that make me hesitate on adopting it for something like Feldspar:

  * I wasn't able to completely hide the deBruijn indices in the simplifier. Can it be done? Is this going to be a big problem for people trying to express rewrite rules?
  * The `lower` function is a potential source of inefficiency in a simplifier. Can it be fixed? Will there be other inefficiencies as the rules get more complex?

If someone has the answers to those questions, I would be happy to know!
	> [UPDATE, 2015-02-23: The post has been slightly updated based on comments by Edward Kmett.]

	In trying to understand the [Bound library](http://hackage.haskell.org/package/bound), and in particular, its limitations (if any) in terms of expressiveness and efficiency. I will here try to implement a small subset of the [Feldspar](http://hackage.haskell.org/package/feldspar-language) EDSL using Bound.

	DISCLAIMER: Statements in this text may very well reflect my misunderstandings rather than actual truths about Bound.

	Note that I'm assuming basic familiarity with Bound. If you're not familiar with the library, see [its documentation](http://hackage.haskell.org/package/bound/docs/Bound.html).

	I would be happy to receive feedback and hints for how to improve the code.

	The whole code can be found here: [FeldsparBound.lhs](https://gist.github.com/emilaxelsson/f9fca8f27afb4da01829#file-feldsparbound-lhs)



	<!--

	\begin{code}
	{-# LANGUAGE DeriveFunctor #-}
	{-# LANGUAGE DeriveFoldable #-}
	{-# LANGUAGE DeriveTraversable #-}
	{-# LANGUAGE FlexibleInstances #-}
	{-# LANGUAGE Rank2Types #-}

	{-# OPTIONS_GHC -fno-warn-missing-methods #-}

	import Bound
	import Bound.Scope
	import Bound.Var
	import Control.Applicative
	import Control.Monad (ap)
	import Control.Monad.Trans
	import Prelude.Extras
	import Data.Foldable as Foldable
	import Data.Traversable
	\end{code}

	-->



	Implementing Feldspar using Bound
	----------------------------------------------------------------------------------------------------

	Here is the subset of Feldspar that we will work with:

	\begin{code}
	data Data a
	= Var a
	\| Int Int
	\| Add (Data a) (Data a)
	\| Ix (Data a) (Data a)
	\| Par (Data a) (Scope () Data a)
	deriving (Eq,Show,Functor,Foldable,Traversable)
	\end{code}

	`Ix` represents array indexing, and `Par` represents array construction, corresponding to the following function on lists:

	\begin{code}
	parList :: Int -> (Int -> a) -> [a]
	parList l f = [f i \| i <- [0 .. l-1]]
	\end{code}

	For simplicity, I chose to put the `Scope` directly on `Par` instead of going through a lambda as we do in Feldspar.

	<!--

	For the deriving of `Eq` and `Show` above, we need some class instances from the package [prelude-extras](http://hackage.haskell.org/package/prelude-extras):

	\begin{code}
	instance Eq1 Data
	instance Show1 Data
	\end{code}

	These are required by the `Eq` and `Show` instances for `Scope`.

	-->

	The `Monad` instance defines substitution in a standard way:

	\begin{code}
	instance Monad Data
	where
	return = Var
	Var a >>= k = k a
	Int a >>= k = Int a
	Add a b >>= k = Add (a >>= k) (b >>= k)
	Ix a i >>= k = Ix (a >>= k) (i >>= k)
	Par l f >>= k = Par (l >>= k) (f >>>= k)

	instance Applicative Data where pure = return; (<*>) = ap
	\end{code}



	User interface
	----------------------------------------------------------------------------------------------------

	Here I choose to represent free variables as strings:

	\begin{code}
	type Name = String

	type D = Data Name
	\end{code}

	The user interface is then as follows:

	\begin{code}
	v :: Name -> D
	v = Var

	instance Num D
	where
	fromInteger = Int . fromInteger
	(+) = Add

	(!) :: D -> D -> D
	(!) = Ix

	par :: Name -> D -> D -> D
	par i l f = Par l $ abstract1 i f
	\end{code}

	Here are two simple example expressions:

	\begin{code}
	ex1 = par "i" 10 $ v "i" + 0
	-- Corresponds to `parList 10 $ \i -> i + 0`

	ex2 = par "i" 10 $ v "arr" ! (v "i" + 0)
	-- Corresponds to `parList 10 $ \i -> arr !! (i + 0)`
	\end{code}

	In the definition of `par`, we use `abstract1` is to bind the free variable (convert to deBruijn). It has to traverse the whole body, so the complexity is quadratic in the number of nested binders.



	Transformations
	----------------------------------------------------------------------------------------------------

	Now to the interesting question of how one goes about to transform terms in the above representation.

	Ideally, we want to do this without looking at deBruijn indices at all. Two questions arise:

	1. How to apply a transformation under a `Scope`?
	2. How to pattern match on binders and variables and rewrite them?

	For the first question, my initial naive attempt was to use a combination of `instantiate1` and `abstract1` to lift a transformation on `D` to a transformation on `Scope`:

	\begin{code}
	transScopeBAD :: (D -> D) -> Scope () Data Name -> Scope () Data Name
	transScopeBAD f = abstract1 "x" . f . instantiate1 (Var "x")
	\end{code}

	The use of `instantiate1` changes all occurrences of the bound variable to the free variable `Var "x"` and `abstract1` re-binds that variable after `f` has been applied to the body. Unfortunately, this doesn't work. If there are nested scopes, their corresponding variables will all be renamed to `"x"`, so the innermost `abstract1` is going to capture all of them. Also, since `abstract1` has to traverse the whole body, nested uses of `transScope` are going to be inefficient.

	A better approach is to work more directly with the representation and use `fromScope` and `toScope` (here with the type specialized to `Data`):

	\begin{code}
	transScope :: (Data (Var b a) -> Data (Var b a)) -> Scope b Data a -> Scope b Data a
	transScope f = toScope . f . fromScope
	\end{code}

	(This version of `transScope` was suggested by Edward. In my initial attempt, I used `hoistScope` instead.)

	Note how the type clearly expresses the fact that the term passed to the transformation function `f` has one extra variable (of type `b`) in addition to the free variables (of type `a`). This information was missing in `transScopeBAD`.

	Using `transScope`, we can now write the following simplifier:

	\begin{code}
	simplify :: Data a -> Data a
	simplify e = simp $ case e of
	Add a b -> Add (simplify a) (simplify b)
	Ix a i -> Ix (simplify a) (simplify i)
	Par l f -> Par (simplify l) $ transScope simplify f
	_ -> e
	where
	simp (Add (Int 0) b) = b
	simp (Add a (Int 0)) = a
	-- The catch-all case `simp e = e` will come later
	\end{code}

	Note that it is important for `simplify` to be polymorphic in the free variables since we're using it at a different type when passing it to `transScope`. This requirement was not obvious to me at first. (In retrospect, it is obvious that the polymorpic type is preferred, since I don't want the simplifier to do anything with the free variables.)

	To make the simplifier more interesting, we add a case corresponding to the following rule for `parList` (defined above):

	~~~~~~~~~~~~~~~~~~~~
	parList l f !! i ==> f i
	~~~~~~~~~~~~~~~~~~~~

	This turns out to be easy:

	\begin{code}
	simp (Ix (Par l f) i) = instantiate1 i f
	\end{code}

	Another similar rule that we may want to express is

	~~~~~~~~~~~~~~~~~~~~
	parList l (\i -> arr !! i) ==> arr (where i `notElem` freeVars arr)
	~~~~~~~~~~~~~~~~~~~~

	(In both of these rules, we ignore out-of-bounds indexing for simplicity.)

	To express this rule, we need to pattern match on binders and variables. My initial attempt was the following:

	\begin{code}
	-- simp (Par l f)
	-- \| Ix arr i <- instantiate1 (v "i") f
	-- , i == v "i"
	-- , "i" `Foldable.notElem` arr
	-- = arr
	\end{code}

	This looks quite nice from a readability perspective, but it doesn't work because `transScope` does not permit free variables as strings. (And, in general, using `instantiate1` in this way has the same problem as `transScopeBAD` above, that the variable `"i"` might easily clash with other instantiated variables.)

	Instead, it seems that one has to work more directly with the deBruijn representation:

	\begin{code}
	simp (Par l f)
	\| Ix arr i <- fromScope f
	, Var (B ()) <- i
	, Just arr' <- lower arr
	= arr'

	-- Catch-all case to complete the definition of `simp`:
	simp e = e
	\end{code}

	The `lower` function tries to lower the scope of an expression, and succeeds if the expression does not refer to the closest bound variable:

	\begin{code}
	lower :: Data (Var b a) -> Maybe (Data a)
	lower = traverse $ unvar (const Nothing) Just
	\end{code}

	Note the use of `fromScope` rather than `unscope` above. This is to change the generalized deBruijn representation into one the standard representation which `lower` expects.

	I would have hoped to avoid matching explicitly on the deBruijn representation of `i` (namely `Var (B ())`). But I guess this is just unavoidable?

	The check that `i` does not occur in `arr` is replaced by the pattern match on the result of `lower`. One nice thing here is that the type system does not allow us to return `arr` directly, so we cannot accidentally forget to check whether `i` occurs in `arr` (which I tend to forget everytime!).



	Examples
	--------

	~~~~~~~~~~~~~~~~~~~~
	*Main> ex1
	Par (Int 10) (Scope (Add (Var (B ())) (Int 0)))

	*Main> simplify ex1
	Par (Int 10) (Scope (Var (B ())))

	*Main> ex2
	Par (Int 10) (Scope (Ix (Var (F (Var "arr"))) (Add (Var (B ())) (Int 0))))

	*Main> simplify ex2
	Var "arr"
	~~~~~~~~~~~~~~~~~~~~



	Discussion
	----------------------------------------------------------------------------------------------------

	At first I was tempted to use `instantiate1` and `abstract1` everywhere and to represent free variables as strings. Although this leads to code that looks readable, it does not work in general:

	* Representing free variables as strings can easily lead to capturing bugs
	* It leads to inefficient code:
	- `abstract1` has to traverse the whole expression, so it should be used sparingly
	- `instantiate1` may not have to traverse the whole expression due to the generalized deBruijn representation, but I'm not sure what this means in practice

	From a capturing point of view, it may be considered OK to use `abstract1` and variables as strings the way I did in the user interface above. The user will just have to make sure to use distinct variable names. However, due to the nested uses of `abstract1`, it is still very inefficient to construct terms in this way. A better way is probably to construct terms in a different representation and then convert them all to deBruijn in a single pass.

	Once I realized that I couldn't use `instantiate1` and `abstract1` for transformations under a `Scope`, I was forced to use `fromScope`/`toScope` which expose a bit the representation of variables. However, this turned out to be a very good thing that just made the code safer.

	The resulting code does not use `abstract1`, so it avoids that source of inefficiency. There is one use of `instantiate1`, but that is a substitution coming from the rewrite rule -- it has nothing to do with the Bound library (and it could be avoided by rewriting to a let binding instead of performing the substitution).

	In the end, there appears to be only one problematic inefficiency left: the `lower` function. In order to know whether an expression refers to the closest bound variable, `lower` naturally has to traverse the whole expression. Of course, this is true regardless of how variables are represented. But in a standard named representation of variables, it is e.g. possible to return a set of free variables from `simplify`, so that membership can be tested without traversing the expression. I don't see how that could be done here.

	In conclusion, I see many advantages of using Bound, especially when it comes to safety and not worrying about capturing. However, there are some things that make me hesitate on adopting it for something like Feldspar:

	* I wasn't able to completely hide the deBruijn indices in the simplifier. Can it be done? Is this going to be a big problem for people trying to express rewrite rules?
	* The `lower` function is a potential source of inefficiency in a simplifier. Can it be fixed? Will there be other inefficiencies as the rules get more complex?

	If someone has the answers to those questions, I would be happy to know!