WillNess/edit 45225142 fix_cata_.txt

## edit 45225142 fix_cata_.txt
my edit for
  https://stackoverflow.com/questions/45225142/the-pattern-with-functions-like-bool-either-etc/45238164#45238164
  by https://stackoverflow.com/users/3234959/chi


First, the types you mention are not really GADTs, they are plain ADTs, since the return type of each constructor is always `T a`. A proper GADT would be something like

    data T a where
       K1 :: T Bool  -- not T a

Anyway, the technique you mention is a well known method to encode algebraic data types into (polymorphic) functions. It goes under many names, like Church encoding, Boehm-Berarducci encoding, encoding as a catamorphism, etc. Sometimes the Yoneda lemma is used to justify this approach, but there's no need to understand the category-theoretic machinery to understand the method.

Basically, the idea is the following. All the ADTs can be generated by

- *product* types `(,) a b`
- *sum* types `Either a b`
- *arrow* types `a -> b`
- *unit* type `()`
- *void* type `Void` (rarely used in Haskell, but theoretically nice)
- *variables* (if the type being defined has parameters)
- possibly, *basic* types (`Integer`, ...)
- type level *recursion*

Type level recursion is used when some value constructor takes the type which is being defined as an argument. The classic example is Peano-style naturals:

    data Nat where
       O :: Nat
       S :: Nat -> Nat
         -- ^^^ recursive!

Lists are also common:

    data List a where
       Nil  :: List a
       Cons :: a -> List a -> List a
                 -- ^^^^^^ recursive!

Types like `Maybe a`, pairs, etc. are non recursive.

Note that each ADT, recursive or not, can be reduced to a single constructor with a single argument, by summing (`Either`) over all the constructors, while multiplying (`(,)`) all the arguments of each constructor. For instance, `Nat` is isomorphic to

    data Nat1 where
      O1 :: () -> Nat
      S1 :: Nat -> Nat

which is isomorphic to

    data Nat2 where K2 :: (Either () Nat) -> Nat

Lists become

    data List1 a where K1 :: (Either () (a, List a)) -> List a

The steps above make use of the algebra of types, which makes the sum and products of types obey the same rules as high school algebra, while `a -> b` behaves like the exponential `b^a`<sup>1</sup>.

Hence, we can write any ADT as a pair of `T` and `F`, in the form

    -- pseudo code
    data T where
       K :: F T -> T
    type F r = .....

For instance

    type F_Nat    r = Either () r       -- for T = Nat,
                                        --     K_Nat    :: F_Nat Nat -> Nat
    type F_List_a r = Either () (a, r)  -- for T = List a,
                                        --     K_List_a :: F_List_a (List a) -> List a

(Note that the latter type function `F_List_a` depends on `a`, but it's not important right now.)

Non recursive types will not use `r`:

    type F_Maybe_a r = Either () a      -- for T = Maybe a,
                                        --     K_Maybe_a :: F_Maybe_a (Maybe a) -> Maybe a

Note that constructor `K` above makes the type `T` isomorphic to `F T` (let's ignore the lifting / extra bottom introduced by it). Essentially, we have that

    Nat     ~= F_Nat     Nat       = Either () Nat
    List  a ~= F_List_a  (List a)  = Either () (a, List a)
    Maybe a ~= F_Maybe_a (Maybe a) = Either () a

We can even formalize this further by abstracting from `F`s,

    newtype Fix f = Fix { unFix :: f (Fix f) }

By definition `Fix F` will now be isomorphic to `F (Fix F)`. We could let

    type Nat       =  Fix  F_Nat       -- ~ F_Nat     (Fix F_Nat)     = F_Nat     Nat
    type (List a)  =  Fix  F_List_a    -- ~ F_List_a  (Fix F_List_a)  = F_List_a  (List a)
    type (Maybe a) =  Fix  F_Maybe_a   -- ~ F_Maybe_a (Fix F_Maybe_a) = F_Maybe_a (Maybe a)

(In Haskell, we need newtype wrappers around them, which I omit for clarity.)

Finally, the general encoding, or catamorphism, is:

    cata :: (F r -> r) -> Fix F -> r

This assumes that `F` is a functor.

For `Nat ~= Fix F_Nat`, we get

    cata :: (Either () r -> r) -> Nat -> r
    -- isomorphic to
    cata :: (() -> r, r -> r) -> Nat -> r
    -- isomorphic to
    cata :: (r, r -> r) -> Nat -> r
    -- isomorphic to
    cata :: r -> (r -> r) -> Nat -> r

Note the "high school albegra" steps, where `r^(1+r) = r^1 * r^r`, hence `Either () r -> r ~= (() -> r, r -> r)`.

Note that we get two arguments, `r` and `r -> r`, which correspond to `O` and `S`. This is not a coincidence -- we summed over all the constructors. This means that `cata` expects to be passed the value of type `r` which "plays the role of `O`" there, and then the value of type `r -> r` which plays the role of `S`.

More informally, `cata` is telling us that, if we want to map a natural in `r`, we only have to state what is the "zero" inside `r` and how to take the "successor" in `r`, and then every `Nat` can be mapped consequently.

For lists we get:

    cata :: (Either () (a, r) -> r) -> List a -> r
    -- isomorphic to
    cata :: (() -> r, (a, r) -> r) -> List a -> r
    -- isomorphic to
    cata :: (r, a -> r -> r) -> List a -> r
    -- isomorphic to
    cata :: r -> (a -> r -> r) -> List a -> r

which is `foldr`.

Again, this is `cata` telling us that, if we state how to take the "empty list" in `r` and to "cons" `a` and `r` inside `r`, we can map any list in `r`.

`Maybe a` is the same:

    cata :: (Either () a -> r) -> Maybe a -> r
    -- isomorphic to
    cata :: (() -> r, a -> r) -> Maybe a -> r
    -- isomorphic to
    cata :: (r, a -> r) -> Maybe a -> r
    -- isomorphic to
    cata :: r -> (a -> r) -> Maybe a -> r

If we can map `Nothing` in `r`, and perform `Just` mapping `a` in `r`, the we can map any `Maybe a` in `r`.

If we try to apply the same approach to `Bool` and `(a,b)` we reach the functions which were posted in the question.

More advanced theoretical topics to look up:

- (initial) F-algebras in category theory
- eliminators / recursors / induction principles in type theory (these can be applied to GADTs as well)

----
<sup>1</sup>  for the intuition about this, think *"how many elements (values) there are in the type"*, e.g. for `Bool -> Either () (Either () ())`, also written as `2 -> 3` in shorthand -- enumerated as a full table, each such function takes one of three values for each of its two possible inputs; thus there are <i>3*3 = 3^2 = 9</i> such functions possible: `9 = length [(x,y) | x<-[1..3], y<-[1..3]]` where each `(x,y)` encodes a function `f_x_y False = x; f_x_y True = y`.
	my edit for
	https://stackoverflow.com/questions/45225142/the-pattern-with-functions-like-bool-either-etc/45238164#45238164
	by https://stackoverflow.com/users/3234959/chi


	First, the types you mention are not really GADTs, they are plain ADTs, since the return type of each constructor is always `T a`. A proper GADT would be something like

	data T a where
	K1 :: T Bool -- not T a

	Anyway, the technique you mention is a well known method to encode algebraic data types into (polymorphic) functions. It goes under many names, like Church encoding, Boehm-Berarducci encoding, encoding as a catamorphism, etc. Sometimes the Yoneda lemma is used to justify this approach, but there's no need to understand the category-theoretic machinery to understand the method.

	Basically, the idea is the following. All the ADTs can be generated by

	- product types `(,) a b`
	- sum types `Either a b`
	- arrow types `a -> b`
	- unit type `()`
	- void type `Void` (rarely used in Haskell, but theoretically nice)
	- variables (if the type being defined has parameters)
	- possibly, basic types (`Integer`, ...)
	- type level recursion

	Type level recursion is used when some value constructor takes the type which is being defined as an argument. The classic example is Peano-style naturals:

	data Nat where
	O :: Nat
	S :: Nat -> Nat
	-- ^^^ recursive!

	Lists are also common:

	data List a where
	Nil :: List a
	Cons :: a -> List a -> List a
	-- ^^^^^^ recursive!

	Types like `Maybe a`, pairs, etc. are non recursive.

	Note that each ADT, recursive or not, can be reduced to a single constructor with a single argument, by summing (`Either`) over all the constructors, while multiplying (`(,)`) all the arguments of each constructor. For instance, `Nat` is isomorphic to

	data Nat1 where
	O1 :: () -> Nat
	S1 :: Nat -> Nat

	which is isomorphic to

	data Nat2 where K2 :: (Either () Nat) -> Nat

	Lists become

	data List1 a where K1 :: (Either () (a, List a)) -> List a

	The steps above make use of the algebra of types, which makes the sum and products of types obey the same rules as high school algebra, while `a -> b` behaves like the exponential `b^a`<sup>1</sup>.

	Hence, we can write any ADT as a pair of `T` and `F`, in the form

	-- pseudo code
	data T where
	K :: F T -> T
	type F r = .....

	For instance

	type F_Nat r = Either () r -- for T = Nat,
	-- K_Nat :: F_Nat Nat -> Nat
	type F_List_a r = Either () (a, r) -- for T = List a,
	-- K_List_a :: F_List_a (List a) -> List a

	(Note that the latter type function `F_List_a` depends on `a`, but it's not important right now.)

	Non recursive types will not use `r`:

	type F_Maybe_a r = Either () a -- for T = Maybe a,
	-- K_Maybe_a :: F_Maybe_a (Maybe a) -> Maybe a

	Note that constructor `K` above makes the type `T` isomorphic to `F T` (let's ignore the lifting / extra bottom introduced by it). Essentially, we have that

	Nat ~= F_Nat Nat = Either () Nat
	List a ~= F_List_a (List a) = Either () (a, List a)
	Maybe a ~= F_Maybe_a (Maybe a) = Either () a

	We can even formalize this further by abstracting from `F`s,

	newtype Fix f = Fix { unFix :: f (Fix f) }

	By definition `Fix F` will now be isomorphic to `F (Fix F)`. We could let

	type Nat = Fix F_Nat -- ~ F_Nat (Fix F_Nat) = F_Nat Nat
	type (List a) = Fix F_List_a -- ~ F_List_a (Fix F_List_a) = F_List_a (List a)
	type (Maybe a) = Fix F_Maybe_a -- ~ F_Maybe_a (Fix F_Maybe_a) = F_Maybe_a (Maybe a)

	(In Haskell, we need newtype wrappers around them, which I omit for clarity.)

	Finally, the general encoding, or catamorphism, is:

	cata :: (F r -> r) -> Fix F -> r

	This assumes that `F` is a functor.

	For `Nat ~= Fix F_Nat`, we get

	cata :: (Either () r -> r) -> Nat -> r
	-- isomorphic to
	cata :: (() -> r, r -> r) -> Nat -> r
	-- isomorphic to
	cata :: (r, r -> r) -> Nat -> r
	-- isomorphic to
	cata :: r -> (r -> r) -> Nat -> r

	Note the "high school albegra" steps, where `r^(1+r) = r^1 * r^r`, hence `Either () r -> r ~= (() -> r, r -> r)`.

	Note that we get two arguments, `r` and `r -> r`, which correspond to `O` and `S`. This is not a coincidence -- we summed over all the constructors. This means that `cata` expects to be passed the value of type `r` which "plays the role of `O`" there, and then the value of type `r -> r` which plays the role of `S`.

	More informally, `cata` is telling us that, if we want to map a natural in `r`, we only have to state what is the "zero" inside `r` and how to take the "successor" in `r`, and then every `Nat` can be mapped consequently.

	For lists we get:

	cata :: (Either () (a, r) -> r) -> List a -> r
	-- isomorphic to
	cata :: (() -> r, (a, r) -> r) -> List a -> r
	-- isomorphic to
	cata :: (r, a -> r -> r) -> List a -> r
	-- isomorphic to
	cata :: r -> (a -> r -> r) -> List a -> r

	which is `foldr`.

	Again, this is `cata` telling us that, if we state how to take the "empty list" in `r` and to "cons" `a` and `r` inside `r`, we can map any list in `r`.

	`Maybe a` is the same:

	cata :: (Either () a -> r) -> Maybe a -> r
	-- isomorphic to
	cata :: (() -> r, a -> r) -> Maybe a -> r
	-- isomorphic to
	cata :: (r, a -> r) -> Maybe a -> r
	-- isomorphic to
	cata :: r -> (a -> r) -> Maybe a -> r

	If we can map `Nothing` in `r`, and perform `Just` mapping `a` in `r`, the we can map any `Maybe a` in `r`.

	If we try to apply the same approach to `Bool` and `(a,b)` we reach the functions which were posted in the question.

	More advanced theoretical topics to look up:

	- (initial) F-algebras in category theory
	- eliminators / recursors / induction principles in type theory (these can be applied to GADTs as well)

	----
	<sup>1</sup> for the intuition about this, think "how many elements (values) there are in the type", e.g. for `Bool -> Either () (Either () ())`, also written as `2 -> 3` in shorthand -- enumerated as a full table, each such function takes one of three values for each of its two possible inputs; thus there are <i>3*3 = 3^2 = 9</i> such functions possible: `9 = length [(x,y) \| x<-[1..3], y<-[1..3]]` where each `(x,y)` encodes a function `f_x_y False = x; f_x_y True = y`.