NorfairKing/Thoughts

## Thoughts
I've been thinking about where one would get started to find problems in code.

Ideally I'd come up with some mathematical properties to spot that would be
likely to indicate that bugs are more common.

Part 1: Possible implementations

One idea I've had stems from algebra. It starts with the concept of cardinality.
Out of all possible implementations of a solution to a certain problem, there
are only few of them that correctly solve the problem.

We already know about cardinality of types, so let me illustrate my point with
those.

If `A` and `B` are types of size `a` and `b`, then `Either A B` has $a + b$
elements (sum type). `(A,B)` has $a * b$ elements (product type). Most
significantly: `A -> B` has $b ^ a$ elements.
This is, in my estimation, the root cause of bad software: there are
exponentially many ways to write the wrong code with respect to the size of the
types/problem you are trying to solve.

If we could somehow eliminate all the wrong implementations from having a
possibilty of existing, then we could increase the likelihood that the code we
write is correct.
So how do we constrain this number of possible implementations such that only
correct implementations remain?

The first step would be using correct (small) types. Haskellers love this
approach. If you don't allow `null` in the input type of a function, and
enforce that constraint with a compiler, then you automatically rule out
$(b - 1)$ times the number of remaining implementations.
($b ^ (a + 1) - b ^ a = (b-1) b ^ a)
To give you an example, the number of possible implementations of a function of
type `Ordering -> Ordering` is $27$ whereas the number of possible
implementations of a function of type `Maybe Ordering -> Ordering` is `81`.
If you don't allow `null` in the output type, and enforce that with a
compiler, then you automatically rule out $(b + 1) ^ a - b ^ a$ possible
implementations.
To go back to the previous example, that's $64$ possible implementations for a
function of type `Ordering -> Maybe Ordering`.
Remember: for large types this difference is much much bigger.

The next step would be to use parametric types.
The number of possible implementations of `Maybe Bool -> Maybe Bool` is $27$.
This is already a very large number considering that `Bool` is a very small
type.
The number of possible implementations of `Maybe a -> Maybe a` is only $2$!
You can either have `id :: Maybe a -> Maybe a` or `const Nothing :: Maybe a ->
Maybe a`.
This is because the implementation of a function of this type has to be able to
deal with the fact that it cannot inspect the value of the type that `a` will
be at a use-site. `Nothing` can never be mapped to anything other than
`Nothing`, because to create a `Just` value, you need a value of type `a`.
`Just a` can never inspect the value `a` to see what to do next. It must either
ignore the value and map to `Nothing`, or it must map to `Just a`.

If you look at the previous two paragraphs, you can also immediately see why
Python programs tend to be broken. There are about two total 'function's in
Python: The function that does nothing, and the identity function.
All the others will crash for some input.

When you make a function as parametric as possible, you could still be left with
an infinite number of possible implementations.
Take for example the reverse function. It has type `[a] -> [a]`.
Because `[a]` could inhabit a infinite number of values (say when `a ~ ()`),
there could be an infinite number of implementations of this function.
(You may think that there would be an uncountably infinite number of
implementations of this function, but the total number of possible programs is
already countable. Not that this really matters.)

So we need to find ways to constrain the number of possible implementations
further. This is where we turn to tests.
If we write a test to specify that `reverse []` must evaluate to `[]`, then we
can use this test to enforce that the implementation that we end up with will
satisfy this limitation. `NonEmpty a` is not any smaller than `[a]`, so this
test will not constrain the number of implementations.
HOWEVER, not every implementation is equally likely to be written.
A 5000 line implementation is less likely to be written than a 2 line
implementation. An implementation that deals with `[]` incorrectly is quite
likely, so the `reverse []` test rules out many wrong implementations that are
likely to be written.
The safety model of a unit test usually assumes that the programmer will not
make certain obscure mistakes. For example, when testing the `reverse` function
, unit tests would likely not test specifically for `Deadlock` exceptions to
be thrown. We assume that programmers may get the `reverse` function wrong, but
not that they will get it THAT wrong.
This is why unit tests are useful: to rule out many (wrong) implementations
that are not unlikely to be written.

We can go even further:
With property tests, we can constrain the possible implementations to the
implementations that satisfy certain properties. There is no hard rule for how
many (wrong) implementations you can rule with a property, but properties tend
to work very well for this purpose anyway, because they relate to what the
programmer means by `correct`.
Back to the `reverse` example: If we add a property test that specifies that
`reverse (reverse xs)` must equal `xs` for _any_ generated `xs`, we don't
constrain the number of possible implementations per-se.
However, we greatly constrain the number of possible likely implementations.
Indeed, the most likely implementation that we haven't ruled out is `id :: [a]
-> [a]`, but the next is the following:

```
reverse :: [a] -> [a]
reverse xs
  | odd (length xs) = xs
  | otherwise = case xs of
      [] -> []
      [_] -> xs -- Never called
      (x:y:xs') -> y:x:reverse xs'
```

This implementation is significantly larger than the correct implementation.
The only implementation we have left to rule out is `id :: [a] -> [a]`.
Then we have ruled out all wrong implementations that are more likely to be
written than the most likely correct implementation.
In the case of `reverse`, we can do this with either a single unit test or a
comprehensive property test.
If we add `reverse [1,4,9] == [9,4,1]`, it becomes _very_ hard to write an
implementation that passes these three tests but is still wrong.
We could also add the following property:

```
reverse (x:xs) = reverse xs ++ [x]
```

This property is very significant because it "pins down" the implementations
that satisfy it. By itself, it already constrains the possible implementations
to only the correct implementation.
It would be great if we could have such a property test for every function, but
writing this property test requires very deep understanding of what `reverse`
means so I would prefer not to assume that we can always do that.
This is why I mention this property test last.

The strategy of constraining possible implementations also leads to the
discovery of a natural stopping point for QA: when we cannot think of any
alternative implementations that pass all tests but are still wrong.


Part 2: Common mistakes.

In the previous section, I completely ignored `undefined :: a` or `unsafeCoerce
:: a -> b`.
However, these kinds of functions raise a valid concern: given that partial
functions exist, do we need to do something different?

First: by evaluating a function with _any_ input, we rule out the `undefined`
implementation.
By evaluating a function with _many random inputs_ (and evaluting the result),
we can (probabilistically) rule out many partial implementations.

Not every partial code path is equally likely to be taken or to exist.
However, we can use a bit of algebra again to predict likely occurrences of
partiality.

Let's look at a function of type `A -> B` again. Traditional modern maths has
defined this to mean that for every `a :: A`, there exists a `b :: B`, but in
computer science this is nonsense of course. something something halting
problem. But how do we find these `a :: A` values for which there is no such `b
:: B` but a crash instead?


*This is where I need your help, I have found only one obvious way, but I would
be very interested in finding more.*


Consider a function `f` of type `A -> B` such that we expect there to be an
function `g` of type `B -> A` such that `g (f x)` equals `x` for every `x :: A`.
For this to be possible at all, mathematics says that `f` and `g` must be
bijective. However, in computer science `f` only has to be injective.
Now, we know many examples of such `f` and `g` where `f` is injective and `g` is
partial: `parseJSON` and `toJSON` form a nice example.

My claim is that whenever you have a pair of functions `f :: A -> B` and
`g :: B -> A` where `f` is injective, it is certainly interesting to investigate
`g` further.
(You can easily see that `f` is injective if we expect a `g` to exist and `B`
is larger than (or equal to) `A`.)
This is because `g` *must now necessarily be partial*.
Deserialisation functions should solve this problem by evaluating to values of
type `Parser A` instead of `A` so that they can naturally handle failure.
However, we typically find that programmers make assumptions about what the
values of type `B` look like when these values are being parsed and these
assumptions cause typical mistakes.
	I've been thinking about where one would get started to find problems in code.

	Ideally I'd come up with some mathematical properties to spot that would be
	likely to indicate that bugs are more common.

	Part 1: Possible implementations

	One idea I've had stems from algebra. It starts with the concept of cardinality.
	Out of all possible implementations of a solution to a certain problem, there
	are only few of them that correctly solve the problem.

	We already know about cardinality of types, so let me illustrate my point with
	those.

	If `A` and `B` are types of size `a` and `b`, then `Either A B` has $a + b$
	elements (sum type). `(A,B)` has $a * b$ elements (product type). Most
	significantly: `A -> B` has $b ^ a$ elements.
	This is, in my estimation, the root cause of bad software: there are
	exponentially many ways to write the wrong code with respect to the size of the
	types/problem you are trying to solve.

	If we could somehow eliminate all the wrong implementations from having a
	possibilty of existing, then we could increase the likelihood that the code we
	write is correct.
	So how do we constrain this number of possible implementations such that only
	correct implementations remain?

	The first step would be using correct (small) types. Haskellers love this
	approach. If you don't allow `null` in the input type of a function, and
	enforce that constraint with a compiler, then you automatically rule out
	$(b - 1)$ times the number of remaining implementations.
	($b ^ (a + 1) - b ^ a = (b-1) b ^ a)
	To give you an example, the number of possible implementations of a function of
	type `Ordering -> Ordering` is $27$ whereas the number of possible
	implementations of a function of type `Maybe Ordering -> Ordering` is `81`.
	If you don't allow `null` in the output type, and enforce that with a
	compiler, then you automatically rule out $(b + 1) ^ a - b ^ a$ possible
	implementations.
	To go back to the previous example, that's $64$ possible implementations for a
	function of type `Ordering -> Maybe Ordering`.
	Remember: for large types this difference is much much bigger.

	The next step would be to use parametric types.
	The number of possible implementations of `Maybe Bool -> Maybe Bool` is $27$.
	This is already a very large number considering that `Bool` is a very small
	type.
	The number of possible implementations of `Maybe a -> Maybe a` is only $2$!
	You can either have `id :: Maybe a -> Maybe a` or `const Nothing :: Maybe a ->
	Maybe a`.
	This is because the implementation of a function of this type has to be able to
	deal with the fact that it cannot inspect the value of the type that `a` will
	be at a use-site. `Nothing` can never be mapped to anything other than
	`Nothing`, because to create a `Just` value, you need a value of type `a`.
	`Just a` can never inspect the value `a` to see what to do next. It must either
	ignore the value and map to `Nothing`, or it must map to `Just a`.

	If you look at the previous two paragraphs, you can also immediately see why
	Python programs tend to be broken. There are about two total 'function's in
	Python: The function that does nothing, and the identity function.
	All the others will crash for some input.

	When you make a function as parametric as possible, you could still be left with
	an infinite number of possible implementations.
	Take for example the reverse function. It has type `[a] -> [a]`.
	Because `[a]` could inhabit a infinite number of values (say when `a ~ ()`),
	there could be an infinite number of implementations of this function.
	(You may think that there would be an uncountably infinite number of
	implementations of this function, but the total number of possible programs is
	already countable. Not that this really matters.)

	So we need to find ways to constrain the number of possible implementations
	further. This is where we turn to tests.
	If we write a test to specify that `reverse []` must evaluate to `[]`, then we
	can use this test to enforce that the implementation that we end up with will
	satisfy this limitation. `NonEmpty a` is not any smaller than `[a]`, so this
	test will not constrain the number of implementations.
	HOWEVER, not every implementation is equally likely to be written.
	A 5000 line implementation is less likely to be written than a 2 line
	implementation. An implementation that deals with `[]` incorrectly is quite
	likely, so the `reverse []` test rules out many wrong implementations that are
	likely to be written.
	The safety model of a unit test usually assumes that the programmer will not
	make certain obscure mistakes. For example, when testing the `reverse` function
	, unit tests would likely not test specifically for `Deadlock` exceptions to
	be thrown. We assume that programmers may get the `reverse` function wrong, but
	not that they will get it THAT wrong.
	This is why unit tests are useful: to rule out many (wrong) implementations
	that are not unlikely to be written.

	We can go even further:
	With property tests, we can constrain the possible implementations to the
	implementations that satisfy certain properties. There is no hard rule for how
	many (wrong) implementations you can rule with a property, but properties tend
	to work very well for this purpose anyway, because they relate to what the
	programmer means by `correct`.
	Back to the `reverse` example: If we add a property test that specifies that
	`reverse (reverse xs)` must equal `xs` for _any_ generated `xs`, we don't
	constrain the number of possible implementations per-se.
	However, we greatly constrain the number of possible likely implementations.
	Indeed, the most likely implementation that we haven't ruled out is `id :: [a]
	-> [a]`, but the next is the following:

	```
	reverse :: [a] -> [a]
	reverse xs
	\| odd (length xs) = xs
	\| otherwise = case xs of
	[] -> []
	[_] -> xs -- Never called
	(x:y:xs') -> y:x:reverse xs'
	```

	This implementation is significantly larger than the correct implementation.
	The only implementation we have left to rule out is `id :: [a] -> [a]`.
	Then we have ruled out all wrong implementations that are more likely to be
	written than the most likely correct implementation.
	In the case of `reverse`, we can do this with either a single unit test or a
	comprehensive property test.
	If we add `reverse [1,4,9] == [9,4,1]`, it becomes _very_ hard to write an
	implementation that passes these three tests but is still wrong.
	We could also add the following property:

	```
	reverse (x:xs) = reverse xs ++ [x]
	```

	This property is very significant because it "pins down" the implementations
	that satisfy it. By itself, it already constrains the possible implementations
	to only the correct implementation.
	It would be great if we could have such a property test for every function, but
	writing this property test requires very deep understanding of what `reverse`
	means so I would prefer not to assume that we can always do that.
	This is why I mention this property test last.

	The strategy of constraining possible implementations also leads to the
	discovery of a natural stopping point for QA: when we cannot think of any
	alternative implementations that pass all tests but are still wrong.


	Part 2: Common mistakes.

	In the previous section, I completely ignored `undefined :: a` or `unsafeCoerce
	:: a -> b`.
	However, these kinds of functions raise a valid concern: given that partial
	functions exist, do we need to do something different?

	First: by evaluating a function with _any_ input, we rule out the `undefined`
	implementation.
	By evaluating a function with _many random inputs_ (and evaluting the result),
	we can (probabilistically) rule out many partial implementations.

	Not every partial code path is equally likely to be taken or to exist.
	However, we can use a bit of algebra again to predict likely occurrences of
	partiality.

	Let's look at a function of type `A -> B` again. Traditional modern maths has
	defined this to mean that for every `a :: A`, there exists a `b :: B`, but in
	computer science this is nonsense of course. something something halting
	problem. But how do we find these `a :: A` values for which there is no such `b
	:: B` but a crash instead?


	*This is where I need your help, I have found only one obvious way, but I would
	be very interested in finding more.*


	Consider a function `f` of type `A -> B` such that we expect there to be an
	function `g` of type `B -> A` such that `g (f x)` equals `x` for every `x :: A`.
	For this to be possible at all, mathematics says that `f` and `g` must be
	bijective. However, in computer science `f` only has to be injective.
	Now, we know many examples of such `f` and `g` where `f` is injective and `g` is
	partial: `parseJSON` and `toJSON` form a nice example.

	My claim is that whenever you have a pair of functions `f :: A -> B` and
	`g :: B -> A` where `f` is injective, it is certainly interesting to investigate
	`g` further.
	(You can easily see that `f` is injective if we expect a `g` to exist and `B`
	is larger than (or equal to) `A`.)
	This is because `g` must now necessarily be partial.
	Deserialisation functions should solve this problem by evaluating to values of
	type `Parser A` instead of `A` so that they can naturally handle failure.
	However, we typically find that programmers make assumptions about what the
	values of type `B` look like when these values are being parsed and these
	assumptions cause typical mistakes.