LittleHelicase/datastructures_in_dataflow.md

## datastructures_in_dataflow.md

      
    Raw
  

              datastructures_in_dataflow.md
            
          
    Defining data structures in data-flow diagrams

We have seen, how we can define functions in data flow diagrams, but can we go further and define complex data structures using only the essential language elements of data-flow diagrams. Well as it turns out: yes, you can define algebraic data types in a day to day data-flow diagram and I want to show you how it is done.
But first a note on algebraic data types (ADT). If you don't know what those are, don't feel bad, they are rather simple. I will first show you how we define ADTs and what their benefits are, before I will introduce them into data-flow diagrams. If this seems rather odd, why would anybody want to do this, I can tell you, that it taught me a lot about the structure of data and how they work hand in hand with functions. I hope to also discuss how to create a powerful meta-language and the requirements to create those are so fundamental, that they apply to every language even our data-flow language.
So let's start with a simple definition of an ADT.
data Bool = True | False
This is how you can create an ADT Bool in Haskell. The only keyword here is data the rest consist of valid identifiers for the name (here Bool) and the two so called constructors (here True and False). I will explain shortly, why they are called constructors. One important syntactical element is the | that tells Haskell, that the type consists of two disjoint parts. You can easily extend an ADT with more options like data BoolNoLEM = True | False | Undecided where we add a third option to the "bool" that does not require the Law of excluded middle (which roughly states: everything must either be true or false). Let's have a look at another common ADT:
data MaybeInt = Nothing | Just Int
You might have seen this one (probably in a more general definition, but this will suffice for now). The only structural difference to the Bool type is, that the second case Just has a type after the constructor name. One important thing to note is, that the first name for each | case is special. It is the name of the resulting constructor, followed by zero or more data arguments. Let's look at how to create those data types.
let b1 = True -- define a Bool using the True constructor
let b2 = False -- define a Bool using the False constructor
let m1 = Nothing -- define a MaybeInt using the Nothing constructor
let m2 = Just 4 -- define a MaybeInt using the Just constructor
You might wonder that Just 4 is written without any parenthesis. Well Haskell is especially friendly to parenthesophobic people, if you don't like parenthesis, you might like Haskell. One neat thing that happens in Haskell is, that without parenthesis a nullary function is the same as a constant. I'm pointing that out, because our ADTs create functions for each case (remember cases are created by |). So True is a nullary and in other languages you would probably write True() to create an instance of it. And to create m2 in some languages you would have to write Just(4). And as those create elements of the defined type, they are regarded as constructors.
Okay, so one important thing about ADTs is, that they define functions that create them, the so called constructors. So besides the name, they only define functions. And we can define functions in our data-flow diagrams. So let's start with that.
The first thing to stumble over is, how would we ever create a kind of or or a disjoint union. Well we can create pairs of data simply with two parallel wires ↓ ↓ (going down). But we want to represent data that is either one thing or the other without knowing which one it will be. If we have a Bool as input we want to have a function that can handle a True and  a False value.
As you probably know from earlier, I forbade to use branches to prevent unwillingly creating implicit state. But how would we process a True or False value without a branch? Let's start with a very basic example, the function oneOrZero that returns 1 for True and 0 for False. It would somehow have a diagram that looks like this:
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾↓‾‾‾\
|oneOrZero      |    |
|               |    |
|      1    0   |    |
|        \   \  |    |
|          \  \ |    |
|         ( magic )  |
|             |      |
\_____________↓______/

And we want to know, how to do the magic part. It must look like this, as we need to create the values 0 or 1 somewhere. And even if we do not need both, we can always discontinue their wire. So let's do this for the True case.
            /‾‾‾‾‾‾‾\
            | True  |
            \_______/
                |
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾↓‾‾‾\
|oneOrZero      |    |
|               |    |
|      1    0   |    |
|        \   \  |    |
|          \  \ |    |
|         ( magic )  |
|             |      |
\_____________↓______/

/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|oneOrZero* /‾‾‾‾‾‾‾\|
|           | True  ||
|           \_______/|
|               |    |
|      1    0   |    |
|        \   \  |    |
|          \  \ |    |
|         ( magic )  |
|             |      |
\_____________↓______/

/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|oneOrZero*          |
|      1    0        |
|        \   \       |
|          \  #      |
|            \       |
|             |      |
\_____________↓______/

The False constructor must somehow disconnect the other wire. So if you think about it, the disconnect must happen inside the True or False constructor (because that's how they are different). Okay, we have two wires that could potentially be disconnected inside the True and False constructors, and we need the not disconnected wire as our result. Let's give it a try.
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
| True              |
|       /‾‾‾‾‾‾‾\   |
|       | /‾‾‾\  |  |
|       | |   |  #  |
\_______↑_↑___↓_____/

here wir simply disconnect the first wire. Our false would look like this:
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
| False             |
|       /‾‾‾‾‾‾‾\   |
|       | /‾‾‾\  |  |
|       | |   #  |  |
\_______↑_↑______↓__/

It simply disconnects the other wire. Neat. So True and False both produce three "output" wires (two upwards, one downwards). Let's try plugging that into our oneOrZero* example from above:
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|oneOrZero* /‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\ |
|           | True              | |
|           |       /‾‾‾‾‾‾‾\   | |
|           |       | /‾‾‾\  |  | |
|           |       | |   |  #  | |
|           \_______↑_↑___↓_____/ |
|                   | |  /        |
|                   \ | /         |
|                    ↑↑↓          |
|      1    0        |||          |
|        \   \______/ /|          |
|          \_________/ |          |
|                      |          |
\______________________↓__________/

To make it obvious, remove the border, collapse the disconnected wire and remove the wiggle.
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|oneOrZero*                       |
|                                 |
|                   /‾‾‾‾‾‾‾\     |
|                   | /‾‾‾\  |    |
|                   | |   |  #    |
|                   ↑ ↑   ↓       |
|                   | |  /        |
|                   \ | /         |
|                    ↑↑↓          |
|      1    0        |||          |
|        \   \______/ /|          |
|          \_________/ |          |
|                      |          |
\______________________↓__________/

/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|oneOrZero*                       |
|                                 |
|                                 |
|                   #             |
|                   |             |
|                   ↑ /‾‾‾\       |
|                   | |  /        |
|                   \ | /         |
|                    ↑||          |
|      1    0        |↑↓          |
|        \   \______/ /|          |
|          \_________/ |          |
|                      |          |
\______________________↓__________/

/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|oneOrZero*            |
|                      |
|      1    0   
|        \   \         |
|          \  #        |
|           |          |
\___________↓__________/

That looks exactly, like our magic from above. You can check for yourself, that False indeed selects the value 0 and disconnects 1.
Now we are interested in what happens, if we have more than two cases. For that we have a look at our BoolNoLEM type. If we simply extend our oneOrZero example to a twoOneOrZero example, we notice, that we need three values and only one gets through. In total that would give us four wires. I leave it as an exercise to draw the data-flow diagram for the three constructors of BoolNoLEM (you probably can already picture the solution to this one. If not give it a try). You will notice, that we get one upwards wire for every case in the data type.
Now, before we get to the MaybeInt example lets look at an ADT where we have only one case (or no | in Haskell notation) and where we store some value in it. Let's call it Value1 and in Haskell you would define it as data Value1 = Value1 Int. So we simply store one integer value in it (btw. it is not problematic in Haskell to give the constructor the same name as your data type). Now, how can we access the value inside this data type? We start by creating a rather simple example, simply getting the value out of this type. It should look like this:
                5
                | <- create the Value1 ADT with the value 5 stored in it.
            /‾‾‾↓‾‾‾‾\
            | Value1 |
            \________/
                |
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾↓‾‾‾\
|extractVal     |    |
|               |    |
|     ( ? )     |    |
|        \      |    |
|          \    |    |
|         ( magic )  |
|             |      |
\_____________↓______/
              | <- we want to get a 5 here.

Here it is obvious, what the effect of the magic operation paired with the ? should be. Let's make it trivial again.
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|extractVal*    5      |
|               |      |
|           /‾‾‾↓‾‾‾‾\ |
|           | Value1 | |
|           \________/ |
|               |      |
|     ( ? )     |      |
|        \      |      |
|          \    |      |
|         ( magic )    |
|             |        |
\_____________↓________/

Boom!
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|extractVal*    5      |
|               |      |
|               |      |
\_______________↓______/

Well, I must admit, the last step was rather, abrupt. The combination of the Value1 constructor, the ? and the magic must simply cancle out. Notice how I didn't disconnect the ? like in the other examples. I want to make it is obvious that this would not be the right choice. Consider a function constant0 that takes a value of type Value1 and always outputs 0. I want to encode that in ?, which must look like this:
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾↓‾‾‾‾‾‾\
|constant0      |Value1|
|       0       |      |
|        \      |      |
|          \    |      |
|         ( magic )    |
|             |        |
\_____________↓________/

And magic!
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾↓‾‾‾‾‾‾\
|constant0      |      |
|       0       |      |
|        \      |      |
|          \    #      |
|            \         |
|             |        |
\_____________↓________/

Here we ignore the value of type Value1 and simply pass 0 to the output. Now let me transform that graph a little bit.
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾↓‾‾‾‾‾‾\
|constant0      |      |
|       0       |      |
|        \    /‾|‾\    |
|          \_/  #  |   |
|                 /    |
|               /      |
\______________↓_______/

                |
              /‾|‾\
/‾‾‾‾‾‾‾‾‾‾‾‾‾↑‾↓‾‾↓‾‾‾\
|constant0    | |  |   |
|       0     | |  |   |
|        \    | |  |   |
|          \_/  #  |   |
|                 /    |
|               /      |
\______________↓_______/

Now I let you guess, what could be on the disconnected wire there. A hint, we didn't use the value packed into our value of type Value1.
Okay that was a rather specific hint. And yes, we have a loop up there in our constructor and we get the stored value out of it. Our Value1 constructor might look like this:
                |
      /‾‾‾‾‾‾‾‾‾↓‾‾‾‾\
      |Value1 /‾|‾\  |
      \_______↑_↓_↓__/
              \ | /
               |||

Neat and with that, we can solve our extractVal problem.
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|extractVal*    5      |
|               |      |
|     /‾‾‾‾‾‾‾‾‾↓‾‾‾‾\ |
|     |Value1 /‾|‾\  | |
|     |       \ | /  | |
|     \________↑↓↓___/ |
|              |||     |
|        /‾\   ||\     |
|        \  \_//  |    |
|          \__/  /     |
|              /       |
|             |        |
\_____________↓________/

Remove the boundaries and let's untie it.
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|extractVal*    5      |
|               |      |
|               ↓      |
|             /‾|‾\    |
|             \ | /    |
|              ↑↓↓     |
|              |||     |
|        /‾\   ||\     |
|        \  \_//  |    |
|          \__/  /     |
|              /       |
|             |        |
\_____________↓________/

/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|extractVal*           |
|                      |
|                      |
|             /‾‾‾\    |
|             \   /    |
|              ↑ ↓     |
|              | |     |
|        /‾\   | \     |
|        \  \_/   |    |
|          \__5  /     |
|              /       |
|             |        |
\_____________↓________/

/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
|extractVal*           |
|             5        |
|             |        |
|             |        |
\_____________↓________/

Exactly, what we wanted, perfect! Btw. it is no coincidence, that this looks like some knot-puzzle. In fact identifying knots in some rat's nest of cabling ist very similar to running a program. We will see, that there are some physical limitations that hinder us from building turing-complete knot-puzzles, but theoretically it all works out.
Now we want to look at data Value2 = Value2 Int Int (a simple pair of integers). The constructor simply looks like this:
                ||
      /‾‾‾‾‾‾‾‾‾↓↓‾‾‾‾\
      |Value2 /‾||‾\  |
      \_______↑_↓↓_↓__/
              \ || /
               ||||

And if we want to get the second value out of it, this would simply look like this:
                ||
      /‾‾‾‾‾‾‾‾‾↓↓‾‾‾‾\
      |Value2 /‾||‾\  |
      \_______↑_↓↓_↓__/
              \ || /
               ||||
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾↑↓↓↓‾‾‾‾\
|extractSnd    ||||    |
|              |#||    |
|              | ||    |
|        /‾\   |/ |    |
|        \  \_//  |    |
|          \__/  /     |
|              /       |
|             |        |
\_____________↓________/

But this time, I let you untie this mess (or follow the wires). The # operation usually would be in the place where the ?  was, but I'm rather certain, that you get the gist of it. It seems that we get one extra downward wire for every value that is stored inside the datatype.
Now it's time for our MaybeInt example. If we look back at the two rules we found in the simpler examples:

we get one upwards wire for every case
we get one extra downward wire for every value

So we would expect that for data MaybeInt = Nothing | Just Int would create two upwards going wires for the two cases, one downward going wire for the Int value stored in the Just case and the one downward going wire that each data type has (the result of any computation with this data type). So I would suggest the following constructors:
/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\    /‾‾‾‾‾‾‾‾‾‾↓‾‾‾‾‾\
|Nothing         |    |Just      |     |
|       /‾‾‾\    |    |      # /‾|‾\   |
|      | # # |   |    |      | | | |   |
\______↑_↑_↓_↓___/    \______↑_↑_↓_↓___/

The Nothing is very similar to the False case, but it has the extra wire that otherwise would transport the value stored in Just. I will come to why in a second. The Just case looks much like Value1 but it also has the disconnected wire that is used in the Nothing case. That is the exact same setup like in the True and False constructors. But why do we need the extra wire in the Nothing case? Well if we write a function that has a MaybeInt value as its input, the signature of the input must fit both constructors. The Nothing constructor and the Just constructor both create a MaybeInt type and it must be possible to accept both, a Nothing value and a Just value. I will show you, how to use this in a rather simple example, where we return 0 in the Nothing case and the value stored in Just otherwise:
            |||| <- some MaybeInt value
/‾‾‾‾‾‾‾‾‾‾‾↑↑↓↓‾‾‾\
| /‾\  0   /// |   |
|  \ \  \_///  |   |
|    \ ‾‾‾ /   |   |
|     ‾‾‾‾‾    |   |
\______________↓___/
               | <- a "simple" value 

                                     5
                                     |
    /‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\    /‾‾‾‾‾‾‾‾‾‾↓‾‾‾‾‾\
    |Nothing         |    |Just      |     |
    |       /‾‾‾\    |    |      # /‾|‾\   |
    |      | # # |   |    |      | | | |   |
    \______↑_↑_↓_↓___/    \______↑_↑_↓_↓___/
            \|/ /                 \|/ /
            ||||                  |||| 
/‾‾‾‾‾‾‾‾‾‾‾↑↑↓↓‾‾‾\  /‾‾‾‾‾‾‾‾‾‾‾↑↑↓↓‾‾‾\
| /‾\  0   /// |   |  | /‾\  0   /// |   |
|  \ \  \_///  |   |  |  \ \  \_///  |   |
|    \ ‾‾‾ /   |   |  |    \ ‾‾‾ /   |   |
|     ‾‾‾‾‾    |   |  |     ‾‾‾‾‾    |   |
\______________↓___/  \______________↓___/

You can hopefully check yourself, that we get a 0 on the left and a 5 on the right (some mazing skills required).
Well there is one important thing left to do. We want to get rid of all the wires (or at least most of them), but we do not want to lose the generality. I will start with a short notation for our wires. We have two types of wires, upwards wires and downward wires. In a first step I want to introduce a short notation if we have multiple arrows. I will add a number after the arrowhead (right of it) that indicates the number of arrows pointing in the direction of the arrow and a number before the arrowhead (to the left) indicating the number of arrows pointing in the opposing direction.
           |||| ||||||           ||| ||||
  4↑6   =  ↓↓↓↓ ↑↑↑↑↑↑ ,  3↓4  = ↓↓↓ ↑↑↑↑
           |||| ||||||           ||| ||||

When combining the wires we automatically order the wires by direction. First all downwards wires then all upwards wires. If the number of wires is clear from the context, I will omit them. Sometimes it is not clear, or we do not care how many wires there are and we simply ignore the number of wires.
This sounds a bit weird, but it is a necessary part of our abstraction mechanism later on. For example, we always used numbers as a given, but if we want we could represent numbers by up and down wires. I did decide to not inform you about this and I will ignore it until later on (because we are not ready yet). Now we need some way of combining multiple wires and separating them again. I will use the operators ÷ to divide a cable into its cables and × to combine cables. So our example above could look like this.
    /‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
    |Nothing         |
    |       /‾‾‾\    |
    |      | # # |   |
    \______↑_↑_↓_↓___/
            \|/ /
            ||||
            ( × )
              |
             2↓2
              |
/‾‾‾‾‾‾‾‾‾‾‾‾‾|‾‾‾‾\
| box       ( ÷ )  |
|           ||||   |
| /‾\  0   /// |   |
|  \ \  \_///  |   |
|    \ ‾‾‾ /   |   |
|     ‾‾‾‾‾    |   |
\______________↓___/

Convinced... ?
Me neither. Our syntax is still not helping us. We know how to generate our ADTs and I honestly do not want to draw those loops all the time. So we also should hide that. Furthermore I would like to clarify the inner part of our box. How about the following?
        /‾‾‾‾‾‾‾‾‾‾\
        |Maybe/    |
        | Nothing  |
        |          |
        \____2↓2___/
              |
/‾‾‾‾‾‾‾‾‾‾‾‾‾|‾‾‾‾\
| box         |    |
|             |    |
|  /‾\   0    |    |
|  ↑ ↓   ↓   2↓2   |
| ([=J][=N] match) | <- pattern match Just via =J and Nothing via =N
|         ↓        |
\_________|________/
          |

Here we create the Nothing value indicating that it belongs to Maybe (this will be rather important later on). I indicated, that our Nothing constructor creates 2↓ and 2↑. Then we get to a match that defines the cases for our constructors. I drew the arrowheads for the cases and the output. On the right we get an wire with 2↓ and 2↑ (that is our value that we want to run our pattern matching against). Then we have =J and =N inside the match in brackets that indicate our cases Just and Nothing. Those make sure, that the correct wires are redirected accordingly. In the end we have one output wire ↓. Now if you count we have the 2↓ and 2↑ wire. We have 1↑ wire and 2↓ wires in our match-cases and 1↓ as our return value. If you add those we get 1↓ and 3↑. That somehow doesn't add up.  You should pause for a moment and look at the full-detail version of this if you don't see why this is.
Well the answer ist simple. The pattern matching cases are going up, whereas the return value goes down. We have to flip all pattern matching cases before adding those to the return value (because they literally change direction for those cases in the match box).
Well there is much more to this, but for many data structures we need the fixpoint combinator (Y-combinator) on functions and data types. After I introduce the Y-combinator on functions, I will show how to simply extend it to data types and there we will see the first beautiful result of this theory, so stay tuned ;). Another important step is removing the implicit definition of the data types like Maybe. We want to explicitly define those types in our language and work with those definitions. I will tell you, why we want this in one of the next posts.