Skip to content

Instantly share code, notes, and snippets.

@mbilokonsky
Last active November 13, 2018 22:49
Show Gist options
  • Save mbilokonsky/ebea6bac180f4c3b57eed5b1fc5e170e to your computer and use it in GitHub Desktop.
Save mbilokonsky/ebea6bac180f4c3b57eed5b1fc5e170e to your computer and use it in GitHub Desktop.
An introduction to Functions As Relations

We're Gonna Need a Bigger Table

 ...A thing or idea seems meaningful only when we have several different ways to represent it — different perspectives and different associations. ...Then we can turn it around in our minds, so to speak: however it seems at the moment, we can see it another way and we never come to a full stop. In other words, we can 'think' about it. If there were only one way to represent this thing or idea, we would not call this representation thinking. - Marvin Minsky

Today I'd like to show you a way to model functional programming concepts as operations on a single infinite table, reducing complex n-dimensional concepts into a more familiar two-dimensional space.

Remember that this is just a modeling exercise - I'm not necessarily making ontological claims about programming, I'm simply trying to explore a new vocabulary and model for reasoning about computation.

Defining Terms

A tuple is a collection of values organized into a data structure. For instance, ["myk", 34, 0x0000FF] is a tuple - it says "these three values are somehow related". The exact nature of that relationship is unknown without more context, but the important thing is that we can model any set of values in a fixed order as a tuple.

A relation is perhaps more commonly known as a table - it's a set of tuples where each tuple has the same structure, and that structure reflects some specific semantics. For instance, our example tuple above may be a row from a table that looks like this:

Name Age Favorite Color
"myk" 34 0x0000FF
"han" 29 0x00FF00

A function is defined as a relation between some set of inputs (defined as some subset of the columns in our table) and some set of outputs (defined as some other subset of the columns in our table). In the simplest case, we can have an identity function - its output is its input.

Input Output
1 1
"abc" "abc"

Each row in our "identity function" table is a single call to the function with a specific input argument. It returns a specific output argument. To flesh this out a bit, a slightly more complex function may look something like this -

input1 input2 output1
1 1 2
7 3 10

It looks like we may be looking at a 'Sum' function here, where input1 is the augend, input2 is the addend and output is the sum. We'll get into more details about this later, but I want to make sure that it makes sense that you can represent a function as a table, aka as a relation of tuples.

A data type can be understood as the set of all values that satisfy some constraint. It's really a predicate function - given some input X, is X in the set of acceptable values? Some common types are things like Integer or String or Boolean, and applications will often introduce new types like User or Session.

In our functions-as-tables metaphor, data types are used to constrain the size of our tables by restricting the number of valid values for each column. For instance, if we specify that the three columns in our sum table above all have the Integer data type then we won't have any rows where we're adding null to "Hello world!" - those are simply invalid rows, and logically cannot exist in our (still infinite) table.

Abstraction, in this model, has a very specific definition: it's when you add a new column to your table. For instance, we can create a more abstract version of the sum function by adding an operator column. The resulting math function looks like this:

input1 input2 operator output
1 1 + 2
1 1 - 0
1 2 + 3

Concretion is what we'll call the dual of Abstraction, in this model - rather than adding a column, we're removing a column. We do that by "partially applying" a value to the function - restricting the size of the function, much as we did with data types, such that one of our columns has a data type that has only a single value in it. So for instance, if we wanted to create a multiply function we could partially apply the * operator into the Math function to create a new function:

input1 input2 output
1 1 1
7 3 21

By now you should be thinking "Huh... that sure seems like sum, multiply and math are all just different views into the same abstract space..." and you'd be 100% correct! There are interesting consequences of this that we'll get to shortly!

Finally, a higher order function is a function where one or more of the columns has the data type function. In other words, every value for one of our columns becomes a whole table! As an admittedly forced example, consider this greetUser() function which takes as inputs a user's first name, a user's last name, and a function used to get the appropriate greeting for the current time of day.

First Name Last Name getGreeting() Output
Myk Bilokonsky
Current Time Greeting
5-12 "Good Morning"
12-6 "Good Afternoon"
6-9 "Good Evening"
9-5 "Good Night"
??!!

At first glance it seems like there's a problem here - if each row of one of our tables is an invocation of the represented function, what do we do if our row contains a nested table? How do we represent that idea? It seems like our output column can't have a value until we step into this nested space?

But, thanks to the magic of math - I think, can anyone give me some links to formal references to this kind of thing? - we can simply change how we represent the problem. A critical rule here is that the columns of a nested table can be pulled into the parent table. The function above can be rewritten as follows:

First Name Last Name Current Time Greeting Output
Myk Bilokonsky 5-12 "Good Morning" "Good Morning Myk Bilokonsky"
Myk Bilokonsky 12-6 "Good Afternoon" "Good Afternoon Myk Bilokonsky"
Myk Bilokonsky 6-9 "Good Evening" "Good Evening Myk Bilokonsky"
Myk Bilokonsky 9-5 "Good Night" "Good Night Myk Bilokonsky"

etc and so on. Now our specific invocation of the function maps cleanly to a row in our relation.

Inputs vs Arguments, Outputs vs Return Value - when we model functions in our code we generally think about our function's arguments as its inputs, and its return value as its output. But as anyone who has ever tried to shoe-horn formal logic into an impure (read: existing) codebase, our functions often live within the scope of some existing state and they often mutate that state as a side-effect when invoked.

This is fine - but it means that we need to think of things like "current time" as an input, even when it's not an argument. It's simply a property of the environment at execution time. Similarly, a function can, say, write some values to a database without actually returning anything. That change of database state must be included when we discuss the "output" of the function.

So What?

So far we've seen how functions can be modeled as relations, where each column can be constrained by a data type and where we can ultimately treat every function call as a lookup in an infinite table. add(1, 7) can be expressed as something like SELECT output FROM sum WHERE input1=1 AND input2=7, right?

(Interestingly, it means we can also do something like SELECT input1 FROM sum WHERE input2=7 AND output=8 - we get, at least conceptually, the full implications of using immutable relations.)

To summarize:

  • Functions can be modeled as relations.
  • Data Types can be used to restrict valid rows in our relations.
  • Abstraction and Concretion can be treated as formal processes whereby columns are added or removed from our relations.
  • Higher Order Functions work when you realize that they are simply a mechanism to express abstraction, adding columns to our table.
  • Inputs are not limited to arguments, outputs are not limited to return values. We can easily model side-effects here.

Subsequent essays on this topic will explore complex, abstract ideas from functional programming in this simplified but expressive space. We'll look at what functors and monads and applicatives are, we'll explore currying and some of the ramifications of abstraction, and we'll look at how once you go down this rabbit hole of modeling functions as relations you quickly run into the much larger rabbit hole of modeling applications as relations - and then, just maybe, we'll explore what it means to think of the universe itself as a single enormous function.

@dmoney
Copy link

dmoney commented Nov 3, 2017

I wrote some code to play around with this concept: https://gist.github.com/dmoney/104ab11ba6393499a6cb045bc4b2e52d

@leotrs
Copy link

leotrs commented Mar 29, 2018

RE "can anyone give me some links to formal references to this kind of thing?".
If you think of a table as a subset of the Cartesian product of the sets of possible values of each column (so that the very first table is a subset of {"myk", "han"} x {34, 29} x {0x0000FF, 0x00FF00}). So when you have a higher order function, the sixth table of this essay is a subset of {"Myk"} x {"Bilokonsky"} x ({5-12, 12-6, 6-9, 9-5} x {greeting1, greeting2, greeting3, greeting4}) and the seventh is a subset of {"Myk"} x {"Bilokonsky"} x {5-12, 12-6, 6-9, 9-5} x {greeting1, greeting2, greeting3, greeting4} (note the lack of parens). There is a natural bijection between these two last products, from which follows your comment on the equivalence of representing the relation as either a nested table or a "regular" table, where all rows in the nested table have been distributed.

The generalization is then, to say that because the Cartesian product is associative (you can place parens anywhere), you can group and nest tables any way you want.

@mbilokonsky
Copy link
Author

Finally turning this into a course for egghead.io, for anyone following along at home. This is gonna be fun! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment