mbilokonsky/conservation_of_information.md

## conservation_of_information.md

      
    Raw
  

              conservation_of_information.md
            
          
    I'm kind of making this up but I have this notion that there exists some sort of law of conservation of information in programming. It works like this:
[Total Information In the System] = [computations done by your code] + [work done by your runtime] + [external environment]
Depending on what you're trying to do, it can be helpful and instructive to think about how you shift information around between these three buckets. To illustrate this point I'd like to step through three different implementations of the same program.
Let's do everyone's favorite simple program, a fibonacci number generator. Given some input INDEX, it'll give you the INDEX fibonacci number. So fib(0) = 1, fib(1) = 1, fib(2) = 2, fib(3) = 3, fib(4) = 5, fib(5) = 8, etc.
Computation

In our first example, let's just write a computational function that'll figure this out for us. It'll look something like this lifted from the first google result:
function fibonacci(num, memo) {
  memo = memo || {};

  if (memo[num]) return memo[num];
  if (num <= 1) return 1;

  return memo[num] = fibonacci(num - 1, memo) + fibonacci(num - 2, memo);
}

When we run this program against a general-purpose interpreter we can derive the fibonacci number we need through computation. This is how most of us think about programming, right? This example is JavaScript but it could just as easily be Java or C or Erlang - we describe the computation we care about, and we allow the general-purpose runtime to execute it and return the value that we care about.
Crucially, the person writing this code has to know what a Fibonacci number is and how to derive one. The rules for generating those numbers need to be expressed in our code, correctly. Our runtime has no idea what a fibonacci number is, right? What would it look like if it did?
Runtime

A thing that doesn't get enough attention, I think, is our ability to create custom languages and runtimes that are not general purpose. Most of CS is so wrapped up in universal computability that we ignore the fact that specialized, non-generalizable computability is also possible and in fact useful.
Imagine a programming language called Fib. Its syntax is super simple - you're only allowed to have one token per line of code, and that token has to be an integer. When your source code is interpreted by the Fib runtime, the runtime will treat each line of input as an index, and it will generate the corresponding fibonacci number for that index.
So if our source code looks like this:
0
5
5

and we run that through Fib, it returns
1
8
8

How did it do that? Well, we don't care, right? The information required to perform that transformation is encoded into our runtime, and as developers who only care about getting the fibonacci number associated with a given index it just doesn't matter to us how the ultimate value is derived.
Under the hood it's possible that our runtime is just invoking a function like we saw above - the runtime is just its own software, afterall, right? So the runtime could be deriving our values for us as it runs. But not necessarily, right? Is there a way for us to implement this runtime without having it do any kind of math?
Environment

This is the third pillar of this model. Computation is work, and any time we can avoid doing it we should - there's no reason to burn those picojoules if we don't have to. So how can we avoid computing things when our program executes?
By computing them, once, in advance. Our Fib programming language could be compiled with the first N fibonacci numbers available for constant-time lookup. Similarly, our fibonacci function above could replace its computational work with a constant-time lookup if that data exists (and in fact it does do its own memoization).
But there is also a whole environment that we all exist within out there, and the fibonacci numbers exist any number of places within this environment. Fibonacci in particular is a great example for this because we see these numbers in nature constantly. We could encode them in a text-file but we could also just examine a sea-shell, right, with some basic lookups? Our runtime would need the capacity to measure the spiral in the image we show it, but that's still potentially less computationally intensive than computing, in theory.
So What?

I'm not sure. I'm fascinated by the idea that functions are just relations, relations are just lookups in a table. So in a sense all of this collapses because at the end of the day even computation is just a set of lookups.
So maybe another way to think about this is to say that if we get clever with memoization we can write much less code, which leads to fewer bugs and less energy consumption.
Thoughts?