Skip to content

Instantly share code, notes, and snippets.

@BrianWill
Last active April 26, 2024 08:09
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save BrianWill/e122e4cffa000d30f187 to your computer and use it in GitHub Desktop.
Save BrianWill/e122e4cffa000d30f187 to your computer and use it in GitHub Desktop.

Writing Functionally Pure Code

Is all my code going to be pure?

Nope. No program can be composed 100% of pure code because every program must do I/O work, which is inherently stateful. Most programs also need to carry around some amount of internal state, to one degree or another. The goal of functional programming, then, is to maximize the proportion of pure code to impure code in your program.

Pure vs. Impure

What does it mean for data and code to be pure? The short answer is that pure things are immutable, but this is not quite accurate: all pure things are immutable, but not all immutable things are pure. Pure things are not only unmodifiable but also definitional.

1) An immutable object on the heap is always itself pure, even when variables which reference the object are impure.

2) A global variable is pure only when it is not just immutable but also initialized by a pure expression (an expression which involves only pure variables, pure functions, and pure objects).

const STARTUP_TIME = getTime();   // immutable but impure global variable
const PI = 3.1415;                // pure global variable

Impure global variable STARTUP_TIME gets its value from an impure function, and so STARTUP_TIME will have different values from one run of the program to the next. In contrast, the pure global variable PI is definitional: it has the same value in every run of the program.

3) The same rule applies to local variables: a pure local variable is immutable and initialized by a pure expression, and so it will have the same value in every call to the function which is given a particular set of arguments, e.g. for every call of foo with the arguments 3 and 5, the pure local variables of foo will have the same values.

4) A pure function is referentially transparent: for a given set of arguments, a pure function always does the same thing every single time, no matter what. The stateful context of the program and the world outside the program never affects any calls to pure functions. So for a function to be pure, its arguments and return value must also be pure. Furthermore, a pure function only uses pure things from enclosing scopes:

// for alice to be pure, ...
function alice(x, y) {     // the arguments passed to x and y must also be pure
    var z = foo(bar);      // foo and bar from enclosing scope must both be pure
    return x + y + z;      // the return value must be an immutable object or a pure function
}

In short, a pure function neither reads nor modifies any state outside of itself. As we say, a pure function does not produce side-effects nor rely upon side-effects. So pure functions cannot do any I/O work! They may not read or write files, send or receive data on the network, get user input, draw on the screen, etc. So a pure function is always called only for the value it returns.

Perhaps surprisingly, each call to a pure function may internally have its own state, so long as the state is confined to the individual call. A pure function's local variables may be mutable, and the function may create and use mutable objects, so long as those mutable objects are not passed to other functions or returned.

However, because stateful logic is generally harder to understand than stateless logic (even when the state is confined to a single function call), the preferred style in functional programming is to not use mutable objects or mutable local variables in our functions—even our impure functions! When we do occasionally use mutable state in our functions, it is only for the sake of optimization.

How do we do work using only immutable objects?

In most languages today, numbers, strings, booleans, and the value null are already immutable. But what about collections and class instances? How can we make them immutable in a way that's reasonably efficient, i.e. that doesn't require us to create a whole copy of a collection every time we wish to modify it?

Enter 'persistent' collections. A persistent collection is an immutable collection implemented in such a way that we can create derivative copies that share in memory everything they have in common with the original. For example, given a persistent list, we can create a derviative copy of the list that is just like the original but has one more element appended on the end. Because the derivative copy and the original share in memory every element except one, creating this derivative is relatively cheap, in terms of both memory usage and clock time.

Facebook publishes a library of persistent collections for Javascript called Immutable.js:

// `npm install immutable`
var im = require('immutable');

var m1 = im.Map();                    // immutable map with no items
var m2 = im.Map({a: 1, b: 2, c: 3});  // immutable map with items a: 1, b: 2, c:3 
var m3 = m2.set('bert', 5);           // immutable map with items a: 1, b: 2, c: 3, bert: 5
var m4 = m2.set('ernie', -7);         // immutable map with items a: 1, b: 2, c: 3, ernie: -7
var m5 = m4.set('b', "hi");           // immutable map with items a: 1, b: "hi", c: 3, ernie -7

m2.get('b');                          // 2
m5.get('b');                          // "hi"

var l1 = im.List();                   // immutable list with no items
var l2 = im.List([1, 2, 3, 4, 5]);    // immutable list with items 1, 2, 3, 4, 5
var l3 = l2.set(0, "hi");             // immutable list with items "hi", 2, 3, 4, 5
var l4 = l2.push(-9);                 // immutable list with items 1, 2, 3, 4, 5, -9
var l5 = l2.remove(1);                // immutable list with items 1, 3, 4, 5

// is()  deep equality
im.is(l2, l4)                          // false
im.is(l2, l4.remove(5))                // true

While you could store mutable objects (JS arrays and objects) in these immutable structures, that would make them no longer immutable, so it's something we generally don't do.

Every collection type implements the methods first() and rest().

For ordered collections, a call to first() returns the first element of the collection, and a call to rest() returns a new collection containing everything but the first element.

For unordered collections, like maps, which element is returned by first() is determinisitic but undefined, so we shouldn't expect any particular element when calling first() on an unordered collection. Calling rest() on an unordered collection returns a new collection containing everything except the element returned by first().

For empty collections, first() returns undefined, and rest() returns an empty collection of the same type.

l3.first();                // "hi"
l3.rest();                 // immutable list with items 2, 3, 4, 5

l1.first()                 // undefined
l1.rest()                  // an empty list

// let's say the 'first' of m3 happens to be the value of key 'a'
m3.first()                 // 1
m3.rest()                  // a new immutable map with items b: 2, c: 3, bert: 5

m1.first()                 // undefined
m1.rest()                  // an empty map

Using first() and rest(), we can iterate through any collection:

// iterating through every element of a list
var li = im.List([1, 2, 3, 4, 5, 6]);
while (li.count() > 0) {
    foo(li.first());
    li = li.rest();
}

For maps, we may sometimes wish to iterate through the keys as well as the values, so we have these methods:

m3.values();             // an ordered collection of the values of the map
m3.keys();               // an ordered collection of the keys of the map
m3.entries();            // an ordered collection of the key-value pairs of the map

A lazy sequence is a collection type in which the values are generated when first needed. A Range, for example, is a type of lazy sequence which represents a range of numbers: the Range object doesn't store these values when first created but rather generates them the first time they are accessed:

im.Range(0, 5)               // a Range representing 0 to 4
im.Range(0, 1000000, 2)       // a Range representing 0 to 999999 by step of 2

As we'll discuss later, there are many more kinds of lazy sequences and many more methods for operating on the collection types.

How do we do work using only immutable local variables?

1. Do not assign to variables of enclosing functions.

As we covered earlier, this is a requirement of pure functions, but the rule applies in any function where we want to avoid mutable local variables.

2. Do not put assignments in conditional expressions.
// don't write this
function alice(x) {
    x > 3 ? x = 3 : x = 5;
    return bob(x); 
}
3. Do not use if or if-else.

Instead we do all of our branching with the ternary operator:

function alice(arg) {
    var x;
    if (arg) {
        x = 2;
    } else {    
        x = 5;
    }

    // to find assignment of x used here, we must reason about control flow
    return bob(x);  
}

// instead, we could write this:
function alice(arg) {
    var x = (arg ? 2 : 5);   // only one assignment of x
    return bob(x);  
}

The two branches of a ternary operator must be single expressions, but we can evaluate multiple expressions in each branch with the comma operator:

// call foo() then bar() if x is greater than 3
x > 3 ? (foo(), bar()) : null;
4. Do not use while, do-while, or for.

We instead do all of our iteration with recursion. For example:

// impure loop
for (var i = 0; i < things.length; i++) {
    doStuff(things[i]);
}

// recursive equivalent
function loop (i) {
    (i < things.length) ? (doStuff(things[i]), loop(i + 1)) : null;
}
loop(0);

The general trick is that everything which might change from one iteration to the next becomes a parameter of the function. At some point in the function, we branch, either recursively calling the function to do another iteration or returning when we have no more iterations to do.

By turning our loops into function calls, our loops are expressions rather than statments and so themselves return values:

// invoke foo with value of x last set by the loop
var x = 0;
while (carol(x)) {   
    x = bob(x);      
}
foo(x);

// invoke foo with value returned by the loop
foo(
    (function loop (x) {
        return carol(x) ? loop(bob(x)) : x;
    })(0)
);

(If you're concerned about the function call overhead introduced by doing our iteration with recursive calls, tail call elimination solves this problem.)

So what does following these rules get us?

Well consider the following two functions that use mutable local variables:

// pure
function alice(arg) {
    if (arg) {
        x = 2;
    } else {    
        x = 5;
    }
    return bob(x);  // value of x depends upon control flow
}

// pure
function alice() {
    x = 0;
    while (carol(x)) {   // value of x depends upon control flow
        x = bob(x);      // value of x depends upon control flow
    }
    return x;            // value of x depends upon control flow
}

Even though both of the functions are pure, to understand each use of local variable x, we have to reason about the control flow. In contrast, when we avoid mutable local variables, the value of each variable is always its one and only assignment:

// pure
function alice(arg) {
    let x = arg ? 2 : 5;
    return bob(x);       // value of x determined by the sole assignment to x
}

// pure
function alice() {
    return
        // value of x is fixed during each iteration 
        (function loop (x) {      
            return carol(x) ? loop(bob(x)) : x;
        })(0)
    ;
}

While the parameter x of the loop function does have a different value in each successive iteration, and in that sense, the value of x depends upon control flow, there is however never a question about where in code x gets its value.

We can actually assign to our 'immutable' local variables more than once.

If we follow the four rules above, we can actually assign to a local variable more than once and still get essentially the same benefits. Consider these three logically equivalent functions:

// this version assigns to each variable only once
function alice() {
    let x = foo();       
    let y = x * 9;      // to understand x, read the only assignment of x
    let z = y - 2;      // to understand y, read the only assignment of y
    return z;           // to understand z, read the only assignment of z
}

// ...but this version assigns to x twice 
function alice() {
    let x = foo();          
    let y = x * 9;      // to understand x, scan up to last assignment of x
    x = y - 2;          // to understand y, scan up to last assignment of y
    return x;           // to understand x, scan up to last assignment of x
}

// this version introduces a subscope which shadows x with its own x
function alice() {
    let x = foo();              
    let y = x * 9;          // to understand x, scan up to last assignment of x
    {
        let x = y - 2;      // to understand y, scan up to last assignment of y
        return x;           // to understand x, scan up to last assignment of x
    }          
}

When we stick to one scope with only one assignment for each variable, we only ever have one place to look for a variable's assignment. But as long as we follow the rules, we can assign to our local variables more than once or shadow them in subscopes and still get nearly the same effect. The only difference is that, for each use of a variable, we find its assignment by scaning up in code to the last assignment for that name. This is only very slightly harder to do and still frees us from having to reason about the control flow. So whether we assign to a variable more than once or instead write the equivalent code with separate, immutable variables is a minor point of style.

Putting it all together

So here's an example of imperative code rewritten in the functional style using only immutable objects and immutable local variables:

// impure: mutable objects and local variables
function alice(list) {
    var newList = [];
    for (var i = 0; i < list.length; i++) {
        newList.push(process(list[i]));
    }
    return newList;
}   

var immutable = require('immutable');
// pure: the parameter expects to receive an immutable list
function alice(list) {
    return 
        (function loop (list, newList) {
            return 
                list.count() > 0 
                ? loop(list.rest(), newList.push(process(list.first())))
                : newList
            ;               
        })(list, immutable.List())
    ;
}

A pure function body always consists of just one expression (or the logical equivalent).

If a function avoids mutable state entirely, then its body will consist of zero or more assignment statements followed by a return statement. Every step creates a value that feeds into a successive step; any step for which this can't be said must be pointless or must be producing side-effects.

Arguably, assignments are just a convenience because we could write any pure function as just one expression statement and get the same result:

// instead of writing this...
function alice() {
    var x = bob();
    var y = david();                 
    return carol(x, y);   
}

// ...we could write this
function alice() {    
    return carol(bob(), david());   
}

In some cases, though, assignments spare us from computing the same value more than once, so they aren't strictly a matter of style:

// this assignment spares us from having to invoke bar twice
var x = bar();
foo(x, x);

(Actually, some functional languages, will automatically optimize duplicate calls by reusing previously computed values. With such optimizations, assignments really are just a matter of style.)

Let's give Javascript some new syntax

Javascript was not designed with functional programming in mind, so following the functional style in Javascript is pretty clumsy and verbose. Let's add some fictional syntax that makes expressing functional code more palatable.

First, let's replace the ternary conditional operator with a new IF operator:

IF(condition, then, else)

// original version
function alice(arg) {
    var x = (arg ? 2 : 5);   // only one assignment of x
    return bob(x);  
}

// version using IF
function alice(arg) {
    var x = IF(arg, 2, 5);
    return bob(x);
}

Next we'll introduce an operator LET that creates and calls an anonymous function that assigns values to zero or more local variables and returns the result of an expression:

LET(assignments, expression)

// original version
(function () {
    let x = 3;
    let y = 9;
    return x * y;
})()

// version using LET
LET(
    x = 3,
    y = 9,
    x * y
)

The FN operator returns a function with zero or more parameters and a body consisting of just one return expression:

FN(params, expression)

// original version
(function (a, b) {
    return a * b;
})

// version using FN
FN(a, b,
    a * b
)

If you we want to do assignments inside our FN operators, we just use a LET operator:

// the value returned by the LET is the value returned by the FN
FN(a, b,
    LET(
        x = a * b,
        y = b + 2,
        x + y
    )
)

To make a recursive call in an FN form, we have the RECUR operator, to which we supply arguments for the recursive call:

RECUR(arguments)

// original version
(function _func_(a, b) {
    return (a < 100) ? _func_(a + 1, a * b) : b;
})

// version using FN, IF, and RECUR
FN(a, b,
    IF(
        a < 100,
        RECUR(a + 1, a * b),
        b
    )
)

(Note that to implement RECUR, we have to give the functions created by FN operations a special, designated name. We'll go with func and hope it doens't conflict with anything. Because this name is used by all FN operations, RECUR can only invoke the FN operation in which it is immediately contained.)

Lastly, to do our looping more conveniently, we introduce one last operator, LOOP, which is exactly just like FN except we specify the parameters as assignments because the created function is immediately invoked with these assigned values:

LOOP(assignments, expression)

To iterate in a LOOP, we use RECUR:

// original version
(function _func_(a, b) {
    return (a < 100) ? _func_(a + 1, a * b) : b;
})(3, 5)

// version using LOOP, IF, and RECUR
LOOP(
    a = 3,
    b = 5,
    IF(
        a < 100,
        RECUR(a + 1, a * b),
        b
    )
)

// LOOP is just a convenience for an FN which gets immediately called:
FN(
    a, b,
    IF(
        a < 100,
        RECUR(a + 1, a * b),
        b
    )
)(3, 5)

Be clear that a RECUR always applies to the LOOP or FN in which it is most immediately contained.

Putting these new operators to use

Here's an earlier example rewriten with these new operators:

// original version
var immutable = require('immutable');
function alice(list) {
    return 
        (function loop (list, newList) {
            return 
                list.count() > 0, 
                ? loop(list.rest(), newList.push(process(list.first())))
                : newList
            ;               
        })(list, immutable.List())
    ;
}

// version with IF
var immutable = require('immutable');
function alice(list) {
    return 
        (function loop (listi, newList) {
            return IF(
                list.count() > 0, 
                loop(list.rest(), newList.push(process(list.first()))),
                newList
            );               
        })(list, immutable.List())
    ;
}

// version with LOOP, RECUR, and IF
var immutable = require('immutable');
function alice(list) {
    return 
        LOOP(
            list = list,
            newList = immutable.List(),
            IF(
                list.count() > 0, 
                RECUR(list.rest(), newList.push(process(list.first()))),
                newList
            )
        )
    ; 
}

// version with FN, LOOP, RECUR, and IF
var immutable = require('immutable');
var alice = FN(
    list,
    LOOP(
        list = list,
        newList = immutable.List(),
        IF(
            list.count() > 0, 
            RECUR(list.rest(), newList.push(process(list.first()))),
            newList
        )
    )
);

Using higher-order functions with collections

In working with collections, functional programming typically makes heavy use of higher-order functions (functions which take other functions as argument). One such function is built into Javascript: apply, a function object method which invokes its function with arguments supplied in an array:

function foo(a, b, c) {
    return a + b + c;
}
var x = [1, 2, 3];

foo.apply(null, x)   // 6  (null is passed to foo as 'this')

The collections of Immutable.js implement a number of higher-order functions. We'll look at the three most essential: map, filter, and reduce.

First, the map method takes a single-parameter function as argument and passes each element of the collection, in turn, to this function; the values returned by these calls get bundled into an ordered collection, which is the object returned by the map method itself. So map returns the collected results of passing each element to the function:

var im = require('immutable');
var x = im.List([1, 2 ,3]);
x.map(function (a) { return a * 3;});    // a list with elements 3, 6, 9

The filter method also takes a single-parameter function and passes each element of the collection, in turn, to this function; the elements for which the function returns a true value are bundled into the ordered collection returned from the filter method. Effectively, filter returns a new collection with only the elements which satsify the predicate function:

var im = require('immutable');
var x= im.List([1, 2, 3, 4, 5, 6, 7, 8, 9]);
x.filter(function (a) { return (a % 2) === 0;});   // a list with elements 2, 4, 6, 8

Example code

Here's a function I originally wrote in Javascript:

Reader.prototype._consolidateCatenae = function _consolidateCatenae(elements) {
    var consolidated = [];
    for (var i = 0; i < elements.length; i++) {
        var element = elements[i];
        if (element instanceof PersistentList) {
            consolidated.push(this._consolidateCatenaeInList(element));
        } else if (element instanceof PersistentMap) {
            consolidated.push(this._consolidateCatenaeInMap(element));
        } else {
            consolidated.push(element);
        }
    }

    elements = consolidated;
    consolidated = [];

    if (elements.length === 1) {
        return elements;        
    }
    for (var i = 0; i < elements.length; ) {
        for (var j = i; j < elements.length - 1; j++) {
            var element = elements[j];
            var nextElement = elements[j + 1];
            if (!this._isAdjacent(element, nextElement)) {
                break;
            }         
        }
        if (j === i) {
            consolidated.push(element);
            i++;
        } else {
            var adjacentElements = elements.slice(i, j + 1);
            i += adjacentElements.length;
            var tokens = [];
            for (var k = 0; k < adjacentElements.length; k++) {
                tokens = tokens.concat(adjacentElements[k].getTokens());
            }
            consolidated.push(new Catena(adjacentElements, tokens));
        }
    }
    return consolidated;
};

And here is the equivalent rewritten in Clojure:

(defn consolidate-catenae [elements]
    (let [[els rest-els]
          (split-at
             (inc
              (count
               (first
                (split-with #(= (first %) (second %))
                            (partition 2 1 (conj elements nil))))))
             elements)
          els (merge els {:contained (map #(consolidate-catenae (:contained %)) els)})]
      (case (count els)
        0 []
        1 (concat (first els) (consolidate-catenae rest-els))
        (concat (->Catena els) (consolidate-catenae rest-els)))))

Not only is the functional implementation less than a third the number of lines, the control flow involved is vastly simpler: the only control flow in the Clojure version is a simple three-way switch. No looping, no nested if's...just a single switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment