Skip to content

Instantly share code, notes, and snippets.

@jsturg
Forked from tomekowal/pure-impure.md
Last active January 18, 2017 21:53
Show Gist options
  • Save jsturg/6d665ea5439a5108695d31c453055ca9 to your computer and use it in GitHub Desktop.
Save jsturg/6d665ea5439a5108695d31c453055ca9 to your computer and use it in GitHub Desktop.
Pure vs impure

Pure vs impure and why should I care?

Some of the most important advancements in programming came from adding restrictions to the way we program. Famous article by Edsger Dijkstra Go To Statement Considered Harmful shows how introducing one statement into programming language breaks many key properties of that language. Using goto gives us freedom and solves a couple of simple problems like breaking from nested loops easily, but it can make programs very hard to understand.

This happens, because we cannot look at two pieces of code in separation. The execution can jump to some other place in code or even worse: it can jump from another piece of code to here and we need to keep that in mind.

This concept is very similar to global variables, we know we shouldn't use them, because it tightly couples every piece of code that uses it. Let's say you wrote a function that needed to compute something using global variable without modifying it, but modified it by accident. It is possible that bug in your program will manifest in completely different piece of code and that buggy function would appear to work fine in tests.

It is the same with design problems as with goto, we cannot look at two pieces of code in separation. Global state is considered harmful.

Let's make a bold move! Let's consider all states are harmful!

Pure functions

Pure functions by definition always return the same value for given input.

Example:

def inc(x), do: x + 1

Pure functions can't depend on or modify application state.

For example this is not a pure function:

def print_value(x), do: IO.inspect(x+1)

Given the same argument, it will always return the same value, but it also has a side effect of printing that value. Pure functions have a couple of really nice properties:

Easy to understand

We can read pure function in isolation from rest of the code. There is no external state we need to worry about. No other piece of code can influence return value of pure function.

Easy to compose

Programming by changing state is called imperative programming and it is often prone to errors. For example, it is common to first create an element and operate on it. It is hard to do that in functional language, so my next example will be in JavaScript.

Let's say we created a piece of HTML:

<p class="blue_box">Some text</p>
<a href="#">Link</a>

And we need to make the background of that paragraph blue using the class:

$(".blue_box").css('background-color', 'blue');

Later we decide we want all links to be blue so in another place we add this:

$("a").css('color', 'blue');

At the end someone modifies original html like this:

<p class="blue_box"><a href="#">Link</a>Some text</p>

and the link disappeared, because it is blue text on blue background. Such situations remind me of this picture:

two doors

Things appear to be working fine in isolation, but not when combined.

Where should we even fix the bug?

This is quite common and it happens because imperative code describes steps to achieve result and not intention. In functional programming, we would need to know both colors in advance when we create html, so we would need a function that computes them. Colors of element depend on elements type and class and we need to compute both foreground and background. Let's say that as before we first want to apply blue background only.

@spec colors(class :: string) -> background_color
def colors("blue_box") -> "blue"
def colors(_) -> "white"

We use pattern matching on function head to determine background color. Underscore is a "catch all", so the code works like this:

  • first clause tells us that in "blue_box" class everything has a blue background so links won't be an exception.
  • and by default everything has a white background

Next, someone tells us to make all links blue, so our function depends now on both class and type:

@spec colors(type :: string, class :: string) -> {foreground_color, background_color}
def colors(_, "blue_box") -> {"black", "blue"}
def colors("a", _) -> {"blue", "white"}
def colors(_) -> {"black, "white"}

Now it works like this:

  • first clause says that if something is in "blue_box" class it will be black on blue (with no exception to links)
  • second clause says that in other cases links will be blue on white background
  • and everything else will be black on white background

The cool thing is that pure code forced us to come up with solution to a blue link in a blue box problem before we even thought about it. Of course, we could solve in some other way by adding other clause in the beginning:

def colors("a", "blue_box") -> {"yellow", "blue"}

The important thing is that even though it required more code, it is more explicit, simpler and less prone to errors.

Easy to test

Without any external state, we can test our programs using only assertions:

assert return_value = function(arguments)

There is no need for mocking libraries or setting up proper state with before hooks. It’s just functions and data.

Purely functional data structures

Programmers often think about data structures as something that is created to change over time. This may be counterintuitive to discover that purely functional data structures are immutable. Once created, they cannot change. If you need slightly modified versions of the same structure you need to create a new one. However, it is often optimized to be very fast.

Let's take a list as an example. Elixir list is just a singly linked list under the hood. If you declare a list like this:

a = [1, 2, 3]

The memory layout looks like this:

a → 1 → 2 → 3 → NULL

This can never change. After last reference to this list is lost we can discard it entirely, but we can never modify it. What happens if we want to append something to that list?

b = [0 | a]

It creates a new list:

    a → 1 → 2 → 3 → NULL
b → 0 ↗

So we just have to create one element in memory and point to the rest of previous list. Immutability guarantees list a will never change, so we don't have to worry about some other part of code mutating it. This allows for very efficient O(1) prepend operation and is thread safe. Many processes can read the same piece of data without any need for locks or mutexes, because no-one is going to change it.

On the other hand, if we wanted to append at the end of list, we have to copy entire list O(n), because we are not allowed to just change last element:

c = a ++ [4]
a → 1 → 2 → 3 → NULL
c → 1 → 2 → 3 → 4 → NULL

We have to remember that using purely functional data structures is a trade-off. We gain thread safety and easier to maintain programs, but some operations are slower than in imperative programs.

Purely functional languages

The only problem with programs that use only pure functions and purely functional data structures is that they are useless. We can't read user input, because it is a side effect.

Even if we encode all input parameters directly in source code, we can't print to screen or write to database, because it is also a side effect. Even if our program returns some value, we won't know about it.

There are couple of solutions to this problem. Haskell uses Monads for all operations that need side effects. It is a trick that conceptually takes an entire outside world or IO system as a function argument adds modified world as a return value. This way you can include information about all side effects that given function will perform in function type.

So purely functional languages are languages that mainly use pure functions and data structures and mark all non-pure parts explicitly as a danger zone.

Many libraries in Elixir do similar thing. For example - Ecto database wrapper has a concept of Multi which aggregates operations on database. Instead of scattering your functions operating on database throughout your codebase we can use Ecto.Multi which is a data structure that encodes multiple db operations.

Without Multi all functions in example below are impure:

def logic1(), do: add_to_db(...)
def logic2(), do: update_db(...)
def logic3(), do: delete_from_db(...)

def do_stuff() do
   logic1
   logic2
   logic3
end

With Multi we can rewrite it like this:

def logic1(multi), do: Multi.insert(multi, ...)
def logic2(multi), do: Multi.update(multi, ...)
def logic3(multi), do: Multi.delete(multi, ...)

def do_stuff() do
   Multi.new #start with empty multi
   |> logic1 #returns multi with insert
   |> logic2 #returns multi with insert and update
   |> logic3 #returns multi with all three operations
   |> Repo.transaction #performs all three operations
end

This way all 3 logic functions become pure. We can test them in isolation and we don't need to mock anything for those three. We still need to setup the database to test the high level do_stuff function, but we can treat it more like an integration test.

Why purity matters?

We've seen that pure functions are easier to test and make programs easier to understand. Purely functional data structures give similar advantages, they can add some clever optimizations and are thread safe. We can write purely functional programs by encoding impure operations as data and then having one point in our program that deals with side effects like in example with Ecto.Multi.

Pushing impure code to the boundaries of our program requires some discipline, but constraining ourselves in this way is compensated by the better design that emerges. It is similar to constraining ourselves by not using goto or global variables. Even non functional languages adopt some parts of this design. Ruby introduced lately immutable strings.

React and many other JS frameworks try to push all functions updating DOM model out of your logic by introducing virtual DOM.

I hope this article helps you you further understand functional programming and dispels any doubts about the meaning of purity.

@cschneid
Copy link

This concept is very similar with global variables, we know we shouldn't use them, because it tightly couples every piece of code that uses it - switches plural to singular.

@cschneid
Copy link

It is the same with design problems as with goto, we cannot look at two pieces of code in separation. Global state is considered harmful.

Let's make a bold move! Lets consider all states are harmful!

probably can turn that into one line that flows better.

@cschneid
Copy link

Pure functions by definition always return the same value for given input.

Example:

def inc(x), do: x + 1
Pure functions can't depend on or modify application state.

For example this is not a pure function:

def print_value(x), do: IO.inspect(x+1)
Given the same argument, it will always return the same value, but it also has a side effect of printing that value. Pure functions have a couple of really nice properties:

This example isn't great. There are two different things, and this confuses them.

1️⃣ Same output for same input. "If I pass 10 in to this plusOne function, will it always return 11?"
2️⃣ No side effects. "Did it change the world somehow while doing that plusOne?

@cschneid
Copy link

Without Multi all functions in example below are impure:

There's an underlying idea here which may be worth expanding either in this post, or another.

"It's often useful to split an operation into the WHAT and the HOW. Build up a data structure of WHAT. Then have an interpreter that takes that data structure and turns it into the HOW."

that's what Multi is doing - it's turning direct action on the db into data about requested actions on the db. So the first lines are building the data structure, then the .transaction code is the interpreter.

@cschneid
Copy link

Generally I feel like this article has 2 too many sections. The entire functional data structures section is only tangentially related to pure functions and could be dropped. It's an interesting topic, but pretty nitpicky, and unlikely for people to implement for themselves.

Maybe expand on what pure means (see my 1 and 2 notes). Comment on if/when logging is a side effect. (this segues into the distinction between logging & tracing which I've run into a few times).

Then the testing part can be expanded, maybe hinting at a future article about quickcheck (which is awesome, and enabled by pure functions existing).

@cschneid
Copy link

I hope this article helps you you further understand functional programming and dispels any doubts about the meaning of purity. Too bold of a closing for a short blog post

@jsturg
Copy link
Author

jsturg commented Jan 18, 2017

Added several spelling and grammatical changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment