Skip to content

Instantly share code, notes, and snippets.

@TimWSpence
Created January 29, 2019 14:43
Show Gist options
  • Save TimWSpence/9997c014564f3f4497880d23c3c7836d to your computer and use it in GitHub Desktop.
Save TimWSpence/9997c014564f3f4497880d23c3c7836d to your computer and use it in GitHub Desktop.
Having your cake and eating it

Having your cake and eating it

At Permutive, we're very much committed to functional programming. That typically means committing to immutable data structures (and this is a good thing!). However, there are times when an algorithm would run much faster or more space efficiently if it could update state in place.

Consider for example the classic implementation of quick sort:

qsort :: Ord a => [a] -> [a]
qsort [] = []
qsort (h:t) = [x | x <- t, x <= h] ++ [h] ++ [x | x <- t, x > h]

Whilst conceptually elegant, this is hugely space efficient (and slow) due to repeated list concatenation. It should be possible to sort in-place, as in the classical implementation you will find in any imperative language. But to do so requires us to have mutable state!

One option for mutable state would be to store our states inside an IORef. This provides mutable memory protected by sequenced IO actions. This would run in constant space but is not very satisfactory as our types become polluted with IO. Clearly this should not be necessary as our qsort function does not want to modify any non-local state.

This in fact reveals something about the nature of the problem. The reason we're required to introduce IO is because the compiler has no way to guarantee that we won't share our mutable array with another thread and hence to ensure referential transparency we must work within the IO monad.

So can we construct a type which permits in-place mutation but which cannot be shared between threads?

Yes! :) (Many thanks to John Launchbury and Simon Peyton Jones - cf their 1994 paper Lazy Functional State Threads)

The ST Monad

Haskell defines types ST s a and STRef s a which support (amongst others) the following operations:

newSTRef :: a -> ST s (STRef s a)
readSTRef :: STRef s a -> ST s a
writeSTRef :: STRef s a -> a -> ST s ()
modifySTRef :: STRef s a -> (a -> a) -> ST s ()

which means that we can write (pseudo-imperative) code like:

increment :: STRef s Int -> ST s ()
increment ref = do
  current <- readSTRef ref
  writeSTRef ref (current + 1)
  
prog :: ST s Int
prog = do
  ref <- newSTRef 0
  increment ref
  increment ref
  readSTRef ref

We can also extract a value from ST using

{-# LANGUAGE RankNTypes #-}

runST :: (forall s. ST s a) -> a

eg

runST prog :: Int

(runST prog) == 2 -- not haskell but you get the picture!

An STRef is backed by a piece of mutable memory. How can the compiler be sure that this is referentially transparent and doesn't have the same potential problems as IORef above?

Let's try to 'leak' the ref from an ST value to see what happens

runST $ newSTRef 0 -- attempt to leak the ref from a runST call
                   -- does not compile!!

The compiler will error with the message

Couldn't match type ‘a’ with ‘STRef s Integer’ because type variable ‘s’ would escape its scope

Why is this? Consider the types:

newSTRef 0 :: ST s (STRef s Int)
runST :: (forall s. ST s a) -> a

Suppose we could apply runST to newSTRef 0. I guess we would have:

runST $ newSTRef 0 :: STRef s Int

But this contradicts the type of runST as the return type is dependent on s, whereas the return type of runST is independent of s

Similarly we can show that the compiler will not allow us to pass in an STRef to be evaluated by runST:

let v = runST (newVar True)
  in runST (readVar v)

If this were legal, v would have type ST s (STRef s Bool) and hence readVar v :: ST s Bool. Again, this does not match the type (forall s. ST s a) required by runST - s is free in the latter but not in the former.

TLDR; the compiler will in fact prohibit us from passing an STRef as a parameter to runST or returning an STRef via runST, hence we can guarantee that the mutable state is never shared! (For a more rigorous proof see the original paper)

Back to sorting

What has the ST monad bought us? We now have a way to perform in-place mutations whilst maintaining referential transparency! Here is qsort running in constant space as promised (using STArray which is built on top of ST and works much the same way)

qort :: Ord a => [a] -> [a]
qort items = runST $ do
    let len = length items
    arr <- newListArray (1, len) items
    qsort' arr 1 len
    getElems arr
  where
    qsort' :: Ord a => (STArray s Int a) -> Int -> Int -> ST s ()
    qsort' arr low high = do
      when (low < high) $ do
        q <- partition arr low high
        qsort' arr low (q - 1)
        qsort' arr (q + 1) high

    partition arr low high = do
      i <- newSTRef (low - 1)
      pivot <- readArray arr high
      forM_ [low .. pred high] $ \j -> do
        jj <- readArray arr j
        when (jj <= pivot) $ do
          modifySTRef i (+1)
          readSTRef i >>= \ii -> swap arr ii j
      ii <- readSTRef i
      swap arr (ii + 1) high
      return (ii + 1)

    swap arr i j = do
      v <- readArray arr i
      w <- readArray arr j
      writeArray arr i w
      writeArray arr j v

Note that we have not had to sacrifice referential transparency to obtain this - the signature of the function is unchanged. It is also an almost direct translation from the corresponding C implementation, but guarantees that our mutable state cannot escape the scope of the qsort function - something that no imperative language can do.

TLDR

We can use the ST monad to implement functions which use mutable state for efficiency, whilst having the compiler guarantee that we won't accidentally share that mutable state :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment