Skip to content

Instantly share code, notes, and snippets.

@hanshoglund
Last active April 15, 2020 10:43
Show Gist options
  • Save hanshoglund/1fd212aa16265cb96da59991982dbd2d to your computer and use it in GitHub Desktop.
Save hanshoglund/1fd212aa16265cb96da59991982dbd2d to your computer and use it in GitHub Desktop.

RIO and impure monads

This post was prompted by a colleague asking for my opinion about the RIO type. In brief, I like it and recommend it, but for impure monads only.

To recap, RIO is defined like this:

data RIO env a = RIO { runRIO :: env -> IO a }

Like IO it is a Functor, Applicative and Monad.

Pure vs. impure monads

Monads are commonly used for descibing effects in purely functional languages such as Haskell. They come in two, fundamentally different shapes: pure and impure.

By "impure" I mean monads that are somehow defined in terms of IO (or PrimMonad). Note that we are talking about concrete monads here, not transformers. Transformers may be applied to either pure or impure monads and do not affect their purity.

  • The following monads are pure: Maybe, StateT Int [] and (Bool ->)
  • The following monads are impure: IO, STM and ReaderT Env IO

Pure monads are just normal types with a certain structure. They are not contrary to the spirit of pure functional programming, though they in some sense allow embedding of imperative languages. Impure monads however are inherently imperative. They are not normal types and require special implementation support.

Why impure?

It is well known that imperative programming has disadvantages: complex semantics, difficulty of testing, inherent sequentiality, and so on. Given this, why bother with impure monads at all? The reason is that current operating systems, language implementations and libraries expose impure interfaces. Impure monads such as IO allow us to consume these interfaces.

Facing the choice is between using impure code and rethinking computation from the ground up we are sometimes compelled to define and use impure monads. So how do we do that?

Defining monads

Implementing pure monads amounts to defining a data structure, writing an instance and proving the laws. There are many techniques to make this easier, such as transformers or free monads.

Implementations of impure monads should use RIO. It does not matter whether you use the "official" RIO type from the eponymous package, ReaderT env IO, or a custom newtype wrapper, these are all equivalent. This type is sufficient to implement any kind of effect you might want: you can pass around configuration, file handles, IORefs for state, and throw exceptions.

Other ways of writing impure monads, such as transformer stacks over IO can be considered an anti-pattern. This does not mean that transformers are useless, far from. They are great for:

  • Implementing pure monads
  • The "IdentityT pattern"

Using monads

By "using monads" I mean writing effectfull functions, using either monadic types or constraints. This can be done in a variety of ways:

  • Using IO: Int -> IO Bool
  • Using RIO: HasLogging env => Int -> RIO env Bool
  • Finally tagless: MonadWriter m => Int -> m Bool

The finally tagless style amounts to using type classes like MonadReader, MonadWriter, MonadLog etc. This is the good approach when a (meaningful) pure implementation is available. This means that type class has laws and that there is at least one lawful pure instance.

An example of a type class allowing a pure implementation is MonadState:

class MonadState s m where
  get    :: m s
  set    :: s -> m s
  modify :: (s -> s) -> m ()

instance MonadState s (State s m) where
  get      = State (\s -> (s, s))
  set x    = State (\_ -> (x, ())
  modify f = State (\s -> (f s, ()))

We can also define an impure implementation:

class HasState s env where
  getState :: env -> IORef s

instance HasState env => MonadState s (RIO env) where
  get      = RIO $ \env -> readIORef (getState env) f
  set x    = RIO $ \env -> writeIORef (getState env) x
  modify f = RIO $ \env -> atomicModifyIORef' (getState env) f

Functions defined on concrete impure monads should be used when a pure implementation is not available. These can be written using IO or RIO, depending on whether an environment is required or not.

appendFile :: FilePath -> Text -> IO ()
writeToSyslog :: SyslogConfig env => Text -> RIO env ()

Note that there is no point in using Monad... type classes here, as a pure implementation makes no sense.

Assuming our impure implementations are all defined in terms of RIO, constraints like MonadIO, MonadUnliftIO, MonadPrim rule out pure implementations without providing any tangible benefits (aside from brevity). It is usually preferable to use IO A instead of MonadIO m => m A and RIO env A instead of MonadUnliftIO m => m A.

What about Free/algebraic effect systems?

Algebraic effect systems (often based on free monads) is gaining traction in the functional programming community. The distinction between pure and impure is somewhat blurred here, but it does remain. An algebraic effect system can be either based on interpretation of pure data structures (making it pure), or built into the language (making it impure).

My caution about impure effects extends to algebraic effect systems. While this paradigm allows a much more fine-grained tracking of effects at the type-level, they do not solve the fundamental problems of impure code, any more than IO does.

Summary

  • Avoid impure monads whenever possible
  • When writing monads:
    • Impure monads: use RIO
    • Pure monads: use transformers, free monads, etc.
  • When using monads:
    • If a pure implementation is possible: Use finally tagless
    • Otherwise use concrete IO/RIO

APPENDIX

finally tagless allows one to mock and test

My argument is that this is only true when:

  1. the class has laws
  2. the laws admit a pure implementation.

Here's an example. Let's say we're writing a binding to some benchmarking library. We define a class:

class Monad m => MonadBenchmark m where
  getTime   :: m Time
  resetTime :: m ()
  runTask   :: Task -> m ()

and a mock implementation:

data Mock a = Mock { runMock :: a }
  deriving (Functor, Applicative, Monad)

instance MonadBenchmark Mock where
  getTime   = Mock 0
  resetTime = Mock ()
  runTask _ = Mock ()

and a piece of code using it:

bench :: (MonadBenchmark m) => [Task] -> m Time
bench tasks = do
  for tasks runTask
  getTime

This code has a bug: the stopwatch is not reset before the batch. This test could have caught the problem:

testBench = do
  prop "Order of tasks doesn't matter" $ \tasks -> runMock $ do
    t1 <- bench tasks
    t2 <- bench (reverse tasks)
    assert $ t1 == t2

But this test silently succeeds. Why? Because the stopwatch doesn't actually move! The time returned is always zero. The Mock implementation is does not conform to our intuitions about the class, even though it type checks.

We can add laws to capture our intuitions. For example:

forall t .
  do { x <- getTime; runTask t; y <- getTime; pure $ x < y }
  ≡
  do { runTask t; pure True }

Meaining "time moves forward when waiting".

The Mock instance violates this law. Let's fix it:

data Mock a = Mock { runMock :: State Time a }
  deriving (Functor, Applicative, Monad)

instance MonadBenchmark Mock where
  getTime   = Mock $ get
  resetTime = Mock $ pure ()
  runTask t = Mock $ modify (+ 1)

The test now fails. Great! We can amend our code to fix the bug:

bench :: (MonadBenchmark m) => [Task] -> m Time
bench tasks = do
  resetTime
  for tasks runTask
  getTime

But this also fails. Why? Because our mock is failing to properly reset the time. Let's add another law:

do { resetTime; getTime } ≡ do { resetTime; pure 0 }

Meaning "after a reset, the time is 0".

We update our mock to respect this law as well:

data Mock a = Mock { runMock :: State Time a }
  deriving (Functor, Applicative, Monad)

instance MonadBenchmark Mock where
  getTime   = Mock $ get
  resetTime = Mock $ set 0
  runTask _ = Mock $ modify (+ 1)

This succeeds (and fails with the original version of bench). While the above mock is pure, it behaves sufficiently like a "real" benchmarking tool.

Now, let's consider some examples where adding laws is not possible.

Let's say we're writing a binding to the Google Search API.

class MonadGoogle m where
  search :: Text -> m [Uri]

This class has expected behavior. But that behavior is impossible to capture in laws. Here is a "mock":

class Mock a = Mock a

instance MonadGoogle Mock where
  search "tweag" = Mock ["http://tweag.io"]
  search _       = Mock []

Now imagine some effectful code using MonadGoogle. This code might issue various searches behave in a certain way depending on the results. We can observe how it behaves against the mock, but that doesn't tell us how it will behave when run against the actual Google Search API.

I picked Google search because it was an outrageous example. But any form of external mutable state will have the same problem.

class MonadKeyValueStore m where
  put    :: Text -> Text -> m ()
  get    :: Text -> m Text

This looks like a state monad, so we might expect typical state monad laws to hold:

do { put k v; get k } ≡ do { put k v; pure v }

But if anybody else is writing to the key-value store, that is not the case!

Maybe a function that logs its process while working could write to stdout when used in one function, or collected as [String] in another function. It allows the caller to make that decision.

class MonadLog m where
  log :: Text -> m ()

This is sort of a degenerate case. It has an "empty set" of laws, which makes any instance valid:

instance MonadLog IdentityT where
  log = pure ()

instance Monad f => MonadLog (WriterT [Text] f) where
  log = tell

instance MonadLog IO where
  log = putStrLn . Text.unpack

But this implies that the log operation is entirely useless, as far as the execution path/result of the computation is concerned. If there aren't any laws, we can not rely on the MonadLog methods to do anything meaningful.

resolveComplexQuery :: (MonadLog m, ...) => Query -> m Result
resolveComplexQuery q = do
  ...
  log "Parsing done"
  ...
  log "Resolving things"
  ...

For logging this happens to be OK: it isn't really part of the computation, and all the log statements can be removed without affecting the result. But for other effects we really need laws.


My argument isn't that RIO should be used everywhere, only when a pure implementation is completely impossible, or when there aren't that describe correct behavior.

Of course in many codebases, pure instances and laws are missing, not because they are impossible to define, but because nobody had the time to write them. There are likely many unheard of teffects which can be given a precise specification and pure implmentation, including complex things like job executors, schedulers, caching etc.

On the other hand there are cases where laws/pure implementations are impossible to come up with. These tend to involve non-monotonic state that reside outside the process, e.g. in the file system or on the internet.


monad transformers make abstraction easier

I should probably clarify that I'm not opposed to writing transformers, or even applying them to IO, but rather to "implementing concrete monads by applying transfomers to IO". In other words this is fine:

newtype FooT f a = { runFooT :: ... }
instance MonadTransf FooT
instance (Monad f, ... f) => MonadFoo (FooT f)

newtype BarT f a = { runBarT :: ...}
instance MonadTransf BarT
instance (Monad f, ... f) => MonadBar (BarT f)

-- elsewhere:
main :: IO ()
main = runFooT $ runBarT $ ...

However this is an anti-pattern

newtype Foo a = Foo { runLog :: FooT IO a }
  deriving (Monad, MonadIO, MonadFoo...)

because it adds another type that allow the user to run arbitrary IO. Writing entire libraries in terms of lawful type classes and monad transformers is great. I know this was not very clear in the original post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment