larskuhtz/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Setup

This gist contains benchmarks for read-only concurrent transaction that involve many shared variables.
The test data is a mutable single-linked list where individual list "cells" are linked via mutable shared
variables. The goal is to measure the overhead of different transaction types and variables in read-only
transactions when there are no concurrent writes.
The input to a benchmark run is a list of a of successive Int values.
The benchmark code runs a number of concurrent threads that each fold over the list and
each compute the sum of all entries. The list isn't mutated during the benchmarks.
Benchmark parameters are

the length of the list,
the number of concurrent threads, and
the type of the variables and transactions.

Variable and transaction types are

MVar,
Compose IORef Maybe (which is fine, since the data structure isn't mutated during the benchmark),
TMVar (in STM) evaluated in a single STM transaction,
TMVar (hoisted) with each STM action hoisted into a separate singleton STM transaction,
TMVar (in IO) with each STM action called in a separate singleton STM transaction,
Compose TVar Maybe (in STM) evaluated in a single STM transaction,
Compose TVar Maybe (hoisted) with each STM action hoisted into a separate singleton STM transaction, and
Compose TVar Maybe (in IO) access in IO without using STM.

"TMVar hoisted" and "TMVar in IO" result in equivalent code. However, the overhead of the hoist implementation
for the Stream type that is used in the fold and different inlining by GHC may result in different performance.
Build:
ghc -O2 -threaded -rtsopts -main-is VarList -o main VarList.hs
Run:
./main +RTS -N
Results


As expected Compose IORef Maybe performs best in all settings. Compose TVar Maybe with readTVarIO performs on par
with plain IORef. MVar has only little overhead.
TMVar in IO and TMVar hoisted both have acceptable performance with some overhead due to the STM transactions.
The hoisted version is somewhat slower, most likely due to overhead in hoist and different inlining by GHC.
In the context of STM transactions TMVar and Compose TVar Maybe perform similar, which can be explained by the
fact that TMVar a is implemented as TVar (Maybe a) in base.
Benchmarks didn't complete in acceptable time for benchmarks that run the complete fold in a single STM transaction
when there was more than a single thread and a moderately large lists lengths (in the order of thousands).
Results for large STM transactions with many variables

Rendered Notebook
Focusing on the scenarios where the complete fold is done in a single STM transaction, it seems that the run time
scales linear in the number of threads and, up to some point, quadratic in the length of the list.
However, if there is more than a single thread and the length of the list growth beyond a certain point
(~15,000 - 20,000 in my tests) the running time distribution exhibits a growing tail of very long runs and eventually
most runs become long running. The behavior seems to indicate a race between concurrent transactions, which
is surprising, since there are no updates to the variables during the test run.
Conclusion

It seems that read-only STM transactions without concurrent writes scale quadratic in the number involved
variables and linear in the number of concurrent transactions.
Small (singleton) read transactions come with little overhead, but are still slower than MVar with tryReadMVar.
TVars with readTVarIO have the same performance as plain IORefs. Thus TVars have the benefit that they can be used
either like plain IORef without any transaction overhead or in transactions.
Using STM transactions that involve a large number of variables should be avoided, even if when there are no concurrent
writes. In addition to the quadratic complexity it seems that there can be races between read-only transactions
even in the absence of concurrent updates.
Thus,

IORefs are best when there is no concurrent updates beyond what can be done with atomicModifyIORef.
MVars are best when concurrent writes can't be ruled out.
TVars are best when variables are used in contexts that don't have concurrent
writes (where readTVarIO can be used) as well as in contexts with concurrent writes.

It would be nice if base could include a function tryReadTMVarIO that avoids the overhead of an STM transaction.

  
## VarList.hs
{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeSynonymInstances #-}

-- |
-- Module: VarList
-- Copyright: Copyright © 2018 Lars Kuhtz <lakuhtz@gmail.com>
-- License: MIT
-- Maintainer: Lars Kuhtz <lakuhtz@gmail.com>
-- Stability: experimental
--
module VarList
(
-- * Mutable Variables

  Var(..)
, MaybeIORef
, MaybeTVar

-- * Cell
, Cell(..)
, foldCell
, cellToStream
, cellFromStream

-- * Test and Benchmark Data
, testCell

-- * Main functions
, main
, main1
, main2
, main3
) where

import Control.Concurrent
import Control.Concurrent.Async
import Control.Concurrent.STM
import Control.DeepSeq
import Control.Monad
import Control.Monad.Morph
import Control.StopWatch

import Criterion
import Criterion.Main

import Data.Foldable
import Data.Functor.Compose
import Data.Functor.Of
import Data.IORef
import qualified Data.List.NonEmpty as L
import qualified Data.Sequence as Seq

import qualified Streaming.Prelude as S

import System.Clock
import System.Environment
import System.IO
import System.Mem

-- -------------------------------------------------------------------------- --
-- Variables
--
-- Note that 'Compose' is used here for type constructors that aren't functors.

class Monad m => Var m v where
    readVar :: v x -> m (Maybe x)
    newEmptyVar :: m (v x)
    newVar :: x -> m (v x)

    -- | Depending on the type of the variable this function may block or spin.
    awaitVar :: v x -> m x

type MaybeIORef = Compose IORef Maybe
type MaybeTVar = Compose TVar Maybe

instance Var STM TMVar where
    readVar = tryReadTMVar
    newEmptyVar = newEmptyTMVar
    newVar = newTMVar
    awaitVar = takeTMVar
    {-# INLINE readVar #-}
    {-# INLINE newEmptyVar #-}
    {-# INLINE newVar #-}
    {-# INLINE awaitVar #-}

instance Var IO TMVar where
    readVar = atomically . tryReadTMVar
    newEmptyVar = newEmptyTMVarIO
    newVar = newTMVarIO
    awaitVar = atomically . awaitVar
    {-# INLINE readVar #-}
    {-# INLINE newEmptyVar #-}
    {-# INLINE newVar #-}
    {-# INLINE awaitVar #-}

instance Var IO MVar where
    readVar = tryReadMVar
    newEmptyVar = newEmptyMVar
    newVar = newMVar
    awaitVar = takeMVar
    {-# INLINE readVar #-}
    {-# INLINE newEmptyVar #-}
    {-# INLINE newVar #-}
    {-# INLINE awaitVar #-}

instance Var STM (Compose TVar Maybe) where
    readVar = readTVar . getCompose
    newEmptyVar = Compose <$> newTVar Nothing
    newVar = fmap Compose . newTVar . Just
    awaitVar = readVar >=> maybe retry return
    {-# INLINE readVar #-}
    {-# INLINE newEmptyVar #-}
    {-# INLINE newVar #-}
    {-# INLINE awaitVar #-}

instance Var IO (Compose TVar Maybe) where
    readVar = readTVarIO . getCompose
    newEmptyVar = Compose <$> newTVarIO Nothing
    newVar = fmap Compose . newTVarIO . Just
    awaitVar = atomically . awaitVar
    {-# INLINE readVar #-}
    {-# INLINE newEmptyVar #-}
    {-# INLINE newVar #-}
    {-# INLINE awaitVar #-}

instance Var IO (Compose IORef Maybe) where
    readVar = readIORef . getCompose
    newEmptyVar = Compose <$> newIORef Nothing
    newVar = fmap Compose . newIORef . Just

    -- | This spins
    awaitVar v = let go = readVar v >>= maybe (yield >> go) return in go
    {-# INLINE readVar #-}
    {-# INLINE newEmptyVar #-}
    {-# INLINE newVar #-}
    {-# INLINE awaitVar #-}

-- -------------------------------------------------------------------------- --
-- Cell

data Cell v a = Cell a !(v (Cell v a))

instance NFData a => NFData (Cell v a) where
    rnf (Cell a _) = rnf a

-- | Fold over the values in a cell
--
foldCell
    :: Monad m
    => Var m v
    => (b -> a -> b)
    -> b
    -> Cell v a
    -> m b
foldCell f b = go b
  where
    go !i (Cell !h !t) = readVar t >>= \case
        Nothing -> return i
        Just c -> go (f i h) c

cellToStream :: Var m v => Cell v a -> S.Stream (Of a) m ()
cellToStream = go
  where
    go (Cell h t) = do
        S.yield h
        lift (readVar t) >>= \case
            Nothing -> return ()
            Just c -> go c

-- | Transforms a stream into a cell and returns the first cell if the stream
-- isn't empty.
--
-- This could be made lazy using 'unsafeInterleavedIO', which would be fine,
-- since the 'Var's won't change once the are evaluated. However, for
-- benchmarking we want it strict.
--
cellFromStream :: Var m v => S.Stream (Of a) m () -> m (v (Cell v a))
cellFromStream s = S.foldrM f (s >> lift newEmptyVar)
  where
    f !a !x = newVar =<< (Cell a <$> x)

cellFromList :: forall v a m . Var m v => L.NonEmpty a -> m (Cell v a)
cellFromList (a L.:| l) = Cell a <$> go l
  where
    go [] = newEmptyVar
    go (!h:t) = newVar =<< (Cell h <$> go t)

maybeIORefCellFromList :: forall a . L.NonEmpty a -> IO (Cell MaybeIORef a)
maybeIORefCellFromList (a L.:| l) = do
    v <- newEmptyVar
    go v l
    return $ Cell a v
  where
    go :: MaybeIORef (Cell MaybeIORef a) -> [a] -> IO ()
    go _ [] = return ()
    go !v (!h:t) = do
        !v' <- newEmptyVar
        writeIORef (getCompose v) (Just (Cell h v'))
        go v' t

-- -------------------------------------------------------------------------- --
-- Main

testCell :: forall v m . Var m v => Int -> m (Cell v Int)
testCell i = awaitVar =<< cellFromStream (S.each [1..i])

main :: IO ()
main = main3

-- -------------------------------------------------------------------------- --
-- Test

main1 :: IO ()
main1 = do
    [minThreadCount, maxThreadCount, threadCountStep, minListLength, maxListLength, listLengthStep] <- fmap read . take 2 <$> getArgs
    hSetBuffering stdout LineBuffering
    putStrLn $ "threadCount, listLength, time"

    forM_ [minThreadCount, (minThreadCount + threadCountStep) .. maxThreadCount] $ \threadCount -> do
        forM_ [minListLength, (minListLength + listLengthStep) .. maxListLength] $ \listLength -> do
            -- Compute test cell
            cell <- testCell @MaybeTVar listLength

            -- prime test cell evaluation
            void $ atomically $ S.sum $ cellToStream $ cell

            -- measure concurrent test cell evaluation
            performMajorGC
            (_, t) <- stopWatch $ replicateConcurrently_ threadCount
                $ atomically $ S.sum_ $ cellToStream $ cell

            -- print result
            putStrLn
                $ show threadCount
                <> ", " <> show listLength
                <> ", " <> show (toNanoSecs t)

-- -------------------------------------------------------------------------- --
-- Comparison with Pure Sequence

main3 :: IO ()
main3 = defaultMain
    [ bs 1
    , bs 100
    , create
    ]
  where
    create = bgroup "create" $
        [ bench "seq"
            $ nf (\s -> Seq.fromList [0..s]) (1000000 :: Int)
        , bench "cell"
            $ nfAppIO (testCell @MaybeIORef) 1000000
        , bench "cell from list"
            $ nfAppIO (\s -> cellFromList @MaybeIORef @Int (0 L.:| [1..s])) 1000000
        , bench "cell from list 2"
            $ nfAppIO (\s -> maybeIORefCellFromList @Int (0 L.:| [1..s])) 1000000
        ]

    bs i = bgroup (show i <> " threads")
        [ bgroup "stream" $
            [ env (return $ Seq.fromList [0..1000000]) $ \(e :: Seq.Seq Int) -> bench "seq"
                $ nfAppIO (S.sum_ . S.each) e
            , env (testCell @MaybeIORef 1000000) $ \e -> bench "cell"
                $ nfAppIO (S.sum_ . cellToStream) e
            ]
        , bgroup "fold" $
            [ env (return $ Seq.fromList [0..1000000]) $ \(e :: Seq.Seq Int) -> bench "seq"
                $ nf (foldl' (+) 0) e
            , env (testCell @MaybeIORef 1000000) $ \e -> bench "cell"
                $ nfAppIO (foldCell (+) 0) e
            ]
        ]

-- -------------------------------------------------------------------------- --
-- Criterion Benchmarks

main2 :: IO ()
main2 = defaultMain
    [ benchmarks 1
    , benchmarks 10
    , benchmarks 1000
    ]

benchmarks :: Int -> Benchmark
benchmarks i = bgroup (show i <> " threads")
    [ env (testCell @MVar 10000) $ \e -> bench "MVar"
        $ nfAppIO (threaded $ S.sum_ . cellToStream) e
    , env (testCell @MaybeIORef 10000) $ \e -> bench "MaybeIORef"
        $ nfAppIO (threaded $ S.sum_ . cellToStream) e

    , bgroup "TMVar" $
        [ env (testCell @TMVar 10000) $ \e -> bench "TMVar hoisted"
            $ nfAppIO (threaded $ S.sum_ . hoist atomically . cellToStream) e
        , env (testCell @TMVar 10000) $ \e -> bench "TMVar in IO"
            $ nfAppIO (threaded $ S.sum_ . cellToStream) e
        ]
        <> if i > 1 then [] else
            [ env (testCell @TMVar 10000) $ \e -> bench "TMVar in STM"
                $ nfAppIO (threaded $ atomically . S.sum_ . cellToStream) e
            -- , env (testCell @TMVar 10000) $ \e -> bench "foldCell Cell TMVar"
            --     $ nfAppIO (threaded $ atomically . foldCell ((+) @Int) 0) e
            ]

    , bgroup "MaybeTVar" $
        [ env (testCell @MaybeTVar 10000) $ \e -> bench "MaybeTVar hoisted"
            $ nfAppIO (threaded $ S.sum_ . hoist atomically . cellToStream) e
        , env (testCell @MaybeTVar 10000) $ \e -> bench "MaybeTVar in IO"
            $ nfAppIO (threaded $ S.sum_ . cellToStream) e
        ]
        <> if i > 1 then [] else
            [ env (testCell @MaybeTVar 10000) $ \e -> bench "MaybeTVar in STM"
                $ nfAppIO (threaded $ atomically . S.sum_ . cellToStream) e
            ]
    ]
  where
    threaded f a = replicateConcurrently_ i (f a)


## results.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              results.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	{-# LANGUAGE AllowAmbiguousTypes #-}
	{-# LANGUAGE BangPatterns #-}
	{-# LANGUAGE DeriveGeneric #-}
	{-# LANGUAGE FlexibleInstances #-}
	{-# LANGUAGE LambdaCase #-}
	{-# LANGUAGE MultiParamTypeClasses #-}
	{-# LANGUAGE OverloadedStrings #-}
	{-# LANGUAGE RankNTypes #-}
	{-# LANGUAGE ScopedTypeVariables #-}
	{-# LANGUAGE TypeApplications #-}
	{-# LANGUAGE TypeSynonymInstances #-}

	-- \|
	-- Module: VarList
	-- Copyright: Copyright © 2018 Lars Kuhtz <lakuhtz@gmail.com>
	-- License: MIT
	-- Maintainer: Lars Kuhtz <lakuhtz@gmail.com>
	-- Stability: experimental
	--
	module VarList
	(
	-- * Mutable Variables

	Var(..)
	, MaybeIORef
	, MaybeTVar

	-- * Cell
	, Cell(..)
	, foldCell
	, cellToStream
	, cellFromStream

	-- * Test and Benchmark Data
	, testCell

	-- * Main functions
	, main
	, main1
	, main2
	, main3
	) where

	import Control.Concurrent
	import Control.Concurrent.Async
	import Control.Concurrent.STM
	import Control.DeepSeq
	import Control.Monad
	import Control.Monad.Morph
	import Control.StopWatch

	import Criterion
	import Criterion.Main

	import Data.Foldable
	import Data.Functor.Compose
	import Data.Functor.Of
	import Data.IORef
	import qualified Data.List.NonEmpty as L
	import qualified Data.Sequence as Seq

	import qualified Streaming.Prelude as S

	import System.Clock
	import System.Environment
	import System.IO
	import System.Mem

	-- -------------------------------------------------------------------------- --
	-- Variables
	--
	-- Note that 'Compose' is used here for type constructors that aren't functors.

	class Monad m => Var m v where
	readVar :: v x -> m (Maybe x)
	newEmptyVar :: m (v x)
	newVar :: x -> m (v x)

	-- \| Depending on the type of the variable this function may block or spin.
	awaitVar :: v x -> m x

	type MaybeIORef = Compose IORef Maybe
	type MaybeTVar = Compose TVar Maybe

	instance Var STM TMVar where
	readVar = tryReadTMVar
	newEmptyVar = newEmptyTMVar
	newVar = newTMVar
	awaitVar = takeTMVar
	{-# INLINE readVar #-}
	{-# INLINE newEmptyVar #-}
	{-# INLINE newVar #-}
	{-# INLINE awaitVar #-}

	instance Var IO TMVar where
	readVar = atomically . tryReadTMVar
	newEmptyVar = newEmptyTMVarIO
	newVar = newTMVarIO
	awaitVar = atomically . awaitVar
	{-# INLINE readVar #-}
	{-# INLINE newEmptyVar #-}
	{-# INLINE newVar #-}
	{-# INLINE awaitVar #-}

	instance Var IO MVar where
	readVar = tryReadMVar
	newEmptyVar = newEmptyMVar
	newVar = newMVar
	awaitVar = takeMVar
	{-# INLINE readVar #-}
	{-# INLINE newEmptyVar #-}
	{-# INLINE newVar #-}
	{-# INLINE awaitVar #-}

	instance Var STM (Compose TVar Maybe) where
	readVar = readTVar . getCompose
	newEmptyVar = Compose <$> newTVar Nothing
	newVar = fmap Compose . newTVar . Just
	awaitVar = readVar >=> maybe retry return
	{-# INLINE readVar #-}
	{-# INLINE newEmptyVar #-}
	{-# INLINE newVar #-}
	{-# INLINE awaitVar #-}

	instance Var IO (Compose TVar Maybe) where
	readVar = readTVarIO . getCompose
	newEmptyVar = Compose <$> newTVarIO Nothing
	newVar = fmap Compose . newTVarIO . Just
	awaitVar = atomically . awaitVar
	{-# INLINE readVar #-}
	{-# INLINE newEmptyVar #-}
	{-# INLINE newVar #-}
	{-# INLINE awaitVar #-}

	instance Var IO (Compose IORef Maybe) where
	readVar = readIORef . getCompose
	newEmptyVar = Compose <$> newIORef Nothing
	newVar = fmap Compose . newIORef . Just

	-- \| This spins
	awaitVar v = let go = readVar v >>= maybe (yield >> go) return in go
	{-# INLINE readVar #-}
	{-# INLINE newEmptyVar #-}
	{-# INLINE newVar #-}
	{-# INLINE awaitVar #-}

	-- -------------------------------------------------------------------------- --
	-- Cell

	data Cell v a = Cell a !(v (Cell v a))

	instance NFData a => NFData (Cell v a) where
	rnf (Cell a _) = rnf a

	-- \| Fold over the values in a cell
	--
	foldCell
	:: Monad m
	=> Var m v
	=> (b -> a -> b)
	-> b
	-> Cell v a
	-> m b
	foldCell f b = go b
	where
	go !i (Cell !h !t) = readVar t >>= \case
	Nothing -> return i
	Just c -> go (f i h) c

	cellToStream :: Var m v => Cell v a -> S.Stream (Of a) m ()
	cellToStream = go
	where
	go (Cell h t) = do
	S.yield h
	lift (readVar t) >>= \case
	Nothing -> return ()
	Just c -> go c

	-- \| Transforms a stream into a cell and returns the first cell if the stream
	-- isn't empty.
	--
	-- This could be made lazy using 'unsafeInterleavedIO', which would be fine,
	-- since the 'Var's won't change once the are evaluated. However, for
	-- benchmarking we want it strict.
	--
	cellFromStream :: Var m v => S.Stream (Of a) m () -> m (v (Cell v a))
	cellFromStream s = S.foldrM f (s >> lift newEmptyVar)
	where
	f !a !x = newVar =<< (Cell a <$> x)

	cellFromList :: forall v a m . Var m v => L.NonEmpty a -> m (Cell v a)
	cellFromList (a L.:\| l) = Cell a <$> go l
	where
	go [] = newEmptyVar
	go (!h:t) = newVar =<< (Cell h <$> go t)

	maybeIORefCellFromList :: forall a . L.NonEmpty a -> IO (Cell MaybeIORef a)
	maybeIORefCellFromList (a L.:\| l) = do
	v <- newEmptyVar
	go v l
	return $ Cell a v
	where
	go :: MaybeIORef (Cell MaybeIORef a) -> [a] -> IO ()
	go _ [] = return ()
	go !v (!h:t) = do
	!v' <- newEmptyVar
	writeIORef (getCompose v) (Just (Cell h v'))
	go v' t

	-- -------------------------------------------------------------------------- --
	-- Main

	testCell :: forall v m . Var m v => Int -> m (Cell v Int)
	testCell i = awaitVar =<< cellFromStream (S.each [1..i])

	main :: IO ()
	main = main3

	-- -------------------------------------------------------------------------- --
	-- Test

	main1 :: IO ()
	main1 = do
	[minThreadCount, maxThreadCount, threadCountStep, minListLength, maxListLength, listLengthStep] <- fmap read . take 2 <$> getArgs
	hSetBuffering stdout LineBuffering
	putStrLn $ "threadCount, listLength, time"

	forM_ [minThreadCount, (minThreadCount + threadCountStep) .. maxThreadCount] $ \threadCount -> do
	forM_ [minListLength, (minListLength + listLengthStep) .. maxListLength] $ \listLength -> do
	-- Compute test cell
	cell <- testCell @MaybeTVar listLength

	-- prime test cell evaluation
	void $ atomically $ S.sum $ cellToStream $ cell

	-- measure concurrent test cell evaluation
	performMajorGC
	(_, t) <- stopWatch $ replicateConcurrently_ threadCount
	$ atomically $ S.sum_ $ cellToStream $ cell

	-- print result
	putStrLn
	$ show threadCount
	<> ", " <> show listLength
	<> ", " <> show (toNanoSecs t)

	-- -------------------------------------------------------------------------- --
	-- Comparison with Pure Sequence

	main3 :: IO ()
	main3 = defaultMain
	[ bs 1
	, bs 100
	, create
	]
	where
	create = bgroup "create" $
	[ bench "seq"
	$ nf (\s -> Seq.fromList [0..s]) (1000000 :: Int)
	, bench "cell"
	$ nfAppIO (testCell @MaybeIORef) 1000000
	, bench "cell from list"
	$ nfAppIO (\s -> cellFromList @MaybeIORef @Int (0 L.:\| [1..s])) 1000000
	, bench "cell from list 2"
	$ nfAppIO (\s -> maybeIORefCellFromList @Int (0 L.:\| [1..s])) 1000000
	]

	bs i = bgroup (show i <> " threads")
	[ bgroup "stream" $
	[ env (return $ Seq.fromList [0..1000000]) $ \(e :: Seq.Seq Int) -> bench "seq"
	$ nfAppIO (S.sum_ . S.each) e
	, env (testCell @MaybeIORef 1000000) $ \e -> bench "cell"
	$ nfAppIO (S.sum_ . cellToStream) e
	]
	, bgroup "fold" $
	[ env (return $ Seq.fromList [0..1000000]) $ \(e :: Seq.Seq Int) -> bench "seq"
	$ nf (foldl' (+) 0) e
	, env (testCell @MaybeIORef 1000000) $ \e -> bench "cell"
	$ nfAppIO (foldCell (+) 0) e
	]
	]

	-- -------------------------------------------------------------------------- --
	-- Criterion Benchmarks

	main2 :: IO ()
	main2 = defaultMain
	[ benchmarks 1
	, benchmarks 10
	, benchmarks 1000
	]

	benchmarks :: Int -> Benchmark
	benchmarks i = bgroup (show i <> " threads")
	[ env (testCell @MVar 10000) $ \e -> bench "MVar"
	$ nfAppIO (threaded $ S.sum_ . cellToStream) e
	, env (testCell @MaybeIORef 10000) $ \e -> bench "MaybeIORef"
	$ nfAppIO (threaded $ S.sum_ . cellToStream) e

	, bgroup "TMVar" $
	[ env (testCell @TMVar 10000) $ \e -> bench "TMVar hoisted"
	$ nfAppIO (threaded $ S.sum_ . hoist atomically . cellToStream) e
	, env (testCell @TMVar 10000) $ \e -> bench "TMVar in IO"
	$ nfAppIO (threaded $ S.sum_ . cellToStream) e
	]
	<> if i > 1 then [] else
	[ env (testCell @TMVar 10000) $ \e -> bench "TMVar in STM"
	$ nfAppIO (threaded $ atomically . S.sum_ . cellToStream) e
	-- , env (testCell @TMVar 10000) $ \e -> bench "foldCell Cell TMVar"
	-- $ nfAppIO (threaded $ atomically . foldCell ((+) @Int) 0) e
	]

	, bgroup "MaybeTVar" $
	[ env (testCell @MaybeTVar 10000) $ \e -> bench "MaybeTVar hoisted"
	$ nfAppIO (threaded $ S.sum_ . hoist atomically . cellToStream) e
	, env (testCell @MaybeTVar 10000) $ \e -> bench "MaybeTVar in IO"
	$ nfAppIO (threaded $ S.sum_ . cellToStream) e
	]
	<> if i > 1 then [] else
	[ env (testCell @MaybeTVar 10000) $ \e -> bench "MaybeTVar in STM"
	$ nfAppIO (threaded $ atomically . S.sum_ . cellToStream) e
	]
	]
	where
	threaded f a = replicateConcurrently_ i (f a)