Skip to content

Instantly share code, notes, and snippets.

/vcache.lhs.md Secret

Created July 23, 2017 16:05
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save anonymous/f005d8387382db8c474c6f8159d33e92 to your computer and use it in GitHub Desktop.
Save anonymous/f005d8387382db8c474c6f8159d33e92 to your computer and use it in GitHub Desktop.

I've briefly tried VCache, and here is how it was

Why

I really like the potential of VCache and acid-state: Having a database layer that is capable of being rather transparent to the app-developer, all the while providing ACIDic persistence to an application. For a long time I wanted to give either of these a try.

I, however, really dislike the limitation that acid-state needs to have all the application state in-memory all the time to work. That's a no-go for web servers for example. And also local desktop applications that may want to store a few hundred megabytes or some gigabytes of state on disk. I wouldn't want to put such an unnecessary burden on my users' RAM with that kind of wastefulness.

VCache seems to offer much greater promise on the RAM-requirement front, however it seems to enjoy a much narrower user-base as of now.

I tried to look for applications that use it, and experience reports of its users. Without much success. So I thought to give it a try myself, and share my exploratory findings to alleviate the aforementioned lack. Which I hope will be useful for others.

I'm furthermore also hoping for that sharing this will lead to

  • more people coming forward and sharing their positive and negative experiences with VCache (and similar systems) before I commit more of my time on using it.
  • or if it turns out that there isn't many people to report on negative experiences, then I hope greater adoption will follow.

How

In this file I tried to get an intuitive and practical, operational understanding of how VCache works. While also stress-testing it a bit to see what are some of its limitations and strengths.

What

Since this is a compilable and runnable Literate Haskell file, let's get the imports out of the way first:

\begin{code}
{-# language LambdaCase #-}
{-# language RankNTypes #-}
{-# language ViewPatterns #-}
{-# language DeriveAnyClass #-}
{-# language PatternSynonyms #-}
{-# language ExtendedDefaultRules #-}

import qualified  Data.IntSet                       as IS
import qualified  Data.IntMap                       as IM
import qualified  Data.IntMap.Base                  as IMB

import            Data.Time
import            Test.Hspec
import            Data.IORef
import            Text.Printf
import            Debug.Trace
import            Control.Lens                      hiding ((.>))
import            Control.Monad
import            Database.VCache
import            Data.ByteString                   (ByteString(..))
import qualified  Data.ByteString                   as BS
import            System.IO.Unsafe
import            Control.Concurrent
import            Control.Applicative
import qualified  Data.ByteString.Char8             as BSC8

default (Int, Double, String)
\end{code}

At first I was just familiarizing myself with the library, figuring out how to do reads and writes. Feel free to skip further to more interesting bits.

\begin{code}
main1 :: Maybe VCache -> IO ()
main1 mayVc = do
  vc <- maybe (openVCache 500 "db") return mayVc

  let vs = vcache_space vc
  let root = loadRootPVar vc (BSC8.pack "root2") (1, Nothing)
  a <- runVTx vs $ do
    r <- readPVar root
    traceShow ("root content", r) noop


    modifyPVar root (& _1 %~ (+1))
    r <- readPVar root
    traceShow ("X", r) noop
    case r of
      (a, Nothing) -> writePVar root (a, Just (vref vs [1,2,3], vref vs [1,2,3]))
      (a, Just b) -> do
        traceShow ("X1", b) noop
        traceShow ("X2", over both deref b) noop

    return r
  print a


  let x = vref vs (352 :: Int)
  print $ deref x

  threadDelay $ 50000
  putStrLn "end"
\end{code}
Running: main1
("root content",(4,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X",(5,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X1",(VRef#1000000004::[Int],VRef#1000000004::[Int]))
("X2",([1,2,3],[1,2,3]))
(5,Just (VRef#1000000004::[Int],VRef#1000000004::[Int]))
352
end

Then, since one of the main datastructures that I am considering persisting in VCache is a hashmap, I thought I would try to see how involved it is to write a VCacheable instance for one of its underlying structures, a binary-tree based IntMap.

And the surprising answer: not too involved. After having written it, it seemed rather straightforward to me:

\begin{code}
instance VCacheable a => VCacheable (IMB.IntMap a) where
  put = \case
    (IMB.Nil)         -> putWord8 0
    (IMB.Tip k v)     -> putWord8 1 >> put k >> put v
    (IMB.Bin p m a b) -> putWord8 2 >> put p >> put m >> put a >> put b
  get = getWord8 >>= \case
    0 -> return IMB.Nil
    1 -> IMB.Tip <$> get <*> get
    2 -> IMB.Bin <$> get <*> get <*> get <*> get
\end{code}

Then I thought I would stress-test it a bit.

\begin{code}
main2 :: Maybe VCache -> IO ()
main2 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc
  let vs = vcache_space vc
      root = loadRootPVar vc (BSC8.pack "root3") initial
      initial =
        IMB.fromList [] :: IMB.IntMap Int
        -- IMB.fromList [(22, 33)] :: IMB.IntMap Int
      disp = do
        r <- readPVar root
        traceNoop ("root content " ++ (take 40 $ show r))
        traceNoop ("root tip " ++ (take 40 $ show $ getMapTip r))
        traceNoop ("root content size " ++ (show $ IMB.size r))
      next = IMB.fromList $ zip [0..5000000] [0..]

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read End"

    traceLn

    traceNoop "Write begin"
    writePVar root next
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main2
(2017-07-22 11:06:45.215611 UTC) Init
(  0.000165s) Read begin
( 21.816722s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
( 13.292584s) root tip Just (0,0)
(  0.007298s) root content size 5000001
(  0.279039s) Read End

(  0.000155s) Write begin
(  0.000756s) Write end
(  0.000063s) Read begin
(  0.000234s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
(  1.552749s) root tip Just (0,0)
(  0.000687s) root content size 5000001
(  0.150842s) Read end
(  0.000467s) vcacheSync begin
( 24.551119s) vcacheSync end

Here I noticed someting rather strange. Yes, yes, writing and reading 5 million items to disk ought to take some time, but why does it take the same amount of time to read the length of this structure from disk as reading just the tip of the binary tree?

I thought I would simplify this question, and went on to try it with a linked list. The result was quite the same.

Note: the actual writing doesn't take place between "Write begin" and "Write end", but rather between "vcacheSync begin" and "vcacheSync end". AFAIK this is because VCache writes asynchronously on a separate thread by default.

\begin{code}
main3 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc
  let vs = vcache_space vc
      root = loadRootPVar vc (BSC8.pack "root-list-of-ints") initial
      initial =
        [1] :: [Int]
      disp = do
        r <- readPVar root
        traceNoop $ "root content " ++ (take 40 $ show r)
        traceNoop $ "root tip " ++ (take 40 $ show $ head r)
      next = [0..5000000]

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

    traceLn

    traceNoop "Write begin"
    writePVar root next
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main3
(  0.000168s) Init
(  0.000083s) Read begin
(  4.804215s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000187s) root tip 0
(  0.000046s) Read end

(  0.000043s) Write begin
(  0.000058s) Write end
(  0.000040s) Read begin
(  0.000034s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000039s) root tip 0
(  0.000032s) Read end
(  0.000037s) vcacheSync begin
(  4.121671s) vcacheSync end

How come?

I wasn't quite sure, but had a few ideas: Maybe we can't take advantage of on-demand reads because internally VCache serializes linked lists into something more compact, e.g. an array which it needs to read in all the way? Or was it that it reverses the list for some reason? Or is it simply the fact that VCacheable's get function is monadic, and can't return a value until everything is read in?

Nevertheless since VRefs are alleged to be a rather transparent storage abstraction with a pure and lazy interface, I thought I would try a linked list that has it's next element wrapped in a vref.

\begin{code}
data ListTest = ListTest {ltVs :: VSpace, unListTest :: [Int]}

instance VCacheable ListTest where
  put = \case
    (ListTest vs [])     -> putWord8 0
    (ListTest vs (x:xs)) -> putWord8 1
      >> put (vref vs x) >> put (vref vs (ListTest vs xs))
  get = getWord8 >>= \case
    0 -> ListTest <$> getVSpace <*> pure []
    1 -> ListTest <$> getVSpace <*> (
      ((:) <$> (deref <$> get) <*> (unListTest <$> deref <$> get)))



main4 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc
  let vs = vcache_space vc
      root = loadRootPVar vc (BSC8.pack "root-lazier-list-of-ints") initial
      initial =
        ListTest vs [1] :: ListTest
      disp = do
        r <- readPVar root
        traceNoop $ "root content " ++ (take 40 $ show $ unListTest r)
        traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest r)
        traceNoop $ "content length " ++ (take 40 $ show $ length $ unListTest r)
      next = ListTest vs [0..200000]

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

    traceLn

    traceNoop "Write begin"
    writePVar root next
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"


  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main4
(  0.000157s) Init
(  0.000096s) Read begin
(  0.000150s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.152087s) root tip 0
(  0.000254s) content length 200001
(  4.715509s) Read end

(  0.000104s) Write begin
(  0.000044s) Write end
(  0.000034s) Read begin
(  0.000037s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000046s) root tip 0
(  0.000037s) content length 200001
(  0.012061s) Read end
(  0.000132s) vcacheSync begin
( 14.975086s) vcacheSync end

And indeed! Reading just the tip became much faster. But what is this? Suddenly reading the whole length or writing the whole thing back in is significantly slower. So much so that I had to lower the amount to a mere 200k to get comparable running times. ~25x slowdown reading, and 150x slowdown writing (in case the time-complexity relationship between them is linear, which I have not investigated by the time of writing this).

Could vref indirection be this costly?

What if I could chunk the data so it can read those chunks internal content eagerly quick, and read the chunks on-demand, lazily? Could that give us best of both worlds, as in: Fast reading of the whole structure, and even faster reading of just the tip of the structure?

\begin{code}
data ListTest2 = ListTest2 {lt2vs :: VSpace, unListTest2 :: [Int]}

instance VCacheable ListTest2 where
  put = \case
    (ListTest2 vs [])                          -> putWord8 0
    (ListTest2 vs (splitAt 500000 -> (xs,ys))) -> putWord8 1
      >> put (vref vs xs) >> put (vref vs (ListTest2 vs ys))
  get = getWord8 >>= \case
    0 -> ListTest2 <$> getVSpace <*> pure []
    1 -> ListTest2 <$> getVSpace <*> (
      ((++) <$> (deref <$> get) <*> (unListTest2 <$> deref <$> get)))



main5 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc
  let vs = vcache_space vc
      root = loadRootPVar
        vc (BSC8.pack "root-lazier-chunked-list-of-ints") initial
      initial =
        ListTest2 vs [0..5000000] :: ListTest2
      disp = do
        r <- readPVar root
        traceNoop $ "root content " ++ (take 40 $ show $ unListTest2 r)
        traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest2 r)

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

    traceLn

    traceNoop "Write begin"

    r <- readPVar root
    writePVar root r
    -- writePVar root next
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"


  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main5
(  0.000273s) Init
(  0.000073s) Read begin
(  0.000144s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.332200s) root tip 0
(  0.000359s) Read end

(  0.001204s) Write begin
(  0.000304s) Write end
(  0.000047s) Read begin
(  0.000038s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000048s) root tip 0
(  0.000037s) Read end
(  0.000040s) vcacheSync begin
( 26.737134s) vcacheSync end

And indeed it does just that. Wonderful! But there seems to be a bit of a downside: writing the same data structure to disk again, even without any modification seems to take 27 seconds. It seems to be slower than without chunking, which took around 4 seconds. But even then 4 seconds for unmodified data seems superfluous. Can't we make this near instantaneous? To have best of all three worlds?

From what I know, VCache internally fully de-duplicates data, meaning that no change is actually taking place in the database when we write un-modified. And then I thought that it still needs to compute the bit-wise representation of the whole data-structure to decide if there are differences or not. It unfortunately doesn't have access to check if it points to the same memory region as it was read from.

Could we make use of Merkle trees to keep track of changes? That way we could know from the tip of the list that nothing has changed and nothing needs to be written.

Or could it be as simple as putting the VReferences to be a part of the data structure, since it already identifies an immutable chunk?

This is worth a try.

\begin{code}
data ListTest3 = ListTest3
  { lt3vs :: VSpace
  , lt3a1 :: [Int]
  , lt3a2 :: Maybe (VRef ListTest3)
  }

instance VCacheable ListTest3 where
  put (ListTest3 vs a b) = put a >> put b
  get = ListTest3 <$> getVSpace <*> get <*> get
\end{code}

Whoa, what a simple instance.

But of course it is, since we are pushing some computation into the toListTest3 and unListTest3 functions:

\begin{code}
toListTest3 :: VSpace -> [Int] -> ListTest3
toListTest3 vs (splitAt l3ChunkSize -> (xs, ys)) = ListTest3 vs xs $ case ys of
  [] -> Noth
  ys -> Just $ vref vs $ toListTest3 vs ys

unListTest3 :: ListTest3 -> [Int]
unListTest3 (ListTest3 vs a1 (Noth              )) = a1
unListTest3 (ListTest3 vs a1 (Just (deref -> a2))) = a1 ++ unListTest3 a2

l3ChunkSize = 500000
\end{code}

Let's give this a run:

\begin{code}
main6 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc

  let
    vs = vcache_space vc
    root = loadRootPVar vc (BSC8.pack "test-for-faster-writes") initial
    initial =
      toListTest3 vs [0..5000000] :: ListTest3
    disp = do
      r <- readPVar root
      traceNoop $ "root content " ++ (take 40 $ show $ unListTest3 r)
      traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest3 r)
    -- next = toListTest3 vs [0..5000000] :: ListTest3

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

    traceLn

    traceNoop "Write begin"
    r <- readPVar root

    -- writePVar root $ toListTest3 vs.unListTest3 $ r
    -- writePVar root $ toListTest3 vs.unListTest3 $ r
    writePVar root $ (sub 1 $ head $ unListTest3 r) `l3cons` r
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"


  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main6
(  0.000159s) Init
(  0.000046s) Read begin
(  0.000108s) root content [-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,1
(  0.214008s) root tip -3
(  0.000145s) Read end

(  0.000063s) Write begin
(  0.000049s) Write end
(  0.000040s) Read begin
(  0.000044s) root content [-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,1
(  0.000057s) root tip -4
(  0.000044s) Read end
(  0.000051s) vcacheSync begin
(  0.003887s) vcacheSync end

Magnificent! Suddenly everything is fast!

There are only two smaller(?) downsides I can think of:

  1. Initial insertion of the data will still incur the ~7x (or potentially non-linear) indirection cost of chunking with vrefs. This seems to be an acceptable trade-off for me. And maybe it can still be optimized further, or even optimized away?

  2. For writing, modifying the data-structure we'll have to use custom-made functions that are aware of the internal chunking. E.g. (:) needs to be re-written like:

\begin{code}
l3cons a as@(ListTest3 vs a1 a2) = if length a1 < l3ChunkSize
  then ListTest3 vs (a:a1)  a2
  else ListTest3 vs [a]     (Just $ vref vs as)
\end{code}

I do wonder if these chunking-aware modifying functions can be derived generically somehow. E.g. for ++. And for other chunked datastructures too. Any ideas? Or could this be perhaps tackled on the layer of VCache that is transparent for higher level users of the API?

Closure

That's it for today. If you have any answers to my questions above, I'd be glad to hear them. Other questions and constructive comments are also welcome.

Let's close down with some miscellaneous top level bindings and the main function (which when executed has created a 321MiB db file on my hard drive).

\begin{code}
sub = subtract

parenthasize a = "(" ++ a ++ ")"
a +.+ b = a ++ " " ++ b

getMapTip = \case
  (IMB.Nil)         -> Nothing
  (IMB.Tip k v)     -> Just (k, v)
  (IMB.Bin p m a b) -> getMapTip a <|> getMapTip b

ln = putStrLn ""

pattern Noth = Nothing

noop :: Monad m => m ()
noop = return ()

-- makeTimer :: IO (IO String)
makeTimer :: IO (IO (Either UTCTime NominalDiffTime))
makeTimer = do
  prevTime <- newIORef Nothing
  return $ do
    pTime <- readIORef prevTime
    time <- getCurrentTime
    writeIORef prevTime $ Just time
    case pTime of
      Nothing -> return $ Left time
      Just a  -> return $ (Right $ diffUTCTime time a)
      -- Just a  -> return $ show time ++ ", diff: " ++ (show $ diffUTCTime time a)

{- NOINLINE globalTimer -}
globalTimer = unsafePerformIO makeTimer


traceLn :: Monad m => m ()
traceLn = trace "" noop

traceNoop a = trace (timing () +.+ a) noop
  where
    {- NOINLINE timing -}
    timing () = either
      (parenthasize.show)
      (parenthasize.printf "%10.6fs".realToFrac)
      $ unsafePerformIO globalTimer

showDiffTimeMs = (* 10e6)



main = do
  jvc <- Just <$> openVCache 500 "db"

  forM
    [ ("main1", main1)
    , ("main2", main2)
    , ("main3", main3)
    , ("main4", main4)
    , ("main5", main5)
    , ("main6", main6)
    ] $ \(mainName, mainFn) -> do
      ln >> ln
      putStrLn $ "Running:" +.+ mainName
      mainFn jvc
\end{code}

And for reference, here is the full output of main when executed, all at once.

Running: main1
("root content",(7,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X",(8,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X1",(VRef#1000000004::[Int],VRef#1000000004::[Int]))
("X2",([1,2,3],[1,2,3]))
(8,Just (VRef#1000000004::[Int],VRef#1000000004::[Int]))
352
end


Running: main2
(2017-07-22 11:38:45.560164 UTC) Init
(  0.000286s) Read begin
( 19.364240s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
(  9.529168s) root tip Just (0,0)
(  0.000130s) root content size 5000001
(  0.259378s) Read End

(  0.000110s) Write begin
(  0.000754s) Write end
(  0.000046s) Read begin
(  0.000036s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
(  1.557133s) root tip Just (0,0)
(  0.000338s) root content size 5000001
(  0.151536s) Read end
(  0.000111s) vcacheSync begin
( 24.629844s) vcacheSync end


Running: main3
(  0.000191s) Init
(  0.000047s) Read begin
(  2.973794s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000180s) root tip 0
(  0.000052s) Read end

(  0.000049s) Write begin
(  0.000042s) Write end
(  0.000038s) Read begin
(  0.000039s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000425s) root tip 0
(  0.000061s) Read end
(  0.000071s) vcacheSync begin
(  4.216694s) vcacheSync end


Running: main4
(  0.002252s) Init
(  0.000080s) Read begin
(  0.000142s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.099453s) root tip 0
(  0.000103s) content length 200001
(  4.851402s) Read end

(  0.000099s) Write begin
(  0.000042s) Write end
(  0.000035s) Read begin
(  0.000036s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000044s) root tip 0
(  0.000034s) content length 200001
(  0.013227s) Read end
(  0.000123s) vcacheSync begin
( 15.903327s) vcacheSync end


Running: main5
(  0.000273s) Init
(  0.000073s) Read begin
(  0.000144s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.332200s) root tip 0
(  0.000359s) Read end

(  0.001204s) Write begin
(  0.000304s) Write end
(  0.000047s) Read begin
(  0.000038s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000048s) root tip 0
(  0.000037s) Read end
(  0.000040s) vcacheSync begin
( 26.737134s) vcacheSync end


Running: main6
(  0.000159s) Init
(  0.000046s) Read begin
(  0.000108s) root content [-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,1
(  0.214008s) root tip -3
(  0.000145s) Read end

(  0.000063s) Write begin
(  0.000049s) Write end
(  0.000040s) Read begin
(  0.000044s) root content [-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,1
(  0.000057s) root tip -4
(  0.000044s) Read end
(  0.000051s) vcacheSync begin
(  0.003887s) vcacheSync end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment