/vcache.lhs.md Secret

## vcache.lhs.md

      
    Raw
  

              vcache.lhs.md
            
          
I've briefly tried VCache, and here is how it was


Why

I really like the potential of VCache and acid-state: Having a database layer that
is capable of being rather transparent to the app-developer, all the
while providing ACIDic persistence to an application.
For a long time I wanted to give either of these a try.
I, however, really dislike the
limitation that acid-state needs to have all the application state in-memory
all the time to work. That's a no-go for web servers for example. And also
local desktop applications that may want to store a few hundred megabytes or
some gigabytes of state on disk. I wouldn't want to put such an unnecessary
burden on my users' RAM with that kind of wastefulness.
VCache seems to offer much greater promise on the RAM-requirement front,
however it seems to enjoy a much narrower user-base as of now.
I tried to look for applications that use it, and experience reports of its users.
Without much success. So I thought to give it a try myself, and share my
exploratory findings to alleviate the aforementioned lack.
Which I hope will be useful for others.
I'm furthermore also hoping for that sharing this will lead to

more people coming forward and sharing their positive
and negative experiences with
VCache (and similar systems) before I commit more of my time on using it.
or if it turns out that there isn't many people to report on negative
experiences, then I hope greater adoption will follow.

How

In this file I tried to get an intuitive and practical, operational
understanding of how VCache works. While also stress-testing it a bit to see
what are some of its limitations and strengths.
What

Since this is a compilable and runnable Literate Haskell file, let's get the
imports out of the way first:
\begin{code}
{-# language LambdaCase #-}
{-# language RankNTypes #-}
{-# language ViewPatterns #-}
{-# language DeriveAnyClass #-}
{-# language PatternSynonyms #-}
{-# language ExtendedDefaultRules #-}

import qualified  Data.IntSet                       as IS
import qualified  Data.IntMap                       as IM
import qualified  Data.IntMap.Base                  as IMB

import            Data.Time
import            Test.Hspec
import            Data.IORef
import            Text.Printf
import            Debug.Trace
import            Control.Lens                      hiding ((.>))
import            Control.Monad
import            Database.VCache
import            Data.ByteString                   (ByteString(..))
import qualified  Data.ByteString                   as BS
import            System.IO.Unsafe
import            Control.Concurrent
import            Control.Applicative
import qualified  Data.ByteString.Char8             as BSC8

default (Int, Double, String)
\end{code}
At first I was just familiarizing myself with the library, figuring out how to
do reads and writes. Feel free to skip further to more interesting bits.
\begin{code}
main1 :: Maybe VCache -> IO ()
main1 mayVc = do
  vc <- maybe (openVCache 500 "db") return mayVc

  let vs = vcache_space vc
  let root = loadRootPVar vc (BSC8.pack "root2") (1, Nothing)
  a <- runVTx vs $ do
    r <- readPVar root
    traceShow ("root content", r) noop


    modifyPVar root (& _1 %~ (+1))
    r <- readPVar root
    traceShow ("X", r) noop
    case r of
      (a, Nothing) -> writePVar root (a, Just (vref vs [1,2,3], vref vs [1,2,3]))
      (a, Just b) -> do
        traceShow ("X1", b) noop
        traceShow ("X2", over both deref b) noop

    return r
  print a


  let x = vref vs (352 :: Int)
  print $ deref x

  threadDelay $ 50000
  putStrLn "end"
\end{code}
Running: main1
("root content",(4,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X",(5,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X1",(VRef#1000000004::[Int],VRef#1000000004::[Int]))
("X2",([1,2,3],[1,2,3]))
(5,Just (VRef#1000000004::[Int],VRef#1000000004::[Int]))
352
end
Then, since one of the main datastructures that I am considering persisting in
VCache is a hashmap, I thought I would try to see how involved it is to write a
VCacheable instance for one of its underlying structures, a binary-tree
based IntMap.
And the surprising answer: not too involved.
After having written it, it seemed rather straightforward to me:
\begin{code}
instance VCacheable a => VCacheable (IMB.IntMap a) where
  put = \case
    (IMB.Nil)         -> putWord8 0
    (IMB.Tip k v)     -> putWord8 1 >> put k >> put v
    (IMB.Bin p m a b) -> putWord8 2 >> put p >> put m >> put a >> put b
  get = getWord8 >>= \case
    0 -> return IMB.Nil
    1 -> IMB.Tip <$> get <*> get
    2 -> IMB.Bin <$> get <*> get <*> get <*> get
\end{code}
Then I thought I would stress-test it a bit.
\begin{code}
main2 :: Maybe VCache -> IO ()
main2 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc
  let vs = vcache_space vc
      root = loadRootPVar vc (BSC8.pack "root3") initial
      initial =
        IMB.fromList [] :: IMB.IntMap Int
        -- IMB.fromList [(22, 33)] :: IMB.IntMap Int
      disp = do
        r <- readPVar root
        traceNoop ("root content " ++ (take 40 $ show r))
        traceNoop ("root tip " ++ (take 40 $ show $ getMapTip r))
        traceNoop ("root content size " ++ (show $ IMB.size r))
      next = IMB.fromList $ zip [0..5000000] [0..]

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read End"

    traceLn

    traceNoop "Write begin"
    writePVar root next
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main2
(2017-07-22 11:06:45.215611 UTC) Init
(  0.000165s) Read begin
( 21.816722s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
( 13.292584s) root tip Just (0,0)
(  0.007298s) root content size 5000001
(  0.279039s) Read End

(  0.000155s) Write begin
(  0.000756s) Write end
(  0.000063s) Read begin
(  0.000234s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
(  1.552749s) root tip Just (0,0)
(  0.000687s) root content size 5000001
(  0.150842s) Read end
(  0.000467s) vcacheSync begin
( 24.551119s) vcacheSync end
Here I noticed someting rather strange. Yes, yes, writing and reading 5 million
items to disk ought to take some time, but why does it take the same amount of
time to read the length of this structure from disk as reading just the tip of
the binary tree?
I thought I would simplify this question, and went on to try it with a linked list.
The result was quite the same.
Note: the actual writing doesn't take place between "Write begin" and
"Write end", but rather between "vcacheSync begin" and "vcacheSync end".
AFAIK this is because VCache writes asynchronously on a separate thread
by default.
\begin{code}
main3 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc
  let vs = vcache_space vc
      root = loadRootPVar vc (BSC8.pack "root-list-of-ints") initial
      initial =
        [1] :: [Int]
      disp = do
        r <- readPVar root
        traceNoop $ "root content " ++ (take 40 $ show r)
        traceNoop $ "root tip " ++ (take 40 $ show $ head r)
      next = [0..5000000]

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

    traceLn

    traceNoop "Write begin"
    writePVar root next
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main3
(  0.000168s) Init
(  0.000083s) Read begin
(  4.804215s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000187s) root tip 0
(  0.000046s) Read end

(  0.000043s) Write begin
(  0.000058s) Write end
(  0.000040s) Read begin
(  0.000034s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000039s) root tip 0
(  0.000032s) Read end
(  0.000037s) vcacheSync begin
(  4.121671s) vcacheSync end
How come?
I wasn't quite sure, but had a few ideas: Maybe we can't take advantage of
on-demand reads because internally VCache serializes linked lists into something
more compact, e.g. an array which it needs to read in all the way? Or was it that
it reverses the list for some reason? Or is it simply the fact that VCacheable's
get function is monadic, and can't return a value until everything is read in?
Nevertheless since VRefs are alleged to be a rather transparent storage
abstraction with a pure and lazy interface, I thought I would try a linked list
that has it's next element wrapped in a vref.
\begin{code}
data ListTest = ListTest {ltVs :: VSpace, unListTest :: [Int]}

instance VCacheable ListTest where
  put = \case
    (ListTest vs [])     -> putWord8 0
    (ListTest vs (x:xs)) -> putWord8 1
      >> put (vref vs x) >> put (vref vs (ListTest vs xs))
  get = getWord8 >>= \case
    0 -> ListTest <$> getVSpace <*> pure []
    1 -> ListTest <$> getVSpace <*> (
      ((:) <$> (deref <$> get) <*> (unListTest <$> deref <$> get)))


main4 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc
  let vs = vcache_space vc
      root = loadRootPVar vc (BSC8.pack "root-lazier-list-of-ints") initial
      initial =
        ListTest vs [1] :: ListTest
      disp = do
        r <- readPVar root
        traceNoop $ "root content " ++ (take 40 $ show $ unListTest r)
        traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest r)
        traceNoop $ "content length " ++ (take 40 $ show $ length $ unListTest r)
      next = ListTest vs [0..200000]

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

    traceLn

    traceNoop "Write begin"
    writePVar root next
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"


  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main4
(  0.000157s) Init
(  0.000096s) Read begin
(  0.000150s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.152087s) root tip 0
(  0.000254s) content length 200001
(  4.715509s) Read end

(  0.000104s) Write begin
(  0.000044s) Write end
(  0.000034s) Read begin
(  0.000037s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000046s) root tip 0
(  0.000037s) content length 200001
(  0.012061s) Read end
(  0.000132s) vcacheSync begin
( 14.975086s) vcacheSync end
And indeed! Reading just the tip became much faster. But what is this?
Suddenly reading the whole length or writing the whole thing back in is
significantly slower. So much so that I had to lower the amount to a mere
200k to get comparable running times. ~25x slowdown reading, and
150x slowdown writing (in case the time-complexity relationship between them is
linear, which I have not investigated by the time of writing this).
Could vref indirection be this costly?
What if I could chunk the data so it can read those chunks internal content
eagerly quick, and read the chunks on-demand, lazily? Could that give us best
of both worlds, as in: Fast reading of the whole structure, and even faster
reading of just the tip of the structure?
\begin{code}
data ListTest2 = ListTest2 {lt2vs :: VSpace, unListTest2 :: [Int]}

instance VCacheable ListTest2 where
  put = \case
    (ListTest2 vs [])                          -> putWord8 0
    (ListTest2 vs (splitAt 500000 -> (xs,ys))) -> putWord8 1
      >> put (vref vs xs) >> put (vref vs (ListTest2 vs ys))
  get = getWord8 >>= \case
    0 -> ListTest2 <$> getVSpace <*> pure []
    1 -> ListTest2 <$> getVSpace <*> (
      ((++) <$> (deref <$> get) <*> (unListTest2 <$> deref <$> get)))


main5 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc
  let vs = vcache_space vc
      root = loadRootPVar
        vc (BSC8.pack "root-lazier-chunked-list-of-ints") initial
      initial =
        ListTest2 vs [0..5000000] :: ListTest2
      disp = do
        r <- readPVar root
        traceNoop $ "root content " ++ (take 40 $ show $ unListTest2 r)
        traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest2 r)

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

    traceLn

    traceNoop "Write begin"

    r <- readPVar root
    writePVar root r
    -- writePVar root next
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"


  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main5
(  0.000273s) Init
(  0.000073s) Read begin
(  0.000144s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.332200s) root tip 0
(  0.000359s) Read end

(  0.001204s) Write begin
(  0.000304s) Write end
(  0.000047s) Read begin
(  0.000038s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000048s) root tip 0
(  0.000037s) Read end
(  0.000040s) vcacheSync begin
( 26.737134s) vcacheSync end
And indeed it does just that. Wonderful! But there seems to be a bit of a
downside: writing the same data structure to disk again, even without any
modification seems to take 27 seconds. It seems to be slower than without
chunking, which took around 4 seconds. But even then 4 seconds for unmodified
data seems superfluous. Can't we make this near instantaneous?
To have best of all three worlds?
From what I know, VCache internally fully de-duplicates data, meaning that no
change is actually taking place in the database when we write un-modified.
And then I thought that it
still needs to compute the bit-wise representation of the whole data-structure
to decide if there are differences or not. It unfortunately doesn't have access
to check if it points to the same memory region as it was read from.
Could we make use of Merkle trees to keep track of changes? That way
we could
know from the tip of the list that nothing has changed and nothing needs to be
written.
Or could it be as simple as putting the VReferences to be a part of the data
structure, since it already identifies an immutable chunk?
This is worth a try.
\begin{code}
data ListTest3 = ListTest3
  { lt3vs :: VSpace
  , lt3a1 :: [Int]
  , lt3a2 :: Maybe (VRef ListTest3)
  }

instance VCacheable ListTest3 where
  put (ListTest3 vs a b) = put a >> put b
  get = ListTest3 <$> getVSpace <*> get <*> get
\end{code}
Whoa, what a simple instance.
But of course it is, since we are pushing some computation into the
toListTest3 and unListTest3 functions:
\begin{code}
toListTest3 :: VSpace -> [Int] -> ListTest3
toListTest3 vs (splitAt l3ChunkSize -> (xs, ys)) = ListTest3 vs xs $ case ys of
  [] -> Noth
  ys -> Just $ vref vs $ toListTest3 vs ys

unListTest3 :: ListTest3 -> [Int]
unListTest3 (ListTest3 vs a1 (Noth              )) = a1
unListTest3 (ListTest3 vs a1 (Just (deref -> a2))) = a1 ++ unListTest3 a2

l3ChunkSize = 500000
\end{code}
Let's give this a run:
\begin{code}
main6 mayVc = do
  traceNoop "Init"

  vc <- maybe (openVCache 500 "db") return mayVc

  let
    vs = vcache_space vc
    root = loadRootPVar vc (BSC8.pack "test-for-faster-writes") initial
    initial =
      toListTest3 vs [0..5000000] :: ListTest3
    disp = do
      r <- readPVar root
      traceNoop $ "root content " ++ (take 40 $ show $ unListTest3 r)
      traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest3 r)
    -- next = toListTest3 vs [0..5000000] :: ListTest3

  runVTx vs $ do
    traceNoop "Read begin"
    disp
    traceNoop "Read end"

    traceLn

    traceNoop "Write begin"
    r <- readPVar root

    -- writePVar root $ toListTest3 vs.unListTest3 $ r
    -- writePVar root $ toListTest3 vs.unListTest3 $ r
    writePVar root $ (sub 1 $ head $ unListTest3 r) `l3cons` r
    traceNoop "Write end"
    traceNoop "Read begin"
    disp
    traceNoop "Read end"


  traceNoop "vcacheSync begin"
  vcacheSync vs
  traceNoop "vcacheSync end"
\end{code}
Running: main6
(  0.000159s) Init
(  0.000046s) Read begin
(  0.000108s) root content [-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,1
(  0.214008s) root tip -3
(  0.000145s) Read end

(  0.000063s) Write begin
(  0.000049s) Write end
(  0.000040s) Read begin
(  0.000044s) root content [-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,1
(  0.000057s) root tip -4
(  0.000044s) Read end
(  0.000051s) vcacheSync begin
(  0.003887s) vcacheSync end
Magnificent! Suddenly everything is fast!
There are only two smaller(?) downsides I can think of:


Initial insertion of the data will still incur the ~7x (or potentially
non-linear) indirection cost of
chunking with vrefs. This seems to be an acceptable trade-off for me.
And maybe it can still be optimized further, or even optimized away?


For writing, modifying the data-structure we'll have to use custom-made
functions that are aware of the internal chunking. E.g. (:) needs to be
re-written like:


\begin{code}
l3cons a as@(ListTest3 vs a1 a2) = if length a1 < l3ChunkSize
  then ListTest3 vs (a:a1)  a2
  else ListTest3 vs [a]     (Just $ vref vs as)
\end{code}
I do wonder if these chunking-aware modifying functions can be derived generically
somehow. E.g. for ++. And for other chunked datastructures too. Any ideas?
Or could this be perhaps tackled on the layer of VCache that is transparent for
higher level users of the API?
Closure

That's it for today. If you have any answers to my questions above,
I'd be glad to hear them. Other questions and constructive comments are also welcome.
Let's close down with some miscellaneous top level bindings and the main
function (which when executed has created a 321MiB db file on my
hard drive).
\begin{code}
sub = subtract

parenthasize a = "(" ++ a ++ ")"
a +.+ b = a ++ " " ++ b

getMapTip = \case
  (IMB.Nil)         -> Nothing
  (IMB.Tip k v)     -> Just (k, v)
  (IMB.Bin p m a b) -> getMapTip a <|> getMapTip b

ln = putStrLn ""

pattern Noth = Nothing

noop :: Monad m => m ()
noop = return ()

-- makeTimer :: IO (IO String)
makeTimer :: IO (IO (Either UTCTime NominalDiffTime))
makeTimer = do
  prevTime <- newIORef Nothing
  return $ do
    pTime <- readIORef prevTime
    time <- getCurrentTime
    writeIORef prevTime $ Just time
    case pTime of
      Nothing -> return $ Left time
      Just a  -> return $ (Right $ diffUTCTime time a)
      -- Just a  -> return $ show time ++ ", diff: " ++ (show $ diffUTCTime time a)

{- NOINLINE globalTimer -}
globalTimer = unsafePerformIO makeTimer


traceLn :: Monad m => m ()
traceLn = trace "" noop

traceNoop a = trace (timing () +.+ a) noop
  where
    {- NOINLINE timing -}
    timing () = either
      (parenthasize.show)
      (parenthasize.printf "%10.6fs".realToFrac)
      $ unsafePerformIO globalTimer

showDiffTimeMs = (* 10e6)


main = do
  jvc <- Just <$> openVCache 500 "db"

  forM
    [ ("main1", main1)
    , ("main2", main2)
    , ("main3", main3)
    , ("main4", main4)
    , ("main5", main5)
    , ("main6", main6)
    ] $ \(mainName, mainFn) -> do
      ln >> ln
      putStrLn $ "Running:" +.+ mainName
      mainFn jvc
\end{code}
And for reference, here is the full output of main when executed, all at once.
Running: main1
("root content",(7,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X",(8,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X1",(VRef#1000000004::[Int],VRef#1000000004::[Int]))
("X2",([1,2,3],[1,2,3]))
(8,Just (VRef#1000000004::[Int],VRef#1000000004::[Int]))
352
end


Running: main2
(2017-07-22 11:38:45.560164 UTC) Init
(  0.000286s) Read begin
( 19.364240s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
(  9.529168s) root tip Just (0,0)
(  0.000130s) root content size 5000001
(  0.259378s) Read End

(  0.000110s) Write begin
(  0.000754s) Write end
(  0.000046s) Read begin
(  0.000036s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
(  1.557133s) root tip Just (0,0)
(  0.000338s) root content size 5000001
(  0.151536s) Read end
(  0.000111s) vcacheSync begin
( 24.629844s) vcacheSync end


Running: main3
(  0.000191s) Init
(  0.000047s) Read begin
(  2.973794s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000180s) root tip 0
(  0.000052s) Read end

(  0.000049s) Write begin
(  0.000042s) Write end
(  0.000038s) Read begin
(  0.000039s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000425s) root tip 0
(  0.000061s) Read end
(  0.000071s) vcacheSync begin
(  4.216694s) vcacheSync end


Running: main4
(  0.002252s) Init
(  0.000080s) Read begin
(  0.000142s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.099453s) root tip 0
(  0.000103s) content length 200001
(  4.851402s) Read end

(  0.000099s) Write begin
(  0.000042s) Write end
(  0.000035s) Read begin
(  0.000036s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000044s) root tip 0
(  0.000034s) content length 200001
(  0.013227s) Read end
(  0.000123s) vcacheSync begin
( 15.903327s) vcacheSync end


Running: main5
(  0.000273s) Init
(  0.000073s) Read begin
(  0.000144s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.332200s) root tip 0
(  0.000359s) Read end

(  0.001204s) Write begin
(  0.000304s) Write end
(  0.000047s) Read begin
(  0.000038s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
(  0.000048s) root tip 0
(  0.000037s) Read end
(  0.000040s) vcacheSync begin
( 26.737134s) vcacheSync end


Running: main6
(  0.000159s) Init
(  0.000046s) Read begin
(  0.000108s) root content [-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,1
(  0.214008s) root tip -3
(  0.000145s) Read end

(  0.000063s) Write begin
(  0.000049s) Write end
(  0.000040s) Read begin
(  0.000044s) root content [-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,1
(  0.000057s) root tip -4
(  0.000044s) Read end
(  0.000051s) vcacheSync begin
(  0.003887s) vcacheSync end