I really like the potential of VCache and acid-state: Having a database layer that is capable of being rather transparent to the app-developer, all the while providing ACIDic persistence to an application. For a long time I wanted to give either of these a try.
I, however, really dislike the limitation that acid-state needs to have all the application state in-memory all the time to work. That's a no-go for web servers for example. And also local desktop applications that may want to store a few hundred megabytes or some gigabytes of state on disk. I wouldn't want to put such an unnecessary burden on my users' RAM with that kind of wastefulness.
VCache seems to offer much greater promise on the RAM-requirement front, however it seems to enjoy a much narrower user-base as of now.
I tried to look for applications that use it, and experience reports of its users. Without much success. So I thought to give it a try myself, and share my exploratory findings to alleviate the aforementioned lack. Which I hope will be useful for others.
I'm furthermore also hoping for that sharing this will lead to
- more people coming forward and sharing their positive and negative experiences with VCache (and similar systems) before I commit more of my time on using it.
- or if it turns out that there isn't many people to report on negative experiences, then I hope greater adoption will follow.
In this file I tried to get an intuitive and practical, operational understanding of how VCache works. While also stress-testing it a bit to see what are some of its limitations and strengths.
Since this is a compilable and runnable Literate Haskell file, let's get the imports out of the way first:
\begin{code}
{-# language LambdaCase #-}
{-# language RankNTypes #-}
{-# language ViewPatterns #-}
{-# language DeriveAnyClass #-}
{-# language PatternSynonyms #-}
{-# language ExtendedDefaultRules #-}
import qualified Data.IntSet as IS
import qualified Data.IntMap as IM
import qualified Data.IntMap.Base as IMB
import Data.Time
import Test.Hspec
import Data.IORef
import Text.Printf
import Debug.Trace
import Control.Lens hiding ((.>))
import Control.Monad
import Database.VCache
import Data.ByteString (ByteString(..))
import qualified Data.ByteString as BS
import System.IO.Unsafe
import Control.Concurrent
import Control.Applicative
import qualified Data.ByteString.Char8 as BSC8
default (Int, Double, String)
\end{code}
At first I was just familiarizing myself with the library, figuring out how to do reads and writes. Feel free to skip further to more interesting bits.
\begin{code}
main1 :: Maybe VCache -> IO ()
main1 mayVc = do
vc <- maybe (openVCache 500 "db") return mayVc
let vs = vcache_space vc
let root = loadRootPVar vc (BSC8.pack "root2") (1, Nothing)
a <- runVTx vs $ do
r <- readPVar root
traceShow ("root content", r) noop
modifyPVar root (& _1 %~ (+1))
r <- readPVar root
traceShow ("X", r) noop
case r of
(a, Nothing) -> writePVar root (a, Just (vref vs [1,2,3], vref vs [1,2,3]))
(a, Just b) -> do
traceShow ("X1", b) noop
traceShow ("X2", over both deref b) noop
return r
print a
let x = vref vs (352 :: Int)
print $ deref x
threadDelay $ 50000
putStrLn "end"
\end{code}
Running: main1
("root content",(4,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X",(5,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X1",(VRef#1000000004::[Int],VRef#1000000004::[Int]))
("X2",([1,2,3],[1,2,3]))
(5,Just (VRef#1000000004::[Int],VRef#1000000004::[Int]))
352
end
Then, since one of the main datastructures that I am considering persisting in
VCache is a hashmap, I thought I would try to see how involved it is to write a
VCacheable
instance for one of its underlying structures, a binary-tree
based IntMap
.
And the surprising answer: not too involved. After having written it, it seemed rather straightforward to me:
\begin{code}
instance VCacheable a => VCacheable (IMB.IntMap a) where
put = \case
(IMB.Nil) -> putWord8 0
(IMB.Tip k v) -> putWord8 1 >> put k >> put v
(IMB.Bin p m a b) -> putWord8 2 >> put p >> put m >> put a >> put b
get = getWord8 >>= \case
0 -> return IMB.Nil
1 -> IMB.Tip <$> get <*> get
2 -> IMB.Bin <$> get <*> get <*> get <*> get
\end{code}
Then I thought I would stress-test it a bit.
\begin{code}
main2 :: Maybe VCache -> IO ()
main2 mayVc = do
traceNoop "Init"
vc <- maybe (openVCache 500 "db") return mayVc
let vs = vcache_space vc
root = loadRootPVar vc (BSC8.pack "root3") initial
initial =
IMB.fromList [] :: IMB.IntMap Int
-- IMB.fromList [(22, 33)] :: IMB.IntMap Int
disp = do
r <- readPVar root
traceNoop ("root content " ++ (take 40 $ show r))
traceNoop ("root tip " ++ (take 40 $ show $ getMapTip r))
traceNoop ("root content size " ++ (show $ IMB.size r))
next = IMB.fromList $ zip [0..5000000] [0..]
runVTx vs $ do
traceNoop "Read begin"
disp
traceNoop "Read End"
traceLn
traceNoop "Write begin"
writePVar root next
traceNoop "Write end"
traceNoop "Read begin"
disp
traceNoop "Read end"
traceNoop "vcacheSync begin"
vcacheSync vs
traceNoop "vcacheSync end"
\end{code}
Running: main2
(2017-07-22 11:06:45.215611 UTC) Init
( 0.000165s) Read begin
( 21.816722s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
( 13.292584s) root tip Just (0,0)
( 0.007298s) root content size 5000001
( 0.279039s) Read End
( 0.000155s) Write begin
( 0.000756s) Write end
( 0.000063s) Read begin
( 0.000234s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
( 1.552749s) root tip Just (0,0)
( 0.000687s) root content size 5000001
( 0.150842s) Read end
( 0.000467s) vcacheSync begin
( 24.551119s) vcacheSync end
Here I noticed someting rather strange. Yes, yes, writing and reading 5 million items to disk ought to take some time, but why does it take the same amount of time to read the length of this structure from disk as reading just the tip of the binary tree?
I thought I would simplify this question, and went on to try it with a linked list. The result was quite the same.
Note: the actual writing doesn't take place between "Write begin" and "Write end", but rather between "vcacheSync begin" and "vcacheSync end". AFAIK this is because VCache writes asynchronously on a separate thread by default.
\begin{code}
main3 mayVc = do
traceNoop "Init"
vc <- maybe (openVCache 500 "db") return mayVc
let vs = vcache_space vc
root = loadRootPVar vc (BSC8.pack "root-list-of-ints") initial
initial =
[1] :: [Int]
disp = do
r <- readPVar root
traceNoop $ "root content " ++ (take 40 $ show r)
traceNoop $ "root tip " ++ (take 40 $ show $ head r)
next = [0..5000000]
runVTx vs $ do
traceNoop "Read begin"
disp
traceNoop "Read end"
traceLn
traceNoop "Write begin"
writePVar root next
traceNoop "Write end"
traceNoop "Read begin"
disp
traceNoop "Read end"
traceNoop "vcacheSync begin"
vcacheSync vs
traceNoop "vcacheSync end"
\end{code}
Running: main3
( 0.000168s) Init
( 0.000083s) Read begin
( 4.804215s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.000187s) root tip 0
( 0.000046s) Read end
( 0.000043s) Write begin
( 0.000058s) Write end
( 0.000040s) Read begin
( 0.000034s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.000039s) root tip 0
( 0.000032s) Read end
( 0.000037s) vcacheSync begin
( 4.121671s) vcacheSync end
How come?
I wasn't quite sure, but had a few ideas: Maybe we can't take advantage of
on-demand reads because internally VCache serializes linked lists into something
more compact, e.g. an array which it needs to read in all the way? Or was it that
it reverses the list for some reason? Or is it simply the fact that VCacheable
's
get
function is monadic, and can't return a value until everything is read in?
Nevertheless since VRef
s are alleged to be a rather transparent storage
abstraction with a pure and lazy interface, I thought I would try a linked list
that has it's next element wrapped in a vref
.
\begin{code}
data ListTest = ListTest {ltVs :: VSpace, unListTest :: [Int]}
instance VCacheable ListTest where
put = \case
(ListTest vs []) -> putWord8 0
(ListTest vs (x:xs)) -> putWord8 1
>> put (vref vs x) >> put (vref vs (ListTest vs xs))
get = getWord8 >>= \case
0 -> ListTest <$> getVSpace <*> pure []
1 -> ListTest <$> getVSpace <*> (
((:) <$> (deref <$> get) <*> (unListTest <$> deref <$> get)))
main4 mayVc = do
traceNoop "Init"
vc <- maybe (openVCache 500 "db") return mayVc
let vs = vcache_space vc
root = loadRootPVar vc (BSC8.pack "root-lazier-list-of-ints") initial
initial =
ListTest vs [1] :: ListTest
disp = do
r <- readPVar root
traceNoop $ "root content " ++ (take 40 $ show $ unListTest r)
traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest r)
traceNoop $ "content length " ++ (take 40 $ show $ length $ unListTest r)
next = ListTest vs [0..200000]
runVTx vs $ do
traceNoop "Read begin"
disp
traceNoop "Read end"
traceLn
traceNoop "Write begin"
writePVar root next
traceNoop "Write end"
traceNoop "Read begin"
disp
traceNoop "Read end"
traceNoop "vcacheSync begin"
vcacheSync vs
traceNoop "vcacheSync end"
\end{code}
Running: main4
( 0.000157s) Init
( 0.000096s) Read begin
( 0.000150s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.152087s) root tip 0
( 0.000254s) content length 200001
( 4.715509s) Read end
( 0.000104s) Write begin
( 0.000044s) Write end
( 0.000034s) Read begin
( 0.000037s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.000046s) root tip 0
( 0.000037s) content length 200001
( 0.012061s) Read end
( 0.000132s) vcacheSync begin
( 14.975086s) vcacheSync end
And indeed! Reading just the tip became much faster. But what is this? Suddenly reading the whole length or writing the whole thing back in is significantly slower. So much so that I had to lower the amount to a mere 200k to get comparable running times. ~25x slowdown reading, and 150x slowdown writing (in case the time-complexity relationship between them is linear, which I have not investigated by the time of writing this).
Could vref
indirection be this costly?
What if I could chunk the data so it can read those chunks internal content eagerly quick, and read the chunks on-demand, lazily? Could that give us best of both worlds, as in: Fast reading of the whole structure, and even faster reading of just the tip of the structure?
\begin{code}
data ListTest2 = ListTest2 {lt2vs :: VSpace, unListTest2 :: [Int]}
instance VCacheable ListTest2 where
put = \case
(ListTest2 vs []) -> putWord8 0
(ListTest2 vs (splitAt 500000 -> (xs,ys))) -> putWord8 1
>> put (vref vs xs) >> put (vref vs (ListTest2 vs ys))
get = getWord8 >>= \case
0 -> ListTest2 <$> getVSpace <*> pure []
1 -> ListTest2 <$> getVSpace <*> (
((++) <$> (deref <$> get) <*> (unListTest2 <$> deref <$> get)))
main5 mayVc = do
traceNoop "Init"
vc <- maybe (openVCache 500 "db") return mayVc
let vs = vcache_space vc
root = loadRootPVar
vc (BSC8.pack "root-lazier-chunked-list-of-ints") initial
initial =
ListTest2 vs [0..5000000] :: ListTest2
disp = do
r <- readPVar root
traceNoop $ "root content " ++ (take 40 $ show $ unListTest2 r)
traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest2 r)
runVTx vs $ do
traceNoop "Read begin"
disp
traceNoop "Read end"
traceLn
traceNoop "Write begin"
r <- readPVar root
writePVar root r
-- writePVar root next
traceNoop "Write end"
traceNoop "Read begin"
disp
traceNoop "Read end"
traceNoop "vcacheSync begin"
vcacheSync vs
traceNoop "vcacheSync end"
\end{code}
Running: main5
( 0.000273s) Init
( 0.000073s) Read begin
( 0.000144s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.332200s) root tip 0
( 0.000359s) Read end
( 0.001204s) Write begin
( 0.000304s) Write end
( 0.000047s) Read begin
( 0.000038s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.000048s) root tip 0
( 0.000037s) Read end
( 0.000040s) vcacheSync begin
( 26.737134s) vcacheSync end
And indeed it does just that. Wonderful! But there seems to be a bit of a downside: writing the same data structure to disk again, even without any modification seems to take 27 seconds. It seems to be slower than without chunking, which took around 4 seconds. But even then 4 seconds for unmodified data seems superfluous. Can't we make this near instantaneous? To have best of all three worlds?
From what I know, VCache internally fully de-duplicates data, meaning that no change is actually taking place in the database when we write un-modified. And then I thought that it still needs to compute the bit-wise representation of the whole data-structure to decide if there are differences or not. It unfortunately doesn't have access to check if it points to the same memory region as it was read from.
Could we make use of Merkle trees to keep track of changes? That way we could know from the tip of the list that nothing has changed and nothing needs to be written.
Or could it be as simple as putting the VRef
erences to be a part of the data
structure, since it already identifies an immutable chunk?
This is worth a try.
\begin{code}
data ListTest3 = ListTest3
{ lt3vs :: VSpace
, lt3a1 :: [Int]
, lt3a2 :: Maybe (VRef ListTest3)
}
instance VCacheable ListTest3 where
put (ListTest3 vs a b) = put a >> put b
get = ListTest3 <$> getVSpace <*> get <*> get
\end{code}
Whoa, what a simple instance.
But of course it is, since we are pushing some computation into the
toListTest3
and unListTest3
functions:
\begin{code}
toListTest3 :: VSpace -> [Int] -> ListTest3
toListTest3 vs (splitAt l3ChunkSize -> (xs, ys)) = ListTest3 vs xs $ case ys of
[] -> Noth
ys -> Just $ vref vs $ toListTest3 vs ys
unListTest3 :: ListTest3 -> [Int]
unListTest3 (ListTest3 vs a1 (Noth )) = a1
unListTest3 (ListTest3 vs a1 (Just (deref -> a2))) = a1 ++ unListTest3 a2
l3ChunkSize = 500000
\end{code}
Let's give this a run:
\begin{code}
main6 mayVc = do
traceNoop "Init"
vc <- maybe (openVCache 500 "db") return mayVc
let
vs = vcache_space vc
root = loadRootPVar vc (BSC8.pack "test-for-faster-writes") initial
initial =
toListTest3 vs [0..5000000] :: ListTest3
disp = do
r <- readPVar root
traceNoop $ "root content " ++ (take 40 $ show $ unListTest3 r)
traceNoop $ "root tip " ++ (take 40 $ show $ head $ unListTest3 r)
-- next = toListTest3 vs [0..5000000] :: ListTest3
runVTx vs $ do
traceNoop "Read begin"
disp
traceNoop "Read end"
traceLn
traceNoop "Write begin"
r <- readPVar root
-- writePVar root $ toListTest3 vs.unListTest3 $ r
-- writePVar root $ toListTest3 vs.unListTest3 $ r
writePVar root $ (sub 1 $ head $ unListTest3 r) `l3cons` r
traceNoop "Write end"
traceNoop "Read begin"
disp
traceNoop "Read end"
traceNoop "vcacheSync begin"
vcacheSync vs
traceNoop "vcacheSync end"
\end{code}
Running: main6
( 0.000159s) Init
( 0.000046s) Read begin
( 0.000108s) root content [-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,1
( 0.214008s) root tip -3
( 0.000145s) Read end
( 0.000063s) Write begin
( 0.000049s) Write end
( 0.000040s) Read begin
( 0.000044s) root content [-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,1
( 0.000057s) root tip -4
( 0.000044s) Read end
( 0.000051s) vcacheSync begin
( 0.003887s) vcacheSync end
Magnificent! Suddenly everything is fast!
There are only two smaller(?) downsides I can think of:
-
Initial insertion of the data will still incur the ~7x (or potentially non-linear) indirection cost of chunking with
vref
s. This seems to be an acceptable trade-off for me. And maybe it can still be optimized further, or even optimized away? -
For writing, modifying the data-structure we'll have to use custom-made functions that are aware of the internal chunking. E.g.
(:)
needs to be re-written like:
\begin{code}
l3cons a as@(ListTest3 vs a1 a2) = if length a1 < l3ChunkSize
then ListTest3 vs (a:a1) a2
else ListTest3 vs [a] (Just $ vref vs as)
\end{code}
I do wonder if these chunking-aware modifying functions can be derived generically
somehow. E.g. for ++
. And for other chunked datastructures too. Any ideas?
Or could this be perhaps tackled on the layer of VCache that is transparent for
higher level users of the API?
That's it for today. If you have any answers to my questions above, I'd be glad to hear them. Other questions and constructive comments are also welcome.
Let's close down with some miscellaneous top level bindings and the main function (which when executed has created a 321MiB db file on my hard drive).
\begin{code}
sub = subtract
parenthasize a = "(" ++ a ++ ")"
a +.+ b = a ++ " " ++ b
getMapTip = \case
(IMB.Nil) -> Nothing
(IMB.Tip k v) -> Just (k, v)
(IMB.Bin p m a b) -> getMapTip a <|> getMapTip b
ln = putStrLn ""
pattern Noth = Nothing
noop :: Monad m => m ()
noop = return ()
-- makeTimer :: IO (IO String)
makeTimer :: IO (IO (Either UTCTime NominalDiffTime))
makeTimer = do
prevTime <- newIORef Nothing
return $ do
pTime <- readIORef prevTime
time <- getCurrentTime
writeIORef prevTime $ Just time
case pTime of
Nothing -> return $ Left time
Just a -> return $ (Right $ diffUTCTime time a)
-- Just a -> return $ show time ++ ", diff: " ++ (show $ diffUTCTime time a)
{- NOINLINE globalTimer -}
globalTimer = unsafePerformIO makeTimer
traceLn :: Monad m => m ()
traceLn = trace "" noop
traceNoop a = trace (timing () +.+ a) noop
where
{- NOINLINE timing -}
timing () = either
(parenthasize.show)
(parenthasize.printf "%10.6fs".realToFrac)
$ unsafePerformIO globalTimer
showDiffTimeMs = (* 10e6)
main = do
jvc <- Just <$> openVCache 500 "db"
forM
[ ("main1", main1)
, ("main2", main2)
, ("main3", main3)
, ("main4", main4)
, ("main5", main5)
, ("main6", main6)
] $ \(mainName, mainFn) -> do
ln >> ln
putStrLn $ "Running:" +.+ mainName
mainFn jvc
\end{code}
And for reference, here is the full output of main when executed, all at once.
Running: main1
("root content",(7,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X",(8,Just (VRef#1000000004::[Int],VRef#1000000004::[Int])))
("X1",(VRef#1000000004::[Int],VRef#1000000004::[Int]))
("X2",([1,2,3],[1,2,3]))
(8,Just (VRef#1000000004::[Int],VRef#1000000004::[Int]))
352
end
Running: main2
(2017-07-22 11:38:45.560164 UTC) Init
( 0.000286s) Read begin
( 19.364240s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
( 9.529168s) root tip Just (0,0)
( 0.000130s) root content size 5000001
( 0.259378s) Read End
( 0.000110s) Write begin
( 0.000754s) Write end
( 0.000046s) Read begin
( 0.000036s) root content fromList [(0,0),(1,1),(2,2),(3,3),(4,4),
( 1.557133s) root tip Just (0,0)
( 0.000338s) root content size 5000001
( 0.151536s) Read end
( 0.000111s) vcacheSync begin
( 24.629844s) vcacheSync end
Running: main3
( 0.000191s) Init
( 0.000047s) Read begin
( 2.973794s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.000180s) root tip 0
( 0.000052s) Read end
( 0.000049s) Write begin
( 0.000042s) Write end
( 0.000038s) Read begin
( 0.000039s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.000425s) root tip 0
( 0.000061s) Read end
( 0.000071s) vcacheSync begin
( 4.216694s) vcacheSync end
Running: main4
( 0.002252s) Init
( 0.000080s) Read begin
( 0.000142s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.099453s) root tip 0
( 0.000103s) content length 200001
( 4.851402s) Read end
( 0.000099s) Write begin
( 0.000042s) Write end
( 0.000035s) Read begin
( 0.000036s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.000044s) root tip 0
( 0.000034s) content length 200001
( 0.013227s) Read end
( 0.000123s) vcacheSync begin
( 15.903327s) vcacheSync end
Running: main5
( 0.000273s) Init
( 0.000073s) Read begin
( 0.000144s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.332200s) root tip 0
( 0.000359s) Read end
( 0.001204s) Write begin
( 0.000304s) Write end
( 0.000047s) Read begin
( 0.000038s) root content [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
( 0.000048s) root tip 0
( 0.000037s) Read end
( 0.000040s) vcacheSync begin
( 26.737134s) vcacheSync end
Running: main6
( 0.000159s) Init
( 0.000046s) Read begin
( 0.000108s) root content [-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,1
( 0.214008s) root tip -3
( 0.000145s) Read end
( 0.000063s) Write begin
( 0.000049s) Write end
( 0.000040s) Read begin
( 0.000044s) root content [-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,1
( 0.000057s) root tip -4
( 0.000044s) Read end
( 0.000051s) vcacheSync begin
( 0.003887s) vcacheSync end