Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created July 15, 2010 16:37
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save PharkMillups/477185 to your computer and use it in GitHub Desktop.
ronr_ # Basically, we're looking into NoSQL database as a datastore solution for our
application. So far, we concentrated on Cassandra, and while things went great, there
were still a few issues we try to overcome.
ronr_ # justinsheehy: first thing, we were wondering if riak has any caching system of
its data. If so, when is it filled: on writes, reads or both?
justinsheehy # ronr_: two answers to start with: first, internal caching depends on your
choice of backend. second, the http interface allows you to easily put a proxy cache
(such as varnish, squid, etc) in front of Riak and have it Just Work.
ronr_ # justinsheehy: can you elaborate a bit more on the backend?
justinsheehy # ronr_: the main thing you should figure out first (and maybe you already have)
is the access patterns and data model of your actual application. for instance, bitcask
(one of the backends, and the default) is structured to take advantage of typical operating
system / filesystem caching instead of doing the caching itself, and that works out very well
for a number of workloads
justinsheehy # ronr_: another choice, innostore, has all of the configurability of embedded
innodb if that is what you desire, and is suitable for some other situations
ronr_ # justinsheehy: our up is very read/write/delete intensive. all data that is written will
be read, normally within a matter of seconds (or minutes at the worst case), after they are read,
there's a high probability they'll be deleted.
justinsheehy # ronr_: that's an interesting and relatively unusual (but not unheard-of) one.
I suggest that you start out with riak using bitcask in default mode and do some benchmarking
of your access pattern, since you do seem to have a handle on the shape of it. people here will
certainly help you to tune if needed and even to point you in other directions if a better fit
emerges based on your needs and results.
ronr_ # justinsheehy: I hope you don't mind me saying, but from the only benchmark we
found so far, it seemed that riak isn't too strong in performance. the design concept
behind it is interesting, but we're a bit skeptic about the performance. am I being
completely wrong or is there something to it? (it's legitimate if performance is not
the main issue of riak, of course)
justinsheehy # ronr_: it depends on the use case, but in many situations riak's
performance is excellent. without knowing the machines, app configuration, cluster
configuration, and the benchmark code used it is hard to evaluate your specific
situation. that's enough info that it might work better on the mailing list.
benblack # ronr_: which benchmark did you find?
justinsheehy # ronr_: short answer is that while we're always working on improvement
we are actually quite happy with performance overall right now, so please do post your
benchmarking methods and details for discussion
ronr_ # benblack: I don't recall exactly. Can look for it.
benblack # ronr_: do you recall what the performance was that seemed slow?
ronr_ # justinsheehy: okay, point taken. I do have an additional question if you
don't mind. How does raik handle data deletions?
seancribbs # ronr_: creates tombstones, then reaps them with read repair
ronr_ # like cassandra?
seancribbs # similar concept beyond that, I'm not sure
benblack # not implemented at all the same way
ronr_ # the basic question is whether deleted data is flushed to the disk if it was
deleted before the original data was flushed to the disk.
justinsheehy # ronr_: right. similar in the most basic concept, but very different
in actual implementation. deletion works.
benblack # seancribbs: in cassandra, tombstones are created. after GCGraceSeconds, those
tombstones are eligible for removal, and when a major compaction is next run they are eliminated.
seancribbs # benblack: ah, very different then
justinsheehy # ronr_: ah. that's a very specific question.
ronr_ # benblack: the benchmark ran on 10 fairly weak servers and showed results that
weren't much higher on our single node of cassandra on a semi-strong machine. of course
there are _many_ other variables, but it's a general sense of it. justinsheehy: :)
justinsheehy # ronr_: in general, yes. the data will be written. if you're really deleting
that fast and don't want to touch storage, perhaps you want to manage that in your
application instead of your storage system.
justinsheehy # ronr_: I suggest running the same benchmark against the same hardware
for starters .what you describe isn't a useful comparisons -s
ronr_ # I realize that. justinsheehy: I can't. I need to make sure the data persists
in case of app failure.
justinsheehy # ronr_: then you can't ask for it to not hit disk. :-)
ronr_ # justinsheehy: of course it should hit the disk. again, I may have made an
assumption here. cassandra has a commitlog which is immediately written to disk,
whereas the 'organized' data is flushed to disk every so and so based on many
variables. if cassandra drops, when it comes up it'll replay the necesseray commit logs.
seancribbs # ronr_: bitcask is a bunch simpler than that (the default riak backend)
justinsheehy # ronr_: yeah, with bitcask the commitlog _is_ the canonical store
justinsheehy # ronr_: very, very different storage model than cassandra. much simpler
in exchange for not having some of the same features.
benblack # ronr_: cassandra does it that way to keep the stuff on disk sorted. bitcask
doesn't keep things sorted.
justinsheehy # right
ronr_ # okay, so what's the end result of it not being sorted? what functionality is removed?
seancribbs # range queries, primarily
ronr_ # yeah, but with Random Partioner in cassandra, ranged queries are pretty useless.
so that's not necessarily a downside. anything else?
bbrowning # ronr_: You still get ranged queries across columns in a key - hard comparison
to Riak though b/c data model is different
ronr_ # I imagine that's true. I actually forgot to look into riak's datamodel :}
benblack # um.
* ronr_ # feels kinda dumb
seancribbs # ronr_: riak is a key-value store, with a few extras. cassandra is more
like bigtable. BIG differences
ronr_ # so it's key to a single value?
seancribbs # yes. it is mostly agnostic to the internal structure of the value
benblack # ...until you query ;)
seancribbs # well, then it helps to use JSON
;) or Erlang terms
ronr_ # what if I just want to store a serialized object per key?
seancribbs # you can do that. Riak won't care
ronr_ # then again, there are other problems.
justinsheehy # people do that often
seancribbs # if you're using HTTP, specify a content type though
ronr_ # silly question, but what about indexing?
benblack # ...until you query ;)
ronr_ # it sounds like the only way to query is to ask for specific keys.
benblack # no, there is a while map/reduce query system built in
seancribbs # ronr_: you typically need to know at least one key, but you can use
map-reduce or link-walking
benblack # less useful if your values are just seriialized blobs
seancribbs # benblack: yes, unless you're shoving more transparent information into metadata
ronr_ # actually, the link-walking was one of the features that caught our attention.
benblack # trudat
justinsheehy # right, or you could make your link-walking deserialize them to find the links
bbrowning # And, if your serialization is JSON or binary erlang you can still do m/r
pretty easily
seancribbs # bbrowning: ^^ see above ;)
justinsheehy # more expensive that way, generally, but doable
ronr_ # but unless I misunderstood, a key could have a single link in its value?
seancribbs # no, it can have many links in its metadata
ronr_ # and while I'm a bit familiar with the m/r concept, can anyone please give me a
specific example to what can be done with it?
seancribbs # ronr_: lots of info on the blog, and wiki about that
ronr_ # seancribbs: maybe a weird question, but can I link to data that doesn't exist yet?
seancribbs # ronr_: yes
ronr_ # okay, will look into that.
seancribbs # if you know the bucket/key
ronr_ # yup, I should know that.
seancribbs # there's no validation of links (i.e. no referential integrity)
ronr_ # great.there's one other thiing though, for which I need to know if there's a
solution, before I dive more into the documentation.
ronr_ # I need to be able to pick an key (or a group of keys) randomly, and only within a
certain age (i.e, data that is old enough). any way to solve that?
ronr_ # we found a work around in cassandra for it. just wondering if there's a way to solve it here.
benblack # you could do that with m/r
ronr_ # not sure that it matters, but if I have fairly few physical nodes (2-4),
would m/r work efficiently?
benblack # map phases are distributed, so more nodes is generally better there
ronr_ # eventually the write and read throughput are crucial to our app.
seancribbs # ronr_: you might also want to consider hybrid solutions. no datastore,
including riak, is going to solve all your problems
ronr_ # right. I want to use a hybrid solution. Unfortunately, my boss
(who apprently knows just about anything in the world) wants it to be a disk-based
solution only, and with one external technology.
seancribbs # that's not a technical problem then
ronr_ # I know.
But since I have to do it, I'm trying to find out which NoSQL solutions might
have improved results compared to our previous tests. Even though I believe the proper
solution is a combination of a NoSQL datastore and a data grid like infinispan.
benblack # i don't understand that at all, but that's ok.
ronr_ # Don't understand what? Why I try to check things that I know won't solve the problem?
benblack # no, why a combination of some nosql thing and infinispan is a good solution.
ronr_ # benblack: I won't have to do complex queries on the datastore, but could do
most in memory. When I know which objects are needed, then I'll read them, do whatever
with them, and delete them from the datastore.
* Damm # has found hybrid solutions will be common until we have enough NoSQL choices
with enough maturity that you can pick and choose and fake enlightenment.
ronr_ # Damm :)
ronr_ # benblack: I need to build trees from the objects, but there are different
rules as to how to connect the objects. If I have to read all the data from the
datastore, I'll have many round trips since I won't be able to know which are the
next objects to query until I have the previous objects in hand (which is where
Riak's links come in handy). Since I don't need the whole objects to do that, I
could keep the relevant data in an in-memory index, find the proper
objects, and load them all at once. a graph db might have been a better solution, but
unless google releases their Pregel as open source, there aren't really any other
viable solutions right now.
seancribbs # ronr_: really?
ronr_ # really what?
seancribbs # there's no viable graph dbs? I don't see how you come by that
ronr_ # the best option out there is Neo4j. but it doesn't have a distributed solution.
ronr_ # infinitegraph is still in beta. InfoGrid _might_ be possible. FlockDB is not
mature enough (and not really a graph db). there's one that's written in .net which isn't
good for us. anything else I missed?
seancribbs # Allegro
ronr_ # right, I forgot about Allegro.
seancribbs # Sones also has a graph databasehttp://www.sones.com/products
ronr_ # right. sones is the .net one. Allegro doesn't seem to be open source.
seancribbs # open source and "good/bad software" are not necessarily correlated
seancribbs # although OSS is easier to evaluate
ronr_ # Right. Bad wording on my account.
boonkerz # sones is ok
ronr_ # never said it wasn't.
boonkerz # because an friend makes it :D
ronr_ # I just said I can't use it. of course my boss throws even more limitations
at me(like try to avoid paying money as much as possible), but that's something I just
have to deal with.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment