Created
July 15, 2010 16:37
-
-
Save PharkMillups/477185 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ronr_ # Basically, we're looking into NoSQL database as a datastore solution for our | |
application. So far, we concentrated on Cassandra, and while things went great, there | |
were still a few issues we try to overcome. | |
ronr_ # justinsheehy: first thing, we were wondering if riak has any caching system of | |
its data. If so, when is it filled: on writes, reads or both? | |
justinsheehy # ronr_: two answers to start with: first, internal caching depends on your | |
choice of backend. second, the http interface allows you to easily put a proxy cache | |
(such as varnish, squid, etc) in front of Riak and have it Just Work. | |
ronr_ # justinsheehy: can you elaborate a bit more on the backend? | |
justinsheehy # ronr_: the main thing you should figure out first (and maybe you already have) | |
is the access patterns and data model of your actual application. for instance, bitcask | |
(one of the backends, and the default) is structured to take advantage of typical operating | |
system / filesystem caching instead of doing the caching itself, and that works out very well | |
for a number of workloads | |
justinsheehy # ronr_: another choice, innostore, has all of the configurability of embedded | |
innodb if that is what you desire, and is suitable for some other situations | |
ronr_ # justinsheehy: our up is very read/write/delete intensive. all data that is written will | |
be read, normally within a matter of seconds (or minutes at the worst case), after they are read, | |
there's a high probability they'll be deleted. | |
justinsheehy # ronr_: that's an interesting and relatively unusual (but not unheard-of) one. | |
I suggest that you start out with riak using bitcask in default mode and do some benchmarking | |
of your access pattern, since you do seem to have a handle on the shape of it. people here will | |
certainly help you to tune if needed and even to point you in other directions if a better fit | |
emerges based on your needs and results. | |
ronr_ # justinsheehy: I hope you don't mind me saying, but from the only benchmark we | |
found so far, it seemed that riak isn't too strong in performance. the design concept | |
behind it is interesting, but we're a bit skeptic about the performance. am I being | |
completely wrong or is there something to it? (it's legitimate if performance is not | |
the main issue of riak, of course) | |
justinsheehy # ronr_: it depends on the use case, but in many situations riak's | |
performance is excellent. without knowing the machines, app configuration, cluster | |
configuration, and the benchmark code used it is hard to evaluate your specific | |
situation. that's enough info that it might work better on the mailing list. | |
benblack # ronr_: which benchmark did you find? | |
justinsheehy # ronr_: short answer is that while we're always working on improvement | |
we are actually quite happy with performance overall right now, so please do post your | |
benchmarking methods and details for discussion | |
ronr_ # benblack: I don't recall exactly. Can look for it. | |
benblack # ronr_: do you recall what the performance was that seemed slow? | |
ronr_ # justinsheehy: okay, point taken. I do have an additional question if you | |
don't mind. How does raik handle data deletions? | |
seancribbs # ronr_: creates tombstones, then reaps them with read repair | |
ronr_ # like cassandra? | |
seancribbs # similar concept beyond that, I'm not sure | |
benblack # not implemented at all the same way | |
ronr_ # the basic question is whether deleted data is flushed to the disk if it was | |
deleted before the original data was flushed to the disk. | |
justinsheehy # ronr_: right. similar in the most basic concept, but very different | |
in actual implementation. deletion works. | |
benblack # seancribbs: in cassandra, tombstones are created. after GCGraceSeconds, those | |
tombstones are eligible for removal, and when a major compaction is next run they are eliminated. | |
seancribbs # benblack: ah, very different then | |
justinsheehy # ronr_: ah. that's a very specific question. | |
ronr_ # benblack: the benchmark ran on 10 fairly weak servers and showed results that | |
weren't much higher on our single node of cassandra on a semi-strong machine. of course | |
there are _many_ other variables, but it's a general sense of it. justinsheehy: :) | |
justinsheehy # ronr_: in general, yes. the data will be written. if you're really deleting | |
that fast and don't want to touch storage, perhaps you want to manage that in your | |
application instead of your storage system. | |
justinsheehy # ronr_: I suggest running the same benchmark against the same hardware | |
for starters .what you describe isn't a useful comparisons -s | |
ronr_ # I realize that. justinsheehy: I can't. I need to make sure the data persists | |
in case of app failure. | |
justinsheehy # ronr_: then you can't ask for it to not hit disk. :-) | |
ronr_ # justinsheehy: of course it should hit the disk. again, I may have made an | |
assumption here. cassandra has a commitlog which is immediately written to disk, | |
whereas the 'organized' data is flushed to disk every so and so based on many | |
variables. if cassandra drops, when it comes up it'll replay the necesseray commit logs. | |
seancribbs # ronr_: bitcask is a bunch simpler than that (the default riak backend) | |
justinsheehy # ronr_: yeah, with bitcask the commitlog _is_ the canonical store | |
justinsheehy # ronr_: very, very different storage model than cassandra. much simpler | |
in exchange for not having some of the same features. | |
benblack # ronr_: cassandra does it that way to keep the stuff on disk sorted. bitcask | |
doesn't keep things sorted. | |
justinsheehy # right | |
ronr_ # okay, so what's the end result of it not being sorted? what functionality is removed? | |
seancribbs # range queries, primarily | |
ronr_ # yeah, but with Random Partioner in cassandra, ranged queries are pretty useless. | |
so that's not necessarily a downside. anything else? | |
bbrowning # ronr_: You still get ranged queries across columns in a key - hard comparison | |
to Riak though b/c data model is different | |
ronr_ # I imagine that's true. I actually forgot to look into riak's datamodel :} | |
benblack # um. | |
* ronr_ # feels kinda dumb | |
seancribbs # ronr_: riak is a key-value store, with a few extras. cassandra is more | |
like bigtable. BIG differences | |
ronr_ # so it's key to a single value? | |
seancribbs # yes. it is mostly agnostic to the internal structure of the value | |
benblack # ...until you query ;) | |
seancribbs # well, then it helps to use JSON | |
;) or Erlang terms | |
ronr_ # what if I just want to store a serialized object per key? | |
seancribbs # you can do that. Riak won't care | |
ronr_ # then again, there are other problems. | |
justinsheehy # people do that often | |
seancribbs # if you're using HTTP, specify a content type though | |
ronr_ # silly question, but what about indexing? | |
benblack # ...until you query ;) | |
ronr_ # it sounds like the only way to query is to ask for specific keys. | |
benblack # no, there is a while map/reduce query system built in | |
seancribbs # ronr_: you typically need to know at least one key, but you can use | |
map-reduce or link-walking | |
benblack # less useful if your values are just seriialized blobs | |
seancribbs # benblack: yes, unless you're shoving more transparent information into metadata | |
ronr_ # actually, the link-walking was one of the features that caught our attention. | |
benblack # trudat | |
justinsheehy # right, or you could make your link-walking deserialize them to find the links | |
bbrowning # And, if your serialization is JSON or binary erlang you can still do m/r | |
pretty easily | |
seancribbs # bbrowning: ^^ see above ;) | |
justinsheehy # more expensive that way, generally, but doable | |
ronr_ # but unless I misunderstood, a key could have a single link in its value? | |
seancribbs # no, it can have many links in its metadata | |
ronr_ # and while I'm a bit familiar with the m/r concept, can anyone please give me a | |
specific example to what can be done with it? | |
seancribbs # ronr_: lots of info on the blog, and wiki about that | |
ronr_ # seancribbs: maybe a weird question, but can I link to data that doesn't exist yet? | |
seancribbs # ronr_: yes | |
ronr_ # okay, will look into that. | |
seancribbs # if you know the bucket/key | |
ronr_ # yup, I should know that. | |
seancribbs # there's no validation of links (i.e. no referential integrity) | |
ronr_ # great.there's one other thiing though, for which I need to know if there's a | |
solution, before I dive more into the documentation. | |
ronr_ # I need to be able to pick an key (or a group of keys) randomly, and only within a | |
certain age (i.e, data that is old enough). any way to solve that? | |
ronr_ # we found a work around in cassandra for it. just wondering if there's a way to solve it here. | |
benblack # you could do that with m/r | |
ronr_ # not sure that it matters, but if I have fairly few physical nodes (2-4), | |
would m/r work efficiently? | |
benblack # map phases are distributed, so more nodes is generally better there | |
ronr_ # eventually the write and read throughput are crucial to our app. | |
seancribbs # ronr_: you might also want to consider hybrid solutions. no datastore, | |
including riak, is going to solve all your problems | |
ronr_ # right. I want to use a hybrid solution. Unfortunately, my boss | |
(who apprently knows just about anything in the world) wants it to be a disk-based | |
solution only, and with one external technology. | |
seancribbs # that's not a technical problem then | |
ronr_ # I know. | |
But since I have to do it, I'm trying to find out which NoSQL solutions might | |
have improved results compared to our previous tests. Even though I believe the proper | |
solution is a combination of a NoSQL datastore and a data grid like infinispan. | |
benblack # i don't understand that at all, but that's ok. | |
ronr_ # Don't understand what? Why I try to check things that I know won't solve the problem? | |
benblack # no, why a combination of some nosql thing and infinispan is a good solution. | |
ronr_ # benblack: I won't have to do complex queries on the datastore, but could do | |
most in memory. When I know which objects are needed, then I'll read them, do whatever | |
with them, and delete them from the datastore. | |
* Damm # has found hybrid solutions will be common until we have enough NoSQL choices | |
with enough maturity that you can pick and choose and fake enlightenment. | |
ronr_ # Damm :) | |
ronr_ # benblack: I need to build trees from the objects, but there are different | |
rules as to how to connect the objects. If I have to read all the data from the | |
datastore, I'll have many round trips since I won't be able to know which are the | |
next objects to query until I have the previous objects in hand (which is where | |
Riak's links come in handy). Since I don't need the whole objects to do that, I | |
could keep the relevant data in an in-memory index, find the proper | |
objects, and load them all at once. a graph db might have been a better solution, but | |
unless google releases their Pregel as open source, there aren't really any other | |
viable solutions right now. | |
seancribbs # ronr_: really? | |
ronr_ # really what? | |
seancribbs # there's no viable graph dbs? I don't see how you come by that | |
ronr_ # the best option out there is Neo4j. but it doesn't have a distributed solution. | |
ronr_ # infinitegraph is still in beta. InfoGrid _might_ be possible. FlockDB is not | |
mature enough (and not really a graph db). there's one that's written in .net which isn't | |
good for us. anything else I missed? | |
seancribbs # Allegro | |
ronr_ # right, I forgot about Allegro. | |
seancribbs # Sones also has a graph databasehttp://www.sones.com/products | |
ronr_ # right. sones is the .net one. Allegro doesn't seem to be open source. | |
seancribbs # open source and "good/bad software" are not necessarily correlated | |
seancribbs # although OSS is easier to evaluate | |
ronr_ # Right. Bad wording on my account. | |
boonkerz # sones is ok | |
ronr_ # never said it wasn't. | |
boonkerz # because an friend makes it :D | |
ronr_ # I just said I can't use it. of course my boss throws even more limitations | |
at me(like try to avoid paying money as much as possible), but that's something I just | |
have to deal with. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment