PharkMillups/gist:477185

## gistfile1.txt
ronr_ # Basically, we're looking into NoSQL database as a datastore solution for our
 application. So far, we concentrated on Cassandra, and while things went great, there
 were still a few issues we try to overcome.

ronr_ # justinsheehy: first thing, we were wondering if riak has any caching system of
its data. If so, when is it  filled: on writes, reads or both?

justinsheehy # ronr_: two answers to start with: first, internal caching depends on your
choice of backend. second, the http interface allows you to easily put a proxy cache
(such as varnish, squid, etc) in front of Riak and have it Just Work.

ronr_ # justinsheehy: can you elaborate a bit more on the backend?

justinsheehy # ronr_: the main thing you should figure out first (and maybe you already have)
is the access patterns and data model of your actual application. for instance, bitcask
 (one of the backends, and the default) is structured to take advantage of typical operating
system / filesystem caching instead of doing the caching itself, and that works out very well
for a number of workloads

justinsheehy # ronr_: another choice, innostore, has all of the configurability of embedded
innodb if that is what you desire, and is suitable for some other situations

ronr_ # justinsheehy: our up is very read/write/delete intensive. all data that is written will
be read, normally within a matter of seconds (or minutes at the worst case), after they are read,
 there's a high probability they'll be deleted.

justinsheehy # ronr_: that's an interesting and relatively unusual (but not unheard-of) one.
 I suggest that you start out with riak using bitcask in default mode and do some benchmarking
of your access pattern, since you do seem to have a handle on the shape of it. people here will
 certainly help you to tune if needed and even to point you in other directions if a better fit
emerges based on your needs and results.

ronr_ # justinsheehy: I hope you don't mind me saying, but from the only benchmark we
found so far, it seemed that riak isn't too strong in performance. the design concept
behind it is interesting, but we're a bit skeptic about the performance. am I being
completely wrong or is there something to it? (it's legitimate if performance is not
the main issue of riak, of course)

justinsheehy # ronr_: it depends on the use case, but in many situations riak's
performance is excellent. without knowing the machines, app configuration, cluster
configuration, and the benchmark code used it is hard to evaluate your specific
 situation. that's enough info that it might work better on the mailing list.

benblack # ronr_: which benchmark did you find?

justinsheehy # ronr_: short answer is that while we're always working on improvement
we are actually quite happy with performance overall right now, so please do post your
 benchmarking methods and details for discussion

ronr_ # benblack: I don't recall exactly. Can look for it.

benblack # ronr_: do you recall what the performance was that seemed slow?

ronr_ # justinsheehy: okay, point taken. I do have an additional question if you
don't mind. How does raik handle data deletions?

seancribbs # ronr_: creates tombstones, then reaps them with read repair

ronr_ # like cassandra?

seancribbs # similar concept beyond that, I'm not sure

benblack # not implemented at all the same way

ronr_ # the basic question is whether deleted data is flushed to the disk if it was
deleted before the original data was flushed to the disk.

justinsheehy # ronr_: right. similar in the most basic concept, but very different
in actual implementation. deletion works.

benblack # seancribbs: in cassandra, tombstones are created. after GCGraceSeconds, those
tombstones are eligible for removal, and when a major compaction is next run they are eliminated.

seancribbs # benblack: ah, very different then

justinsheehy # ronr_: ah. that's a very specific question.

ronr_ # benblack: the benchmark ran on 10 fairly weak servers and showed results that
 weren't much higher on our single node of cassandra on a semi-strong machine. of course
there are _many_ other variables, but it's a general sense of it. justinsheehy: :)

justinsheehy # ronr_: in general, yes. the data will be written. if you're really deleting
that fast and don't want to touch storage, perhaps you want to manage that in your
application instead of your storage system.

justinsheehy # ronr_: I suggest running the same benchmark against the same hardware
for starters .what you describe isn't a useful comparisons -s

ronr_ # I realize that. justinsheehy: I can't. I need to make sure the data persists
 in case of app failure.

justinsheehy # ronr_: then you can't ask for it to not hit disk. :-)

ronr_ # justinsheehy: of course it should hit the disk. again, I may have made an
assumption here. cassandra has a commitlog which is immediately written to disk,
 whereas the 'organized' data is flushed to disk every so and so based on many
variables. if cassandra drops, when it comes up it'll replay the necesseray commit logs.

seancribbs # ronr_: bitcask is a bunch simpler than that (the default riak backend)

justinsheehy # ronr_: yeah, with bitcask the commitlog _is_ the canonical store

justinsheehy # ronr_: very, very different storage model than cassandra. much simpler
in exchange for not having some of the same features.

benblack # ronr_: cassandra does it that way to keep the stuff on disk sorted. bitcask
doesn't keep things sorted.

justinsheehy # right

ronr_ # okay, so what's the end result of it not being sorted? what functionality is removed?

seancribbs # range queries, primarily

ronr_ # yeah, but with Random Partioner in cassandra, ranged queries are pretty useless.
so that's not necessarily a downside. anything else?

bbrowning # ronr_: You still get ranged queries across columns in a key - hard comparison
to Riak though b/c data model is different

ronr_ # I imagine that's true. I actually forgot to look into riak's datamodel :}

benblack # um.

* ronr_ # feels kinda dumb
seancribbs # ronr_: riak is a key-value store, with a few extras. cassandra is more
 like bigtable. BIG differences

ronr_ # so it's key to a single value?

seancribbs # yes. it is mostly agnostic to the internal structure of the value

benblack # ...until you query ;)

seancribbs # well, then it helps to use JSON
;) or Erlang terms

ronr_ # what if I just want to store a serialized object per key?

seancribbs # you can do that. Riak won't care

ronr_ # then again, there are other problems.

justinsheehy # people do that often

seancribbs # if you're using HTTP, specify a content type though

ronr_ # silly question, but what about indexing?

benblack # ...until you query ;)

ronr_ # it sounds like the only way to query is to ask for specific keys.

benblack # no, there is a while map/reduce query system built in

seancribbs # ronr_: you typically need to know at least one key, but you can use
 map-reduce or link-walking

benblack # less useful if your values are just seriialized blobs

seancribbs # benblack: yes, unless you're shoving more transparent information into metadata

ronr_ # actually, the link-walking was one of the features that caught our attention.

benblack # trudat

justinsheehy # right, or you could make your link-walking deserialize them to find the links

bbrowning # And, if your serialization is JSON or binary erlang you can still do m/r
 pretty easily

seancribbs # bbrowning: ^^ see above ;)

justinsheehy # more expensive that way, generally, but doable

ronr_ # but unless I misunderstood, a key could have a single link in its value?

seancribbs # no, it can have many links in its metadata

ronr_ # and while I'm a bit familiar with the m/r concept, can anyone please give me a
specific example to what can be done with it?

seancribbs # ronr_: lots of info on the blog, and wiki about that

ronr_ # seancribbs: maybe a weird question, but can I link to data that doesn't exist yet?

seancribbs # ronr_: yes

ronr_ # okay, will look into that.

seancribbs # if you know the bucket/key

ronr_ # yup, I should know that.

seancribbs # there's no validation of links (i.e. no referential integrity)

ronr_ # great.there's one other thiing though, for which I need to know if there's a
solution, before I dive more into the documentation.

ronr_ # I need to be able to pick an key (or a group of keys) randomly, and only within a
certain age (i.e, data that is old enough). any way to solve that?

ronr_ # we found a work around in cassandra for it. just wondering if there's a way to solve it here.

benblack # you could do that with m/r

ronr_ # not sure that it matters, but if I have fairly few physical nodes (2-4),
would m/r work efficiently?

benblack # map phases are distributed, so more nodes is generally better there

ronr_ # eventually the write and read throughput are crucial to our app.

seancribbs # ronr_: you might also want to consider hybrid solutions. no datastore,
 including riak, is going to solve all your problems

ronr_ # right. I want to use a hybrid solution. Unfortunately, my boss
(who apprently knows just about anything in the world) wants it to be a disk-based
solution only, and with one external technology.

seancribbs # that's not a technical problem then

ronr_ # I know.
But since I have to do it, I'm trying to find out which NoSQL solutions might
have improved results compared to our previous tests. Even though I believe the proper
solution is a combination of a NoSQL datastore and a data grid like infinispan.

benblack # i don't understand that at all, but that's ok.

ronr_ # Don't understand what? Why I try to check things that I know won't solve the problem?

benblack # no, why a combination of some nosql thing and infinispan is a good solution.

ronr_ # benblack: I won't have to do complex queries on the datastore, but could do
most in memory. When I know which objects are needed, then I'll read them, do whatever
with them, and delete them from the datastore.

* Damm # has found hybrid solutions will be common until we have enough NoSQL choices
 with enough maturity that you can pick and choose and fake enlightenment.

ronr_ # Damm :)

ronr_ # benblack: I need to build trees from the objects, but there are different
rules as to how to connect the objects. If I have to read all the data from the
datastore, I'll have many round trips since I won't be able to know which are the
next objects to query until I have the previous objects in hand (which is where
Riak's links come in handy). Since I don't need the whole objects to do that, I
could keep the relevant data in an in-memory index, find the proper
objects, and load them all at once. a graph db might have been a better solution, but
 unless google releases their Pregel as open source, there aren't really any other
 viable solutions right now.

seancribbs # ronr_: really?

ronr_ # really what?

seancribbs # there's no viable graph dbs? I don't see how you come by that

ronr_ # the best option out there is Neo4j. but it doesn't have a distributed solution.

ronr_ # infinitegraph is still in beta. InfoGrid _might_ be possible. FlockDB is not
mature enough (and not really a graph db). there's one that's written in .net which isn't
 good for us. anything else I missed?

seancribbs # Allegro

ronr_ # right, I forgot about Allegro.

seancribbs # Sones also has a graph databasehttp://www.sones.com/products

ronr_ # right. sones is the .net one. Allegro doesn't seem to be open source.

seancribbs # open source and "good/bad software" are not necessarily correlated

seancribbs # although OSS is easier to evaluate

ronr_ # Right. Bad wording on my account.

boonkerz # sones is ok

ronr_ # never said it wasn't.

boonkerz # because an friend makes it :D

ronr_ # I just said I can't use it. of course my boss throws even more limitations
at me(like try to avoid paying money as much as possible), but that's something I just
have to deal with.
	ronr_ # Basically, we're looking into NoSQL database as a datastore solution for our
	application. So far, we concentrated on Cassandra, and while things went great, there
	were still a few issues we try to overcome.

	ronr_ # justinsheehy: first thing, we were wondering if riak has any caching system of
	its data. If so, when is it filled: on writes, reads or both?

	justinsheehy # ronr_: two answers to start with: first, internal caching depends on your
	choice of backend. second, the http interface allows you to easily put a proxy cache
	(such as varnish, squid, etc) in front of Riak and have it Just Work.

	ronr_ # justinsheehy: can you elaborate a bit more on the backend?

	justinsheehy # ronr_: the main thing you should figure out first (and maybe you already have)
	is the access patterns and data model of your actual application. for instance, bitcask
	(one of the backends, and the default) is structured to take advantage of typical operating
	system / filesystem caching instead of doing the caching itself, and that works out very well
	for a number of workloads

	justinsheehy # ronr_: another choice, innostore, has all of the configurability of embedded
	innodb if that is what you desire, and is suitable for some other situations

	ronr_ # justinsheehy: our up is very read/write/delete intensive. all data that is written will
	be read, normally within a matter of seconds (or minutes at the worst case), after they are read,
	there's a high probability they'll be deleted.

	justinsheehy # ronr_: that's an interesting and relatively unusual (but not unheard-of) one.
	I suggest that you start out with riak using bitcask in default mode and do some benchmarking
	of your access pattern, since you do seem to have a handle on the shape of it. people here will
	certainly help you to tune if needed and even to point you in other directions if a better fit
	emerges based on your needs and results.

	ronr_ # justinsheehy: I hope you don't mind me saying, but from the only benchmark we
	found so far, it seemed that riak isn't too strong in performance. the design concept
	behind it is interesting, but we're a bit skeptic about the performance. am I being
	completely wrong or is there something to it? (it's legitimate if performance is not
	the main issue of riak, of course)

	justinsheehy # ronr_: it depends on the use case, but in many situations riak's
	performance is excellent. without knowing the machines, app configuration, cluster
	configuration, and the benchmark code used it is hard to evaluate your specific
	situation. that's enough info that it might work better on the mailing list.

	benblack # ronr_: which benchmark did you find?

	justinsheehy # ronr_: short answer is that while we're always working on improvement
	we are actually quite happy with performance overall right now, so please do post your
	benchmarking methods and details for discussion

	ronr_ # benblack: I don't recall exactly. Can look for it.

	benblack # ronr_: do you recall what the performance was that seemed slow?

	ronr_ # justinsheehy: okay, point taken. I do have an additional question if you
	don't mind. How does raik handle data deletions?

	seancribbs # ronr_: creates tombstones, then reaps them with read repair

	ronr_ # like cassandra?

	seancribbs # similar concept beyond that, I'm not sure

	benblack # not implemented at all the same way

	ronr_ # the basic question is whether deleted data is flushed to the disk if it was
	deleted before the original data was flushed to the disk.

	justinsheehy # ronr_: right. similar in the most basic concept, but very different
	in actual implementation. deletion works.

	benblack # seancribbs: in cassandra, tombstones are created. after GCGraceSeconds, those
	tombstones are eligible for removal, and when a major compaction is next run they are eliminated.

	seancribbs # benblack: ah, very different then

	justinsheehy # ronr_: ah. that's a very specific question.

	ronr_ # benblack: the benchmark ran on 10 fairly weak servers and showed results that
	weren't much higher on our single node of cassandra on a semi-strong machine. of course
	there are _many_ other variables, but it's a general sense of it. justinsheehy: :)

	justinsheehy # ronr_: in general, yes. the data will be written. if you're really deleting
	that fast and don't want to touch storage, perhaps you want to manage that in your
	application instead of your storage system.

	justinsheehy # ronr_: I suggest running the same benchmark against the same hardware
	for starters .what you describe isn't a useful comparisons -s

	ronr_ # I realize that. justinsheehy: I can't. I need to make sure the data persists
	in case of app failure.

	justinsheehy # ronr_: then you can't ask for it to not hit disk. :-)

	ronr_ # justinsheehy: of course it should hit the disk. again, I may have made an
	assumption here. cassandra has a commitlog which is immediately written to disk,
	whereas the 'organized' data is flushed to disk every so and so based on many
	variables. if cassandra drops, when it comes up it'll replay the necesseray commit logs.

	seancribbs # ronr_: bitcask is a bunch simpler than that (the default riak backend)

	justinsheehy # ronr_: yeah, with bitcask the commitlog _is_ the canonical store

	justinsheehy # ronr_: very, very different storage model than cassandra. much simpler
	in exchange for not having some of the same features.

	benblack # ronr_: cassandra does it that way to keep the stuff on disk sorted. bitcask
	doesn't keep things sorted.

	justinsheehy # right

	ronr_ # okay, so what's the end result of it not being sorted? what functionality is removed?

	seancribbs # range queries, primarily

	ronr_ # yeah, but with Random Partioner in cassandra, ranged queries are pretty useless.
	so that's not necessarily a downside. anything else?

	bbrowning # ronr_: You still get ranged queries across columns in a key - hard comparison
	to Riak though b/c data model is different

	ronr_ # I imagine that's true. I actually forgot to look into riak's datamodel :}

	benblack # um.

	* ronr_ # feels kinda dumb
	seancribbs # ronr_: riak is a key-value store, with a few extras. cassandra is more
	like bigtable. BIG differences

	ronr_ # so it's key to a single value?

	seancribbs # yes. it is mostly agnostic to the internal structure of the value

	benblack # ...until you query ;)

	seancribbs # well, then it helps to use JSON
	;) or Erlang terms

	ronr_ # what if I just want to store a serialized object per key?

	seancribbs # you can do that. Riak won't care

	ronr_ # then again, there are other problems.

	justinsheehy # people do that often

	seancribbs # if you're using HTTP, specify a content type though

	ronr_ # silly question, but what about indexing?

	benblack # ...until you query ;)

	ronr_ # it sounds like the only way to query is to ask for specific keys.

	benblack # no, there is a while map/reduce query system built in

	seancribbs # ronr_: you typically need to know at least one key, but you can use
	map-reduce or link-walking

	benblack # less useful if your values are just seriialized blobs

	seancribbs # benblack: yes, unless you're shoving more transparent information into metadata

	ronr_ # actually, the link-walking was one of the features that caught our attention.

	benblack # trudat

	justinsheehy # right, or you could make your link-walking deserialize them to find the links

	bbrowning # And, if your serialization is JSON or binary erlang you can still do m/r
	pretty easily

	seancribbs # bbrowning: ^^ see above ;)

	justinsheehy # more expensive that way, generally, but doable

	ronr_ # but unless I misunderstood, a key could have a single link in its value?

	seancribbs # no, it can have many links in its metadata

	ronr_ # and while I'm a bit familiar with the m/r concept, can anyone please give me a
	specific example to what can be done with it?

	seancribbs # ronr_: lots of info on the blog, and wiki about that

	ronr_ # seancribbs: maybe a weird question, but can I link to data that doesn't exist yet?

	seancribbs # ronr_: yes

	ronr_ # okay, will look into that.

	seancribbs # if you know the bucket/key

	ronr_ # yup, I should know that.

	seancribbs # there's no validation of links (i.e. no referential integrity)

	ronr_ # great.there's one other thiing though, for which I need to know if there's a
	solution, before I dive more into the documentation.

	ronr_ # I need to be able to pick an key (or a group of keys) randomly, and only within a
	certain age (i.e, data that is old enough). any way to solve that?

	ronr_ # we found a work around in cassandra for it. just wondering if there's a way to solve it here.

	benblack # you could do that with m/r

	ronr_ # not sure that it matters, but if I have fairly few physical nodes (2-4),
	would m/r work efficiently?

	benblack # map phases are distributed, so more nodes is generally better there

	ronr_ # eventually the write and read throughput are crucial to our app.

	seancribbs # ronr_: you might also want to consider hybrid solutions. no datastore,
	including riak, is going to solve all your problems

	ronr_ # right. I want to use a hybrid solution. Unfortunately, my boss
	(who apprently knows just about anything in the world) wants it to be a disk-based
	solution only, and with one external technology.

	seancribbs # that's not a technical problem then

	ronr_ # I know.
	But since I have to do it, I'm trying to find out which NoSQL solutions might
	have improved results compared to our previous tests. Even though I believe the proper
	solution is a combination of a NoSQL datastore and a data grid like infinispan.

	benblack # i don't understand that at all, but that's ok.

	ronr_ # Don't understand what? Why I try to check things that I know won't solve the problem?

	benblack # no, why a combination of some nosql thing and infinispan is a good solution.

	ronr_ # benblack: I won't have to do complex queries on the datastore, but could do
	most in memory. When I know which objects are needed, then I'll read them, do whatever
	with them, and delete them from the datastore.

	* Damm # has found hybrid solutions will be common until we have enough NoSQL choices
	with enough maturity that you can pick and choose and fake enlightenment.

	ronr_ # Damm :)

	ronr_ # benblack: I need to build trees from the objects, but there are different
	rules as to how to connect the objects. If I have to read all the data from the
	datastore, I'll have many round trips since I won't be able to know which are the
	next objects to query until I have the previous objects in hand (which is where
	Riak's links come in handy). Since I don't need the whole objects to do that, I
	could keep the relevant data in an in-memory index, find the proper
	objects, and load them all at once. a graph db might have been a better solution, but
	unless google releases their Pregel as open source, there aren't really any other
	viable solutions right now.

	seancribbs # ronr_: really?

	ronr_ # really what?

	seancribbs # there's no viable graph dbs? I don't see how you come by that

	ronr_ # the best option out there is Neo4j. but it doesn't have a distributed solution.

	ronr_ # infinitegraph is still in beta. InfoGrid _might_ be possible. FlockDB is not
	mature enough (and not really a graph db). there's one that's written in .net which isn't
	good for us. anything else I missed?

	seancribbs # Allegro

	ronr_ # right, I forgot about Allegro.

	seancribbs # Sones also has a graph databasehttp://www.sones.com/products

	ronr_ # right. sones is the .net one. Allegro doesn't seem to be open source.

	seancribbs # open source and "good/bad software" are not necessarily correlated

	seancribbs # although OSS is easier to evaluate

	ronr_ # Right. Bad wording on my account.

	boonkerz # sones is ok

	ronr_ # never said it wasn't.

	boonkerz # because an friend makes it :D

	ronr_ # I just said I can't use it. of course my boss throws even more limitations
	at me(like try to avoid paying money as much as possible), but that's something I just
	have to deal with.