PharkMillups/gist:767327

## gistfile1.txt
07:10 <svjson> Hey guys

07:10 <svjson> Time for a couple of questions?

07:21 <roder> svjson: what's up?

07:22 <svjson> I have some general questions & thoughts. I'm currently
building a prototype for a new application for a customer,
which is using riak for data storage

07:23 <svjson> Or riak-search, rather

07:24 <rustyk> svjson: sure thing, fire away

07:24 <svjson> I did some fiddling last week and got things
to work the way I want

07:24 <svjson> But now I'm wondering a bit about indexing and data size

07:24 <svjson> The application is going to be, more or less,
constantly bombarded with data and will need to store this data
historically for several years

07:26 <svjson> So it's kind of write heavy - all data is
guaranteed to be read at least once, as part of it's life-cycle,
and after that it _may_ be requested again, but will mostly just... lie around

07:27 <svjson> Are there any known limit, or increase in
response time,  when hitting an index proportional to the
amount of data than can be predicted?

07:28 <rustyk> svjson: the response time is going to grow proportional
to two things 1) how many entries are in the index under the searched
term and 2) how many objects are in the final result set

07:29 <svjson> Right

07:29 <rustyk> svjson: in other words, roughly how many objects will match each clause of the query, and how many objects will match the entire query

07:31 <svjson> In the case of 1), the first part of a query will
be constantly growing, but the matches for the entire query will
vary within a predictable range

07:32 <svjson> One case, for example is events tied to a specific entity

07:32 <svjson> An entity will usually have 50-60 events, sometimes more,
but never as many as.. say.. a thousand.

07:34 <svjson> Grabbing these 50-60 events with the help of an
index on a reference to such an entity should, in other words not
noticably slow down with a huge data set, unless the query
contains a search term that matches entries outside of these?

07:36 <rustyk> yes, that's correct

07:37 <rustyk> I'd get worried if you started to have clauses
that matched hundreds of thousands to millions entries
(depending on machine size), but 50 − 60 is no problem

07:38 <svjson> well, it might in "extreme" cases go up
to a couple of hundred, but these will not be the norm

07:38 <rustyk> excellent… sounds like really good use case
for Riak Search

07:39 <svjson> Any idea about how large data sets I would need
to test with in order to make sure that I get acceptable
performance?

07:39 <svjson> Or is that like asking how many trees will fit in my garden?

07:40 <svjson> (without more detail)

07:40 <rustyk> yeah… there are many factors… RAM, disk speed, # of processors, network speed, how large your hot set of data is, etc.

07:41 <rustyk> if you haven't played with Basho Bench before,
you might want to check it out… it helps you create repeatable
performance tests for things like this

07:41 <rustyk> plus it will give you an excuse to dive back into Erlang

07:41 <rustyk> ;-)

07:44 <svjson> Right. Will do.

07:45 <svjson> How is Riak aware of and dependent on what
my "hot" data set is?

07:45 <svjson> For caching?

07:46 <svjson> Sounds like a good opportunity, yeah.

07:46 <rustyk> Riak doesn't actually keep track of it, but the
underlying OS will naturally cache that data

07:46 <svjson> Right

07:47 <svjson> Well, during the life-cycle of one of these entities
in this example, events constantly go down into riak, and when the
life-cycle has expired, the events will be audited, and after that
access is arbitrary

07:47 <svjson> Ie, administrators or staff has some sort of reason to go back and look at it. Which is rare

07:48 <svjson> So there won't be a lot of random access, no

07:48 <svjson> However, that brings me to another thing I wanted to bring up. About erlang.

07:49 <svjson> Some more background on this is that, this
project at my customer has been in the pipe for a while and is
informally referred to as "The Cassandra-project".

07:49 <svjson> I'm doing this prototype with Riak because I have a bad gut-feeling about Cassandra.

07:50 <svjson> Someone told me that the fact that I'd written
some erlang to make this work as I wanted, was a potential nail
in the coffin because "This is a java shop"

07:50 <svjson> So I'll be needing some good pro-Riak arguments
to go. I'm sure you're shock-full of those :)

07:51 <svjson> (Yeah, I know the reasoning behind that is so
narrow-mindedly dumb that it's a threat to the fabric of space-time,
but that's the way it is...)

07:52 <rustyk> my only argument is "use the best tool for the job" ;-)

07:53 <rustyk> that said, if they are getting hung up on the custom
extractor you wrote, you could potentially do some more processing
*before* you store the data in Riak that would eliminate the need
for the extractor

07:53 <seancribbs> svjson: point to erjang - even if they are
a Java shop, non-Java languages are bursting on the JVM

07:57 <svjson> rustyk: Yeah, that's true...

07:58 <svjson> sean: If only that was an argument that would bite.

07:58 <seancribbs> svjson: yeah, it was a shot in the dark

07:58 <DeadZen> erjang is pretty cool

07:58 <svjson> I responded with say that if I could learn
enough erlang to get that together in just a day, so should
anyone else be able to

07:58 <seancribbs> you'd be better off arguing the
differences between the data models… Riak's is much
simpler to understand

07:59 <svjson> But I was thinking more about the merits
about the data stores

07:59 <svjson> sean, exactly

07:59 <seancribbs> Cassandra's is more rich

07:59 <seancribbs> IMO those should be the primary concerns

07:59 <svjson> "rich" translates to "weird & complex" in my
book, when it comes to Cassandra

07:59 <seancribbs> otherwise, it's just a piece of infrastructure

07:59 <seancribbs> i mean, do they complain that Oracle
isn't written in Java? I think not

08:00 <DeadZen> the lines between oracle and java will fade ;)

08:00 <peschkaj> svjson: you can also point to apparent
complexity - riak is fairly simple, especially when compared
to the types of tuning you need for Cassandra.

08:00 <DeadZen> is riak HA?

08:00 <seancribbs> DeadZen: depends on what you mean by HA

08:01 <DeadZen> that's true

08:01 <DeadZen> i'm asking because for instance while im
running a map reduce test

08:01 <svjson> peschkaj: Thanks - exactly. Thanks to the
esoteric tuning that seems to be necessary with Cassandra,
I kind of have a gut-feeling that if we do this project with it,
people's phones will start ringing in the middle of night

08:01 <DeadZen> if i introduce a node during the test, things break ;)

08:01 <seancribbs> if you mean, survives single-node failure, then yes

08:01 <DeadZen> i guess that depends on what you mean by survive

08:01 <svjson> But I don't have enough Cassandra experience
to verify that

08:01 <seancribbs> DeadZen: touche

08:02 <DeadZen> heh

08:02 <seancribbs> however, if you introduce a node, you're
causing data to shift around the cluster

08:02 <seancribbs> so… the plan a MapReduce query was
running will become moot

08:02 <seancribbs> I use the word "plan" lightly of course

08:02 <DeadZen> nods i get what you mean

08:03 <DeadZen> so say im using riak as a queue, so until
that's complete (the data migration), the queue will be
unrecognizable/inaccessible ?

08:03 <seancribbs> first, i wouldn't use it as a queue - use a queue.

08:04 <seancribbs> second, it will be accessible, it'll just
be on the other 2 replicas

08:04 <peschkaj> DeadZen: you would still be able to access
your data, it might be slow, depending on your networking gear

08:04 <DeadZen> nods I guess I tried to make a queue to better understand its personality

08:04 <peschkaj> svjson: you could take a look at the Cassandra
documentation or the tutorials from Riptano

08:05 <DeadZen> seancribbs: do you know of a distributed queue
for erlang?

08:05 <seancribbs> rabbitmq-cluster

08:05 <peschkaj> svjson: the other argument I use is that
with riak I get a rock solid set of core functionality and
I can build any other pieces that I need

08:05 <svjson> peschkaj: Build any other pieces - in what respect?

08:05 <DeadZen> seancribbs: yah thats true... do riak and
rabbit play well together? ;)

08:06 <seancribbs> DeadZen: they don't run in the same
erlang nodes, but yes

08:06 <peschkaj> svjson: well, if there's something that I don't
need, it isn't
there. But if I need an index to act a certain way, I can either
use riak-search or else implement my own indexing mechanism

08:07 <peschkaj> svjson: theoretically, I could do the
same thing with Cassandra, but it already has a lot more
complexity that needs to be considered when I'm extending it

08:12 <svjson> peschkaj: Right, but then implementing our own stuff on top of it would back-fire as an argument.

08:12 <svjson> (again - erlang)

08:12 <DeadZen> mapred_verify is cool

08:13 <peschkaj> svjson: true, but ideally you'll
never have to extend. I guess I
just use the simplicity of Riak as an argument in
favor of using it.

08:13 <svjson> There was a recent post...
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis -
that concludes Riak like so:

08:13 <svjson> Best used: If you want something Cassandra-like
(Dynamo-like), but no  way you're gonna deal with the bloat and
complexity. If you need very good single-site scalability,
availability and fault-tolerance, but you're ready to pay for
multi-site replication.

08:14 <peschkaj> yup

08:15 <seancribbs> svjson: it was very well put, yes

08:15 <svjson> Although that is the opinion of one individual,
I'd like to know more about what he is actually referring to

08:15 <DeadZen> that sounds like a good synopsis

08:16 <svjson> With bloat I guess he means what you just
mentioned - a lot of features we're not interested in

08:16 <seancribbs> svjson: also SLOC

08:16 <seancribbs> Riak is pretty small, all things considered

08:16 <peschkaj> yup, and riak is also readable. I find
enterprise-y java to be infinitely illegible

08:16 <svjson> Complexity - The BigTable data model, the tuning stuff, etc?

08:17 <peschkaj> yeah, the complexity comes in when you're
dealing with column  families, super columns, data modeling
and partitioning

08:17 <seancribbs> svjson: rumor has it that every major
installation of Cassandra has their own customizations too

08:17 <seancribbs> which is a point of complexity

08:17 <peschkaj> there are a lot of tuning parameters when compared to riak

08:18 <DeadZen> and a giant xml config ;)

08:18 <peschkaj> from personal experience - i never got
Cassandra to behave properly when I experimented with it

08:18 <svjson> Yeah, so we're effectively storing chunks of XML,
caring only about a couple of fields in it for indexing, which
Riak-search now handles perfectly fine

08:19 <svjson> DeadZen: Remember, this is a Java-land. Configuring
stuff in a large XML file is considered a good thing over here ;)

08:19 <peschkaj> Java is a DSL for generating XML

08:19 <* seancribbs> likes erlang terms — short and sweet

08:19 <peschkaj> svjson: show them how much it would cost to
implement with Oracle ;)

08:19 <DeadZen> file:consult ;)

08:19 <seancribbs> DeadZen: exactly

08:19 <svjson> peschkaj: Hah - that one goes in my quote-book

08:20 <svjson> So what about that last sentence - what is he referring to?

08:20 <svjson> "but you're ready to pay for multi-site replication" ?

08:20 <DeadZen> riak enterprise

08:20 <peschkaj> Riak can replicate across data centers, but you need
to pay for the enterprise licensing

08:21 <peschkaj> so you could have three data centers and each
one has a complete copy of your data

08:21 <seancribbs> svjson: that's another advantage of riak if you
need long-haul repl… nodes in other DCs aren't part of the same cluster

08:21 <seancribbs> rack-awareness is… sketchy

08:21 <kreynolds> I did not realize you were required to pay for
multi-dc riak setups

08:22 <seancribbs> kreynolds: you also get the 877 number that wakes
me in the middle of the night ;)

08:22 <svjson> Right, there is talk about doing exactly that

08:22 <svjson> Or at least, having a separate "backup-rack" or
something like that.

08:22 <svjson> I haven't been briefed about the exact idea

08:23 <kreynolds> seancribbs: I can definitely see paying to wake
somebody up, I just didn't know that was a requirement

08:23 <seancribbs> kreynolds: we think that if you're ready
for multiple data centers, you're probably ready for top-tier
support too

08:24 <DeadZen> it makes sense that at that level why risk over-looking something

08:24 <kreynolds> I don't know that it's a bad thing, precisely

08:25 <svjson> Yeah, well, I don't think my customer is
going to put neither Riak nor Cassandra into production
without paying for support, anyways

08:25 <seancribbs> svjson: our VP BizDev does a great pitch for our services, if you'd like to be put in touch

08:26 <peschkaj> Paid support is an awesome thing - there's someone you can call to fix things :)

08:26 <svjson> sean: Might come a time for that. At this point,
it's still in a pre-study phase.

08:26 <seancribbs> svjson: gotcha. just let us know

08:27 <svjson> sean: Cool. First step is to convince devs &
architects about the technical merits

08:27 <svjson> About that, another question about the
technical side of things...

08:27 <svjson> I mentioned before that data will be stored for a
number of years, and after that it should be thrown out

08:28 <svjson> Are there any special features to cater for
this, or should I index a timestamp and periodically query
for documents that are older than a certain time limit?

08:28 <seancribbs> svjson: bitcask has expiration built-in

08:28 <seancribbs> but for decidedly smaller timeframes… it's measured in seconds

08:30 <kreynolds> seancribbs: My gut reaction was that paid
support shouldn't be coupled to deployment. As frequently
people want support even in simpler deployments, but in
practice for a really-real multi-dc deployment, there are
 probably only a fraction of instances where somebody
*wouldn't* want paid support

08:30 <svjson> How does that work under the hood?

08:31 <seancribbs> svjson: it's two mechanisms, first,
if the expiry_secs configuration is set, it will be
checked against on read. second, when bitcask merges,
expired items will not be written to the new file

08:32 <peschkaj> would there be any harm setting
expiry_secs to something like 31536000?

08:32 <svjson> Umm. When does bitcask merge, and why does it do it? :)

08:32 <seancribbs> peschkaj: i don't see why not

08:32 <seancribbs> svjson: there are various triggers

08:32 <peschkaj> svjson: there's a nice PDF that gives
an overview: http://downloads.basho.com/papers/bitcask-intro.pdf

08:33 <svjson> sean: In my case that would be multiples of
31536000, still not a problem?

08:33 <DeadZen> is there any reason to use innostore over bitcask?

08:33 <seancribbs> Erlang has no problem with large numbers ;)

08:33 <peschkaj> Erlang - it can count!

08:33 <seancribbs> DeadZen: yes, to strictly control memory usage

08:33 <peschkaj> svjson: the merges happen to save
space because of updates or deletions

08:33 <svjson> Amazing language.

08:33 <DeadZen> seancribbs: ahhh

08:33 <DeadZen> seancribbs: at the cost of? request speed? backup time?

08:34 <seancribbs> DeadZen: latency on inno is much less
predictable

08:34 <DeadZen> got it

08:34 <seancribbs> also, it has costlier recovery

08:34 <seancribbs> svjson: https://github.com/basho/bitcask/blob/master/ebin/bitcask.app#L47

08:34 <seancribbs> those are the config settings that specify
when bitcask merges

08:35 <seancribbs> there are triggers and thresholds, any of which
can cause merging

08:36 <seancribbs> sorry, got that a little mixed up. thresholds
determine whether a file should be merged, triggers cause merges

08:36 <svjson> Right. But no updates will ever be issued on this data, and deletes would only happen because the data is expired. So nothing would actually trigger a bitcask merge, unless I start reading expired docs

08:36 <seancribbs> svjson: you misunderstand… merges will happen because those expired items become "dead bytes"

08:37 <svjson> sean: So somehow the buckets are scanned for expired items, still?

08:38 <seancribbs> there's an erlang process called
bitcask_merge_worker which periodically checks whether it
should merge files

08:38 <seancribbs> eventually, the space will be reclaimed

08:39 <seancribbs> keep in mind, too, that bitcask stores keys in
RAM. so if you're storing lots of data, plan for lots of RAM
or many machines

08:39 <seancribbs> https://spreadsheets.google.com/a/basho.com/ccc?key=0Ar810jMxoZMDdEdMMnA5Rjl0R1puRlVhR09wajUxWlE&authkey=CJ_ajeUB&hl=en#gid=0

08:40 <svjson> sean: Many "cheap" machines, is the plan

08:40 <seancribbs> sounds good

08:42 <svjson> How would it handle a situation where there's
not enough RAM? Swapping at the cost of performance degradation ?

08:42 <seancribbs> svjson: yes. at that point, you're in a world of hurt

08:43 <peschkaj> svjson: another thing to keep in mind is that
you can't change the number of virtual nodes once you create a
bucket so you need to plan accordingly

08:43 <seancribbs> peschkaj: you mean cluster, but yes

08:43 <peschkaj> derp, yes

08:44 <DeadZen> once you create a bucket you can't change
the number of virtual nodes?

08:44 <seancribbs> cluster

08:44 <seancribbs> not bucket

08:44 <DeadZen> ah

08:44 <seancribbs> generally you plan that AOT

08:44 <svjson> Um, now I'm confused

08:45 <svjson> Once I set up a cluster, I cannot add nodes?

08:45 <seancribbs> the consistent hashing ring is divided into
a number of partitions

08:45 <seancribbs> you can add nodes, but not vnodes

08:45 <seancribbs> 1 vnode belongs to 1 partition

08:45 <svjson> Right, so if I set it up with 64, I could never
outgrow 64 physical nodes

08:46 <seancribbs> right. and that would be bad anyway because you'd have only one vnode per node

08:46 <peschkaj> seancribbs: isn't the recommendation ~10 vnodes per pnode?

08:46 <seancribbs> 10 or more

08:46 <svjson> What are the implications of using a large
number of vnode on a small number of physical nodes?

08:46 <DeadZen> how do you control that?

08:47 <seancribbs> DeadZen: http://wiki.basho.com/Configuration-Files.html#ring_creation_size

08:47 <seancribbs> svjson: more resource utilization

08:48 <seancribbs> keep in mind that within the "good" range
you can usually grow your cluster by a factor of 5 before
needing to consider repartitioning

08:48 <seancribbs> and your needs at 4 nodes will be very different than at 20

08:49 <svjson> Right.

08:50 <svjson> But we know roughly how much data we generate
per day, and for how long we'll want to store it. It should
be possible to come up with a reasonable estimate and add
some margin to that.

08:50 <seancribbs> svjson: yep

08:50 <svjson> It's not a case of "if data comes", but rather "when data comes". :)

08:50 <seancribbs> you're already a step ahead of most people if you're doing capacity estimation

08:52 <svjson> Yeah, but it's not foolproof either.
Organization and amount of and usage patterns of load-bringing
customers might change over time.

08:52 <svjson> What about if we need to grow the amount of partitions in the end - how do we approach this?

08:53 <svjson> Down-time is out of the question

08:53 <peschkaj> svjson: as I understand it, you'd built a
new cluster with more vnodes and then start migrating data to
the new cluster. As you migrate data, you can then decommission
servers from the original cluster and move them into the new cluster

08:54 <seancribbs> what peschkaj said

08:56 <svjson> Right. That adds a couple of things to consider in the application layer

08:56 <svjson> Cheap lunch, but not free :)

08:57 <peschkaj> Yup. If you know you're getting close
to your size limits, you can start writing to the new
cluster and use some kind of shard key on dates that you
keep moving backwards in time as you push data into the new cluster.

08:57 <peschkaj> Too much customer data is usually a good problem to have.

08:58 <DeadZen> so an initial ring creation size of say
512 is overkill?

09:00 <svjson> Hmm. Btw, why is it important to have several vnodes per physical machine?

09:00 <seancribbs> DeadZen: with 512 partitions, you'd reach a
practical limit of 50 nodes

09:00 <seancribbs> svjson: IO concurrency

09:00 <svjson> sean: one concurrent read or write per vnode?

09:00 <DeadZen> interesting aspect to think about

09:00 <seancribbs> svjson: yes, requests are serialized at the vnode level

09:01 <seancribbs> that is, a vnode can handle only
one at a time

09:01 <svjson> Aha. Got it.

09:01 <seancribbs> also, it's important for spread

09:01 <seancribbs> more partitions = more even spread

09:04 <DeadZen> btw i learned riak_core the other day

09:04 <DeadZen> wanted to say its really cool ;)

09:05 <seancribbs> :)

09:07 <DeadZen> i think it helps to learn riak_core
before trying to understand riak

09:07 <DeadZen> but thats just my opinion ;)

09:09 <svjson> So, what about Java client support for riak search?
I've found two java clients, but can't see that any of them mention
riak-search support

09:12 <seancribbs> svjson: there's no specific support,
although we are working on that. but in some circumstances
you can use a Solr client

09:12 <svjson> "some" sounds vague? :)

09:14 <seancribbs> well… the interface is nominally compatible with Solr

09:16 <svjson> Right. And the Solr interface is the way I
query it from my application (Erlang CLI or search-cmd
doesn't seem to be what I'm looking for)

09:46 <svjson> in any case, I think that concludes my Q&A session for the day. Thanks, everyone!
No results found