PharkMillups/gist:584539

## gistfile1.txt
07:50 <outerim> if allow_mult is not set on a bucket what happens with 2 conflicting writes
(assuming the create case) ie client A writes first with rw=quorum client B writes next with rw=quorum?

07:50 <outerim> I presume that A wins because B's write would not have a descending
vector clock

07:51 <outerim> and B would get some error response back. what code?

07:51 <seancribbs> outerim: depends on what vector clock is submitted

07:51 <seancribbs> and no error responses

07:51 <outerim> seancribbs: what if no vector clock is submitted, doesn't riak
generate an initial one based on the client id?

07:52 <seancribbs> basically.. it assumes the empty vector-clock

07:52 <outerim> seancribbs: so you'd just have to do a readbody and test if it was what
you wrote?

07:52 <seancribbs> yes

07:52 <seancribbs> returnbody

07:54 <outerim> seancribbs: so correct me if I'm wrong but with quorum r/rw values you are
guaranteed that the first write will win (in the create case) and that the second will
receive the first's body with returnbody?

07:54 <outerim> seancribbs: (thanks for your help btw)

07:55 <seancribbs> outerim: mmm not sure exactly how to answer that

07:55 <outerim> seancribbs: also worth mentioning that I'm using riak-client from ripple,
haven't looked to see if it creates an initial vclock that riak understands...

07:55 <seancribbs> depends on the ordering of events

07:55 <seancribbs> it won't send the vclock the first time because there is none

07:55 <outerim> seancribbs: do tell if you don't mind :

07:55 <seancribbs> i.e. the empty vclock

07:55 <outerim> right

07:57 <seancribbs> outerim: probably best shown in an example. I'll gist some IRB output
in a minute

07:57 <outerim> seancribbs: that would be excellent

08:00 <seancribbs> outerim: http://gist.github.com/580770

08:00 <outerim> looking

08:00 <seancribbs> so… the answer is, when allow_mult is false, it takes the one with the
latest timestamp

08:01 <outerim> seancribbs: is that only true of the both clients are using the same client id case?

08:02 <seancribbs> no, those are using different client ids

08:02 <outerim> oh yeah I see 2 clients... hmm

08:03 <outerim> seancribbs: so what is the point of last_write_wins on a bucket?

08:03 <outerim> and what does it default to?

08:03 <seancribbs> ignoring vector clocks entirely has significant speed implications

08:03 <seancribbs> it means you don't ever compare or update them

08:04 <outerim> so isn't it ignoring vector clocks when it says (as in your example) the last
write wins?

08:04 <seancribbs> so…if i can explain this correctly, in the gist this worked because both versions
descended from the same vclock (the empty one)

08:04 <outerim> or since there are no vclocks at that point..

08:05 <seancribbs> so if you updated obj2 a couple of times, then updated obj1 with a stale vclock,
it should not take

08:05 <outerim> but the second didn't descend from the first one which should have been stored
at the time the second write occurred right?

08:05 <seancribbs> but they had the same parent

08:07 <outerim> hmm, so it allows a write to be lost if 2 conflicting writes have the same parent?
that doesn't seem right to me :\

08:07 <seancribbs> soo…. as i was telling people last night, if you're trying to prevent concurrent
creations from clobbering each other, don't expect Riak to stop you, unless you have allow_mult turned on

08:11 <outerim> anyway so if that's not going to work, what if allow_mult is true? presumably
client 2's write would have  gotten a multiple choices back. in my application I need to be
able to guarantee that if one client claimed a key by creating it that cannot be clobbered

08:11 <outerim> how would you do that?

08:11 <seancribbs> (updated my gist)

08:11 <outerim> can I trust the timestamps (hopefully my machines clocks are in sync) :\

08:12 <seancribbs> there's no 100% way to guarantee your clients won't clobber each other

08:12 <seancribbs> allow_mult just exposes it to you

08:13 <seancribbs> and the timestamps are internal to riak, so if your riak machines are reasonably
synced via NTP, it should be mostly ok

08:13 <seancribbs> but we usually suggest other means for resolving conflicts

08:15 <outerim> seancribbs: in this case, think of someone claiming something akin to a subdomain in a
system for themselves. first write succeeds and the client sees no conflict second write generates a conflict,
we want the second one to lose because the first one was already told he'd claimed it. is there
a better way than timestamps for this case

08:15 <seancribbs> well, if this is in the context of web requests, check for its existence first
(maybe via ajax)

08:16 <seancribbs> and also check for its existence when they submit the form

08:16 <outerim> seancribbs: API actually, and yeah I will of course check for existence before
attempting to write

08:17 <seancribbs> ok

08:17 <seancribbs> you can also use If-None-Match: * header for a little extra checking

08:17 <seancribbs> i should add that to the client

08:18 <seancribbs> something like obj.store_conditionally = true

08:18 <seancribbs> or obj.prevent_stale_writes = true

08:19 <outerim> seancribbs: ooh, that could be good... how would riak handle that internally if 2 writes
literally came in on different nodes at ~ the exact same time

08:19 <seancribbs> outerim: you would hard pressed to deal with that without allow_mult

08:20 <seancribbs> the scope of the critical section on the Riak side is very small, but there is a
read before the write

08:21 <outerim> seancribbs: I am having a hard time even visualizing the madness on the
riak side, so request comes in and it's committed to at least N nodes where N is my rw value

08:21 <seancribbs> right, so this is why we have vector clocks

08:21 <seancribbs> to remove the madness

08:22 <seancribbs> or at least, to make it deterministic

08:22 <outerim> any one of those could either fail because of If-None-Match or generate a
conflict right?

08:22 <outerim> if allow_mult is true

08:22 <seancribbs> if-none-match is checked after the read, in the web endpoint

08:23 <outerim> seancribbs: by web endpoint you mean the host the client is connected
to for it's session

08:23 <seancribbs> yes, i mean the webmachine resource sitting at /riak

08:24 <outerim> so even with if-none-match * you're only guaranteed that there wasn't
anything there when your write entered the system to be committed 2 could have come in
both passed the if-none-match check and been committed (either generating a conflict with allow_mult,
or one of them winning based on timestamp)

08:24 <outerim> do I understand that correctly

08:24 <seancribbs> yep

08:25 <seancribbs> ordering of those events is unpredictable

08:25 <seancribbs> even on a per-replica basis

08:25 <outerim> seancribbs: so is it possible that neither could get back a conflict
message at the time of their write (assuming rw/r=quorum)?

08:26 <seancribbs> yes

08:26 <outerim> seancribbs: and if the ordering per replica is unpredictable is it
possible for different replicas to have different values

08:26 <seancribbs> although, i doubt it's that likely

08:26 <seancribbs> outerim: yes, absolutely.

08:26 <outerim> seancribbs: oh good lord... that scares me ;)

08:27 <seancribbs> the thing is, the window for these conflicts will be really small

08:27 <seancribbs> much smaller than a single request through your API

08:27 <outerim> seancribbs: yeah, totally understood, but it still exists :)

08:28 <outerim> so those things are fixed up in post by read repairs?

08:28 <seancribbs> outerim: right, so I would say use allow_mult for the "just in case" scenarios

08:28 <seancribbs> yes

08:28 <outerim> when idle is riak checking those sorts of things proactively by chance?

08:28 <seancribbs> no

08:29 <outerim> read repair was one thing from the docs I wasn't entirely clear on,
with an r value of 'all' and inconsistent data between 2 replicas would my read fail or
would the data be repaired

08:29 <outerim> ?

08:30 <seancribbs> read repair happens regardless of whether your request quorum was
met, and after a reply has been returned

08:30 <seancribbs> outerim: http://github.com/seancribbs/ripple/issues#issue/61

08:30 <seancribbs> so you might see one request fail, and the following one succeed

08:34 <outerim> seancribbs: so the request quorum is the # of hosts that have to agree on a value
for the read to succeed right? assuming the bucket's n val is 3 and one host has stale data if my
read has r=2 I may see a failed read (if it picks a good and stale host) or succeed (picked 2 good
hosts) only in the first case will the stale host be repaired and subsequent reads should not fail?

08:35 <outerim> seancribbs: would a pull request be appreciated for issue 61 if I were to whip something up?

08:36 <seancribbs> outerim: yes, absolutely

08:39 <outerim> seancribbs: and not 2 replicas but 2 partition owners right? (just clarifying)

08:39 <outerim> seancribbs: you said "so you might see one request fail, and the following one
succeed" under what conditions might a request fail? only if the data hasn't been replicated
to enough partitions or a host is down?

08:40 <outerim> so if I write with an rw=1 and try to read with an r=3 it
might fail? other cases?

08:40 <seancribbs> 2 replicas == 2 partition owners

08:41 <seancribbs> the request might fail if not all of the replicas are
populated

08:41 <seancribbs> i.e. if enough of them come back with "not found" so that the
quorum can't be satisfied

08:41 <seancribbs> but if the last one comes back with a value (which happens frequently
because it's slower to send the value than send "not found")

08:41 <seancribbs> it will be read-repaired back to the other nodes

08:41 <seancribbs> err partitions

08:42 <seancribbs> then on the next request, you'll get a success

08:42 <outerim> seancribbs: and does riak always try the reads from all replicas or just whatever
your r value is.

08:42 <outerim> seancribbs: that makes sense and was how I understood it I think..

08:43 <seancribbs> always requests all replicas, replies to client as soon as R is met or is known
not to be satisfiable

08:43 <outerim> by all replicas I mean all the replicas that own a specific key

08:43 <seancribbs> outerim: yes, we're on the same page

08:44 <outerim> seancribbs: cool, appreciate your help very much it's clarified a lot for me.


08:44 <seancribbs> no problem. if you know some Erlang and want more definitive
answers to how this works, read the riak_kv_get_fsm module

08:45 <seancribbs> it's illuminating

08:45 <seancribbs> also the get_fsm_qc QuickCheck tests

08:46 <outerim> seancribbs: I've attempted a few times to learn erlang. perhaps I'll have a look
	07:50 <outerim> if allow_mult is not set on a bucket what happens with 2 conflicting writes
	(assuming the create case) ie client A writes first with rw=quorum client B writes next with rw=quorum?

	07:50 <outerim> I presume that A wins because B's write would not have a descending
	vector clock

	07:51 <outerim> and B would get some error response back. what code?

	07:51 <seancribbs> outerim: depends on what vector clock is submitted

	07:51 <seancribbs> and no error responses

	07:51 <outerim> seancribbs: what if no vector clock is submitted, doesn't riak
	generate an initial one based on the client id?

	07:52 <seancribbs> basically.. it assumes the empty vector-clock

	07:52 <outerim> seancribbs: so you'd just have to do a readbody and test if it was what
	you wrote?

	07:52 <seancribbs> yes

	07:52 <seancribbs> returnbody

	07:54 <outerim> seancribbs: so correct me if I'm wrong but with quorum r/rw values you are
	guaranteed that the first write will win (in the create case) and that the second will
	receive the first's body with returnbody?

	07:54 <outerim> seancribbs: (thanks for your help btw)

	07:55 <seancribbs> outerim: mmm not sure exactly how to answer that

	07:55 <outerim> seancribbs: also worth mentioning that I'm using riak-client from ripple,
	haven't looked to see if it creates an initial vclock that riak understands...

	07:55 <seancribbs> depends on the ordering of events

	07:55 <seancribbs> it won't send the vclock the first time because there is none

	07:55 <outerim> seancribbs: do tell if you don't mind :

	07:55 <seancribbs> i.e. the empty vclock

	07:55 <outerim> right

	07:57 <seancribbs> outerim: probably best shown in an example. I'll gist some IRB output
	in a minute

	07:57 <outerim> seancribbs: that would be excellent

	08:00 <seancribbs> outerim: http://gist.github.com/580770

	08:00 <outerim> looking

	08:00 <seancribbs> so… the answer is, when allow_mult is false, it takes the one with the
	latest timestamp

	08:01 <outerim> seancribbs: is that only true of the both clients are using the same client id case?

	08:02 <seancribbs> no, those are using different client ids

	08:02 <outerim> oh yeah I see 2 clients... hmm

	08:03 <outerim> seancribbs: so what is the point of last_write_wins on a bucket?

	08:03 <outerim> and what does it default to?

	08:03 <seancribbs> ignoring vector clocks entirely has significant speed implications

	08:03 <seancribbs> it means you don't ever compare or update them

	08:04 <outerim> so isn't it ignoring vector clocks when it says (as in your example) the last
	write wins?

	08:04 <seancribbs> so…if i can explain this correctly, in the gist this worked because both versions
	descended from the same vclock (the empty one)

	08:04 <outerim> or since there are no vclocks at that point..

	08:05 <seancribbs> so if you updated obj2 a couple of times, then updated obj1 with a stale vclock,
	it should not take

	08:05 <outerim> but the second didn't descend from the first one which should have been stored
	at the time the second write occurred right?

	08:05 <seancribbs> but they had the same parent

	08:07 <outerim> hmm, so it allows a write to be lost if 2 conflicting writes have the same parent?
	that doesn't seem right to me :\

	08:07 <seancribbs> soo…. as i was telling people last night, if you're trying to prevent concurrent
	creations from clobbering each other, don't expect Riak to stop you, unless you have allow_mult turned on

	08:11 <outerim> anyway so if that's not going to work, what if allow_mult is true? presumably
	client 2's write would have gotten a multiple choices back. in my application I need to be
	able to guarantee that if one client claimed a key by creating it that cannot be clobbered

	08:11 <outerim> how would you do that?

	08:11 <seancribbs> (updated my gist)

	08:11 <outerim> can I trust the timestamps (hopefully my machines clocks are in sync) :\

	08:12 <seancribbs> there's no 100% way to guarantee your clients won't clobber each other

	08:12 <seancribbs> allow_mult just exposes it to you

	08:13 <seancribbs> and the timestamps are internal to riak, so if your riak machines are reasonably
	synced via NTP, it should be mostly ok

	08:13 <seancribbs> but we usually suggest other means for resolving conflicts

	08:15 <outerim> seancribbs: in this case, think of someone claiming something akin to a subdomain in a
	system for themselves. first write succeeds and the client sees no conflict second write generates a conflict,
	we want the second one to lose because the first one was already told he'd claimed it. is there
	a better way than timestamps for this case

	08:15 <seancribbs> well, if this is in the context of web requests, check for its existence first
	(maybe via ajax)

	08:16 <seancribbs> and also check for its existence when they submit the form

	08:16 <outerim> seancribbs: API actually, and yeah I will of course check for existence before
	attempting to write

	08:17 <seancribbs> ok

	08:17 <seancribbs> you can also use If-None-Match: * header for a little extra checking

	08:17 <seancribbs> i should add that to the client

	08:18 <seancribbs> something like obj.store_conditionally = true

	08:18 <seancribbs> or obj.prevent_stale_writes = true

	08:19 <outerim> seancribbs: ooh, that could be good... how would riak handle that internally if 2 writes
	literally came in on different nodes at ~ the exact same time

	08:19 <seancribbs> outerim: you would hard pressed to deal with that without allow_mult

	08:20 <seancribbs> the scope of the critical section on the Riak side is very small, but there is a
	read before the write

	08:21 <outerim> seancribbs: I am having a hard time even visualizing the madness on the
	riak side, so request comes in and it's committed to at least N nodes where N is my rw value

	08:21 <seancribbs> right, so this is why we have vector clocks

	08:21 <seancribbs> to remove the madness

	08:22 <seancribbs> or at least, to make it deterministic

	08:22 <outerim> any one of those could either fail because of If-None-Match or generate a
	conflict right?

	08:22 <outerim> if allow_mult is true

	08:22 <seancribbs> if-none-match is checked after the read, in the web endpoint

	08:23 <outerim> seancribbs: by web endpoint you mean the host the client is connected
	to for it's session

	08:23 <seancribbs> yes, i mean the webmachine resource sitting at /riak

	08:24 <outerim> so even with if-none-match * you're only guaranteed that there wasn't
	anything there when your write entered the system to be committed 2 could have come in
	both passed the if-none-match check and been committed (either generating a conflict with allow_mult,
	or one of them winning based on timestamp)

	08:24 <outerim> do I understand that correctly

	08:24 <seancribbs> yep

	08:25 <seancribbs> ordering of those events is unpredictable

	08:25 <seancribbs> even on a per-replica basis

	08:25 <outerim> seancribbs: so is it possible that neither could get back a conflict
	message at the time of their write (assuming rw/r=quorum)?

	08:26 <seancribbs> yes

	08:26 <outerim> seancribbs: and if the ordering per replica is unpredictable is it
	possible for different replicas to have different values

	08:26 <seancribbs> although, i doubt it's that likely

	08:26 <seancribbs> outerim: yes, absolutely.

	08:26 <outerim> seancribbs: oh good lord... that scares me ;)

	08:27 <seancribbs> the thing is, the window for these conflicts will be really small

	08:27 <seancribbs> much smaller than a single request through your API

	08:27 <outerim> seancribbs: yeah, totally understood, but it still exists :)

	08:28 <outerim> so those things are fixed up in post by read repairs?

	08:28 <seancribbs> outerim: right, so I would say use allow_mult for the "just in case" scenarios

	08:28 <seancribbs> yes

	08:28 <outerim> when idle is riak checking those sorts of things proactively by chance?

	08:28 <seancribbs> no

	08:29 <outerim> read repair was one thing from the docs I wasn't entirely clear on,
	with an r value of 'all' and inconsistent data between 2 replicas would my read fail or
	would the data be repaired

	08:29 <outerim> ?

	08:30 <seancribbs> read repair happens regardless of whether your request quorum was
	met, and after a reply has been returned

	08:30 <seancribbs> outerim: http://github.com/seancribbs/ripple/issues#issue/61

	08:30 <seancribbs> so you might see one request fail, and the following one succeed

	08:34 <outerim> seancribbs: so the request quorum is the # of hosts that have to agree on a value
	for the read to succeed right? assuming the bucket's n val is 3 and one host has stale data if my
	read has r=2 I may see a failed read (if it picks a good and stale host) or succeed (picked 2 good
	hosts) only in the first case will the stale host be repaired and subsequent reads should not fail?

	08:35 <outerim> seancribbs: would a pull request be appreciated for issue 61 if I were to whip something up?

	08:36 <seancribbs> outerim: yes, absolutely

	08:39 <outerim> seancribbs: and not 2 replicas but 2 partition owners right? (just clarifying)

	08:39 <outerim> seancribbs: you said "so you might see one request fail, and the following one
	succeed" under what conditions might a request fail? only if the data hasn't been replicated
	to enough partitions or a host is down?

	08:40 <outerim> so if I write with an rw=1 and try to read with an r=3 it
	might fail? other cases?

	08:40 <seancribbs> 2 replicas == 2 partition owners

	08:41 <seancribbs> the request might fail if not all of the replicas are
	populated

	08:41 <seancribbs> i.e. if enough of them come back with "not found" so that the
	quorum can't be satisfied

	08:41 <seancribbs> but if the last one comes back with a value (which happens frequently
	because it's slower to send the value than send "not found")

	08:41 <seancribbs> it will be read-repaired back to the other nodes

	08:41 <seancribbs> err partitions

	08:42 <seancribbs> then on the next request, you'll get a success

	08:42 <outerim> seancribbs: and does riak always try the reads from all replicas or just whatever
	your r value is.

	08:42 <outerim> seancribbs: that makes sense and was how I understood it I think..

	08:43 <seancribbs> always requests all replicas, replies to client as soon as R is met or is known
	not to be satisfiable

	08:43 <outerim> by all replicas I mean all the replicas that own a specific key

	08:43 <seancribbs> outerim: yes, we're on the same page

	08:44 <outerim> seancribbs: cool, appreciate your help very much it's clarified a lot for me.


	08:44 <seancribbs> no problem. if you know some Erlang and want more definitive
	answers to how this works, read the riak_kv_get_fsm module

	08:45 <seancribbs> it's illuminating

	08:45 <seancribbs> also the get_fsm_qc QuickCheck tests

	08:46 <outerim> seancribbs: I've attempted a few times to learn erlang. perhaps I'll have a look
No results found