Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created September 17, 2010 16:57
Show Gist options
  • Select an option

  • Save PharkMillups/584539 to your computer and use it in GitHub Desktop.

Select an option

Save PharkMillups/584539 to your computer and use it in GitHub Desktop.
07:50 <outerim> if allow_mult is not set on a bucket what happens with 2 conflicting writes
(assuming the create case) ie client A writes first with rw=quorum client B writes next with rw=quorum?
07:50 <outerim> I presume that A wins because B's write would not have a descending
vector clock
07:51 <outerim> and B would get some error response back. what code?
07:51 <seancribbs> outerim: depends on what vector clock is submitted
07:51 <seancribbs> and no error responses
07:51 <outerim> seancribbs: what if no vector clock is submitted, doesn't riak
generate an initial one based on the client id?
07:52 <seancribbs> basically.. it assumes the empty vector-clock
07:52 <outerim> seancribbs: so you'd just have to do a readbody and test if it was what
you wrote?
07:52 <seancribbs> yes
07:52 <seancribbs> returnbody
07:54 <outerim> seancribbs: so correct me if I'm wrong but with quorum r/rw values you are
guaranteed that the first write will win (in the create case) and that the second will
receive the first's body with returnbody?
07:54 <outerim> seancribbs: (thanks for your help btw)
07:55 <seancribbs> outerim: mmm not sure exactly how to answer that
07:55 <outerim> seancribbs: also worth mentioning that I'm using riak-client from ripple,
haven't looked to see if it creates an initial vclock that riak understands...
07:55 <seancribbs> depends on the ordering of events
07:55 <seancribbs> it won't send the vclock the first time because there is none
07:55 <outerim> seancribbs: do tell if you don't mind :
07:55 <seancribbs> i.e. the empty vclock
07:55 <outerim> right
07:57 <seancribbs> outerim: probably best shown in an example. I'll gist some IRB output
in a minute
07:57 <outerim> seancribbs: that would be excellent
08:00 <seancribbs> outerim: http://gist.github.com/580770
08:00 <outerim> looking
08:00 <seancribbs> so… the answer is, when allow_mult is false, it takes the one with the
latest timestamp
08:01 <outerim> seancribbs: is that only true of the both clients are using the same client id case?
08:02 <seancribbs> no, those are using different client ids
08:02 <outerim> oh yeah I see 2 clients... hmm
08:03 <outerim> seancribbs: so what is the point of last_write_wins on a bucket?
08:03 <outerim> and what does it default to?
08:03 <seancribbs> ignoring vector clocks entirely has significant speed implications
08:03 <seancribbs> it means you don't ever compare or update them
08:04 <outerim> so isn't it ignoring vector clocks when it says (as in your example) the last
write wins?
08:04 <seancribbs> so…if i can explain this correctly, in the gist this worked because both versions
descended from the same vclock (the empty one)
08:04 <outerim> or since there are no vclocks at that point..
08:05 <seancribbs> so if you updated obj2 a couple of times, then updated obj1 with a stale vclock,
it should not take
08:05 <outerim> but the second didn't descend from the first one which should have been stored
at the time the second write occurred right?
08:05 <seancribbs> but they had the same parent
08:07 <outerim> hmm, so it allows a write to be lost if 2 conflicting writes have the same parent?
that doesn't seem right to me :\
08:07 <seancribbs> soo…. as i was telling people last night, if you're trying to prevent concurrent
creations from clobbering each other, don't expect Riak to stop you, unless you have allow_mult turned on
08:11 <outerim> anyway so if that's not going to work, what if allow_mult is true? presumably
client 2's write would have gotten a multiple choices back. in my application I need to be
able to guarantee that if one client claimed a key by creating it that cannot be clobbered
08:11 <outerim> how would you do that?
08:11 <seancribbs> (updated my gist)
08:11 <outerim> can I trust the timestamps (hopefully my machines clocks are in sync) :\
08:12 <seancribbs> there's no 100% way to guarantee your clients won't clobber each other
08:12 <seancribbs> allow_mult just exposes it to you
08:13 <seancribbs> and the timestamps are internal to riak, so if your riak machines are reasonably
synced via NTP, it should be mostly ok
08:13 <seancribbs> but we usually suggest other means for resolving conflicts
08:15 <outerim> seancribbs: in this case, think of someone claiming something akin to a subdomain in a
system for themselves. first write succeeds and the client sees no conflict second write generates a conflict,
we want the second one to lose because the first one was already told he'd claimed it. is there
a better way than timestamps for this case
08:15 <seancribbs> well, if this is in the context of web requests, check for its existence first
(maybe via ajax)
08:16 <seancribbs> and also check for its existence when they submit the form
08:16 <outerim> seancribbs: API actually, and yeah I will of course check for existence before
attempting to write
08:17 <seancribbs> ok
08:17 <seancribbs> you can also use If-None-Match: * header for a little extra checking
08:17 <seancribbs> i should add that to the client
08:18 <seancribbs> something like obj.store_conditionally = true
08:18 <seancribbs> or obj.prevent_stale_writes = true
08:19 <outerim> seancribbs: ooh, that could be good... how would riak handle that internally if 2 writes
literally came in on different nodes at ~ the exact same time
08:19 <seancribbs> outerim: you would hard pressed to deal with that without allow_mult
08:20 <seancribbs> the scope of the critical section on the Riak side is very small, but there is a
read before the write
08:21 <outerim> seancribbs: I am having a hard time even visualizing the madness on the
riak side, so request comes in and it's committed to at least N nodes where N is my rw value
08:21 <seancribbs> right, so this is why we have vector clocks
08:21 <seancribbs> to remove the madness
08:22 <seancribbs> or at least, to make it deterministic
08:22 <outerim> any one of those could either fail because of If-None-Match or generate a
conflict right?
08:22 <outerim> if allow_mult is true
08:22 <seancribbs> if-none-match is checked after the read, in the web endpoint
08:23 <outerim> seancribbs: by web endpoint you mean the host the client is connected
to for it's session
08:23 <seancribbs> yes, i mean the webmachine resource sitting at /riak
08:24 <outerim> so even with if-none-match * you're only guaranteed that there wasn't
anything there when your write entered the system to be committed 2 could have come in
both passed the if-none-match check and been committed (either generating a conflict with allow_mult,
or one of them winning based on timestamp)
08:24 <outerim> do I understand that correctly
08:24 <seancribbs> yep
08:25 <seancribbs> ordering of those events is unpredictable
08:25 <seancribbs> even on a per-replica basis
08:25 <outerim> seancribbs: so is it possible that neither could get back a conflict
message at the time of their write (assuming rw/r=quorum)?
08:26 <seancribbs> yes
08:26 <outerim> seancribbs: and if the ordering per replica is unpredictable is it
possible for different replicas to have different values
08:26 <seancribbs> although, i doubt it's that likely
08:26 <seancribbs> outerim: yes, absolutely.
08:26 <outerim> seancribbs: oh good lord... that scares me ;)
08:27 <seancribbs> the thing is, the window for these conflicts will be really small
08:27 <seancribbs> much smaller than a single request through your API
08:27 <outerim> seancribbs: yeah, totally understood, but it still exists :)
08:28 <outerim> so those things are fixed up in post by read repairs?
08:28 <seancribbs> outerim: right, so I would say use allow_mult for the "just in case" scenarios
08:28 <seancribbs> yes
08:28 <outerim> when idle is riak checking those sorts of things proactively by chance?
08:28 <seancribbs> no
08:29 <outerim> read repair was one thing from the docs I wasn't entirely clear on,
with an r value of 'all' and inconsistent data between 2 replicas would my read fail or
would the data be repaired
08:29 <outerim> ?
08:30 <seancribbs> read repair happens regardless of whether your request quorum was
met, and after a reply has been returned
08:30 <seancribbs> outerim: http://github.com/seancribbs/ripple/issues#issue/61
08:30 <seancribbs> so you might see one request fail, and the following one succeed
08:34 <outerim> seancribbs: so the request quorum is the # of hosts that have to agree on a value
for the read to succeed right? assuming the bucket's n val is 3 and one host has stale data if my
read has r=2 I may see a failed read (if it picks a good and stale host) or succeed (picked 2 good
hosts) only in the first case will the stale host be repaired and subsequent reads should not fail?
08:35 <outerim> seancribbs: would a pull request be appreciated for issue 61 if I were to whip something up?
08:36 <seancribbs> outerim: yes, absolutely
08:39 <outerim> seancribbs: and not 2 replicas but 2 partition owners right? (just clarifying)
08:39 <outerim> seancribbs: you said "so you might see one request fail, and the following one
succeed" under what conditions might a request fail? only if the data hasn't been replicated
to enough partitions or a host is down?
08:40 <outerim> so if I write with an rw=1 and try to read with an r=3 it
might fail? other cases?
08:40 <seancribbs> 2 replicas == 2 partition owners
08:41 <seancribbs> the request might fail if not all of the replicas are
populated
08:41 <seancribbs> i.e. if enough of them come back with "not found" so that the
quorum can't be satisfied
08:41 <seancribbs> but if the last one comes back with a value (which happens frequently
because it's slower to send the value than send "not found")
08:41 <seancribbs> it will be read-repaired back to the other nodes
08:41 <seancribbs> err partitions
08:42 <seancribbs> then on the next request, you'll get a success
08:42 <outerim> seancribbs: and does riak always try the reads from all replicas or just whatever
your r value is.
08:42 <outerim> seancribbs: that makes sense and was how I understood it I think..
08:43 <seancribbs> always requests all replicas, replies to client as soon as R is met or is known
not to be satisfiable
08:43 <outerim> by all replicas I mean all the replicas that own a specific key
08:43 <seancribbs> outerim: yes, we're on the same page
08:44 <outerim> seancribbs: cool, appreciate your help very much it's clarified a lot for me.
08:44 <seancribbs> no problem. if you know some Erlang and want more definitive
answers to how this works, read the riak_kv_get_fsm module
08:45 <seancribbs> it's illuminating
08:45 <seancribbs> also the get_fsm_qc QuickCheck tests
08:46 <outerim> seancribbs: I've attempted a few times to learn erlang. perhaps I'll have a look
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment