Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created February 2, 2011 16:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/807976 to your computer and use it in GitHub Desktop.
Save PharkMillups/807976 to your computer and use it in GitHub Desktop.
10:35 <siculars> nodes a,b,c . a goes poof . some time later d
comes online. keys are shipped to d on read (read/repair) . can i
get a pointer on where to look in the code to see how that happens?
aka. how does riak distinguish internally between a new write and a
copy from a pre-existing replica. aka. how does it know to not trigger
a hook again for example. is there an internal copy method? yes, i could
just stumble through the code... but then what would you all do all day if not
answer questions on irc?
10:37 <seancribbs> siculars: lol
10:39 <siculars> what would mark have to write about in the recap?
gotta give him some gist materials occasionally...
10:40 <seancribbs> siculars: so are you serious about the question or what
10:40 <siculars> seancribbs: ya! lf answers pls.
10:46 <drev1> siculars: hooks are executed by the coordinating node
when the request is processed
10:49 <drev1> the coordinating node is also responsible for
initiating read repair:
https://github.com/basho/riak_kv/blob/master/src/riak_kv_get_fsm.erl#L247
10:53 <drev1> in addition to read repair, data is also exchanged via handoff.
\In the scenario you described, when node d was brought online a portion of
the vnodes would become aware that they needed to migrate to node d
and would start handing off data. Each vnode becomes aware of
needing to handoff by periodically checking if it is running on
the correct node:
https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L130
11:00 <siculars> drev1: thank you. so in theory, in the simple example
above, wouldnt a handoff cover all keys where a replica were missing and
obviate the need for a read/repair? i take it the read/repair would occur in
the scenario where a key was read while the cluster was still operating with
2 nodes (b,c) before d came online.
11:02 <drev1> the vnodes that belong to node A would need to be read repaired
11:02 <drev1> with or without node D
11:07 <siculars> so what does "when node d was brought online a portion of the
vnodes would become aware that they needed to migrate to node d and would start
handing off data." mean? isnt "handing off data" an auto data dump/copy? whats
going on there...
11:08 <drev1> when node D is joined to the cluster; the cluster determines that
a portion of vnodes will move to node D
11:08 <drev1> the ring file is changed
11:09 <drev1> on all nodes in the cluster
11:09 <siculars> what about the data ...
11:09 <drev1> on node D; the vnode master starts up vnodes to handle
the partitions it is responsible for
11:10 <drev1> on nodes B and C; some of the vnodes will realize that
they've been moved to node D
11:10 <drev1> based on the updated ring file
11:10 <drev1> once the vnodes realized they've been moved; they will
contact node D and start transferring data
11:11 <siculars> so data -is- moving without a read repair , ya ?
11:11 <drev1> yes
11:11 <siculars> cause in that scenarion no data was read by a client
11:12 <siculars> sorry to beat a horse here , but would that not cover
100% of the keys and their replica levels? why is a read repair necessary?
11:13 <drev1> in the scenario you presented you also had node A fail
11:13 <drev1> at the beginning you had 3 ndoes
11:13 <drev1> 64 partitions are spread across those 3 nodes
11:13 <drev1> from node A fails; the data in those partitions are not available
11:14 <siculars> and you do in the end a -> d
11:14 <drev1> I don't understand
11:14 <siculars> but that data on a exists in some replica on b,c
11:14 <drev1> yes
11:14 <drev1> you don't lose any data
11:14 <drev1> but some replicas are not available
11:14 <drev1> because they were on node A
11:16 <siculars> so you have to copy the specific replica . riak wont
copy the replica on B to D as if it were A.
11:17 <drev1> Riak will transfer vnodes from B to D
11:17 <drev1> let's forget about replicas for a moment
11:17 <drev1> you have 64 partitions
11:17 <drev1> distributed across 3 nodes
11:17 <drev1> you lose a node
11:17 <drev1> 1/3 of the partitions are gone
11:17 <drev1> the data is not available
11:18 <drev1> you bring a new node online, do some remove/join magic
11:18 <drev1> you now have 3 nodes again
11:18 <drev1> with 64 partitions
11:18 <drev1> but 1/3 of the partitions are empty
11:18 <drev1> because the data on the failed node is not available
11:19 <drev1> read repair will populate the empty partitions
11:31 <siculars> drev1: thanks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment