Created
February 2, 2011 16:52
-
-
Save PharkMillups/807976 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10:35 <siculars> nodes a,b,c . a goes poof . some time later d | |
comes online. keys are shipped to d on read (read/repair) . can i | |
get a pointer on where to look in the code to see how that happens? | |
aka. how does riak distinguish internally between a new write and a | |
copy from a pre-existing replica. aka. how does it know to not trigger | |
a hook again for example. is there an internal copy method? yes, i could | |
just stumble through the code... but then what would you all do all day if not | |
answer questions on irc? | |
10:37 <seancribbs> siculars: lol | |
10:39 <siculars> what would mark have to write about in the recap? | |
gotta give him some gist materials occasionally... | |
10:40 <seancribbs> siculars: so are you serious about the question or what | |
10:40 <siculars> seancribbs: ya! lf answers pls. | |
10:46 <drev1> siculars: hooks are executed by the coordinating node | |
when the request is processed | |
10:49 <drev1> the coordinating node is also responsible for | |
initiating read repair: | |
https://github.com/basho/riak_kv/blob/master/src/riak_kv_get_fsm.erl#L247 | |
10:53 <drev1> in addition to read repair, data is also exchanged via handoff. | |
\In the scenario you described, when node d was brought online a portion of | |
the vnodes would become aware that they needed to migrate to node d | |
and would start handing off data. Each vnode becomes aware of | |
needing to handoff by periodically checking if it is running on | |
the correct node: | |
https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L130 | |
11:00 <siculars> drev1: thank you. so in theory, in the simple example | |
above, wouldnt a handoff cover all keys where a replica were missing and | |
obviate the need for a read/repair? i take it the read/repair would occur in | |
the scenario where a key was read while the cluster was still operating with | |
2 nodes (b,c) before d came online. | |
11:02 <drev1> the vnodes that belong to node A would need to be read repaired | |
11:02 <drev1> with or without node D | |
11:07 <siculars> so what does "when node d was brought online a portion of the | |
vnodes would become aware that they needed to migrate to node d and would start | |
handing off data." mean? isnt "handing off data" an auto data dump/copy? whats | |
going on there... | |
11:08 <drev1> when node D is joined to the cluster; the cluster determines that | |
a portion of vnodes will move to node D | |
11:08 <drev1> the ring file is changed | |
11:09 <drev1> on all nodes in the cluster | |
11:09 <siculars> what about the data ... | |
11:09 <drev1> on node D; the vnode master starts up vnodes to handle | |
the partitions it is responsible for | |
11:10 <drev1> on nodes B and C; some of the vnodes will realize that | |
they've been moved to node D | |
11:10 <drev1> based on the updated ring file | |
11:10 <drev1> once the vnodes realized they've been moved; they will | |
contact node D and start transferring data | |
11:11 <siculars> so data -is- moving without a read repair , ya ? | |
11:11 <drev1> yes | |
11:11 <siculars> cause in that scenarion no data was read by a client | |
11:12 <siculars> sorry to beat a horse here , but would that not cover | |
100% of the keys and their replica levels? why is a read repair necessary? | |
11:13 <drev1> in the scenario you presented you also had node A fail | |
11:13 <drev1> at the beginning you had 3 ndoes | |
11:13 <drev1> 64 partitions are spread across those 3 nodes | |
11:13 <drev1> from node A fails; the data in those partitions are not available | |
11:14 <siculars> and you do in the end a -> d | |
11:14 <drev1> I don't understand | |
11:14 <siculars> but that data on a exists in some replica on b,c | |
11:14 <drev1> yes | |
11:14 <drev1> you don't lose any data | |
11:14 <drev1> but some replicas are not available | |
11:14 <drev1> because they were on node A | |
11:16 <siculars> so you have to copy the specific replica . riak wont | |
copy the replica on B to D as if it were A. | |
11:17 <drev1> Riak will transfer vnodes from B to D | |
11:17 <drev1> let's forget about replicas for a moment | |
11:17 <drev1> you have 64 partitions | |
11:17 <drev1> distributed across 3 nodes | |
11:17 <drev1> you lose a node | |
11:17 <drev1> 1/3 of the partitions are gone | |
11:17 <drev1> the data is not available | |
11:18 <drev1> you bring a new node online, do some remove/join magic | |
11:18 <drev1> you now have 3 nodes again | |
11:18 <drev1> with 64 partitions | |
11:18 <drev1> but 1/3 of the partitions are empty | |
11:18 <drev1> because the data on the failed node is not available | |
11:19 <drev1> read repair will populate the empty partitions | |
11:31 <siculars> drev1: thanks |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment