PharkMillups/gist:807976

## gistfile1.txt
10:35 <siculars> nodes a,b,c . a goes poof . some time later d
comes online. keys are shipped to d on read (read/repair) . can i
get a pointer on where to look in the code to see how that happens?
aka. how does riak distinguish internally between a new write and a
copy from a pre-existing replica. aka. how does it know to not trigger
a hook again for example. is there an internal copy method? yes, i could
just stumble through the code... but then what would you all do all day if not
answer questions on irc?


10:37 <seancribbs> siculars: lol

10:39 <siculars> what would mark have to write about in the recap?
gotta give him some gist materials occasionally...

10:40 <seancribbs> siculars: so are you serious about the question or what

10:40 <siculars> seancribbs: ya! lf answers pls.

10:46 <drev1> siculars: hooks are executed by the coordinating node
when the request is processed

10:49 <drev1> the coordinating node is also responsible for
initiating read repair:
https://github.com/basho/riak_kv/blob/master/src/riak_kv_get_fsm.erl#L247

10:53 <drev1> in addition to read repair, data is also exchanged via handoff.
\In the scenario you described, when node d was brought online a portion of
the vnodes would become aware that they needed to migrate to node d
and would start handing off data. Each vnode becomes aware of
needing to handoff by periodically checking if it is running on
the correct node:
https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L130

11:00 <siculars> drev1: thank you. so in theory, in the simple example
above, wouldnt a handoff cover all keys where a replica were missing and
obviate the need for a read/repair? i take it the read/repair would occur in
the scenario where a key was read while the cluster was still operating with
2 nodes (b,c) before d came online.

11:02 <drev1> the vnodes that belong to node A would need to be read repaired

11:02 <drev1> with or without node D

11:07 <siculars> so what does "when node d was brought online a portion of the
vnodes would become aware that they needed to migrate to node d and would start
handing off data." mean? isnt "handing off data" an auto data dump/copy? whats
going on there...

11:08 <drev1> when node D is joined to the cluster; the cluster determines that
a portion of vnodes will move to node D


11:08 <drev1> the ring file is changed

11:09 <drev1> on all nodes in the cluster

11:09 <siculars> what about the data ...

11:09 <drev1> on node D; the vnode master starts up vnodes to handle
the partitions it is responsible for

11:10 <drev1> on nodes B and C; some of the vnodes will realize that
they've been moved to node D

11:10 <drev1> based on the updated ring file

11:10 <drev1> once the vnodes realized they've been moved; they will
contact node D and start transferring data

11:11 <siculars> so data -is- moving without a read repair , ya ?

11:11 <drev1> yes

11:11 <siculars> cause in that scenarion no data was read by a client

11:12 <siculars> sorry to beat a horse here , but would that not cover
100% of the keys and their replica levels? why is a read repair necessary?

11:13 <drev1> in the scenario you presented you also had node A fail

11:13 <drev1> at the beginning you had 3 ndoes

11:13 <drev1> 64 partitions are spread across those 3 nodes

11:13 <drev1> from node A fails; the data in those partitions are not available

11:14 <siculars> and you do in the end a -> d

11:14 <drev1> I don't understand

11:14 <siculars> but that data on a exists in some replica on b,c


11:14 <drev1> yes

11:14 <drev1> you don't lose any data

11:14 <drev1> but some replicas are not available

11:14 <drev1> because they were on node A

11:16 <siculars> so you have to copy the specific replica . riak wont
copy the replica on B to D as if it were A.

11:17 <drev1> Riak will transfer vnodes from B to D

11:17 <drev1> let's forget about replicas for a moment

11:17 <drev1> you have 64 partitions

11:17 <drev1> distributed across 3 nodes

11:17 <drev1> you lose a node


11:17 <drev1> 1/3 of the partitions are gone

11:17 <drev1> the data is not available

11:18 <drev1> you bring a new node online, do some remove/join magic

11:18 <drev1> you now have 3 nodes again

11:18 <drev1> with 64 partitions

11:18 <drev1> but 1/3 of the partitions are empty

11:18 <drev1> because the data on the failed node is not available

11:19 <drev1> read repair will populate the empty partitions

11:31 <siculars> drev1: thanks
	10:35 <siculars> nodes a,b,c . a goes poof . some time later d
	comes online. keys are shipped to d on read (read/repair) . can i
	get a pointer on where to look in the code to see how that happens?
	aka. how does riak distinguish internally between a new write and a
	copy from a pre-existing replica. aka. how does it know to not trigger
	a hook again for example. is there an internal copy method? yes, i could
	just stumble through the code... but then what would you all do all day if not
	answer questions on irc?


	10:37 <seancribbs> siculars: lol

	10:39 <siculars> what would mark have to write about in the recap?
	gotta give him some gist materials occasionally...

	10:40 <seancribbs> siculars: so are you serious about the question or what

	10:40 <siculars> seancribbs: ya! lf answers pls.

	10:46 <drev1> siculars: hooks are executed by the coordinating node
	when the request is processed

	10:49 <drev1> the coordinating node is also responsible for
	initiating read repair:
	https://github.com/basho/riak_kv/blob/master/src/riak_kv_get_fsm.erl#L247

	10:53 <drev1> in addition to read repair, data is also exchanged via handoff.
	\In the scenario you described, when node d was brought online a portion of
	the vnodes would become aware that they needed to migrate to node d
	and would start handing off data. Each vnode becomes aware of
	needing to handoff by periodically checking if it is running on
	the correct node:
	https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L130

	11:00 <siculars> drev1: thank you. so in theory, in the simple example
	above, wouldnt a handoff cover all keys where a replica were missing and
	obviate the need for a read/repair? i take it the read/repair would occur in
	the scenario where a key was read while the cluster was still operating with
	2 nodes (b,c) before d came online.

	11:02 <drev1> the vnodes that belong to node A would need to be read repaired

	11:02 <drev1> with or without node D

	11:07 <siculars> so what does "when node d was brought online a portion of the
	vnodes would become aware that they needed to migrate to node d and would start
	handing off data." mean? isnt "handing off data" an auto data dump/copy? whats
	going on there...

	11:08 <drev1> when node D is joined to the cluster; the cluster determines that
	a portion of vnodes will move to node D


	11:08 <drev1> the ring file is changed

	11:09 <drev1> on all nodes in the cluster

	11:09 <siculars> what about the data ...

	11:09 <drev1> on node D; the vnode master starts up vnodes to handle
	the partitions it is responsible for

	11:10 <drev1> on nodes B and C; some of the vnodes will realize that
	they've been moved to node D

	11:10 <drev1> based on the updated ring file

	11:10 <drev1> once the vnodes realized they've been moved; they will
	contact node D and start transferring data

	11:11 <siculars> so data -is- moving without a read repair , ya ?

	11:11 <drev1> yes

	11:11 <siculars> cause in that scenarion no data was read by a client

	11:12 <siculars> sorry to beat a horse here , but would that not cover
	100% of the keys and their replica levels? why is a read repair necessary?

	11:13 <drev1> in the scenario you presented you also had node A fail

	11:13 <drev1> at the beginning you had 3 ndoes

	11:13 <drev1> 64 partitions are spread across those 3 nodes

	11:13 <drev1> from node A fails; the data in those partitions are not available

	11:14 <siculars> and you do in the end a -> d

	11:14 <drev1> I don't understand

	11:14 <siculars> but that data on a exists in some replica on b,c


	11:14 <drev1> yes

	11:14 <drev1> you don't lose any data

	11:14 <drev1> but some replicas are not available

	11:14 <drev1> because they were on node A

	11:16 <siculars> so you have to copy the specific replica . riak wont
	copy the replica on B to D as if it were A.

	11:17 <drev1> Riak will transfer vnodes from B to D

	11:17 <drev1> let's forget about replicas for a moment

	11:17 <drev1> you have 64 partitions

	11:17 <drev1> distributed across 3 nodes

	11:17 <drev1> you lose a node


	11:17 <drev1> 1/3 of the partitions are gone

	11:17 <drev1> the data is not available

	11:18 <drev1> you bring a new node online, do some remove/join magic

	11:18 <drev1> you now have 3 nodes again

	11:18 <drev1> with 64 partitions

	11:18 <drev1> but 1/3 of the partitions are empty

	11:18 <drev1> because the data on the failed node is not available

	11:19 <drev1> read repair will populate the empty partitions

	11:31 <siculars> drev1: thanks