PharkMillups/gist:713854

## gistfile1.txt
12:02<hoodoos> hello guys!

12:02 <seancribbs> howdy

12:04 <hoodoos> i was exeperementing with leaving/joining today. I do it that way make 1 of 3
nodes leave cluster, delete bitcask on it, then join in again, after handoffs are done bitcask
size is very small (like 1megabyte, related to 100 and 97 megabytes of two other nodes). Why
is that so? Riak thinks that old node retuned and still has the data and will append data
during read repairs?

12:05 <seancribbs> hoodoos: did you wait for the node to exit?

12:05 <hoodoos> yep

12:05 <seancribbs> ok

12:05 <hoodoos> waited till handoffs are done on 2 remaining nodes after leave

12:06 <seancribbs> are they handing off data to the node that came back?

12:06 <hoodoos> yep

12:06 <hoodoos> with strange partitions numbers :)

12:07 <seancribbs> well, they are always strange

12:07 <seancribbs> or rather… large

12:07 cyberdelia joined

12:07 <hoodoos> yep

12:07 <hoodoos> well, I leave I can understand numbers, they're reasonable, but when
I join they're strange

12:08 <hoodoos> let me tell you exact numbers, I have a 512 partions cluster

12:08 <seancribbs> for 3 nodes?

12:09 <hoodoos> well, it will eventually grow bigger I hope :)

12:09 <seancribbs> ok

12:09 <seancribbs> how big

12:09 <hoodoos> is it big overhead?

12:09 <hoodoos> well, to 20-30 nodes in a year

12:09 <seancribbs> you should aim for between 10 and 50 parititions per node

12:09 <seancribbs> 256 will nicely support up to 30

12:10 <seancribbs> let me guess, AWS?

12:10 <seancribbs> i.e. EC2?

12:10 <hoodoos> no

12:10 <hoodoos> physical machines

12:10 <hoodoos> just a dedicated hosting

12:10 <seancribbs> yay :D

12:10 <hoodoos> ehm? why yay? :)

12:11 <seancribbs> you'll get more bang-for-buck from physical machines

12:11 <hoodoos> we tried linode for riak

12:11 <hoodoos> but it can't garantue disk IO performance

12:11 <hoodoos> so some nodes was doing really bad

12:12 <seancribbs> right, most virtualized platforms will have that problem

12:12 <hoodoos> so we decided to get physical servers with single SSD drives

12:12 <hoodoos> in case of riak it's reasonable i guess, no need in raids unless
you want to make it even more faster :)

12:13 <seancribbs> sounds like a good plan

12:13 <hoodoos> so, admin is missing, I have numbers for joining only:

12:13 <hoodoos> the node i join doing: 170 handoffs to one the other nodes and
waiting for 342 partitions

12:14 <hoodoos> two other exchange 170 partitions and sending 342 partions to newly join

12:14 <hoodoos> very strange :)

12:14 <seancribbs> those two add up to 512

12:15 <hoodoos> hmm

12:15 <seancribbs> and 2x170 == 340

12:15 <hoodoos> how much partions each node should have?

12:15 <hoodoos> 170?

12:16 <hoodoos> ~170

12:16 <seancribbs> about 170, yes

12:16 <seancribbs> 170 or 171

12:16 <hoodoos> and what newly created node trying to give away? and why to the only one node :/

12:17 banjiewen joined

12:17 <seancribbs> not sure

12:17 <seancribbs> sounds like you got your cluster in an odd state

12:17 <hoodoos> hm, any way to examine it more closely?

12:18 <hoodoos> cluster is working, new node is growing if I add new data

12:19 <hoodoos> anyways normal situation is when newly joined node bitcask will grow to normal
size related to other nodes, right?

12:20 <seancribbs> well, there might be old versions still sitting around on the other nodes.
but eventually, they should hand off enough data and then compact their own data files,
approaching equal balance

12:20 <seancribbs> but I don't know that it will be as immediate as you expect

12:20 <hoodoos> well, I'm not sure to :)

12:20 siculars joined

12:21 <hoodoos> anyways it's not a big pain to start read repairs manualy

12:21 <seancribbs> in normal ops you shouldn't have to force read repair

12:21 <hoodoos> seancribbs, can I ask you one more question? :) I know you're kinda tired of mentor's job :)

12:21 <seancribbs> ha

12:21 <seancribbs> go ahead

12:22 <hoodoos> i tried to rename one of the nodes today without leaving a cluster and it
resulted in 4 node cluster with 1 dead node i guess(i wasn't doing it myself so can't tell for sure),
i guess it is expected behavour? to rename node I need to leave and join as new node, right?

12:23 <seancribbs> reip is kind of weird. but you generally have to run the 'riak-admin reip' command on all nodes

12:24 <seancribbs> if that's how you were renaming

12:24 <hoodoos> i didn't change ip.. i changed riak@10.0.0.10 to riak@riak00.somedomain

12:25 <hoodoos> i guess reip is when you change Ip of server running node without name change

12:25 <seancribbs> reip is just the name of the command

12:25 <hoodoos> ah..

12:25 <hoodoos> i see node names there

12:30 <hoodoos> well, thanks, I guess I need to read more on reip command..

12:37 <hoodoos> how to make /riak/bucketname list keys?

12:37 <seancribbs> add ?keys=true

12:37 <seancribbs> or ?keys=stream

12:37 <hoodoos> what will stream do?

12:38 fedesilva joined

12:38 <seancribbs> sends it back in chunked-encoding

12:38 <hoodoos> hmm.. my nginx won't eat that much headers :)

12:39 <hoodoos> well, thanks

12:53 <johnae> hoodoos: maybe your nginx won't return the content properly since nginx
doesn't seem to proxy chunked encoding very well

12:54 <hoodoos> nono, just headers too big for it's current settings

12:54 <hoodoos> johnae, i use chunking module, it handles chunking encoding nice

12:55 <johnae> but, chunking module is just for input isnt it?

12:55 <hoodoos> ah yes, but output was handled nice, never mentioned any problems

12:55 <hoodoos> is there any known?

12:56 <johnae> well, I may still be wrong about this, but I've not been able to proxy luwak
using nginx and afaik its because luwak returns chunked encoding only

12:57 <johnae> so I thought that maybe ?keys=stream would not work either then since that would also
result in chunked encoding

13:06 <hoodoos> btw, after reading most of keys on the bucket new node's bitcask
grew to it's normal size :)

13:06 <hoodoos> seancribbs, ^^

13:07 <seancribbs> hoodoos: that's strange

13:07 <hoodoos> it should be done during handoff?

13:08 <seancribbs> not exactly

13:11 <hoodoos> so what is strange exactly?

13:12 <seancribbs> that it wouldn't appear the correct size until you listed the keys. listing
keys doesn't touch disk (at least on 0.13+)

13:12 <hoodoos> no i didn't

13:12 <hoodoos> i just get them all

13:12 <seancribbs> oh well that explains it

13:12 <hoodoos> listing didn't do it, and for somereason mapred didn't too

13:12 <seancribbs> read repair

13:12 <hoodoos> ah

13:12 <hoodoos> i understand why..

13:13 <seancribbs> brb

13:13 <hoodoos> it was reading values from available nodes

13:14 <hoodoos> i guess I will repead bitcask deletion to see wether it will
repair with time or not. Should it really?

13:14 <hoodoos> *repeat

13:14 <seancribbs> it only repairs on reads

13:14 <hoodoos> why can't it repair in background i wonder.. :)

13:15 <seancribbs> that's called handoff, and it only works if the
data got to the other nodes

13:15 <seancribbs> how did you perform the leave exactly

13:20 <hoodoos> wait a sec please

13:22 <hoodoos> 1. riak-admin leave riak@node

13:22 <hoodoos> 2. riak stop

13:22 <hoodoos> 3. wait for riak-admin transfers on other nodes

13:22 <hoodoos> 4. rm -r bitcask ring // on left node

13:22 <hoodoos> 5. riak start

13:22 <hoodoos> 6. riak-admin join

13:22 <hoodoos> seancribbs, ^^ like that

13:22 <seancribbs> eliminate step 2

13:22 <seancribbs> it will automatically stop when it's done handing off

13:22 <hoodoos> ah

13:23 <seancribbs> which explains why there was no data when you
brought the node back

13:23 <hoodoos> hmm, let me try it :)
	12:02<hoodoos> hello guys!

	12:02 <seancribbs> howdy

	12:04 <hoodoos> i was exeperementing with leaving/joining today. I do it that way make 1 of 3
	nodes leave cluster, delete bitcask on it, then join in again, after handoffs are done bitcask
	size is very small (like 1megabyte, related to 100 and 97 megabytes of two other nodes). Why
	is that so? Riak thinks that old node retuned and still has the data and will append data
	during read repairs?

	12:05 <seancribbs> hoodoos: did you wait for the node to exit?

	12:05 <hoodoos> yep

	12:05 <seancribbs> ok

	12:05 <hoodoos> waited till handoffs are done on 2 remaining nodes after leave

	12:06 <seancribbs> are they handing off data to the node that came back?

	12:06 <hoodoos> yep

	12:06 <hoodoos> with strange partitions numbers :)

	12:07 <seancribbs> well, they are always strange

	12:07 <seancribbs> or rather… large

	12:07 cyberdelia joined

	12:07 <hoodoos> yep

	12:07 <hoodoos> well, I leave I can understand numbers, they're reasonable, but when
	I join they're strange

	12:08 <hoodoos> let me tell you exact numbers, I have a 512 partions cluster

	12:08 <seancribbs> for 3 nodes?

	12:09 <hoodoos> well, it will eventually grow bigger I hope :)

	12:09 <seancribbs> ok

	12:09 <seancribbs> how big

	12:09 <hoodoos> is it big overhead?

	12:09 <hoodoos> well, to 20-30 nodes in a year

	12:09 <seancribbs> you should aim for between 10 and 50 parititions per node

	12:09 <seancribbs> 256 will nicely support up to 30

	12:10 <seancribbs> let me guess, AWS?

	12:10 <seancribbs> i.e. EC2?

	12:10 <hoodoos> no

	12:10 <hoodoos> physical machines

	12:10 <hoodoos> just a dedicated hosting

	12:10 <seancribbs> yay :D

	12:10 <hoodoos> ehm? why yay? :)

	12:11 <seancribbs> you'll get more bang-for-buck from physical machines

	12:11 <hoodoos> we tried linode for riak

	12:11 <hoodoos> but it can't garantue disk IO performance

	12:11 <hoodoos> so some nodes was doing really bad

	12:12 <seancribbs> right, most virtualized platforms will have that problem

	12:12 <hoodoos> so we decided to get physical servers with single SSD drives

	12:12 <hoodoos> in case of riak it's reasonable i guess, no need in raids unless
	you want to make it even more faster :)

	12:13 <seancribbs> sounds like a good plan

	12:13 <hoodoos> so, admin is missing, I have numbers for joining only:

	12:13 <hoodoos> the node i join doing: 170 handoffs to one the other nodes and
	waiting for 342 partitions

	12:14 <hoodoos> two other exchange 170 partitions and sending 342 partions to newly join

	12:14 <hoodoos> very strange :)

	12:14 <seancribbs> those two add up to 512

	12:15 <hoodoos> hmm

	12:15 <seancribbs> and 2x170 == 340

	12:15 <hoodoos> how much partions each node should have?

	12:15 <hoodoos> 170?

	12:16 <hoodoos> ~170

	12:16 <seancribbs> about 170, yes

	12:16 <seancribbs> 170 or 171

	12:16 <hoodoos> and what newly created node trying to give away? and why to the only one node :/

	12:17 banjiewen joined

	12:17 <seancribbs> not sure

	12:17 <seancribbs> sounds like you got your cluster in an odd state

	12:17 <hoodoos> hm, any way to examine it more closely?

	12:18 <hoodoos> cluster is working, new node is growing if I add new data

	12:19 <hoodoos> anyways normal situation is when newly joined node bitcask will grow to normal
	size related to other nodes, right?

	12:20 <seancribbs> well, there might be old versions still sitting around on the other nodes.
	but eventually, they should hand off enough data and then compact their own data files,
	approaching equal balance

	12:20 <seancribbs> but I don't know that it will be as immediate as you expect

	12:20 <hoodoos> well, I'm not sure to :)

	12:20 siculars joined

	12:21 <hoodoos> anyways it's not a big pain to start read repairs manualy

	12:21 <seancribbs> in normal ops you shouldn't have to force read repair

	12:21 <hoodoos> seancribbs, can I ask you one more question? :) I know you're kinda tired of mentor's job :)

	12:21 <seancribbs> ha

	12:21 <seancribbs> go ahead

	12:22 <hoodoos> i tried to rename one of the nodes today without leaving a cluster and it
	resulted in 4 node cluster with 1 dead node i guess(i wasn't doing it myself so can't tell for sure),
	i guess it is expected behavour? to rename node I need to leave and join as new node, right?

	12:23 <seancribbs> reip is kind of weird. but you generally have to run the 'riak-admin reip' command on all nodes

	12:24 <seancribbs> if that's how you were renaming

	12:24 <hoodoos> i didn't change ip.. i changed riak@10.0.0.10 to riak@riak00.somedomain

	12:25 <hoodoos> i guess reip is when you change Ip of server running node without name change

	12:25 <seancribbs> reip is just the name of the command

	12:25 <hoodoos> ah..

	12:25 <hoodoos> i see node names there

	12:30 <hoodoos> well, thanks, I guess I need to read more on reip command..

	12:37 <hoodoos> how to make /riak/bucketname list keys?

	12:37 <seancribbs> add ?keys=true

	12:37 <seancribbs> or ?keys=stream

	12:37 <hoodoos> what will stream do?

	12:38 fedesilva joined

	12:38 <seancribbs> sends it back in chunked-encoding

	12:38 <hoodoos> hmm.. my nginx won't eat that much headers :)

	12:39 <hoodoos> well, thanks

	12:53 <johnae> hoodoos: maybe your nginx won't return the content properly since nginx
	doesn't seem to proxy chunked encoding very well

	12:54 <hoodoos> nono, just headers too big for it's current settings

	12:54 <hoodoos> johnae, i use chunking module, it handles chunking encoding nice

	12:55 <johnae> but, chunking module is just for input isnt it?

	12:55 <hoodoos> ah yes, but output was handled nice, never mentioned any problems

	12:55 <hoodoos> is there any known?

	12:56 <johnae> well, I may still be wrong about this, but I've not been able to proxy luwak
	using nginx and afaik its because luwak returns chunked encoding only

	12:57 <johnae> so I thought that maybe ?keys=stream would not work either then since that would also
	result in chunked encoding

	13:06 <hoodoos> btw, after reading most of keys on the bucket new node's bitcask
	grew to it's normal size :)

	13:06 <hoodoos> seancribbs, ^^

	13:07 <seancribbs> hoodoos: that's strange

	13:07 <hoodoos> it should be done during handoff?

	13:08 <seancribbs> not exactly

	13:11 <hoodoos> so what is strange exactly?

	13:12 <seancribbs> that it wouldn't appear the correct size until you listed the keys. listing
	keys doesn't touch disk (at least on 0.13+)

	13:12 <hoodoos> no i didn't

	13:12 <hoodoos> i just get them all

	13:12 <seancribbs> oh well that explains it

	13:12 <hoodoos> listing didn't do it, and for somereason mapred didn't too

	13:12 <seancribbs> read repair

	13:12 <hoodoos> ah

	13:12 <hoodoos> i understand why..

	13:13 <seancribbs> brb

	13:13 <hoodoos> it was reading values from available nodes

	13:14 <hoodoos> i guess I will repead bitcask deletion to see wether it will
	repair with time or not. Should it really?

	13:14 <hoodoos> *repeat

	13:14 <seancribbs> it only repairs on reads

	13:14 <hoodoos> why can't it repair in background i wonder.. :)

	13:15 <seancribbs> that's called handoff, and it only works if the
	data got to the other nodes

	13:15 <seancribbs> how did you perform the leave exactly

	13:20 <hoodoos> wait a sec please

	13:22 <hoodoos> 1. riak-admin leave riak@node

	13:22 <hoodoos> 2. riak stop

	13:22 <hoodoos> 3. wait for riak-admin transfers on other nodes

	13:22 <hoodoos> 4. rm -r bitcask ring // on left node

	13:22 <hoodoos> 5. riak start

	13:22 <hoodoos> 6. riak-admin join

	13:22 <hoodoos> seancribbs, ^^ like that

	13:22 <seancribbs> eliminate step 2

	13:22 <seancribbs> it will automatically stop when it's done handing off

	13:22 <hoodoos> ah

	13:23 <seancribbs> which explains why there was no data when you
	brought the node back

	13:23 <hoodoos> hmm, let me try it :)