Created
November 24, 2010 15:53
-
-
Save PharkMillups/713854 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12:02<hoodoos> hello guys! | |
12:02 <seancribbs> howdy | |
12:04 <hoodoos> i was exeperementing with leaving/joining today. I do it that way make 1 of 3 | |
nodes leave cluster, delete bitcask on it, then join in again, after handoffs are done bitcask | |
size is very small (like 1megabyte, related to 100 and 97 megabytes of two other nodes). Why | |
is that so? Riak thinks that old node retuned and still has the data and will append data | |
during read repairs? | |
12:05 <seancribbs> hoodoos: did you wait for the node to exit? | |
12:05 <hoodoos> yep | |
12:05 <seancribbs> ok | |
12:05 <hoodoos> waited till handoffs are done on 2 remaining nodes after leave | |
12:06 <seancribbs> are they handing off data to the node that came back? | |
12:06 <hoodoos> yep | |
12:06 <hoodoos> with strange partitions numbers :) | |
12:07 <seancribbs> well, they are always strange | |
12:07 <seancribbs> or rather… large | |
12:07 cyberdelia joined | |
12:07 <hoodoos> yep | |
12:07 <hoodoos> well, I leave I can understand numbers, they're reasonable, but when | |
I join they're strange | |
12:08 <hoodoos> let me tell you exact numbers, I have a 512 partions cluster | |
12:08 <seancribbs> for 3 nodes? | |
12:09 <hoodoos> well, it will eventually grow bigger I hope :) | |
12:09 <seancribbs> ok | |
12:09 <seancribbs> how big | |
12:09 <hoodoos> is it big overhead? | |
12:09 <hoodoos> well, to 20-30 nodes in a year | |
12:09 <seancribbs> you should aim for between 10 and 50 parititions per node | |
12:09 <seancribbs> 256 will nicely support up to 30 | |
12:10 <seancribbs> let me guess, AWS? | |
12:10 <seancribbs> i.e. EC2? | |
12:10 <hoodoos> no | |
12:10 <hoodoos> physical machines | |
12:10 <hoodoos> just a dedicated hosting | |
12:10 <seancribbs> yay :D | |
12:10 <hoodoos> ehm? why yay? :) | |
12:11 <seancribbs> you'll get more bang-for-buck from physical machines | |
12:11 <hoodoos> we tried linode for riak | |
12:11 <hoodoos> but it can't garantue disk IO performance | |
12:11 <hoodoos> so some nodes was doing really bad | |
12:12 <seancribbs> right, most virtualized platforms will have that problem | |
12:12 <hoodoos> so we decided to get physical servers with single SSD drives | |
12:12 <hoodoos> in case of riak it's reasonable i guess, no need in raids unless | |
you want to make it even more faster :) | |
12:13 <seancribbs> sounds like a good plan | |
12:13 <hoodoos> so, admin is missing, I have numbers for joining only: | |
12:13 <hoodoos> the node i join doing: 170 handoffs to one the other nodes and | |
waiting for 342 partitions | |
12:14 <hoodoos> two other exchange 170 partitions and sending 342 partions to newly join | |
12:14 <hoodoos> very strange :) | |
12:14 <seancribbs> those two add up to 512 | |
12:15 <hoodoos> hmm | |
12:15 <seancribbs> and 2x170 == 340 | |
12:15 <hoodoos> how much partions each node should have? | |
12:15 <hoodoos> 170? | |
12:16 <hoodoos> ~170 | |
12:16 <seancribbs> about 170, yes | |
12:16 <seancribbs> 170 or 171 | |
12:16 <hoodoos> and what newly created node trying to give away? and why to the only one node :/ | |
12:17 banjiewen joined | |
12:17 <seancribbs> not sure | |
12:17 <seancribbs> sounds like you got your cluster in an odd state | |
12:17 <hoodoos> hm, any way to examine it more closely? | |
12:18 <hoodoos> cluster is working, new node is growing if I add new data | |
12:19 <hoodoos> anyways normal situation is when newly joined node bitcask will grow to normal | |
size related to other nodes, right? | |
12:20 <seancribbs> well, there might be old versions still sitting around on the other nodes. | |
but eventually, they should hand off enough data and then compact their own data files, | |
approaching equal balance | |
12:20 <seancribbs> but I don't know that it will be as immediate as you expect | |
12:20 <hoodoos> well, I'm not sure to :) | |
12:20 siculars joined | |
12:21 <hoodoos> anyways it's not a big pain to start read repairs manualy | |
12:21 <seancribbs> in normal ops you shouldn't have to force read repair | |
12:21 <hoodoos> seancribbs, can I ask you one more question? :) I know you're kinda tired of mentor's job :) | |
12:21 <seancribbs> ha | |
12:21 <seancribbs> go ahead | |
12:22 <hoodoos> i tried to rename one of the nodes today without leaving a cluster and it | |
resulted in 4 node cluster with 1 dead node i guess(i wasn't doing it myself so can't tell for sure), | |
i guess it is expected behavour? to rename node I need to leave and join as new node, right? | |
12:23 <seancribbs> reip is kind of weird. but you generally have to run the 'riak-admin reip' command on all nodes | |
12:24 <seancribbs> if that's how you were renaming | |
12:24 <hoodoos> i didn't change ip.. i changed riak@10.0.0.10 to riak@riak00.somedomain | |
12:25 <hoodoos> i guess reip is when you change Ip of server running node without name change | |
12:25 <seancribbs> reip is just the name of the command | |
12:25 <hoodoos> ah.. | |
12:25 <hoodoos> i see node names there | |
12:30 <hoodoos> well, thanks, I guess I need to read more on reip command.. | |
12:37 <hoodoos> how to make /riak/bucketname list keys? | |
12:37 <seancribbs> add ?keys=true | |
12:37 <seancribbs> or ?keys=stream | |
12:37 <hoodoos> what will stream do? | |
12:38 fedesilva joined | |
12:38 <seancribbs> sends it back in chunked-encoding | |
12:38 <hoodoos> hmm.. my nginx won't eat that much headers :) | |
12:39 <hoodoos> well, thanks | |
12:53 <johnae> hoodoos: maybe your nginx won't return the content properly since nginx | |
doesn't seem to proxy chunked encoding very well | |
12:54 <hoodoos> nono, just headers too big for it's current settings | |
12:54 <hoodoos> johnae, i use chunking module, it handles chunking encoding nice | |
12:55 <johnae> but, chunking module is just for input isnt it? | |
12:55 <hoodoos> ah yes, but output was handled nice, never mentioned any problems | |
12:55 <hoodoos> is there any known? | |
12:56 <johnae> well, I may still be wrong about this, but I've not been able to proxy luwak | |
using nginx and afaik its because luwak returns chunked encoding only | |
12:57 <johnae> so I thought that maybe ?keys=stream would not work either then since that would also | |
result in chunked encoding | |
13:06 <hoodoos> btw, after reading most of keys on the bucket new node's bitcask | |
grew to it's normal size :) | |
13:06 <hoodoos> seancribbs, ^^ | |
13:07 <seancribbs> hoodoos: that's strange | |
13:07 <hoodoos> it should be done during handoff? | |
13:08 <seancribbs> not exactly | |
13:11 <hoodoos> so what is strange exactly? | |
13:12 <seancribbs> that it wouldn't appear the correct size until you listed the keys. listing | |
keys doesn't touch disk (at least on 0.13+) | |
13:12 <hoodoos> no i didn't | |
13:12 <hoodoos> i just get them all | |
13:12 <seancribbs> oh well that explains it | |
13:12 <hoodoos> listing didn't do it, and for somereason mapred didn't too | |
13:12 <seancribbs> read repair | |
13:12 <hoodoos> ah | |
13:12 <hoodoos> i understand why.. | |
13:13 <seancribbs> brb | |
13:13 <hoodoos> it was reading values from available nodes | |
13:14 <hoodoos> i guess I will repead bitcask deletion to see wether it will | |
repair with time or not. Should it really? | |
13:14 <hoodoos> *repeat | |
13:14 <seancribbs> it only repairs on reads | |
13:14 <hoodoos> why can't it repair in background i wonder.. :) | |
13:15 <seancribbs> that's called handoff, and it only works if the | |
data got to the other nodes | |
13:15 <seancribbs> how did you perform the leave exactly | |
13:20 <hoodoos> wait a sec please | |
13:22 <hoodoos> 1. riak-admin leave riak@node | |
13:22 <hoodoos> 2. riak stop | |
13:22 <hoodoos> 3. wait for riak-admin transfers on other nodes | |
13:22 <hoodoos> 4. rm -r bitcask ring // on left node | |
13:22 <hoodoos> 5. riak start | |
13:22 <hoodoos> 6. riak-admin join | |
13:22 <hoodoos> seancribbs, ^^ like that | |
13:22 <seancribbs> eliminate step 2 | |
13:22 <seancribbs> it will automatically stop when it's done handing off | |
13:22 <hoodoos> ah | |
13:23 <seancribbs> which explains why there was no data when you | |
brought the node back | |
13:23 <hoodoos> hmm, let me try it :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment