Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created November 24, 2010 15:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/713854 to your computer and use it in GitHub Desktop.
Save PharkMillups/713854 to your computer and use it in GitHub Desktop.
12:02<hoodoos> hello guys!
12:02 <seancribbs> howdy
12:04 <hoodoos> i was exeperementing with leaving/joining today. I do it that way make 1 of 3
nodes leave cluster, delete bitcask on it, then join in again, after handoffs are done bitcask
size is very small (like 1megabyte, related to 100 and 97 megabytes of two other nodes). Why
is that so? Riak thinks that old node retuned and still has the data and will append data
during read repairs?
12:05 <seancribbs> hoodoos: did you wait for the node to exit?
12:05 <hoodoos> yep
12:05 <seancribbs> ok
12:05 <hoodoos> waited till handoffs are done on 2 remaining nodes after leave
12:06 <seancribbs> are they handing off data to the node that came back?
12:06 <hoodoos> yep
12:06 <hoodoos> with strange partitions numbers :)
12:07 <seancribbs> well, they are always strange
12:07 <seancribbs> or rather… large
12:07 cyberdelia joined
12:07 <hoodoos> yep
12:07 <hoodoos> well, I leave I can understand numbers, they're reasonable, but when
I join they're strange
12:08 <hoodoos> let me tell you exact numbers, I have a 512 partions cluster
12:08 <seancribbs> for 3 nodes?
12:09 <hoodoos> well, it will eventually grow bigger I hope :)
12:09 <seancribbs> ok
12:09 <seancribbs> how big
12:09 <hoodoos> is it big overhead?
12:09 <hoodoos> well, to 20-30 nodes in a year
12:09 <seancribbs> you should aim for between 10 and 50 parititions per node
12:09 <seancribbs> 256 will nicely support up to 30
12:10 <seancribbs> let me guess, AWS?
12:10 <seancribbs> i.e. EC2?
12:10 <hoodoos> no
12:10 <hoodoos> physical machines
12:10 <hoodoos> just a dedicated hosting
12:10 <seancribbs> yay :D
12:10 <hoodoos> ehm? why yay? :)
12:11 <seancribbs> you'll get more bang-for-buck from physical machines
12:11 <hoodoos> we tried linode for riak
12:11 <hoodoos> but it can't garantue disk IO performance
12:11 <hoodoos> so some nodes was doing really bad
12:12 <seancribbs> right, most virtualized platforms will have that problem
12:12 <hoodoos> so we decided to get physical servers with single SSD drives
12:12 <hoodoos> in case of riak it's reasonable i guess, no need in raids unless
you want to make it even more faster :)
12:13 <seancribbs> sounds like a good plan
12:13 <hoodoos> so, admin is missing, I have numbers for joining only:
12:13 <hoodoos> the node i join doing: 170 handoffs to one the other nodes and
waiting for 342 partitions
12:14 <hoodoos> two other exchange 170 partitions and sending 342 partions to newly join
12:14 <hoodoos> very strange :)
12:14 <seancribbs> those two add up to 512
12:15 <hoodoos> hmm
12:15 <seancribbs> and 2x170 == 340
12:15 <hoodoos> how much partions each node should have?
12:15 <hoodoos> 170?
12:16 <hoodoos> ~170
12:16 <seancribbs> about 170, yes
12:16 <seancribbs> 170 or 171
12:16 <hoodoos> and what newly created node trying to give away? and why to the only one node :/
12:17 banjiewen joined
12:17 <seancribbs> not sure
12:17 <seancribbs> sounds like you got your cluster in an odd state
12:17 <hoodoos> hm, any way to examine it more closely?
12:18 <hoodoos> cluster is working, new node is growing if I add new data
12:19 <hoodoos> anyways normal situation is when newly joined node bitcask will grow to normal
size related to other nodes, right?
12:20 <seancribbs> well, there might be old versions still sitting around on the other nodes.
but eventually, they should hand off enough data and then compact their own data files,
approaching equal balance
12:20 <seancribbs> but I don't know that it will be as immediate as you expect
12:20 <hoodoos> well, I'm not sure to :)
12:20 siculars joined
12:21 <hoodoos> anyways it's not a big pain to start read repairs manualy
12:21 <seancribbs> in normal ops you shouldn't have to force read repair
12:21 <hoodoos> seancribbs, can I ask you one more question? :) I know you're kinda tired of mentor's job :)
12:21 <seancribbs> ha
12:21 <seancribbs> go ahead
12:22 <hoodoos> i tried to rename one of the nodes today without leaving a cluster and it
resulted in 4 node cluster with 1 dead node i guess(i wasn't doing it myself so can't tell for sure),
i guess it is expected behavour? to rename node I need to leave and join as new node, right?
12:23 <seancribbs> reip is kind of weird. but you generally have to run the 'riak-admin reip' command on all nodes
12:24 <seancribbs> if that's how you were renaming
12:24 <hoodoos> i didn't change ip.. i changed riak@10.0.0.10 to riak@riak00.somedomain
12:25 <hoodoos> i guess reip is when you change Ip of server running node without name change
12:25 <seancribbs> reip is just the name of the command
12:25 <hoodoos> ah..
12:25 <hoodoos> i see node names there
12:30 <hoodoos> well, thanks, I guess I need to read more on reip command..
12:37 <hoodoos> how to make /riak/bucketname list keys?
12:37 <seancribbs> add ?keys=true
12:37 <seancribbs> or ?keys=stream
12:37 <hoodoos> what will stream do?
12:38 fedesilva joined
12:38 <seancribbs> sends it back in chunked-encoding
12:38 <hoodoos> hmm.. my nginx won't eat that much headers :)
12:39 <hoodoos> well, thanks
12:53 <johnae> hoodoos: maybe your nginx won't return the content properly since nginx
doesn't seem to proxy chunked encoding very well
12:54 <hoodoos> nono, just headers too big for it's current settings
12:54 <hoodoos> johnae, i use chunking module, it handles chunking encoding nice
12:55 <johnae> but, chunking module is just for input isnt it?
12:55 <hoodoos> ah yes, but output was handled nice, never mentioned any problems
12:55 <hoodoos> is there any known?
12:56 <johnae> well, I may still be wrong about this, but I've not been able to proxy luwak
using nginx and afaik its because luwak returns chunked encoding only
12:57 <johnae> so I thought that maybe ?keys=stream would not work either then since that would also
result in chunked encoding
13:06 <hoodoos> btw, after reading most of keys on the bucket new node's bitcask
grew to it's normal size :)
13:06 <hoodoos> seancribbs, ^^
13:07 <seancribbs> hoodoos: that's strange
13:07 <hoodoos> it should be done during handoff?
13:08 <seancribbs> not exactly
13:11 <hoodoos> so what is strange exactly?
13:12 <seancribbs> that it wouldn't appear the correct size until you listed the keys. listing
keys doesn't touch disk (at least on 0.13+)
13:12 <hoodoos> no i didn't
13:12 <hoodoos> i just get them all
13:12 <seancribbs> oh well that explains it
13:12 <hoodoos> listing didn't do it, and for somereason mapred didn't too
13:12 <seancribbs> read repair
13:12 <hoodoos> ah
13:12 <hoodoos> i understand why..
13:13 <seancribbs> brb
13:13 <hoodoos> it was reading values from available nodes
13:14 <hoodoos> i guess I will repead bitcask deletion to see wether it will
repair with time or not. Should it really?
13:14 <hoodoos> *repeat
13:14 <seancribbs> it only repairs on reads
13:14 <hoodoos> why can't it repair in background i wonder.. :)
13:15 <seancribbs> that's called handoff, and it only works if the
data got to the other nodes
13:15 <seancribbs> how did you perform the leave exactly
13:20 <hoodoos> wait a sec please
13:22 <hoodoos> 1. riak-admin leave riak@node
13:22 <hoodoos> 2. riak stop
13:22 <hoodoos> 3. wait for riak-admin transfers on other nodes
13:22 <hoodoos> 4. rm -r bitcask ring // on left node
13:22 <hoodoos> 5. riak start
13:22 <hoodoos> 6. riak-admin join
13:22 <hoodoos> seancribbs, ^^ like that
13:22 <seancribbs> eliminate step 2
13:22 <seancribbs> it will automatically stop when it's done handing off
13:22 <hoodoos> ah
13:23 <seancribbs> which explains why there was no data when you
brought the node back
13:23 <hoodoos> hmm, let me try it :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment