Created
May 17, 2016 12:51
-
-
Save JakubOboza/463586d764287971fc6c28a3f85294b3 to your computer and use it in GitHub Desktop.
Riak issue with upgrade from old -sname rings to new -name rings.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Dear Team | |
What are we trying to do: | |
We are trying to change -sname to -name within our existing cluster without taking the cluster down. | |
We had a call with basho and we were advised to take steps like here: | |
http://docs.basho.com/riak/1.4.9/ops/running/nodes/renaming/#Multi-Node-Clusters | |
They didn't work. | |
Here is how you can reproduce the issue: | |
download riak 1.4.9 | |
copy it to 3 locations | |
ring1 | |
ring2 | |
ring3 | |
next | |
edit ring*/etc/vm.args | |
and change -name/-sname to | |
-sname ring1 | |
for ring2 | |
-sname ring2 | |
and for ring 3 to | |
-sname ring3 | |
So nodes will be called ring1,ring2,ring3 | |
Now you need to edit app.config on each of them and change | |
{pb, [ {"127.0.0.1", 8287 } ]} | |
port to be unique i suggest using 8187, 8287,8387 on respective ring1,ring,ring3 | |
same goes for | |
{http, [ {"127.0.0.1", 8298 } ]}, | |
and | |
{handoff_port, 8299 }, | |
This will give you option to start 3 riak nodes on same box to present the issue. | |
Now you start all of them | |
./ring1/bin/riak start | |
./ring2/bin/riak start | |
./ring3/bin/riak start | |
create a ring | |
(side note my local box name is pc4 ) | |
./ring2/bin/riak-admin cluster join ring1 | |
./ring3/bin/riak-admin cluster join ring1 | |
./ring3/bin/riak-admin cluster plan | |
./ring3/bin/riak-admin cluster commit | |
this should create the ring you can use | |
./ring3/bin/riak-admin status | |
to check if everything is ok. | |
Actual problem presentation: | |
Now what you do | |
We will take out ring3 and try to change the name to ring3@127.0.0.1 or localdomain but using not -sname but -name | |
First we stop the ring3 node | |
./ring3/bin/riak stop | |
mark it as down | |
./ring2/bin/riak-admin down riak3 | |
change the name in vm.args | |
nano ring3/etc/vm.args | |
-name riak3@127.0.0.1 | |
so it is a name not a -sname | |
next we remove ring directory | |
rm -rf ./ring3/data/ring | |
next we start the node | |
./ring3/bin/riak start | |
next we try to join the cluster | |
riak-admin cluster join riak2 | |
and it is unreachable, you can use names for the existing cluster like riak2@pc4 try with riak1... it can't reach it. | |
when you will attach to the node and check if he can see it...he kinda can. | |
example: | |
(riak3@127.0.0.1)1> net_adm:names(). | |
{ok,[{"riak1",56623}, | |
{"riak2",56642}, | |
{"riak3",56856}, | |
{"c_87421_riak3",57309}]} | |
Initially i thought it is the epmd issue and i made a tcp dump which kinda shows that nodes just don't wanna communicate even if they see each other. | |
192.168.120.118 was trying to join cluster on 192.168.120.117 | |
here is the dump from it: | |
11:24:38.050408 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [S], seq 3547573965, win 2920, options [mss 1460,sackOK,TS val 3955041498 ecr 0,nop,wscale 1], length 0 | |
11:24:38.050448 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [S.], seq 3095205376, ack 3547573966, win 2896, options [mss 1460,sackOK,TS val 3955040296 ecr 3955041498,nop,wscale 1], length 0 | |
11:24:38.050547 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [.], ack 1, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 0 | |
11:24:38.050652 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [P.], seq 1:8, ack 1, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 7 | |
11:24:38.050670 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [.], ack 8, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 0 | |
11:24:38.050708 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [P.], seq 1:19, ack 8, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 18 | |
11:24:38.050727 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [F.], seq 19, ack 8, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 0 | |
11:24:38.050785 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [.], ack 19, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 0 | |
11:24:38.050902 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [F.], seq 8, ack 20, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 0 | |
11:24:38.050918 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [.], ack 9, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 0 | |
11:24:38.051068 IP 192.168.120.117.61236 > 192.168.120.118.6281: Flags [S], seq 128976478, win 2920, options [mss 1460,sackOK,TS val 3955041498 ecr 0,nop,wscale 1], length 0 | |
11:24:38.051081 IP 192.168.120.118.6281 > 192.168.120.117.61236: Flags [S.], seq 2270681428, ack 128976479, win 2896, options [mss 1460,sackOK,TS val 3955040296 ecr 3955041498,nop,wscale 1], length 0 | |
11:24:38.051185 IP 192.168.120.117.61236 > 192.168.120.118.6281: Flags [.], ack 1, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 0 | |
11:24:38.051358 IP 192.168.120.117.61236 > 192.168.120.118.6281: Flags [P.], seq 1:32, ack 1, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 31 | |
11:24:38.051367 IP 192.168.120.118.6281 > 192.168.120.117.61236: Flags [.], ack 32, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 0 | |
11:24:38.052032 IP 192.168.120.118.6281 > 192.168.120.117.61236: Flags [P.], seq 1:6, ack 32, win 1448, options [nop,nop,TS val 3955040297 ecr 3955041498], length 5 | |
11:24:38.052157 IP 192.168.120.117.61236 > 192.168.120.118.6281: Flags [.], ack 6, win 1460, options [nop,nop,TS val 3955041499 ecr 3955040297], length 0 | |
11:24:38.052239 IP 192.168.120.118.6281 > 192.168.120.117.61236: Flags [P.], seq 6:52, ack 32, win 1448, options [nop,nop,TS val 3955040297 ecr 3955041499], length 46 | |
Now i tried changing ip, domains etc within cluster of -name nodes and it works fine, this part is ok, but migration from -sname to -name is not possible. What im doing wrong ? | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment