Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JakubOboza/463586d764287971fc6c28a3f85294b3 to your computer and use it in GitHub Desktop.
Save JakubOboza/463586d764287971fc6c28a3f85294b3 to your computer and use it in GitHub Desktop.
Riak issue with upgrade from old -sname rings to new -name rings.
Dear Team
What are we trying to do:
We are trying to change -sname to -name within our existing cluster without taking the cluster down.
We had a call with basho and we were advised to take steps like here:
http://docs.basho.com/riak/1.4.9/ops/running/nodes/renaming/#Multi-Node-Clusters
They didn't work.
Here is how you can reproduce the issue:
download riak 1.4.9
copy it to 3 locations
ring1
ring2
ring3
next
edit ring*/etc/vm.args
and change -name/-sname to
-sname ring1
for ring2
-sname ring2
and for ring 3 to
-sname ring3
So nodes will be called ring1,ring2,ring3
Now you need to edit app.config on each of them and change
{pb, [ {"127.0.0.1", 8287 } ]}
port to be unique i suggest using 8187, 8287,8387 on respective ring1,ring,ring3
same goes for
{http, [ {"127.0.0.1", 8298 } ]},
and
{handoff_port, 8299 },
This will give you option to start 3 riak nodes on same box to present the issue.
Now you start all of them
./ring1/bin/riak start
./ring2/bin/riak start
./ring3/bin/riak start
create a ring
(side note my local box name is pc4 )
./ring2/bin/riak-admin cluster join ring1
./ring3/bin/riak-admin cluster join ring1
./ring3/bin/riak-admin cluster plan
./ring3/bin/riak-admin cluster commit
this should create the ring you can use
./ring3/bin/riak-admin status
to check if everything is ok.
Actual problem presentation:
Now what you do
We will take out ring3 and try to change the name to ring3@127.0.0.1 or localdomain but using not -sname but -name
First we stop the ring3 node
./ring3/bin/riak stop
mark it as down
./ring2/bin/riak-admin down riak3
change the name in vm.args
nano ring3/etc/vm.args
-name riak3@127.0.0.1
so it is a name not a -sname
next we remove ring directory
rm -rf ./ring3/data/ring
next we start the node
./ring3/bin/riak start
next we try to join the cluster
riak-admin cluster join riak2
and it is unreachable, you can use names for the existing cluster like riak2@pc4 try with riak1... it can't reach it.
when you will attach to the node and check if he can see it...he kinda can.
example:
(riak3@127.0.0.1)1> net_adm:names().
{ok,[{"riak1",56623},
{"riak2",56642},
{"riak3",56856},
{"c_87421_riak3",57309}]}
Initially i thought it is the epmd issue and i made a tcp dump which kinda shows that nodes just don't wanna communicate even if they see each other.
192.168.120.118 was trying to join cluster on 192.168.120.117
here is the dump from it:
11:24:38.050408 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [S], seq 3547573965, win 2920, options [mss 1460,sackOK,TS val 3955041498 ecr 0,nop,wscale 1], length 0
11:24:38.050448 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [S.], seq 3095205376, ack 3547573966, win 2896, options [mss 1460,sackOK,TS val 3955040296 ecr 3955041498,nop,wscale 1], length 0
11:24:38.050547 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [.], ack 1, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 0
11:24:38.050652 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [P.], seq 1:8, ack 1, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 7
11:24:38.050670 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [.], ack 8, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 0
11:24:38.050708 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [P.], seq 1:19, ack 8, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 18
11:24:38.050727 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [F.], seq 19, ack 8, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 0
11:24:38.050785 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [.], ack 19, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 0
11:24:38.050902 IP 192.168.120.117.56058 > 192.168.120.118.4369: Flags [F.], seq 8, ack 20, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 0
11:24:38.050918 IP 192.168.120.118.4369 > 192.168.120.117.56058: Flags [.], ack 9, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 0
11:24:38.051068 IP 192.168.120.117.61236 > 192.168.120.118.6281: Flags [S], seq 128976478, win 2920, options [mss 1460,sackOK,TS val 3955041498 ecr 0,nop,wscale 1], length 0
11:24:38.051081 IP 192.168.120.118.6281 > 192.168.120.117.61236: Flags [S.], seq 2270681428, ack 128976479, win 2896, options [mss 1460,sackOK,TS val 3955040296 ecr 3955041498,nop,wscale 1], length 0
11:24:38.051185 IP 192.168.120.117.61236 > 192.168.120.118.6281: Flags [.], ack 1, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 0
11:24:38.051358 IP 192.168.120.117.61236 > 192.168.120.118.6281: Flags [P.], seq 1:32, ack 1, win 1460, options [nop,nop,TS val 3955041498 ecr 3955040296], length 31
11:24:38.051367 IP 192.168.120.118.6281 > 192.168.120.117.61236: Flags [.], ack 32, win 1448, options [nop,nop,TS val 3955040296 ecr 3955041498], length 0
11:24:38.052032 IP 192.168.120.118.6281 > 192.168.120.117.61236: Flags [P.], seq 1:6, ack 32, win 1448, options [nop,nop,TS val 3955040297 ecr 3955041498], length 5
11:24:38.052157 IP 192.168.120.117.61236 > 192.168.120.118.6281: Flags [.], ack 6, win 1460, options [nop,nop,TS val 3955041499 ecr 3955040297], length 0
11:24:38.052239 IP 192.168.120.118.6281 > 192.168.120.117.61236: Flags [P.], seq 6:52, ack 32, win 1448, options [nop,nop,TS val 3955040297 ecr 3955041499], length 46
Now i tried changing ip, domains etc within cluster of -name nodes and it works fine, this part is ok, but migration from -sname to -name is not possible. What im doing wrong ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment