Skip to content

Instantly share code, notes, and snippets.

@shino
Created July 22, 2015 15:11
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save shino/dd9a75e84b2b5792a079 to your computer and use it in GitHub Desktop.
Save shino/dd9a75e84b2b5792a079 to your computer and use it in GitHub Desktop.
Operation steps when claimant node is down in Riak cluster

Created 3-node healthy cluster. Node dev1 is claimant at this point.

% dev/dev1/bin/riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      50.0%      --      'dev1@127.0.0.1'
valid      25.0%      --      'dev2@127.0.0.1'
valid      25.0%      --      'dev3@127.0.0.1'
-------------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

% dev/dev1/bin/riak-admin ring-status
================================== Claimant ===================================
Claimant:  'dev1@127.0.0.1'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

Then kill node dev1 forcefully

% DEV1=`ps auw | grep beam | grep riak | grep dev1 | awk '{print $2;}'`; echo $DEV1
64693
% kill -9 $DEV1
% dev/dev2/bin/riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      50.0%      --      'dev1@127.0.0.1'
valid      25.0%      --      'dev2@127.0.0.1'
valid      25.0%      --      'dev3@127.0.0.1'
-------------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Check ring status, claimant is down

% dev/dev2/bin/riak-admin ring-status
================================== Claimant ===================================
Claimant:  'dev1@127.0.0.1'
Status:     down
Ring Ready: unknown

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
The following nodes are unreachable: ['dev1@127.0.0.1']

WARNING: The cluster state will not converge until all nodes
are up. Once the above nodes come back online, convergence
will continue. If the outages are long-term or permanent, you
can either mark the nodes as down (riak-admin down NODE) or
forcibly remove the nodes from the cluster (riak-admin
force-remove NODE) to allow the remaining nodes to settle.

As expected cluster operation fails

% dev/dev4/bin/riak-admin cluster join dev2@127.0.0.1
Success: staged join request for 'dev4@127.0.0.1' to 'dev2@127.0.0.1'

% dev/dev2/bin/riak-admin cluster plan
RPC to 'dev2@127.0.0.1' failed: {'EXIT',
                                 {{nodedown,'dev1@127.0.0.1'},
                                  {gen_server,call,
                                   [{riak_core_claimant,'dev1@127.0.0.1'},
                                    plan,infinity]}}}

Mark node dev1 as down from node dev2, then claimant changes.

% ./dev/dev2/bin/riak-admin down dev1@127.0.0.1 Success: "dev1@127.0.0.1" marked as down % dev/dev2/bin/riak-admin ring-status ================================== Claimant =================================== Claimant: 'dev2@127.0.0.1' Status: up Ring Ready: true

============================== Ownership Handoff ============================== No pending changes.

============================== Unreachable Nodes ============================== All nodes are up and reachable

Retry cluster plan, it succeeds this time.

% dev/dev2/bin/riak-admin cluster plan
=============================== Staged Changes ================================
Action         Details(s)
-------------------------------------------------------------------------------
join           'dev4@127.0.0.1'
-------------------------------------------------------------------------------


NOTE: Applying these changes will result in 1 cluster transition

###############################################################################
                         After cluster transition 1/1
###############################################################################

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
down       50.0%      --      'dev1@127.0.0.1'
valid      25.0%      --      'dev2@127.0.0.1'
valid      25.0%      --      'dev3@127.0.0.1'
valid       0.0%      --      'dev4@127.0.0.1'
-------------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:1

WARNING: Not all replicas will be on distinct nodes

Commit can be done. Hooray!

% dev/dev2/bin/riak-admin cluster commit
Cluster changes committed
% dev/dev2/bin/riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
down       50.0%      --      'dev1@127.0.0.1'
valid      25.0%      --      'dev2@127.0.0.1'
valid      25.0%      --      'dev3@127.0.0.1'
valid       0.0%      --      'dev4@127.0.0.1'
-------------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:1
% dev/dev2/bin/riak-admin ring-status
================================== Claimant ===================================
Claimant:  'dev2@127.0.0.1'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment