Skip to content

Instantly share code, notes, and snippets.

@thedebugger
Last active August 29, 2015 13:57
Show Gist options
  • Save thedebugger/9845208 to your computer and use it in GitHub Desktop.
Save thedebugger/9845208 to your computer and use it in GitHub Desktop.
nodetool status shows 2 nodes are down
Problem: nodetool status (only from node6) shows 2 nodes are down. But they appear to be running.
* Environment
* Running cassandra 2.0.1
* 6 nodes in the cluster
* nodetool status - from node 6 (10.82.23.119)
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN 10.84.2.135 13.71 GB 30.7% f6eb0cd6-f82d-41c9-91be-3f4e9dc4a59d -9223372036854775808 rack1
DN 10.82.23.116 3.43 GB 14.1% b114fe60-1bb0-4f7b-aaf3-b941f261026e -6614363251442489157 rack1
UN 10.84.13.121 13.86 GB 19.2% 5e4c7c51-447b-4c4a-a14b-b495814fb45e -3074457345618258603 rack1
UN 10.82.22.116 2.86 GB 29.2% 1ee1ec8f-6dc3-4d2c-8a3b-befe082933c1 2317766532957582345 rack1
DN 10.84.14.117 13.57 GB 4.1% 9bf76178-84d7-4c0f-8ad5-dedfd305a269 3074457345618258602 rack1
UN 10.82.23.119 8.03 GB 2.6% a952d445-5b7f-46ee-8238-4451dfaf0e50 3558217197862924070 rack1
* nodetool info - from node 6
Token : 3558217197862924070
ID : a952d445-5b7f-46ee-8238-4451dfaf0e50
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 8.03 GB
Generation No : 1395260457
Uptime (seconds) : 790697
Heap Memory (MB) : 3134.13 / 7987.25
Data Center : datacenter1
Rack : rack1
Exceptions : 0
Key Cache : size 104857540 (bytes), capacity 104857600 (bytes), 54295052 hits, 61109135 requests, 0.000 recent hit rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
* nodetool status - from node 2 (10.84.14.117)
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN 10.84.2.135 13.71 GB 30.7% f6eb0cd6-f82d-41c9-91be-3f4e9dc4a59d -9223372036854775808 rack1
UN 10.82.23.116 3.43 GB 14.1% b114fe60-1bb0-4f7b-aaf3-b941f261026e -6614363251442489157 rack1
UN 10.84.13.121 13.86 GB 19.2% 5e4c7c51-447b-4c4a-a14b-b495814fb45e -3074457345618258603 rack1
UN 10.82.22.116 2.86 GB 29.2% 1ee1ec8f-6dc3-4d2c-8a3b-befe082933c1 2317766532957582345 rack1
UN 10.84.14.117 13.57 GB 4.1% 9bf76178-84d7-4c0f-8ad5-dedfd305a269 3074457345618258602 rack1
UN 10.82.23.119 8.03 GB 2.6% a952d445-5b7f-46ee-8238-4451dfaf0e50 3558217197862924070 rack1
* Verified that all the ports are open
* Verified running following queries from node 6 using cqlsh with consistency level set to ALL
select * from evnts where token(device_guid) > -5614363251442489156 limit 1;
select * from evnts where token(device_guid) < -5614363251442489156 limit 1;
select * from evnts where token(device_guid) > 3074457345618258603 limit 1;
All queries returned result successfully. These should fail if the node is down...right?
* Enabled logging on node 6
log4j.logger.org.apache.cassandra.gms.Gossiper=TRACE
log4j.logger.org.apache.cassandra.gms.FailureDetector=TRACE
Here is the output of logs..
https://gist.github.com/thedebugger/9845383
Everything looks ok to me. I was expecting a problem similar to this - http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html.. but no.
* In the logs I saw this only once..
WARN [OptionalTasks:1] 2014-03-28 11:40:57,635 HintedHandoffMetrics.java (line 79) /10.84.14.117 has 1 dropped hints, because node is down past configured hint window
* nodetool gossipinfo - from node 6
/10.84.13.121
SEVERITY:0.0
RACK:rack1
RPC_ADDRESS:10.84.13.121
NET_VERSION:7
SCHEMA:8295a71a-1d4f-38f7-a513-9a74c0aa8c5d
HOST_ID:5e4c7c51-447b-4c4a-a14b-b495814fb45e
RELEASE_VERSION:2.0.1
LOAD:1.4882345955E10
STATUS:NORMAL,-3074457345618258603
DC:datacenter1
/10.84.14.117
SEVERITY:0.0
RACK:rack1
RPC_ADDRESS:10.84.14.117
NET_VERSION:7
SCHEMA:8295a71a-1d4f-38f7-a513-9a74c0aa8c5d
HOST_ID:9bf76178-84d7-4c0f-8ad5-dedfd305a269
RELEASE_VERSION:2.0.1
LOAD:1.4574199355E10
STATUS:NORMAL,3074457345618258602
DC:datacenter1
/10.82.23.119
SEVERITY:0.0
RACK:rack1
RPC_ADDRESS:10.82.23.119
NET_VERSION:7
SCHEMA:8295a71a-1d4f-38f7-a513-9a74c0aa8c5d
RELEASE_VERSION:2.0.1
HOST_ID:a952d445-5b7f-46ee-8238-4451dfaf0e50
LOAD:8.624908077E9
DC:datacenter1
STATUS:NORMAL,3558217197862924070
/10.84.2.135
SEVERITY:0.0
RACK:rack1
RPC_ADDRESS:10.84.2.135
NET_VERSION:7
SCHEMA:8295a71a-1d4f-38f7-a513-9a74c0aa8c5d
HOST_ID:f6eb0cd6-f82d-41c9-91be-3f4e9dc4a59d
RELEASE_VERSION:2.0.1
LOAD:1.4718471412E10
STATUS:NORMAL,-9223372036854775808
DC:datacenter1
/10.82.22.116
SEVERITY:0.0
RACK:rack1
RPC_ADDRESS:10.82.22.116
NET_VERSION:7
SCHEMA:8295a71a-1d4f-38f7-a513-9a74c0aa8c5d
HOST_ID:1ee1ec8f-6dc3-4d2c-8a3b-befe082933c1
RELEASE_VERSION:2.0.1
LOAD:3.072407862E9
STATUS:NORMAL,2317766532957582345
DC:datacenter1
/10.82.23.116
SEVERITY:0.0
RACK:rack1
RPC_ADDRESS:10.82.23.116
NET_VERSION:7
SCHEMA:8295a71a-1d4f-38f7-a513-9a74c0aa8c5d
HOST_ID:b114fe60-1bb0-4f7b-aaf3-b941f261026e
RELEASE_VERSION:2.0.1
LOAD:3.685326727E9
STATUS:NORMAL,-6614363251442489157
DC:datacenter1
* Whyyy? Could there be a bug in the nodetool? Or Am I missing something?
@adampwells
Copy link

Did you ever get a resolution to this? I am having the same problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment