- Upgrade to cassandra 2.0.9 #1381
- Cron Repairs
- Salt Cassandra boxes
Cassandra wants you to repair weekly to make sure deleted cells are actually removed before their tombstones expire and they come back from the dead. Sadly repairs seem to fail frequently, so we currently don't attempt them.
Running a repair is easy:
nodetool repair
That command will block until the repair finishes (which it may not), but you can kill the command at any time. This will not kill the repair.
Repairs compact and then stream via AntiEntropy Sessions. There are three commands to watch these:
nodetool tpstats
Shows how many AntiEntropy Sessions are running and how many are pending. Note that not all AE sessions for a repair are queued up at the beginning, so it's not possible to tell how much longer a repair will take given the number of pending AE sessions.
nodetool compactionstats
Shows outstanding compactions which indicate a repair taking place (probably? maybe not always?).
nodetool netstats
Shows current streams which indicate a repair taking place (probably? maybe not always?).
Grepping /var/log/cassandra/system.log
for RepairJobTask
or just Repair
should give log lines indicating the progress of the repair.
Execute StorageService.forceTerminateAllRepairSessions()
... on each node? Running this on one node didn't work for me and just caused the stuck repair session to continue!
- Doing a rolling restart of our cluster was recommended.
- Doing range repairs instead of full cluster repairs.
- nodetool repair
- Repair Docs
- Range Repair Script
- #cassandra on Freenode