schmichael/gist:814a993735308f7040ea

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    TODO


Upgrade to cassandra 2.0.9 #1381
Cron Repairs
Salt Cassandra boxes

Repairs

Cassandra wants you to repair weekly to make sure deleted cells are actually removed before their tombstones expire and they come back from the dead. Sadly repairs seem to fail frequently, so we currently don't attempt them.
Running a repair is easy:
nodetool repair

That command will block until the repair finishes (which it may not), but you can kill the command at any time. This will not kill the repair.
Repair Progress

Repairs compact and then stream via AntiEntropy Sessions. There are three commands to watch these:
nodetool tpstats

Shows how many AntiEntropy Sessions are running and how many are pending. Note that not all AE sessions for a repair are queued up at the beginning, so it's not possible to tell how much longer a repair will take given the number of pending AE sessions.
nodetool compactionstats

Shows outstanding compactions which indicate a repair taking place (probably? maybe not always?).
nodetool netstats

Shows current streams which indicate a repair taking place (probably? maybe not always?).
Grepping /var/log/cassandra/system.log for RepairJobTask or just Repair should give log lines indicating the progress of the repair.
Killing a Repair

Execute StorageService.forceTerminateAllRepairSessions() ... on each node? Running this on one node didn't work for me and just caused the stuck repair session to continue!
Next Steps


Doing a rolling restart of our cluster was recommended.
Doing range repairs instead of full cluster repairs.

Resources


nodetool repair
Repair Docs
Range Repair Script
#cassandra on Freenode