Skip to content

Instantly share code, notes, and snippets.

@schmichael
Created July 8, 2014 18:32
Show Gist options
  • Save schmichael/814a993735308f7040ea to your computer and use it in GitHub Desktop.
Save schmichael/814a993735308f7040ea to your computer and use it in GitHub Desktop.

TODO

  • Upgrade to cassandra 2.0.9 #1381
  • Cron Repairs
  • Salt Cassandra boxes

Repairs

Cassandra wants you to repair weekly to make sure deleted cells are actually removed before their tombstones expire and they come back from the dead. Sadly repairs seem to fail frequently, so we currently don't attempt them.

Running a repair is easy:

nodetool repair

That command will block until the repair finishes (which it may not), but you can kill the command at any time. This will not kill the repair.

Repair Progress

Repairs compact and then stream via AntiEntropy Sessions. There are three commands to watch these:

nodetool tpstats

Shows how many AntiEntropy Sessions are running and how many are pending. Note that not all AE sessions for a repair are queued up at the beginning, so it's not possible to tell how much longer a repair will take given the number of pending AE sessions.

nodetool compactionstats

Shows outstanding compactions which indicate a repair taking place (probably? maybe not always?).

nodetool netstats

Shows current streams which indicate a repair taking place (probably? maybe not always?).

Grepping /var/log/cassandra/system.log for RepairJobTask or just Repair should give log lines indicating the progress of the repair.

Killing a Repair

Execute StorageService.forceTerminateAllRepairSessions() ... on each node? Running this on one node didn't work for me and just caused the stuck repair session to continue!

Next Steps

  • Doing a rolling restart of our cluster was recommended.
  • Doing range repairs instead of full cluster repairs.

Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment