Skip to content

Instantly share code, notes, and snippets.

@mihasya
Created August 12, 2015 20:41
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mihasya/cdee7e1274796c8ab1d2 to your computer and use it in GitHub Desktop.
Save mihasya/cdee7e1274796c8ab1d2 to your computer and use it in GitHub Desktop.

These are our cassandra upgrade checklists almost exactly as performed. It includes notes about unexpected things that occured on the first node.

  • stop Chef from running via cron
  • knife node run_list set ***********.opsmatic.com steph-role
  • nodetool upgradesstables - this appears to be required before AND after upgrade?
  • nodetool drain (gracefully stop serving traffic)
  • sudo service cassandra stop
  • remove prod-cassandra security group and verify the node cannot talk to the rest of the machines on the service ports
    • aws ec2 modify-instance-attribute --instance-id i-******** --groups sg-********
    • on host: telnet cass02.usw2.opsmatic.com 7000
  • sudo chef-client --once - actually apply the new Chef cookbooks.
    • Note: small snag: first chef run fails because somehow cassandra ends up getting started up with the old cassandra.yaml file; simply sudo rm /etc/cassanda/cassandra.yaml and run chef again
    • Note: small snag: cluster_ips used to populate seed nodes returned no results because no hosts were yet in steph-role. Using the _cluster_ips attribute override in the environment to get around this. That's probably for the best for the early part of the process, since I can manually set that to just have all cassandra-role nodes, which will allow the 2.x node to gossip with the original cluster.
    • Note: big snag: we were using a version of the cassandra cookbook that didn't support multiple data directories. Had to upgrade to version 3.4.0 of the cookbook (from 2.9.0) - the cookbook had since been renamed etc. Was a bit of open-heart chef surgery, but we're back at it.
  • visual spot check of configuration
    • initial_token should not be set in /etc/cassandra/cassandra.yaml
    • data directories should be pointing to /data1/keyspaces and /data2/keyspaces
    • Heap size should be 6GB
  • start the service back up (probably already done by chef) - sudo service cassandra start
  • service should start but complain about not being able to talk to the rest of the cluster. it SHOULD show itself in nodetool status as having a bunch of data, though it may not ever get to the point where nodetool will work due to not being able to gossip
  • restore prod-cassandra security group - aws ec2 modify-instance-attribute --instance-id i-******** --groups sg-******** sg-********
  • start service back up if it had previously died due to being unable to gossip
  • nodetool upgradesstables
    • Note: default compaction throughput got reset to 16mb/sec in steph-role; setting it manually to 1024 so that upgradesstables finishes more quickly, gonna set the chef default to whatever cassandra-role was running with when this operation is finished
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment