These are our cassandra upgrade checklists almost exactly as performed. It includes notes about unexpected things that occured on the first node.
- stop Chef from running via cron
knife node run_list set ***********.opsmatic.com steph-role
- nodetool upgradesstables - this appears to be required before AND after upgrade?
nodetool drain
(gracefully stop serving traffic)sudo service cassandra stop
- remove prod-cassandra security group and verify the node cannot talk to the rest of the machines on the service ports
aws ec2 modify-instance-attribute --instance-id i-******** --groups sg-********
- on host:
telnet cass02.usw2.opsmatic.com 7000
sudo chef-client --once
- actually apply the new Chef cookbooks.- Note: small snag: first chef run fails because somehow cassandra ends up getting started up with the old cassandra.yaml file; simply
sudo rm /etc/cassanda/cassandra.yaml
and run chef again - Note: small snag:
cluster_ips
used to populate seed nodes returned no results because no hosts were yet insteph-role
. Using the_cluster_ips
attribute override in the environment to get around this. That's probably for the best for the early part of the process, since I can manually set that to just have allcassandra-role
nodes, which will allow the 2.x node to gossip with the original cluster. - Note: big snag: we were using a version of the cassandra cookbook that didn't support multiple data directories. Had to upgrade to version 3.4.0 of the cookbook (from 2.9.0) - the cookbook had since been renamed etc. Was a bit of open-heart chef surgery, but we're back at it.
- Note: small snag: first chef run fails because somehow cassandra ends up getting started up with the old cassandra.yaml file; simply
- visual spot check of configuration
initial_token
should not be set in/etc/cassandra/cassandra.yaml
- data directories should be pointing to
/data1/keyspaces
and/data2/keyspaces
- Heap size should be 6GB
- start the service back up (probably already done by chef) -
sudo service cassandra start
- service should start but complain about not being able to talk to the rest of the cluster. it SHOULD show itself in nodetool status as having a bunch of data, though it may not ever get to the point where nodetool will work due to not being able to gossip
- restore prod-cassandra security group -
aws ec2 modify-instance-attribute --instance-id i-******** --groups sg-******** sg-********
- start service back up if it had previously died due to being unable to gossip
nodetool upgradesstables
- Note: default compaction throughput got reset to 16mb/sec in
steph-role
; setting it manually to 1024 so that upgradesstables finishes more quickly, gonna set the chef default to whatevercassandra-role
was running with when this operation is finished
- Note: default compaction throughput got reset to 16mb/sec in