A customer has a Chef Server 12 (HA - DRBD) in Production. They want to test an in-place upgrade (or maintenance) using their current OPC Production data and config. This gives us a good chance to make corrections if we find that their data is too broken for the migrations to handle, and gives the customer experience in managing the upgrade in Production.
The sequence of events will broadly be these:
- Install the same version of Chef Server on the target HA Test cluster
- Restore data from Production instance backup (LVM snapshot or full-stop backup)
- Test
- Assuming a clean target system, nothing in /etc/opscode, /opt/opscode or /var/opt/opscode, no processes running which match
ps -ef |grep opscode
- Create data backup from Production source (the bootstrap backend, which should also be the ACTIVE backend)
A. If the server is 12.2.0 or greater, use the
chef-server-ctl backup
tool and create an OFFLINE backup B. Otherwise, you will need to create a manual backup: - on the standby backend:
chef-server-ctl stop keepalived
- the rest of the steps on the active backend:
chef-server-ctl stop
# to stop all services except keepalived- Ensure keepalived still considers this server to be active and keeps the DRBD volume mounted, but all services except keepalived are stopped:
chef-server-ctl ha-status
,chef-server-ctl status
andmount | grep drbd
tar -czf /tmp/chefbackup-destination.tar.gz /etc/opscode /etc/opscode-reporting /etc/opscode-manage /var/opt/opscode/drbd/data
- Copy the tarball off of the system and restore all services that were stopped
- Install CS12 on the backends and frontends as described in the documentation and validate the system is working correctly
- Restore the Production cluster data
A. If your backup data was created by a Chef Server 12.10.0 or greater version using
chef-server-ctl backup
- Copy the backup tarball onto the primary/bootstrap backend
chef-server-ctl restore /path/to/backup.tar.gz
B. If your backup data was created by an older CS12 cluster, follow the same steps as 2B to prepare the cluster for restore- on the standby backend:
chef-server-ctl stop keepalived
- the rest of the steps on the active backend:
chef-server-ctl stop
# to stop all services except keepalived- Ensure keepalived still considers this server to be active and keeps the DRBD volume mounted, but all services except keepalived are stopped:
chef-server-ctl ha-status
,chef-server-ctl status
andmount | grep drbd
- Remove the DRBD data:
rm -rf /var/opt/opscode/drbd/data/*
- Restore the backup on the current bootstrap Primary target system:
tar -xvz chefbackup-source.tar -C /
chef-server-ctl reconfigure && opscode-manage-ctl reconfigure
chef-server-ctl start
- Copy the configuration folders: (/etc/opscode, /etc/opscode-manage, /etc/opscode-reporting) to the frontends and secondary backend
- Reconfigure the frontends and TEST
- On each chef server, edit the
/etc/hosts
file - determine the
api_fqdn
value (ex: chef.mycompany.com) - determine the primary IP address of the given node (ex: 10.10.10.5)
- Alias the api_fqdn to the local host by adding an entry to the
/etc/hosts
file like so:
10.10.10.5 chef.mycompany.com
- NOTE: It is safe to leave this entry permanently, it is only relied upon by the test suite
- Test the system:
chef-server-ctl test
curl -k http://localhost/_status
- Check DRBD status to be sure we are replicating with the secondary
- Test a selection of orgs and operations on those orgs: client lists, node lists, chef-client runs, group memberships, and any other important or desired tests.
I wanted to know what will happen to chef server which is not active, if I stop the controller process i.e. chef-server-ctl and start it. will it become active.