irvingpop/ha_drbd.md

## ha_drbd.md

      
    Raw
  

              ha_drbd.md
            
          
    Customer Scenario

A customer has a Chef Server 12 (HA - DRBD) in Production. They want to test an in-place upgrade (or maintenance) using their current OPC Production data and config. This gives us a good chance to make corrections
if we find that their data is too broken for the migrations to handle, and gives the customer experience in managing the upgrade in Production.
The sequence of events will broadly be these:

Install the same version of Chef Server on the target HA Test cluster
Restore data from Production instance backup  (LVM snapshot or full-stop backup)
Test

The Process


Assuming a clean target system, nothing in /etc/opscode, /opt/opscode or /var/opt/opscode, no processes running which match ps -ef |grep opscode

Create the backup on the production cluster


Create data backup from Production source (the bootstrap backend, which should also be the ACTIVE backend)
A. If the server is 12.2.0 or greater, use the chef-server-ctl backup tool and create an OFFLINE backup
B. Otherwise, you will need to create a manual backup:
on the standby backend: chef-server-ctl stop keepalived
the rest of the steps on the active backend:
chef-server-ctl stop # to stop all services except keepalived
Ensure keepalived still considers this server to be active and keeps the DRBD volume mounted, but all services except keepalived are stopped: chef-server-ctl ha-status, chef-server-ctl status and mount | grep drbd
tar -czf /tmp/chefbackup-destination.tar.gz /etc/opscode /etc/opscode-reporting /etc/opscode-manage /var/opt/opscode/drbd/data
Copy the tarball off of the system and restore all services that were stopped

On the target system:  Install and Verify CS12


Install CS12 on the backends and frontends as described in the documentation and validate the system is working correctly
Restore the Production cluster data
A. If your backup data was created by a Chef Server 12.10.0 or greater version using chef-server-ctl backup
Copy the backup tarball onto the primary/bootstrap backend
chef-server-ctl restore /path/to/backup.tar.gz
B. If your backup data was created by an older CS12 cluster, follow the same steps as 2B to prepare the cluster for restore
on the standby backend: chef-server-ctl stop keepalived
the rest of the steps on the active backend:
chef-server-ctl stop # to stop all services except keepalived
Ensure keepalived still considers this server to be active and keeps the DRBD volume mounted, but all services except keepalived are stopped: chef-server-ctl ha-status, chef-server-ctl status and mount | grep drbd
Remove the DRBD data:  rm -rf /var/opt/opscode/drbd/data/*
Restore the backup on the current bootstrap Primary target system:

tar -xvz chefbackup-source.tar -C /


chef-server-ctl reconfigure && opscode-manage-ctl reconfigure
chef-server-ctl start
Copy the configuration folders: (/etc/opscode, /etc/opscode-manage, /etc/opscode-reporting) to the frontends and secondary backend
Reconfigure the frontends and TEST

TEST


On each chef server, edit the /etc/hosts file
determine the api_fqdn value (ex: chef.mycompany.com)
determine the primary IP address of the given node (ex: 10.10.10.5)
Alias the api_fqdn to the local host by adding an entry to the /etc/hosts file like so:

10.10.10.5 chef.mycompany.com


NOTE:  It is safe to leave this entry permanently, it is only relied upon by the test suite
Test the system:
chef-server-ctl test
curl -k http://localhost/_status
Check DRBD status to be sure we are replicating with the secondary
Test a selection of orgs and operations on those orgs: client
lists, node lists, chef-client runs, group memberships, and any other important or
desired tests.