stevenwilliamson/triton-headnode-dr.md

## triton-headnode-dr.md

      
    Raw
  

              triton-headnode-dr.md
            
          
    Triton Disaster Recovery

Useful context

According to the docs and from investigation Triton stores all important persistent state in the manatee cluster, which is a cluster of postgresql instances.
The manatee cluster itself depends upon the binder service operating which provides zookeeper and DNS services that manatee makes use of.
Most services store the state in manatee via Moray. Moray provides a key value store API that is backed by manatee. Restoring binder, manatee and moray is the main part of a restore. All other services depend on these services and are almost completely stateless.
Depending on how images are configured in the event a complete headnode loss images will not be restored as they are only stored locally.
Special measures should be taken to ensure any required images can be recovered from another source.
Total headnode loss recovery with HA binder/manatee

These instructions deal with restoring triton services after the complete loss of a headnode that can not be recovered. I.e the headnode has gone up in flames the data has been lost. If the issue is just hardware failure much simpler recovery methods exist.
These instructions are based on notes from working through a headnode recovery to restore triton services. There may be better ways and procedures may change but these instructions are from a tested restore as of Triton image 20170330
Recovery Steps

Clean headnode

If you are testing a recovery or have some state left and want to start from a fresh headnode factory reset the headnode.
zfs set smartdc:factoryreset=yes zones/var
zfs set smartdc:factoryreset=yes zones
reboot


NOTE: Make sure on reboot DELL BIOS is configured correctly just in case nothing appears on COM2
-> Enter system setup -> goto Serial Communication -> make sure Serial1 = COM1, Serial2 = COM2


Re-image the USB key remotely following SOP-NNN with the previously running platform image. Place the relevant answers.json file in /private directory on the USB stick. The answers.json should be a backup of all the installer questions from the previous install.


When the node reboots at the grub menu select the Live Image option if it's not the default.


After booting it will do some rsync foo then present a login prompt. It’s still not ready and while it looks to have completed is doing stuff in the background. Leave it be.


After install completes you will be told setup is complete and to press enter to continue.


The headnode will be up and the compute nodes will appear in cnapi but at this stage not much works yet.


After initial bring up you need to reconfigure any NIC aggregations as the installer does not handle these.


To do this mount the USB key sdc-usbkey mount edit /mnt/usbkey/config, save to the usbkey, and reboot.


Returning triton services to operation

No vms are likely to listed by sdc-vmapi list, however compute nodes are listed by sdc-server list.
Restarting vm-agent on a compute node re-registers the VM’s and they appear again which is useful if you need to find zones running on other compute nodes.
At this stage we have two manatee clusters. One single node cluster on the headnode that thinks it is the primary, and one cluster on the old compute nodes that has a primary.
Re-cluster binder and manatee


login to manatee0 on the headnode

zlogin <uuid>


Disable services

[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable manta/application/registrar:default
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable manta/application/manatee-sitter:default
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable manta/application/manatee-backupserver:default
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable manta/application/manatee-snapshotter:default
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable /smartdc/application/config-agent:default


Find one of the existing binder instances which should be on a designated compute note and grab the configuration

zlogin 758a5f4c-6808-4f5e-90b7-69be00323161>
[root@758a5f4c-6808-4f5e-90b7-69be00323161 (ash:binder1) ~]# cd /opt/local/etc/zookeeper/
[root@758a5f4c-6808-4f5e-90b7-69be00323161 (ash:binder1) /opt/local/etc/zookeeper]# cat zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
dataDir=/zookeeper/zookeeper
# the port at which the clients will connect
clientPort=2181
maxClientCnxns=0
server.3=192.168.12.37:2888:3888
server.1=192.168.12.5:2888:3888
server.4=192.168.12.38:2888:3888


Login to the new binder0 instance on the new headnode and disable services

zlogin <uuid>
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) ~]# svcadm disable smartdc/application/zookeeper:default
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) ~]# svcadm disable manta/application/binder:default
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) ~]# svcadm disable config-agent


Remove zookeeper data

[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) ~]# cd /zookeeper/zookeeper
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /zookeeper/zookeeper]# rm -rf version-2/


Make sure myid matches the config we about to put in place

[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /zookeeper/zookeeper]# cat myid
1


Update the config to include all the binder nodes for example

vim /opt/local/etc/zookeeper/zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
dataDir=/zookeeper/zookeeper
# the port at which the clients will connect
clientPort=2181
maxClientCnxns=0
server.3=192.168.12.37:2888:3888
server.1=192.168.12.5:2888:3888
server.4=192.168.12.38:2888:3888


Enable zookeeper and check it comes up OK

[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /opt/local/etc/zookeeper]# svcadm enable zookeeper
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /opt/local/etc/zookeeper]# less $(svcs -L zookeeper)
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /opt/local/etc/zookeeper]# less /var/log/zookeeper/zookeeper.log


Enable binder

[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /opt/local/etc/zookeeper]# svcadm enable binder


Logout of the binder zone


Login into the manatee0 zone on the headnode and update the config which resides at /opt/smartdc/manatee/etc


Edit sitter.json and set oneNodeWriteMode to false
Enable services
svcadm enable manta/application/manatee-backupserver:default
svcadm enable manta/application/registrar:default


Re-build the manatee instance

[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) /opt/smartdc/manatee/etc]# manatee-adm rebuild
This operation will remove all local data and rebuild this peer from another cluster peer.  This operation is destructive, and you should only proceed after confirming that this peer's copy of the data is no longer required.  (yes/no)?
prompt: no:  yes


pg-status should now look OK with the new manatee0 an async slave

[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) /opt/smartdc/manatee/etc]# manatee-adm pg-status
ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG
primary  890914ba ok   sync  0/9B68F968    0/9B68F968    0/9B68F5C0    -
sync     bc86d76c ok   async 0/9B68F968    0/9B68F968    0/9B68F5C0    -
async    51d6230a ok   -     -             -             -             0m00s

You should now have a re-clustered manatee and binder with zones on the headnode.
Fix SAPI

Restarting the sapi service results in an unhandled exception and sapi then aborts. To fix this log into moray and find the instance record for the old sapi and add a new correct one.

Find the SAPI service UUID

findobjects sapi_services "(name=sapi)" | less


Find the sapi instance

findobjects sapi_instances "(type=vm)"


The sapi instance does not have an alias. You can check you have the right one by checking the service uuid in services to make sure it matches sapi.


Now using the sapi instance as a template take it’s data and create a new object but using the UUID of the new sapi0 zone in place of the old one


putobject -d '{ "uuid": "acc53926-b378-48f5-8a1b-598517f88514", "service_uuid": "5de5c5af-1311-4e2c-a89e-71ae77ff87a0", "type": "vm", "exists": true }' sapi_instances acc53926-b378-48f5-8a1b-598517f88514


Now restart sapi and this time it should come up OK.


sdc-healthcheck should report all OK now though we don’t yet have a cloudapi, cns, dockerlogger zone etc.


Fix SAPI data

SAPI is running however in a poor state because we have effectively created a brand new headnode with new core zones and then restored data from the old world. SAPI now has records for instances which no longer exist and the instances that do exist do not have SAPI records.
To sort SAPI we need to add data for each existing instance, and remove instances that refer to instances that are never to return.
A few instances have instance level configuration so check each one before re-creating the instances entry in SAPI. Binder for example required metadata for ZK_ID.
An example run through of sorting papi service manually

Grab service uuid

[root@headnode (ash) /opt/recover]# sdc-sapi /services | json -a name uuid | grep papi
papi ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72


Grab our current live instance

[root@headnode (ash) /opt/recover]# vmadm list | grep papi
0ab00556-4154-49ac-a890-92e0e59c73a9  OS    1024     running           papi0


Grab current instance data for reference you might want to save this to a file

[root@headnode (ash) /opt/recover]# sdc-sapi /instances?service_uuid=ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72 | json -H
[
  {
    "uuid": "4a1bca56-5aa8-44ce-bae1-b6cd7dbf3e5e",
    "service_uuid": "ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72",
    "params": {
      "alias": "papi0"
    },
    "metadata": {},
    "type": "vm"
  }
]


Create an instance entry for our live instance

[root@headnode (ash) /opt/recover]# sdc-sapi /instances -X POST -d '{ "name": "papi", "service_uuid": "ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72", "uuid": "0ab00556-4154-49ac-a890-92e0e59c73a9" }'
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 113
Date: Thu, 13 Apr 2017 11:55:37 GMT
Connection: keep-alive

{
  "uuid": "0ab00556-4154-49ac-a890-92e0e59c73a9",
  "service_uuid": "ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72",
  "type": "vm"
}


Delete the old instance

[root@headnode (ash) /opt/recover]# sdc-sapi /instances/4a1bca56-5aa8-44ce-bae1-b6cd7dbf3e5e -X DELETE
HTTP/1.1 204 No Content
Date: Thu, 13 Apr 2017 11:58:06 GMT
Connection: keep-alive

There are some bash scripts at  to ease working with sapi for this data fixup task.
When SAPI info is all current, reboot all core sdc instances other than binder and manatee.
SAPI recovery should now be complete
When sapi recovery is complete you can reprovision CNS, add external nics to adminui etc. CNS instances running on compute nodes should be up and functioning still. But you will want one on the headnode
Image Recovery


NOTE: at this stage we have not recovered any images in imgapi. So there may be metadata for images but the image files will not exist. You should have a working adminui though now. See images section for image recovery


Add an external nic to imgapi making it the primary interface


To fix the missing images sdc-login to the imgapi0 zone and run “imgapiadm check-files” This will show missing files.


Delete the images with missing files and re-import. If there upstream images you can re-import as normal. If there custom images you will need to add them from whereever they are stored and or re-create them. (We should make it clear to users that custom images are there responsability backup at least for now)


Example:
[root@headnode (ash) /opt/recover]# sdc-imgadm delete 03adc520-1665-11e7-a285-0336b790622b
sdc-imgadm: error (InternalError): error deleting image file: Error: ENOENT: no such file or directory, unlink '/data/imgapi/images/03a/03adc520-1665-11e7-a285-0336b790622b/file0'
[root@headnode (ash) /opt/recover]#

[root@headnode (ash) /opt/recover]# sdc-imgadm import 03adc520-1665-11e7-a285-0336b790622b -S https://updates.joyent.com
Imported image 03adc520-1665-11e7-a285-0336b790622b (cns, master-20170331T225040Z-gcac1002, state=active)
[root@headnode (ash) /opt/recover]#


You can also use the adminui to import the image if you have it working. (You will need to add an external NIC to adminui first)


You will need to do this to restore all missing images. (side note really should use manta for imgapi backend)


You can use sdcadm to create a cloudapi instance (you may need to delete the existing instance entry in SAPI if you have not removed it already).


Add external NICS to cloudapi.


Main Triton Recovery Complete.
Headnode hardware failures

Use this procedure when the headnode has suffered a hardware failure but the data is intact. I.e motherboard has been replaced of the disks have been installed in a new chassis.


Boot the headnode as normal


Mount the usb key


sdc-usbkey mount


Update the NIC configuration so the MAC addresses are correct for the new hardware.

Now there will be two headnode entries present one for the old headnode and one for the new. To delete the old headnode entry:
Put DC in maintenance
Find correct server in adminui
Actions -> “Forget Server”
Confirm it’s the correct server
Run sdc-healthcheck / sdcadm health & make sure they’re happy
Remove DC from maintenance
NOTE: If you have VM’s other than SDC core zones on the headnode which is inadvisable you may need to re-create nictags with nictagadm before those VM’s can be started.