Skip to content

Instantly share code, notes, and snippets.

@stevenwilliamson
Created April 25, 2017 15:07
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save stevenwilliamson/264e97ab1b009908c8db61b489c508c4 to your computer and use it in GitHub Desktop.
Save stevenwilliamson/264e97ab1b009908c8db61b489c508c4 to your computer and use it in GitHub Desktop.

Triton Disaster Recovery

Useful context

According to the docs and from investigation Triton stores all important persistent state in the manatee cluster, which is a cluster of postgresql instances.

The manatee cluster itself depends upon the binder service operating which provides zookeeper and DNS services that manatee makes use of.

Most services store the state in manatee via Moray. Moray provides a key value store API that is backed by manatee. Restoring binder, manatee and moray is the main part of a restore. All other services depend on these services and are almost completely stateless.

Depending on how images are configured in the event a complete headnode loss images will not be restored as they are only stored locally.

Special measures should be taken to ensure any required images can be recovered from another source.

Total headnode loss recovery with HA binder/manatee

These instructions deal with restoring triton services after the complete loss of a headnode that can not be recovered. I.e the headnode has gone up in flames the data has been lost. If the issue is just hardware failure much simpler recovery methods exist.

These instructions are based on notes from working through a headnode recovery to restore triton services. There may be better ways and procedures may change but these instructions are from a tested restore as of Triton image 20170330

Recovery Steps

Clean headnode

If you are testing a recovery or have some state left and want to start from a fresh headnode factory reset the headnode.

zfs set smartdc:factoryreset=yes zones/var
zfs set smartdc:factoryreset=yes zones
reboot

NOTE: Make sure on reboot DELL BIOS is configured correctly just in case nothing appears on COM2 -> Enter system setup -> goto Serial Communication -> make sure Serial1 = COM1, Serial2 = COM2

  • Re-image the USB key remotely following SOP-NNN with the previously running platform image. Place the relevant answers.json file in /private directory on the USB stick. The answers.json should be a backup of all the installer questions from the previous install.

  • When the node reboots at the grub menu select the Live Image option if it's not the default.

  • After booting it will do some rsync foo then present a login prompt. It’s still not ready and while it looks to have completed is doing stuff in the background. Leave it be.

  • After install completes you will be told setup is complete and to press enter to continue.

  • The headnode will be up and the compute nodes will appear in cnapi but at this stage not much works yet.

  • After initial bring up you need to reconfigure any NIC aggregations as the installer does not handle these.

  • To do this mount the USB key sdc-usbkey mount edit /mnt/usbkey/config, save to the usbkey, and reboot.

Returning triton services to operation

No vms are likely to listed by sdc-vmapi list, however compute nodes are listed by sdc-server list. Restarting vm-agent on a compute node re-registers the VM’s and they appear again which is useful if you need to find zones running on other compute nodes.

At this stage we have two manatee clusters. One single node cluster on the headnode that thinks it is the primary, and one cluster on the old compute nodes that has a primary.

Re-cluster binder and manatee

  • login to manatee0 on the headnode
zlogin <uuid>
  • Disable services
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable manta/application/registrar:default
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable manta/application/manatee-sitter:default
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable manta/application/manatee-backupserver:default
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable manta/application/manatee-snapshotter:default
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) ~]# svcadm disable /smartdc/application/config-agent:default
  • Find one of the existing binder instances which should be on a designated compute note and grab the configuration
zlogin 758a5f4c-6808-4f5e-90b7-69be00323161>
[root@758a5f4c-6808-4f5e-90b7-69be00323161 (ash:binder1) ~]# cd /opt/local/etc/zookeeper/
[root@758a5f4c-6808-4f5e-90b7-69be00323161 (ash:binder1) /opt/local/etc/zookeeper]# cat zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
dataDir=/zookeeper/zookeeper
# the port at which the clients will connect
clientPort=2181
maxClientCnxns=0
server.3=192.168.12.37:2888:3888
server.1=192.168.12.5:2888:3888
server.4=192.168.12.38:2888:3888
  • Login to the new binder0 instance on the new headnode and disable services
zlogin <uuid>
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) ~]# svcadm disable smartdc/application/zookeeper:default
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) ~]# svcadm disable manta/application/binder:default
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) ~]# svcadm disable config-agent
  • Remove zookeeper data
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) ~]# cd /zookeeper/zookeeper
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /zookeeper/zookeeper]# rm -rf version-2/
  • Make sure myid matches the config we about to put in place
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /zookeeper/zookeeper]# cat myid
1
  • Update the config to include all the binder nodes for example
vim /opt/local/etc/zookeeper/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
dataDir=/zookeeper/zookeeper
# the port at which the clients will connect
clientPort=2181
maxClientCnxns=0
server.3=192.168.12.37:2888:3888
server.1=192.168.12.5:2888:3888
server.4=192.168.12.38:2888:3888
  • Enable zookeeper and check it comes up OK
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /opt/local/etc/zookeeper]# svcadm enable zookeeper
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /opt/local/etc/zookeeper]# less $(svcs -L zookeeper)
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /opt/local/etc/zookeeper]# less /var/log/zookeeper/zookeeper.log
  • Enable binder
[root@590ba76e-f5b8-4316-bfa9-a4e121ead7f1 (ash:binder0) /opt/local/etc/zookeeper]# svcadm enable binder
  • Logout of the binder zone

  • Login into the manatee0 zone on the headnode and update the config which resides at /opt/smartdc/manatee/etc

Edit sitter.json and set oneNodeWriteMode to false

Enable services

svcadm enable manta/application/manatee-backupserver:default
svcadm enable manta/application/registrar:default
  • Re-build the manatee instance
[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) /opt/smartdc/manatee/etc]# manatee-adm rebuild
This operation will remove all local data and rebuild this peer from another cluster peer.  This operation is destructive, and you should only proceed after confirming that this peer's copy of the data is no longer required.  (yes/no)?
prompt: no:  yes


pg-status should now look OK with the new manatee0 an async slave

[root@51d6230a-4451-47a2-8058-e74bcfb2f670 (ash:manatee0) /opt/smartdc/manatee/etc]# manatee-adm pg-status
ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG
primary  890914ba ok   sync  0/9B68F968    0/9B68F968    0/9B68F5C0    -
sync     bc86d76c ok   async 0/9B68F968    0/9B68F968    0/9B68F5C0    -
async    51d6230a ok   -     -             -             -             0m00s

You should now have a re-clustered manatee and binder with zones on the headnode.

Fix SAPI

Restarting the sapi service results in an unhandled exception and sapi then aborts. To fix this log into moray and find the instance record for the old sapi and add a new correct one.

  • Find the SAPI service UUID
findobjects sapi_services "(name=sapi)" | less
  • Find the sapi instance
findobjects sapi_instances "(type=vm)"
  • The sapi instance does not have an alias. You can check you have the right one by checking the service uuid in services to make sure it matches sapi.

  • Now using the sapi instance as a template take it’s data and create a new object but using the UUID of the new sapi0 zone in place of the old one

putobject -d '{ "uuid": "acc53926-b378-48f5-8a1b-598517f88514", "service_uuid": "5de5c5af-1311-4e2c-a89e-71ae77ff87a0", "type": "vm", "exists": true }' sapi_instances acc53926-b378-48f5-8a1b-598517f88514
  • Now restart sapi and this time it should come up OK.

  • sdc-healthcheck should report all OK now though we don’t yet have a cloudapi, cns, dockerlogger zone etc.

Fix SAPI data

SAPI is running however in a poor state because we have effectively created a brand new headnode with new core zones and then restored data from the old world. SAPI now has records for instances which no longer exist and the instances that do exist do not have SAPI records.

To sort SAPI we need to add data for each existing instance, and remove instances that refer to instances that are never to return.

A few instances have instance level configuration so check each one before re-creating the instances entry in SAPI. Binder for example required metadata for ZK_ID.

An example run through of sorting papi service manually

  • Grab service uuid
[root@headnode (ash) /opt/recover]# sdc-sapi /services | json -a name uuid | grep papi
papi ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72
  • Grab our current live instance
[root@headnode (ash) /opt/recover]# vmadm list | grep papi
0ab00556-4154-49ac-a890-92e0e59c73a9  OS    1024     running           papi0
  • Grab current instance data for reference you might want to save this to a file
[root@headnode (ash) /opt/recover]# sdc-sapi /instances?service_uuid=ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72 | json -H
[
  {
    "uuid": "4a1bca56-5aa8-44ce-bae1-b6cd7dbf3e5e",
    "service_uuid": "ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72",
    "params": {
      "alias": "papi0"
    },
    "metadata": {},
    "type": "vm"
  }
]
  • Create an instance entry for our live instance
[root@headnode (ash) /opt/recover]# sdc-sapi /instances -X POST -d '{ "name": "papi", "service_uuid": "ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72", "uuid": "0ab00556-4154-49ac-a890-92e0e59c73a9" }'
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 113
Date: Thu, 13 Apr 2017 11:55:37 GMT
Connection: keep-alive

{
  "uuid": "0ab00556-4154-49ac-a890-92e0e59c73a9",
  "service_uuid": "ad8a243d-0ddb-44cf-a1ff-d80ebeda3a72",
  "type": "vm"
}
  • Delete the old instance
[root@headnode (ash) /opt/recover]# sdc-sapi /instances/4a1bca56-5aa8-44ce-bae1-b6cd7dbf3e5e -X DELETE
HTTP/1.1 204 No Content
Date: Thu, 13 Apr 2017 11:58:06 GMT
Connection: keep-alive

There are some bash scripts at to ease working with sapi for this data fixup task.

When SAPI info is all current, reboot all core sdc instances other than binder and manatee.

SAPI recovery should now be complete

When sapi recovery is complete you can reprovision CNS, add external nics to adminui etc. CNS instances running on compute nodes should be up and functioning still. But you will want one on the headnode

Image Recovery

NOTE: at this stage we have not recovered any images in imgapi. So there may be metadata for images but the image files will not exist. You should have a working adminui though now. See images section for image recovery

  • Add an external nic to imgapi making it the primary interface

  • To fix the missing images sdc-login to the imgapi0 zone and run “imgapiadm check-files” This will show missing files.

  • Delete the images with missing files and re-import. If there upstream images you can re-import as normal. If there custom images you will need to add them from whereever they are stored and or re-create them. (We should make it clear to users that custom images are there responsability backup at least for now)

Example:

[root@headnode (ash) /opt/recover]# sdc-imgadm delete 03adc520-1665-11e7-a285-0336b790622b
sdc-imgadm: error (InternalError): error deleting image file: Error: ENOENT: no such file or directory, unlink '/data/imgapi/images/03a/03adc520-1665-11e7-a285-0336b790622b/file0'
[root@headnode (ash) /opt/recover]#

[root@headnode (ash) /opt/recover]# sdc-imgadm import 03adc520-1665-11e7-a285-0336b790622b -S https://updates.joyent.com
Imported image 03adc520-1665-11e7-a285-0336b790622b (cns, master-20170331T225040Z-gcac1002, state=active)
[root@headnode (ash) /opt/recover]#
  • You can also use the adminui to import the image if you have it working. (You will need to add an external NIC to adminui first)

  • You will need to do this to restore all missing images. (side note really should use manta for imgapi backend)

  • You can use sdcadm to create a cloudapi instance (you may need to delete the existing instance entry in SAPI if you have not removed it already).

  • Add external NICS to cloudapi.

Main Triton Recovery Complete.

Headnode hardware failures

Use this procedure when the headnode has suffered a hardware failure but the data is intact. I.e motherboard has been replaced of the disks have been installed in a new chassis.

  • Boot the headnode as normal

  • Mount the usb key

sdc-usbkey mount
  • Update the NIC configuration so the MAC addresses are correct for the new hardware.

Now there will be two headnode entries present one for the old headnode and one for the new. To delete the old headnode entry:

Put DC in maintenance Find correct server in adminui Actions -> “Forget Server” Confirm it’s the correct server Run sdc-healthcheck / sdcadm health & make sure they’re happy Remove DC from maintenance

NOTE: If you have VM’s other than SDC core zones on the headnode which is inadvisable you may need to re-create nictags with nictagadm before those VM’s can be started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment