The goal of this process is to upgrade a manatee of any vintage to Manatee v2. It relies on ZFS send/recv to replicate the data, but is limited to a migration between nodes in ONWM.
Upgrading moray to a forward/backward compatible version is a prerequisite of the upgrade. The usual process is to disable one moray node, double-check the stack reconnects correctly, reprovision that node, and then repeat for other moray nodes.
If there is only one moray node deployed, deploying a second using the new image allows you to upgrade the original node as above.
Pre work - update the moray image in SAPI:
SDC=$(sdc-sapi /applications?name=sdc | json -Ha uuid)
MORAY_SVC=$(sdc-sapi "/services?name=moray&application_uuid=$SDC" | json -Ha uuid)
sapiadm update $MORAY_SVC params.image_uuid=$MORAY_IMAGE
In the target moray zone:
svcadm disable registrar
svcadm disable *moray-202*
In the headnode GZ:
sapiadm reprovision $TARGET_MORAY $MORAY_IMAGE
It can help to be logged on to the CNs of each manatee node, but most operations will be done in the HN/GZ via sdc-oneachnode
. First we'll se up some environment variables:
SDC=$(sdc-sapi /applications?name=sdc | json -Ha uuid)
MANATEE_SVC=$(sdc-sapi "/services?name=manatee&application_uuid=$SDC" | json -Ha uuid)
# NB: sdc-manatee-stat isn't present in the GZ on all installs;
# manatee-stat inside any manatee zone will be. Adjust if required.
MANATEE_STAT=$(sdc-manatee-stat | json | tee initial_manatee_stat.json)
PRIMARY=$(echo $MANATEE_STAT | json sdc.primary.zoneId)
CN_PRIMARY=$(sdc-vmapi /vms/$PRIMARY | json -H server_uuid)
SYNC=$(echo $MANATEE_STAT | json sdc.sync.zoneId)
CN_SYNC=$(sdc-vmapi /vms/$SYNC | json -H server_uuid)
ASYNC=$(echo $MANATEE_STAT | json sdc.async.zoneId)
CN_ASYNC=$(sdc-vmapi /vms/$ASYNC | json -H server_uuid)
Now we will proceed to disable the async and sync nodes:
sdc-oneachnode -n $CN_ASYNC "svcadm -z $ASYNC disable manatee-sitter"
sdc-oneachnode -n $CN_SYNC "svcadm -z $SYNC disable manatee-sitter"
Set ONWM on the primary node:
svcadm disable config-agent
vim /opt/smartdc/manatee/etc/sitter.json
# search for oneNodeWriteMode and set it to true.
svcadm restart manatee-sitter
In the GZ of the CN of the primary manatee:
MANATEE_UUID=$(vmadm lookup alias=~manatee)
zlogin $MANATEE_UUID "svcadm disable manatee-sitter" < /dev/null
zfs snapshot -r zones/$MANATEE_UUID/data/manatee@backup
zfs send zones/$MANATEE_UUID/data/manatee@backup >./manatee-backup.zfs
zfs destroy zones/$MANATEE_UUID/data/manatee@backup
zlogin $MANATEE_UUID "svcadm enable manatee-sitter" < /dev/null
In this step, we will set up a manatee node with the new code, but configured to not start manatee-sitter on setup (preventing the creation of any state). We will then destroy its delegated dataset, and set it up to zfs recv in the next step. Using the environment variables above:
sapiadm update $MANATEE_SVC params.image_uuid=$MANATEE_IMAGE
sapiadm update $ASYNC metadata.DISABLE_SITTER=true
sapiadm update $ASYNC metadata.ONE_NODE_WRITE_MODE=true
sapiadm reprovision $ASYNC $MANATEE_IMAGE
Alternately, we can provision a new manatee node on the same CN as the primary, to speed up the zfs send/recv section of the upgrade. In that case, instead of the above reprovision steps, perform the following:
sapiadm update $MANATEE_SVC params.image_uuid=$MANATEE_IMAGE
cat > manatee3.json <<EOF
{
"service_uuid": "$MANATEE_SVC",
"params": {
"alias": "manatee3",
"server_uuid: "$CN_PRIMARY"
},
"metadata": {
"DISABLE_SITTER": true,
"ONE_NODE_WRITE_MODE": true
}
}
EOF
sdc-sapi /instances -X POST -d@manatee3.json
Wait for the reprovision to complete, and log into $ASYNC
, then:
# ensure that manatee-sitter is indeed disabled
svcs manatee-sitter
# take note of IP address
ifconfig | grep inet
# destroy dataset
ZONE_UUID=$(sysinfo | json UUID)
zfs destroy -r zones/$ZONE_UUID/data/manatee
nc -l 1337 | zfs recv zones/$ZONE_UUID/data/manatee
On the old primary (now in ONWM), we will disable manatee, take a snapshot, and send it to the v2 manatee node.
ZONE_UUID=$(sysinfo | json UUID)
IP=XXX # from above.
svcadm disable manatee-sitter
zfs snapshot zones/$ZONE_UUID/data/manatee@migrate
zfs send -v zones/$ZONE_UUID/data/manatee@migrate | nc $IP 1337
NB: rolling back after this point is slightly more complicated, see below.
When the zfs send has completed, you can svcadm enable manatee-sitter
in the new zone. manatee-stat
in the new zone will indicate when the primary comes online, and the stack can be checked via the usual sdc-healthcheck
, provisioning tests, and so on.
A slight modification of the steps outlined here:
- reprovision the v1 sync manatee
sapiadm reprovision $SYNC $MANATEE_IMAGE
- take ONE_NODE_WRITE_MODE out of sapi:
sapiadm update $MANATEE_V2_PRIMARY metadata.ONE_NODE_WRITE_MODE=false
- Stop the v2 primary manatee sitter process:
sdc-oneachnode -n $CN_PRIMARY "svcadm -z $PRIMARY disable manatee-sitter"
- Run # manatee-adm unfreeze.
- Start the v2 primary manatee sitter process:
sdc-oneachnode -n $CN_PRIMARY "svcadm -z $PRIMARY enable manatee-sitter"
The modification takes advantage of the fact that manatee-sitter does not immediately pick up new config, so we can use SAPI to write the new config file, but it will not take effect until after manatee is stopped and restarted.
We should now have a 2-machine manatee v2 cluster, and can check the stack in the usual way. If that's successful, the final step is to reprovision the old v1 manatee; no other special steps are required for that.
At any time before enabling the v2 manatee (step 6), rollback is straight-forward:
svcadm enable config-agent
on the v1 primary- wait & double-check that
oneNodeWriteMode: false
is in/opt/smartdc/manatee/sitter.json
svcadm restart manatee-sitter
in the v1 primary, wait formanatee-stat
to stabilizesvcadm enable manatee-sitter
in the syncsapiadm reprovision $MANATEE_ASYNC $OLD_MANATEE_IMAGE
to revive the async (it may also require a rebuild)
After enabling the v2 manatee, rollback requires a little more consideration, as we may have accepted writes that we are not prepared to discard. In this case, the "reverse brain transplant" could be attempted; sending a snapshot of the v2 manatee back to the v1 primary (or perhaps the v1 sync).
If all else fails, we can restore the v1 manatee from its pre-migration backup.