Skip to content

Instantly share code, notes, and snippets.

@jujaga
Last active March 2, 2020 19:00
Show Gist options
  • Save jujaga/51c11383d07bf30f006e1610327c745c to your computer and use it in GitHub Desktop.
Save jujaga/51c11383d07bf30f006e1610327c745c to your computer and use it in GitHub Desktop.
Migrating Patroni from GlusterFS to NetApp Storage

Migrating Patroni from GlusterFS to NetApp Storage

These scripts assist you with migrating an existing Patroni cluster from GlusterFS to NetApp storage. The strategy depends on Patroni's ability to leverage Postgres WAL history archiving to replay transactions and maintain history lineage. Note that this gist runs with a few assumptions and isn't fully parameterized. We assume your Patroni cluster is named "patroni-master", and have a working knowledge of OpenShift and patronictl.

High Level Strategy

  1. Add temporary permissive NSP rules to namespace
  2. Create a new temporary statefulset which will extend the existing statefulset cluster
  3. Force master to failover to the new temporary node
  4. Delete the original statefulset and its PVCs
  5. Recreate the original statefulset cluster with the new storage class request
  6. Force master to failover to one of the new cluster nodes
  7. Delete the temporary statefulset and its PVC
  8. Remove any temporary NSP rules in namespace

Steps

Run the following:

export NAMESPACE=<YOURNAMESPACE>

./clone-statefulset-temp.sh $NAMESPACE patroni-master

This will create a temporary statefulset attached with the existing cluster. Follow the instructions printed on screen to force a failover to the temporary pod and then delete the original statefulset and PVC.

Once everything is now on the temporary statefulset only, run the following:

export NAMESPACE=<YOURNAMESPACE>

./clone-statefulset-restore.sh $NAMESPACE patroni-master-x

This will reverse the process by recreating the original statefulset attached to the existing cluster. Follow the instructions printed on screen to force a failover to one of the re-established pods and then delete the temporary statefulset and PVC. You will also want to remove any temporary NSP that were used with the following command:

export NAMESPACE=<YOURNAMESPACE>

oc delete -n $NAMESPACE nsp -l template=networksecurity-template

At the end of this, you should now be on NetApp block storage without any downtime to your dependent applications.

Acknowledgements

A big thanks to @cvarjao for his initial script and pattern idea: https://github.com/cvarjao/platform-services/blob/patroni-supporting-scripts/apps/pgsql/patroni/scripts/clone-statefulset.sh

#!/bin/bash
set -Eeu
set -o pipefail
NAMESPACE="$1"
SOURCE_STATEFULSET="$2"
TARGET_STATEFULSET="patroni-master"
function waitUntilAllReady(){
local NAME="$1"
local IS_READY=0
echo "Waiting for ${NAME} to become ready";
while [ $IS_READY -eq 0 ]; do
sleep 2
oc -n $NAMESPACE get "${NAME}" -o 'custom-columns=DESIRED:.spec.replicas,READY:.status.readyReplicas' --no-headers | awk '{if ($1 == $2) exit 0; else exit 1 }' && IS_READY=1 || true
done
}
echo "Clone from temporary Patroni statefulset back to full instance with 3 replicas"
oc -n "$NAMESPACE" get "statefulset/${SOURCE_STATEFULSET}" --export -o json | jq '.metadata.name = "patroni-master" | .spec.selector.matchLabels."statefulset" = .metadata.name | .spec.template.metadata.labels."statefulset" = .metadata.name | ( .spec.volumeClaimTemplates[] | .metadata.name) |= "postgresql" | ( .spec.template.spec.volumes[] | .persistentVolumeClaim.claimName) |= "postgresql" | .spec.replicas = 3 | del (.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration",.metadata.selfLink)' | jq '( .spec.volumeClaimTemplates[] | .spec.storageClassName ) |= "netapp-block-standard"' | jq '( .spec.volumeClaimTemplates[] | .metadata.annotations."volume.beta.kubernetes.io/storage-class" ) |= "netapp-block-standard"' | oc -n "$NAMESPACE" create -f - --save-config=true
waitUntilAllReady "statefulset/${TARGET_STATEFULSET}"
echo "Get onto cluster pods, and manually 'patronictl failover' to one of the main pods"
echo "Once failover is complete, manually run the following to remove the temporary statefulset. THIS IS DESTRUCTIVE!"
echo "oc delete -n $NAMESPACE statefulset/${SOURCE_STATEFULSET} --wait && oc delete -n $NAMESPACE pvc -l statefulset=${SOURCE_STATEFULSET} --wait"
echo "Finally, run the following to remove temporary NSP overrides to namespace"
echo "oc delete -n $NAMESPACE nsp -l template=networksecurity-template"
#!/bin/bash
set -Eeu
set -o pipefail
NAMESPACE="$1"
SOURCE_STATEFULSET="$2"
TARGET_STATEFULSET="${3:-${SOURCE_STATEFULSET}-x}"
function waitUntilAllReady(){
local NAME="$1"
local IS_READY=0
echo "Waiting for ${NAME} to become ready";
while [ $IS_READY -eq 0 ]; do
sleep 2
oc -n $NAMESPACE get "${NAME}" -o 'custom-columns=DESIRED:.spec.replicas,READY:.status.readyReplicas' --no-headers | awk '{if ($1 == $2) exit 0; else exit 1 }' && IS_READY=1 || true
done
}
echo "Add temporary NSP overrides to namespace"
oc process -n $NAMESPACE -f https://raw.githubusercontent.com/wiki/bcgov/nr-get-token/assets/nsp.yaml -p NAMESPACE=$NAMESPACE -o yaml | oc apply -n $NAMESPACE -f -
echo "Clone Patroni statefulset with one replica"
oc -n "$NAMESPACE" get "statefulset/${SOURCE_STATEFULSET}" --export -o json | jq '.metadata.name = (.metadata.name + "-x") | .spec.selector.matchLabels."statefulset" = .metadata.name | .spec.template.metadata.labels."statefulset" = .metadata.name | ( .spec.volumeClaimTemplates[] | .metadata.name) |= "postgresql" | ( .spec.template.spec.volumes[] | .persistentVolumeClaim.claimName) |= "postgresql" | .spec.replicas = 1 | del (.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration",.metadata.selfLink)' | jq '( .spec.volumeClaimTemplates[] | .spec.storageClassName ) |= "netapp-block-standard"' | jq '( .spec.volumeClaimTemplates[] | .metadata.annotations."volume.beta.kubernetes.io/storage-class" ) |= "netapp-block-standard"' | oc -n "$NAMESPACE" create -f - --save-config=true
waitUntilAllReady "statefulset/${TARGET_STATEFULSET}"
echo "Get onto cluster pods, and manually 'patronictl failover' to the temporary pod"
echo "Once failover is complete, manually run the following to remove the original statefulset. THIS IS DESTRUCTIVE!"
echo "oc delete -n $NAMESPACE statefulset/${SOURCE_STATEFULSET} --wait && oc delete -n $NAMESPACE pvc -l statefulset=${SOURCE_STATEFULSET} --wait"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment