Skip to content

Instantly share code, notes, and snippets.

@andy108369
Created August 30, 2022 14:08
Show Gist options
  • Save andy108369/b9f160fcc3026a5344d78ae131921f33 to your computer and use it in GitHub Desktop.
Save andy108369/b9f160fcc3026a5344d78ae131921f33 to your computer and use it in GitHub Desktop.

Ceph Service Recovery Procedure

I've been playing with Rook Ceph, have been able to helm uninstall it (all the K8s bits including ceph CRDs), and installing it back again without the data loss while having the Pods using the persistent storage (the RBD).
The impact: the Akash deployments using persistent storage disks will hang for the time until Ceph services are restored.

They key locations which need to be preserved are:

/var/lib/rook/* isn't removed when you uninstall akash-rook helm chart

  • /var/lib/rook/mon-a;
  • /var/lib/rook/rook-ceph;
  • rook-ceph-mon secret;
  • rook-ceph-mon-a original service's IP address;
  • rook-ceph-mon-endpoints (as it needs to be set to the original Ceph Mon service IP);

Important: if you have more than one Ceph Mon, then take mon-b, mon-c, ... into account as well. (for service IP only; since monitors share the same key).

The process to restore Ceph after intentional or unintentional helm uninstall -n akash-services akash-rook:

Key rule is: the data on the disk is right, the one in K8s is wrong.

1. Reconstruct rook-ceph-mon secret

# mon-a dir is available on the node which was running one of Ceph Monitors (either is good, they share the same key)
MON_KEY="$(cat /var/lib/rook/mon-a/data/keyring | grep key | awk '{print $NF}' | tr -d '\n' | openssl base64 -A)"

CLIENT_ADMIN_PASS="$(cat /var/lib/rook/rook-ceph/client.admin.keyring | grep key | awk '{print $NF}' | tr -d '\n' | openssl base64 -A)"
CEPH_CLUSTER_FSID="$(grep fsid /var/lib/rook/rook-ceph/rook-ceph.config | awk '{print $NF}' | tr -d '\n' | openssl base64 -A)"

kubectl -n rook-ceph patch secret rook-ceph-mon -p '{"data":{"ceph-secret":"'"${CLIENT_ADMIN_PASS}"'", "ceph-username":"Y2xpZW50LmFkbWlu", "fsid":"'"${CEPH_CLUSTER_FSID}"'", "mon-secret":"'"${MON_KEY}"'"}}'

2. Find the original Ceph Mon service IP

You can also see what IP's RBD clients are connected to by running grep ^ /sys/devices/rbd/*/* on the client nodes.

Find the mon host IP:

# cat /var/lib/rook/rook-ceph/rook-ceph.config
[global]
fsid                           = d0e99a74-3127-4b99-91cc-500b701805ad
mon initial members            = a
mon host                       = [v2:10.233.60.225:3300,v1:10.233.60.225:6789]
...

in my case the ceph mon "a" has this IP: 10.233.60.225. (and the new & wrong IP was 10.233.19.44).

On the system with the three Ceph Monitors the output will be following:

mon initial members            = a b c
mon host                       = [v2:10.232.72.99:3300,v1:10.232.72.99:6789],[v2:10.232.172.25:3300,v1:10.232.172.25:6789],[v2:10.232.1.25:3300,v1:10.232.1.25:6789]

3. Patch the Ceph Mon "A" service IP

kubectl -n rook-ceph patch svc rook-ceph-mon-a --type merge -p '{"spec":{"clusterIP":"10.233.60.225","clusterIPs":["10.233.60.225"]}}' --dry-run=client -o yaml | kubectl replace --force -f -

for Mon "B" change -mon-a to -mon-b

4. Patch rook-ceph-mon-endpoints ConfigMap

kubectl -n rook-ceph get cm rook-ceph-mon-endpoints -o yaml | sed -E 's~(a=)[^ :;]+(:?\d*)~\1'10.233.60.225'\2~' | kubectl apply -f -

for Mon "B" change (a=) to (b=)

5. Bounce the Ceph Mon

kubectl -n rook-ceph delete pod -l "app=rook-ceph-mon,ceph_daemon_id=a"

for Mon "B" change ceph_daemon_id=a to ceph_daemon_id=b

6. Repeat for the rest of Ceph Monitors

Repeat the above steps for Ceph Mon "B", "C".. if you have more than one Ceph Mon.

7. Bounce Ceph Operator & Ceph Tools

kubectl -n rook-ceph delete pod -l "app=rook-ceph-operator"
kubectl -n rook-ceph delete pod -l "app=rook-ceph-tools" 

8. Check if things look good

kubectl -n rook-ceph get cephclusters
kubectl -n rook-ceph describe cephclusters

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- bash
ceph status
ceph health detail
ceph osd tree
ceph df
ceph osd df
kubectl get events --sort-by='.metadata.creationTimestamp' -A
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-mon,ceph_daemon_id=a" -o jsonpath='{.items[0].metadata.name}') --tail=20 -f
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-operator" -o jsonpath='{.items[0].metadata.name}') --tail=20 -f

That's all. Ceph Rook Operator will start the OSD's back again, the Akash deployments using Persistent storage will automagically unfreeze.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment