I've been playing with Rook Ceph, have been able to helm uninstall
it (all the K8s bits including ceph CRDs), and installing it back again without the data loss while having the Pods using the persistent storage (the RBD
).
The impact: the Akash deployments using persistent storage disks will hang for the time until Ceph services are restored.
They key locations which need to be preserved are:
/var/lib/rook/*
isn't removed when you uninstall akash-rook helm chart
/var/lib/rook/mon-a
;/var/lib/rook/rook-ceph
;rook-ceph-mon
secret;rook-ceph-mon-a
original service's IP address;rook-ceph-mon-endpoints
(as it needs to be set to the original Ceph Mon service IP);
Important: if you have more than one Ceph Mon, then take mon-b
, mon-c
, ... into account as well. (for service IP only; since monitors share the same key).
The process to restore Ceph after intentional or unintentional helm uninstall -n akash-services akash-rook
:
Key rule is: the data on the disk is right, the one in K8s is wrong.
# mon-a dir is available on the node which was running one of Ceph Monitors (either is good, they share the same key)
MON_KEY="$(cat /var/lib/rook/mon-a/data/keyring | grep key | awk '{print $NF}' | tr -d '\n' | openssl base64 -A)"
CLIENT_ADMIN_PASS="$(cat /var/lib/rook/rook-ceph/client.admin.keyring | grep key | awk '{print $NF}' | tr -d '\n' | openssl base64 -A)"
CEPH_CLUSTER_FSID="$(grep fsid /var/lib/rook/rook-ceph/rook-ceph.config | awk '{print $NF}' | tr -d '\n' | openssl base64 -A)"
kubectl -n rook-ceph patch secret rook-ceph-mon -p '{"data":{"ceph-secret":"'"${CLIENT_ADMIN_PASS}"'", "ceph-username":"Y2xpZW50LmFkbWlu", "fsid":"'"${CEPH_CLUSTER_FSID}"'", "mon-secret":"'"${MON_KEY}"'"}}'
You can also see what IP's RBD clients are connected to by running
grep ^ /sys/devices/rbd/*/*
on the client nodes.
Find the mon host
IP:
# cat /var/lib/rook/rook-ceph/rook-ceph.config
[global]
fsid = d0e99a74-3127-4b99-91cc-500b701805ad
mon initial members = a
mon host = [v2:10.233.60.225:3300,v1:10.233.60.225:6789]
...
in my case the ceph mon "a" has this IP:
10.233.60.225
. (and the new & wrong IP was10.233.19.44
).
On the system with the three Ceph Monitors the output will be following:
mon initial members = a b c
mon host = [v2:10.232.72.99:3300,v1:10.232.72.99:6789],[v2:10.232.172.25:3300,v1:10.232.172.25:6789],[v2:10.232.1.25:3300,v1:10.232.1.25:6789]
kubectl -n rook-ceph patch svc rook-ceph-mon-a --type merge -p '{"spec":{"clusterIP":"10.233.60.225","clusterIPs":["10.233.60.225"]}}' --dry-run=client -o yaml | kubectl replace --force -f -
for Mon "B" change
-mon-a
to-mon-b
kubectl -n rook-ceph get cm rook-ceph-mon-endpoints -o yaml | sed -E 's~(a=)[^ :;]+(:?\d*)~\1'10.233.60.225'\2~' | kubectl apply -f -
for Mon "B" change
(a=)
to(b=)
kubectl -n rook-ceph delete pod -l "app=rook-ceph-mon,ceph_daemon_id=a"
for Mon "B" change
ceph_daemon_id=a
toceph_daemon_id=b
Repeat the above steps for Ceph Mon "B", "C".. if you have more than one Ceph Mon.
kubectl -n rook-ceph delete pod -l "app=rook-ceph-operator"
kubectl -n rook-ceph delete pod -l "app=rook-ceph-tools"
kubectl -n rook-ceph get cephclusters
kubectl -n rook-ceph describe cephclusters
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- bash
ceph status
ceph health detail
ceph osd tree
ceph df
ceph osd df
kubectl get events --sort-by='.metadata.creationTimestamp' -A
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-mon,ceph_daemon_id=a" -o jsonpath='{.items[0].metadata.name}') --tail=20 -f
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-operator" -o jsonpath='{.items[0].metadata.name}') --tail=20 -f
That's all. Ceph Rook Operator will start the OSD's back again, the Akash deployments using Persistent storage will automagically unfreeze.