Skip to content

Instantly share code, notes, and snippets.

@jbw976
Last active August 22, 2021 22:34
Show Gist options
  • Save jbw976/3cc562719780b8d6bcb83f0577354bfa to your computer and use it in GitHub Desktop.
Save jbw976/3cc562719780b8d6bcb83f0577354bfa to your computer and use it in GitHub Desktop.
OSD removal workaround
Remove k8s-nvme-01.acme.org node from Rook orchestration
delete OSD replica set, this will also stop/kill the OSD pod
kubectl -n rook delete replicaset rook-ceph-osd-k8s-nvme-01.acme.org
remove entry for k8s-nvme-01.acme.org node from the orchestration status map
kubectl -n rook edit cm rook-ceph-osd-orchestration-status
delete all node's OSD config maps: rook-ceph-osd-XX-fs-backup (93,94,95,96,97,98,99,100,101). example:
kubectl -n rook delete cm rook-ceph-osd-93-fs-backup
delete node's config map:
kubectl -n rook delete cm rook-ceph-osd-k8s-nvme-01.acme.org-config
In rook toolbox, purge all OSDs for node from ceph book-keeping too (93,94,95,96,97,98,99,100,101)
- purge OSDs: remove from crush map (osd crush rm), auth delete (auth del), remove osd (osd rm)
ceph osd crush rm osd.93
ceph auth del osd.93
ceph osd rm 93
In toolbox, remove entire node from crush map
ceph osd crush rm k8s-nvme-01-acme-org
- clean up any local host data path folders on node
rm -fr /var/lib/rook/osdXX
Edit cluster CRD to add k8s-nvme-01.acme.org node and its disks back in
Restart Rook operator, make sure cluster orchestration gets to fully completed
Wait for node to be fully added to cluster, cluster healthy and rebalanced
@chrisjsimpson
Copy link

chrisjsimpson commented Aug 22, 2021

Restart Rook operator, make sure cluster orchestration gets to fully completed

# Restart the operator to ensure devices are configured. A new pod will automatically be started when the current operator pod is deleted.
kubectl -n rook-ceph delete pod -l app=rook-ceph-operator

[...] see ceph common issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment