This procedure is for removing all OSD off of a selected storage node in the environment with more than a single
storage node in the cluster and enough free disk space.
If you have a single storage node only, then you'll have to remove disk by disk.
If you have only a single disk, then you'll have to remove OSD by OSD, reclaiming the freed disk space in the VG (if that is your case)
using this and that hints or waiting until rook-ceph will support this.
-
Scale
osdsPerDevice
from3
to1
and apply the rook-ceph-cluster helm chart; -
Make sure the cluster is healthy:
kubectl -n rook-ceph get cephclusters
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0
Can be a useful command if you know what disks you want to remove from Ceph:
lsblk -J -n -o NAME /dev/nvme11n1 /dev/nvme10n1 /dev/nvme9n1 /dev/nvme8n1 /dev/nvme7n1 /dev/nvme6n1 | jq -r '.blockdevices[].children[].name' | sed 's/.*block--//g' | sed 's/--/-/g' | xargs -I@ grep ^ /var/lib/rook/rook-ceph/3184888a-5060-4684-a6c2-32509397948a_@/whoami | xargs
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd tree
Say you want to remove all OSDs from the specific node:
$ kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-13 27.94556 host <YOUR-NODE>
8 ssd 2.32880 osd.8 up 1.00000 1.00000
18 ssd 2.32880 osd.18 up 1.00000 1.00000
28 ssd 2.32880 osd.28 up 1.00000 1.00000
38 ssd 2.32880 osd.38 up 1.00000 1.00000
48 ssd 2.32880 osd.48 up 1.00000 1.00000
58 ssd 2.32880 osd.58 up 1.00000 1.00000
68 ssd 2.32880 osd.68 up 1.00000 1.00000
77 ssd 2.32880 osd.77 up 1.00000 1.00000
87 ssd 2.32880 osd.87 up 1.00000 1.00000
97 ssd 2.32880 osd.97 up 1.00000 1.00000
107 ssd 2.32880 osd.107 up 1.00000 1.00000
117 ssd 2.32880 osd.117 up 1.00000 1.00000
export OSDS="117 107 97 87 77 68 58 48 38 28 18 8"
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd out $OSDS
Wait until Ceph settles:
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
Verify the OSDs are safe to destroy first.
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd safe-to-destroy $OSDS
You should see this message: OSD(s) 8,18,... are safe to destroy without reducing data durability.
Stop if you do not see. It might take some time. If you still do not see the safe message, then rollback the change by setting the OSDs back (e.g. ceph osd in $OSDS
).
You might need to go disk-by-disk or osd-by-osd.
Verify the OSDs you are going to remove are on the right node:
for i in $OSDS; do kubectl -n rook-ceph get pods -l "app=rook-ceph-osd,osd=$i" -o wide; done
Remove the OSD deployment:
this will put these OSDs down
for i in $OSDS; do kubectl -n rook-ceph delete deployment -l "app=rook-ceph-osd,osd=$i"; done
Check OSDs are down now:
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd tree
If for whatever reason the OSDs are still up even after you have removed the deployment even if Ceph Operator is not running, you can manually put them down:
##kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd down $OSDS
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- sh -c "for i in $OSDS; do ceph osd purge \$i --force; done"
If these were the last OSDs and you aren't planning on using that host ever again, you can remove it from the crush map:
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd crush rm <old-host>
Look for the disks with ceph_bluestore
FSTYPE:
lsblk -f
If LVM was used, close the inactive by now (since OSDs have been stopped and purged) LVs and remove their VG / PV:
pvs
vgs
lvs
pvdisplay --noheadings -C -o pv_name,vg_name,pv_attr,vg_attr,lv_attr
lvdisplay --noheadings -C -o lv_name,vg_name,lv_attr
lvdisplay --noheadings -C -o lv_name,vg_name,lv_attr | grep ceph | grep -v -- '-ao-' | while read lv vg attr; do lvremove -y $vg/$lv; done
vgs
vgremove <empty vg name>
pvdisplay --noheadings -C -o pv_name,vg_name,pv_attr
pvremove /dev/<DISK>
lsblk -f
Wipe FS signatures off of the disk so it can be reused by Ceph again if you wish so:
wipefs -a /dev/<DISK>
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=1
After few moments Ceph will automatically create new OSDs over previously purged disks (if you have wipefs
'ed them or zero'ed them with dd
command [first 50Mi
usually is enough]).
- Wait for the Ceph Operator to settle down:
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-operator" -o jsonpath='{.items[0].metadata.name}') --tail=10 -f
- Check the OSDs and cluster health using the commands from the above.
If rook-ceph fails creating a new OSD with entity osd.<id> exists but key does not match
error, then it is likely because you have scaled Rook Operator down too early, leaving ceph auth leftovers.
Simply compare ceph auth ls
vs ceph osd tree
and ceph auth rm osd.<id>
those that are stale.
Clean the related disks again and bounce the Rook Ceph Operator.
If you still have the list of OSDs to remove expoted as OSDS variable, you can automate stale ceph auth removal using this command:
kubectl -n rook-ceph exec -i $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- sh -c "for i in $OSDS; do ceph auth rm osd.\$i; done"
Ref. rook/rook#10872
For those ones who are wondering how to run the ceph-bluestore-tool
on the Kubernetes rook-ceph environment, you could modify the deployment
of your OSDs to add it as an initContainer
:
- name: bluefs-bdev-expand
image: quay.io/ceph/ceph:v17.2.3
command:
- ceph-bluestore-tool
args:
- 'bluefs-bdev-expand'
- '--path'
- /var/lib/ceph/osd/ceph-<ID>
resources: {}
volumeMounts:
- name: rook-data
mountPath: /var/lib/rook
- name: rook-config-override
readOnly: true
mountPath: /etc/ceph
- name: rook-ceph-log
mountPath: /var/log/ceph
- name: rook-ceph-crash
mountPath: /var/lib/ceph/crash
- name: devices
mountPath: /dev
- name: run-udev
mountPath: /run/udev
- name: activate-osd
mountPath: /var/lib/ceph/osd/ceph-<ID>
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
runAsUser: 0
readOnlyRootFilesystem: false
Refs.