Impact: Akash deployments using Persistent storage will temporarily stall due to having their I/O stuck to the RBD mounted devices.
This will be needed in later steps.
kubectl -n rook-ceph get pods -l "app=rook-ceph-mon" -o wide
Example:
$ kubectl -n rook-ceph get pods -l "app=rook-ceph-mon" -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-mon-a-5d987bf6bf-cnwmj 1/1 Running 0 30d 10.233.188.43 k8s-node-5.provider-0.prod.ams1 <none> <none>
rook-ceph-mon-b-5f4d9dd5cb-tr5jr 1/1 Running 0 30d 10.233.40.12 k8s-node-6.provider-0.prod.ams1 <none> <none>
rook-ceph-mon-c-857b7f649-78q54 1/1 Running 0 30d 10.233.232.37 k8s-node-7.provider-0.prod.ams1 <none> <none>
- configmap:
kubectl -n rook-ceph get cm rook-ceph-mon-endpoints -o json | jq -r 'del(.metadata.resourceVersion, .metadata.uid, .metadata.selfLink, .metadata.creationTimestamp, .metadata.annotations, .metadata.generation, .metadata.ownerReferences)' > ceph-mon-cm.json
- secrets:
kubectl -n rook-ceph get secret rook-ceph-mon -o json | jq -r 'del(.metadata.resourceVersion, .metadata.uid, .metadata.selfLink, .metadata.creationTimestamp, .metadata.annotations, .metadata.generation, .metadata.ownerReferences)' > ceph-mon-secret.json
- service IPs:
kubectl -n rook-ceph get svc -l ceph_daemon_type=mon -o json | jq -r 'del(.items[].metadata.resourceVersion, .items[].metadata.uid, .items[].metadata.selfLink, .items[].metadata.creationTimestamp, .items[].metadata.annotations, .items[].metadata.generation, .items[].metadata.ownerReferences)' > ceph-mon-svc.json
You should have rook.yaml
with these values but if you lost it, you can still see the vaules using this command:
helm -n akash-services get values akash-rook
Example:
useAllDevices: false
deviceFilter: "^nvme[01]n1"
osdsPerDevice: 5
persistent_storage:
class: beta3
nodes:
- name: "k8s-node-5.provider-0.prod.ams1"
config: ""
- name: "k8s-node-6.provider-0.prod.ams1"
config: ""
- name: "k8s-node-7.provider-0.prod.ams1"
config: ""
- update
deviceFilter
to match your disks; - change storageClass name from
beta3
to one you are planning to use based on this table; - update
osdsPerDevice
based on this table - add your nodes you want the Ceph storage to use the disks on under the
nodes
section;
Note that the downstream akash-rook helm chart had the default pool
size
&min_size
set to 1 both, this is not the best production configuration.
Refer to https://docs.akash.network/providers/build-a-cloud-provider/helm-based-provider-persistent-storage-enablement/deploy-persistent-storage for the production configuration.
Having that thefailureDomain
ishost
, increasing thesize
&min_size
from1
higher values might not work for you if you do not have at least 3 hosts with the storage dedicated to Ceph. In this case simply set these values back to1
until you figure your underlying architecture.
Example:
cat > rook-ceph-cluster.values.yml << 'EOF'
operatorNamespace: rook-ceph
configOverride: |
[global]
osd_pool_default_pg_autoscale_mode = on
osd_pool_default_size = 3
osd_pool_default_min_size = 2
cephClusterSpec:
#resources:
cephVersion:
# https://quay.io/repository/ceph/ceph?tab=tags&tag=latest
# IMPORTANT:
# - the upstream rook-ceph uses ceph v16, however the downstream akash-rook chart brought ceph v17 and you can't downgrade it back to v16.
# - ceph v17 will be a default in rook-ceph charts v1.10, so stick to v17 manually until then:
image: quay.io/ceph/ceph:v17.2.0
mon:
count: 3
mgr:
count: 2
storage:
useAllNodes: false
useAllDevices: false
deviceFilter: "^nvme[01]n1"
config:
osdsPerDevice: "2"
nodes:
- name: "k8s-node-5.provider-0.prod.ams1"
config:
- name: "k8s-node-6.provider-0.prod.ams1"
config:
- name: "k8s-node-7.provider-0.prod.ams1"
config:
cephBlockPools:
- name: akash-deployments
spec:
failureDomain: host
replicated:
size: 3
parameters:
min_size: "2"
bulk: "true"
storageClass:
enabled: true
name: beta3
isDefault: true
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
# RBD image format. Defaults to "2".
imageFormat: "2"
# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
imageFeatures: layering
# The secrets contain Ceph admin credentials.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
# in hyperconverged settings where the volume is mounted on the same node as the osds.
csi.storage.k8s.io/fstype: ext4
- name: akash-nodes
spec:
failureDomain: host
replicated:
size: 3
parameters:
min_size: "2"
storageClass:
enabled: true
name: akash-nodes
isDefault: false
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
# RBD image format. Defaults to "2".
imageFormat: "2"
# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
imageFeatures: layering
# The secrets contain Ceph admin credentials.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
# in hyperconverged settings where the volume is mounted on the same node as the osds.
csi.storage.k8s.io/fstype: ext4
# Do not create default Ceph file systems, object stores
cephFileSystems:
cephObjectStores:
# Spawn rook-ceph-tools, useful for troubleshooting
toolbox:
enabled: true
#resources:
EOF
helm uninstall -n akash-services akash-rook
helm uninstall
won't remove everything, there are some bits you have to remove manually.
What you need to remove will be shown with this command:
kubectl api-resources --verbs=list --namespaced -o name | grep -v ^events | xargs -r -n 1 kubectl get --show-kind --ignore-not-found -n rook-ceph
Example:
$ kubectl api-resources --verbs=list --namespaced -o name | grep -v ^events | xargs -r -n 1 kubectl get --show-kind --ignore-not-found -n rook-ceph
NAME DATA AGE
configmap/rook-ceph-mon-endpoints 4 26d
NAME TYPE DATA AGE
secret/rook-ceph-mon kubernetes.io/rook 4 26d
NAME PHASE
cephblockpool.ceph.rook.io/akash-deployments Ready
cephblockpool.ceph.rook.io/akash-nodes Ready
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
cephcluster.ceph.rook.io/rook-ceph /var/lib/rook 3 26d Deleting Deleting the CephCluster HEALTH_OK
It usually comes down to removing the following resources:
kubectl -n rook-ceph delete --wait=false cephblockpool akash-deployments
kubectl -n rook-ceph delete --wait=false cephblockpool akash-nodes
kubectl patch -n rook-ceph CephBlockPool akash-deployments --type merge -p '{"metadata":{"finalizers": []}}'
kubectl patch -n rook-ceph CephBlockPool akash-nodes --type merge -p '{"metadata":{"finalizers": []}}'
kubectl patch -n rook-ceph cm rook-ceph-mon-endpoints --type merge -p '{"metadata":{"finalizers": []}}'
kubectl patch -n rook-ceph secret rook-ceph-mon --type merge -p '{"metadata":{"finalizers": []}}'
kubectl patch -n rook-ceph cephclusters rook-ceph --type merge -p '{"metadata":{"finalizers": []}}'
kubectl get crd -o json | jq -r '.items[].metadata.name' | grep -E 'ceph.rook.io|objectbucket.io' | xargs -r -I@ sh -c "echo == @ ==; kubectl get @ -A"
kubectl get crd -o json | jq -r '.items[].metadata.name' | grep -E 'ceph.rook.io|objectbucket.io' | xargs -r -I@ kubectl delete --wait=false crd @
kubectl delete ns rook-ceph
If some pods cannot get removed but you have checked and sure there are no underlying containers running on the target system (crictl ps | grep <containerID>
), then you can remove them with force:
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
- create the rook-ceph namespace first:
kubectl create ns rook-ceph
- configmap:
kubectl apply -f ceph-mon-cm.json
- secrets:
kubectl apply -f ceph-mon-secret.json
- service IPs:
kubectl apply -f ceph-mon-svc.json
If that won't work for some reason, try
kubectl replace --wait=false --force -f ceph-mon-svc.json
helm upgrade --install --create-namespace -n rook-ceph rook-ceph rook-release/rook-ceph --version 1.9.4
helm upgrade --install --create-namespace -n rook-ceph rook-ceph-cluster \
--set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster --version 1.9.4 -f rook-ceph-cluster.values.yml
First, check whether the Ceph Mons are running on their respective nodes:
kubectl -n rook-ceph get pods -l 'ceph_daemon_type=mon' -o wide
And if the way they are running is different from the taken a snapshot in the 1st step of this guide, then set them back using the following commands (replace the hostnames with yours):
kubectl -n rook-ceph patch deployment rook-ceph-mon-a -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "k8s-node-5.provider-0.prod.ams1"}}}}}'
kubectl -n rook-ceph patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "k8s-node-6.provider-0.prod.ams1"}}}}}'
kubectl -n rook-ceph patch deployment rook-ceph-mon-c -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "k8s-node-7.provider-0.prod.ams1"}}}}}'
Set it to anything higher than 1
:
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd pool set .mgr size 3
This it to make sure Ceph Operator has no issues updating the OSD belonging to the .mgr
pool's PG as well as for sake of higher redundancy/availability.
kubectl -n rook-ceph get cephclusters
kubectl -n rook-ceph describe cephclusters
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph health detail
kubectl get events --sort-by='.metadata.creationTimestamp' -A -w
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-mon,ceph_daemon_id=a" -o jsonpath='{.items[0].metadata.name}') --tail=20 -f
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-operator" -o jsonpath='{.items[0].metadata.name}') --tail=20 -f
This label is mandatory and is used by the Akash's
inventory-operator
for searching the storageClass.
- change
beta3
to your storageClass you have picked before;
kubectl label sc akash-nodes akash.network=true
kubectl label sc beta3 akash.network=true
helm upgrade --install --create-namespace -n rook-ceph rook-ceph rook-release/rook-ceph --version 1.9.9
helm upgrade --install --create-namespace -n rook-ceph rook-ceph-cluster \
--set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster --version 1.9.9 -f rook-ceph-cluster.values.yml
Check the Ceph cluster status again.
- If
ceph
commands are not responding you may need to bounce the Ceph services:
kubectl -n rook-ceph delete pod -l "app=rook-ceph-mon"
kubectl -n rook-ceph delete --wait=false pod -l "app=rook-ceph-tools"
kubectl -n rook-ceph delete --wait=false pod -l "app=rook-ceph-operator"
If you forgot to backup something and have entirely removed your previous akash-rook / rook-ceph, then you still have a way to recover by following Ceph Service Recovery Procedure