Skip to content

Instantly share code, notes, and snippets.

@andy108369
Last active September 15, 2022 15:51
Show Gist options
  • Save andy108369/cd3ab76884f9006611a2becb4b3ccb4f to your computer and use it in GitHub Desktop.
Save andy108369/cd3ab76884f9006611a2becb4b3ccb4f to your computer and use it in GitHub Desktop.

Migration from akash-rook to the upstream rook-ceph Helm Charts

Impact: Akash deployments using Persistent storage will temporarily stall due to having their I/O stuck to the RBD mounted devices.

1. Take a snapshot of Ceph Mon locations

This will be needed in later steps.

kubectl -n rook-ceph get pods -l "app=rook-ceph-mon" -o wide

Example:

$ kubectl -n rook-ceph get pods -l "app=rook-ceph-mon" -o wide
NAME                                                              READY   STATUS      RESTARTS      AGE   IP              NODE                                NOMINATED NODE   READINESS GATES
rook-ceph-mon-a-5d987bf6bf-cnwmj                                  1/1     Running     0             30d   10.233.188.43   k8s-node-5.provider-0.prod.ams1     <none>           <none>
rook-ceph-mon-b-5f4d9dd5cb-tr5jr                                  1/1     Running     0             30d   10.233.40.12    k8s-node-6.provider-0.prod.ams1     <none>           <none>
rook-ceph-mon-c-857b7f649-78q54                                   1/1     Running     0             30d   10.233.232.37   k8s-node-7.provider-0.prod.ams1     <none>           <none>

2. Backup Ceph Mon configmap, secrets, service IPs

  • configmap:
kubectl -n rook-ceph get cm rook-ceph-mon-endpoints -o json | jq  -r 'del(.metadata.resourceVersion, .metadata.uid, .metadata.selfLink, .metadata.creationTimestamp, .metadata.annotations, .metadata.generation, .metadata.ownerReferences)' > ceph-mon-cm.json
  • secrets:
kubectl -n rook-ceph get secret rook-ceph-mon -o json | jq  -r 'del(.metadata.resourceVersion, .metadata.uid, .metadata.selfLink, .metadata.creationTimestamp, .metadata.annotations, .metadata.generation, .metadata.ownerReferences)' > ceph-mon-secret.json 
  • service IPs:
kubectl -n rook-ceph get svc -l ceph_daemon_type=mon -o json | jq  -r 'del(.items[].metadata.resourceVersion, .items[].metadata.uid, .items[].metadata.selfLink, .items[].metadata.creationTimestamp, .items[].metadata.annotations, .items[].metadata.generation, .items[].metadata.ownerReferences)' > ceph-mon-svc.json

3. Save current akash-rook helm-chart values

You should have rook.yaml with these values but if you lost it, you can still see the vaules using this command:

helm -n akash-services get values akash-rook

Example:

useAllDevices: false
deviceFilter: "^nvme[01]n1"
osdsPerDevice: 5
persistent_storage:
  class: beta3
nodes:
  - name: "k8s-node-5.provider-0.prod.ams1"
    config: ""
  - name: "k8s-node-6.provider-0.prod.ams1"
    config: ""
  - name: "k8s-node-7.provider-0.prod.ams1"
    config: ""

4. Prepare Ceph Cluster config

  • update deviceFilter to match your disks;
  • change storageClass name from beta3 to one you are planning to use based on this table;
  • update osdsPerDevice based on this table
  • add your nodes you want the Ceph storage to use the disks on under the nodes section;

Note that the downstream akash-rook helm chart had the default pool size & min_size set to 1 both, this is not the best production configuration.
Refer to https://docs.akash.network/providers/build-a-cloud-provider/helm-based-provider-persistent-storage-enablement/deploy-persistent-storage for the production configuration.
Having that the failureDomain is host, increasing the size & min_size from 1 higher values might not work for you if you do not have at least 3 hosts with the storage dedicated to Ceph. In this case simply set these values back to 1 until you figure your underlying architecture.

Example:

cat > rook-ceph-cluster.values.yml << 'EOF'
operatorNamespace: rook-ceph

configOverride: |
  [global]
  osd_pool_default_pg_autoscale_mode = on
  osd_pool_default_size = 3
  osd_pool_default_min_size = 2

cephClusterSpec:
  #resources:

  cephVersion:
    # https://quay.io/repository/ceph/ceph?tab=tags&tag=latest
    # IMPORTANT:
    # - the upstream rook-ceph uses ceph v16, however the downstream akash-rook chart brought ceph v17 and you can't downgrade it back to v16.
    # - ceph v17 will be a default in rook-ceph charts v1.10, so stick to v17 manually until then:
    image: quay.io/ceph/ceph:v17.2.0

  mon:
    count: 3
  mgr:
    count: 2

  storage:
    useAllNodes: false
    useAllDevices: false
    deviceFilter: "^nvme[01]n1"
    config:
      osdsPerDevice: "2"
    nodes:
    - name: "k8s-node-5.provider-0.prod.ams1"
      config:
    - name: "k8s-node-6.provider-0.prod.ams1"
      config:
    - name: "k8s-node-7.provider-0.prod.ams1"
      config:

cephBlockPools:
  - name: akash-deployments
    spec:
      failureDomain: host
      replicated:
        size: 3
      parameters:
        min_size: "2"
        bulk: "true"
    storageClass:
      enabled: true
      name: beta3
      isDefault: true
      reclaimPolicy: Delete
      allowVolumeExpansion: true
      parameters:
        # RBD image format. Defaults to "2".
        imageFormat: "2"
        # RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
        imageFeatures: layering
        # The secrets contain Ceph admin credentials.
        csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
        csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
        csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
        csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
        csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
        csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
        # Specify the filesystem type of the volume. If not specified, csi-provisioner
        # will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
        # in hyperconverged settings where the volume is mounted on the same node as the osds.
        csi.storage.k8s.io/fstype: ext4

  - name: akash-nodes
    spec:
      failureDomain: host
      replicated:
        size: 3
      parameters:
        min_size: "2"
    storageClass:
      enabled: true
      name: akash-nodes
      isDefault: false
      reclaimPolicy: Delete
      allowVolumeExpansion: true
      parameters:
        # RBD image format. Defaults to "2".
        imageFormat: "2"
        # RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
        imageFeatures: layering
        # The secrets contain Ceph admin credentials.
        csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
        csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
        csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
        csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
        csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
        csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
        # Specify the filesystem type of the volume. If not specified, csi-provisioner
        # will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
        # in hyperconverged settings where the volume is mounted on the same node as the osds.
        csi.storage.k8s.io/fstype: ext4

# Do not create default Ceph file systems, object stores
cephFileSystems:
cephObjectStores:

# Spawn rook-ceph-tools, useful for troubleshooting
toolbox:
  enabled: true
  #resources:
EOF

5. Uninstall the akash-rook helm-chart

helm uninstall -n akash-services akash-rook

6. Uninstall the akash-rook leftovers

helm uninstall won't remove everything, there are some bits you have to remove manually.

What you need to remove will be shown with this command:

kubectl api-resources --verbs=list --namespaced -o name | grep -v ^events | xargs -r -n 1 kubectl get --show-kind --ignore-not-found -n rook-ceph

Example:

$ kubectl api-resources --verbs=list --namespaced -o name | grep -v ^events | xargs -r -n 1 kubectl get --show-kind --ignore-not-found -n rook-ceph
NAME                                DATA   AGE
configmap/rook-ceph-mon-endpoints   4      26d
NAME                   TYPE                 DATA   AGE
secret/rook-ceph-mon   kubernetes.io/rook   4      26d
NAME                                           PHASE
cephblockpool.ceph.rook.io/akash-deployments   Ready
cephblockpool.ceph.rook.io/akash-nodes         Ready
NAME                                 DATADIRHOSTPATH   MONCOUNT   AGE   PHASE      MESSAGE                    HEALTH      EXTERNAL
cephcluster.ceph.rook.io/rook-ceph   /var/lib/rook     3          26d   Deleting   Deleting the CephCluster   HEALTH_OK   

It usually comes down to removing the following resources:

kubectl -n rook-ceph delete --wait=false cephblockpool akash-deployments
kubectl -n rook-ceph delete --wait=false cephblockpool akash-nodes

kubectl patch -n rook-ceph CephBlockPool akash-deployments --type merge -p '{"metadata":{"finalizers": []}}'
kubectl patch -n rook-ceph CephBlockPool akash-nodes --type merge -p '{"metadata":{"finalizers": []}}'

kubectl patch -n rook-ceph cm rook-ceph-mon-endpoints --type merge -p '{"metadata":{"finalizers": []}}'
kubectl patch -n rook-ceph secret rook-ceph-mon --type merge -p '{"metadata":{"finalizers": []}}'
kubectl patch -n rook-ceph cephclusters rook-ceph --type merge -p '{"metadata":{"finalizers": []}}'

kubectl get crd -o json | jq -r '.items[].metadata.name' | grep -E 'ceph.rook.io|objectbucket.io' | xargs -r -I@ sh -c "echo == @ ==; kubectl get @ -A"
kubectl get crd -o json | jq -r '.items[].metadata.name' | grep -E 'ceph.rook.io|objectbucket.io' | xargs -r -I@ kubectl delete --wait=false crd @

kubectl delete ns rook-ceph

If some pods cannot get removed but you have checked and sure there are no underlying containers running on the target system (crictl ps | grep <containerID>), then you can remove them with force:

kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>

7. Restore the Ceph Mon configmap / secrets / service IPs

  • create the rook-ceph namespace first:
kubectl create ns rook-ceph
  • configmap:
kubectl apply -f ceph-mon-cm.json
  • secrets:
kubectl apply -f ceph-mon-secret.json
  • service IPs:
kubectl apply -f ceph-mon-svc.json

If that won't work for some reason, try kubectl replace --wait=false --force -f ceph-mon-svc.json

8. Install the Ceph Operator chart

helm upgrade --install --create-namespace -n rook-ceph rook-ceph rook-release/rook-ceph --version 1.9.4

9. Install the Ceph Cluster chart

helm upgrade --install --create-namespace -n rook-ceph rook-ceph-cluster \
   --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster --version 1.9.4 -f rook-ceph-cluster.values.yml

10. Set the Ceph Mon back to their nodes

First, check whether the Ceph Mons are running on their respective nodes:

kubectl -n rook-ceph get pods -l 'ceph_daemon_type=mon' -o wide

And if the way they are running is different from the taken a snapshot in the 1st step of this guide, then set them back using the following commands (replace the hostnames with yours):

kubectl -n rook-ceph patch deployment rook-ceph-mon-a -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "k8s-node-5.provider-0.prod.ams1"}}}}}'
kubectl -n rook-ceph patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "k8s-node-6.provider-0.prod.ams1"}}}}}'
kubectl -n rook-ceph patch deployment rook-ceph-mon-c -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "k8s-node-7.provider-0.prod.ams1"}}}}}'

11. Update the .mgr pool size

Set it to anything higher than 1:

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph osd pool set .mgr size 3

This it to make sure Ceph Operator has no issues updating the OSD belonging to the .mgr pool's PG as well as for sake of higher redundancy/availability.

12. Check the Ceph cluster status

kubectl -n rook-ceph get cephclusters
kubectl -n rook-ceph describe cephclusters

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph status
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph health detail
kubectl get events --sort-by='.metadata.creationTimestamp' -A -w
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-mon,ceph_daemon_id=a" -o jsonpath='{.items[0].metadata.name}') --tail=20 -f
kubectl -n rook-ceph logs $(kubectl -n rook-ceph get pod -l "app=rook-ceph-operator" -o jsonpath='{.items[0].metadata.name}') --tail=20 -f

13. Label the storageClass

This label is mandatory and is used by the Akash's inventory-operator for searching the storageClass.

  • change beta3 to your storageClass you have picked before;
kubectl label sc akash-nodes akash.network=true
kubectl label sc beta3 akash.network=true

14. Upgrade Ceph from 1.9.4 to 1.9.9

helm upgrade --install --create-namespace -n rook-ceph rook-ceph rook-release/rook-ceph --version 1.9.9

helm upgrade --install --create-namespace -n rook-ceph rook-ceph-cluster \
   --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster --version 1.9.9 -f rook-ceph-cluster.values.yml

Check the Ceph cluster status again.

Troubleshooting

  • If ceph commands are not responding you may need to bounce the Ceph services:
kubectl -n rook-ceph delete pod -l "app=rook-ceph-mon"
kubectl -n rook-ceph delete --wait=false pod -l "app=rook-ceph-tools"
kubectl -n rook-ceph delete --wait=false pod -l "app=rook-ceph-operator"

If you forgot to backup something and have entirely removed your previous akash-rook / rook-ceph, then you still have a way to recover by following Ceph Service Recovery Procedure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment