Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save singlecheeze/f839372a72e4ab15dd9c1c5129f860f3 to your computer and use it in GitHub Desktop.
Save singlecheeze/f839372a72e4ab15dd9c1c5129f860f3 to your computer and use it in GitHub Desktop.
OCP Delete Leftover Device Plugin Devices

Links:

https://kubernetes.io/docs/tasks/administer-cluster/extended-resource-node/

kubernetes/kubernetes#53395

NVIDIA/k8s-device-plugin#240

https://kubernetes.io/docs/reference/kubectl/cheatsheet/

https://access.redhat.com/solutions/3986301

This may be helpful too if object is part of an array:
https://stackoverflow.com/questions/64355902/is-there-a-way-in-kubectl-patch-to-delete-a-specific-object-in-an-array-withou

How To:

Find device paths

[dave@lenovo ~]$ kubectl get nodes -o json | jq -c 'paths|join(".")' | grep nvidia
"items.4.status.capacity.nvidia.com/GA102GL_A40"
"items.4.status.capacity.nvidia.com/GP104_GEFORCE_GTX_1070_TI"
"items.4.status.capacity.nvidia.com/GP104_GEFORCE_GTX_1080"
"items.4.status.capacity.nvidia.com/GP104_GeForce_GTX_1070_Ti"
"items.4.status.capacity.nvidia.com/GP104_GeForce_GTX_1080"
"items.4.status.capacity.nvidia.com/gpu"

Start a proxy to the cluster

[dave@lenovo ~]$ oc proxy
Starting to serve on 127.0.0.1:8001

Get the nodes in the cluster

[dave@lenovo ~]$ oc get nodes
NAME                   STATUS                     ROLES           AGE   VERSION
r730ocp3.localdomain   Ready                      master,worker   63d   v1.24.6+5157800
r730ocp4.localdomain   Ready                      master,worker   63d   v1.24.6+5157800
r730ocp5.localdomain   Ready                      master,worker   63d   v1.24.6+5157800
trt2ocp1.localdomain   Ready,SchedulingDisabled   worker          63d   v1.24.6+5157800
trt2ocp2.localdomain   Ready                      worker          63d   v1.24.6+5157800

Describe the node you would like to remove a device from
Note In this case GP104_GEFORCE_GTX_1080 due to incorrect case, output shortened below

[dave@lenovo ~]$oc describe node trt2ocp1.localdomain
Name:               trt2ocp1.localdomain
Roles:              worker
Capacity:
  cpu:                                   48
  devices.kubevirt.io/kvm:               1k
  devices.kubevirt.io/sev:               1k
  devices.kubevirt.io/tun:               1k
  devices.kubevirt.io/vhost-net:         1k
  ephemeral-storage:                     243127276Ki
  hugepages-1Gi:                         0
  hugepages-2Mi:                         0
  memory:                                197792288Ki
  nvidia.com/GP104_GEFORCE_GTX_1080:     0
  nvidia.com/GP104_GeForce_GTX_1070_Ti:  3
  nvidia.com/GP104_GeForce_GTX_1080:     6
  pods:                                  250
Allocatable:
  cpu:                                   47500m
  devices.kubevirt.io/kvm:               1k
  devices.kubevirt.io/sev:               0
  devices.kubevirt.io/tun:               1k
  devices.kubevirt.io/vhost-net:         1k
  ephemeral-storage:                     224066097191
  hugepages-1Gi:                         0
  hugepages-2Mi:                         0
  memory:                                196641312Ki
  nvidia.com/GP104_GEFORCE_GTX_1080:     0
  nvidia.com/GP104_GeForce_GTX_1070_Ti:  3
  nvidia.com/GP104_GeForce_GTX_1080:     6
  pods:                                  250
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                              Requests     Limits
  --------                              --------     ------
  cpu                                   774m (1%)    0 (0%)
  memory                                1988Mi (1%)  0 (0%)
  ephemeral-storage                     0 (0%)       0 (0%)
  hugepages-1Gi                         0 (0%)       0 (0%)
  hugepages-2Mi                         0 (0%)       0 (0%)
  devices.kubevirt.io/kvm               0            0
  devices.kubevirt.io/sev               0            0
  devices.kubevirt.io/tun               0            0
  devices.kubevirt.io/vhost-net         0            0
  nvidia.com/GP104_GEFORCE_GTX_1080     0            0
  nvidia.com/GP104_GeForce_GTX_1070_Ti  0            0
  nvidia.com/GP104_GeForce_GTX_1080     0            0

Then issue REST commands (Notice the ~1 instead of the /, this is required for some reason)

[dave@lenovo ~] curl --header "Content-Type: application/json-patch+json" --request PATCH --data '[{"op": "remove", "path": "/status/capacity/nvidia.com~1GP104_GEFORCE_GTX_1080"}]' http://localhost:8001/api/v1/nodes/trt2ocp1.localdomain/status

Check to make sure the device is removed

[dave@lenovo ~]$ oc describe node trt2ocp1.localdomain
Name:               trt2ocp1.localdomain
Roles:              worker
Capacity:
  cpu:                                   48
  devices.kubevirt.io/kvm:               1k
  devices.kubevirt.io/sev:               1k
  devices.kubevirt.io/tun:               1k
  devices.kubevirt.io/vhost-net:         1k
  ephemeral-storage:                     243127276Ki
  hugepages-1Gi:                         0
  hugepages-2Mi:                         0
  memory:                                197792288Ki
  nvidia.com/GP104_GeForce_GTX_1070_Ti:  3
  nvidia.com/GP104_GeForce_GTX_1080:     6
  pods:                                  250
Allocatable:
  cpu:                                   47500m
  devices.kubevirt.io/kvm:               1k
  devices.kubevirt.io/sev:               0
  devices.kubevirt.io/tun:               1k
  devices.kubevirt.io/vhost-net:         1k
  ephemeral-storage:                     224066097191
  hugepages-1Gi:                         0
  hugepages-2Mi:                         0
  memory:                                196641312Ki
  nvidia.com/GP104_GeForce_GTX_1070_Ti:  3
  nvidia.com/GP104_GeForce_GTX_1080:     6
  pods:                                  250
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                              Requests     Limits
  --------                              --------     ------
  cpu                                   774m (1%)    0 (0%)
  memory                                1988Mi (1%)  0 (0%)
  ephemeral-storage                     0 (0%)       0 (0%)
  hugepages-1Gi                         0 (0%)       0 (0%)
  hugepages-2Mi                         0 (0%)       0 (0%)
  devices.kubevirt.io/kvm               0            0
  devices.kubevirt.io/sev               0            0
  devices.kubevirt.io/tun               0            0
  devices.kubevirt.io/vhost-net         0            0
  nvidia.com/GP104_GeForce_GTX_1070_Ti  0            0
  nvidia.com/GP104_GeForce_GTX_1080     0            0

Examples:

curl --header "Content-Type: application/json-patch+json" --request PATCH --data '[{"op": "remove", "path": "/status/capacity/nvidia.com~1GP104_GEFORCE_GTX_1080"}]' http://localhost:8001/api/v1/nodes/trt2ocp2.localdomain/status
curl --header "Content-Type: application/json-patch+json" --request PATCH --data '[{"op": "remove", "path": "/status/capacity/nvidia.com~1GP104_GEFORCE_GTX_1070_TI"}]' http://localhost:8001/api/v1/nodes/trt2ocp2.localdomain/status
curl --header "Content-Type: application/json-patch+json" --request PATCH --data '[{"op": "remove", "path": "/status/capacity/nvidia.com~1gpu"}]' http://localhost:8001/api/v1/nodes/trt2ocp2.localdomain/status
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment