Skip to content

Instantly share code, notes, and snippets.

@rssnyder
Last active December 13, 2023 00:45
Show Gist options
  • Save rssnyder/b550cdd644e1d01ba2e373a98df5ad4c to your computer and use it in GitHub Desktop.
Save rssnyder/b550cdd644e1d01ba2e373a98df5ad4c to your computer and use it in GitHub Desktop.
scaling kubernetes delegates based on tasks

problem statement: scale a kubernetes delegate not based on cpu or memory, but on the number of tasks assigned to the delegates

what we need:

  1. validate that our prometheus metrics can be used by the cluster for scaling:
> kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/harness-delegate-ng/pods/*/io_harness_custom_metric_tasks_currently_executing" | jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "harness-delegate-ng",
        "name": "lab-5c9b7cf675-cvpw8",
        "apiVersion": "/v1"
      },
      "metricName": "io_harness_custom_metric_tasks_currently_executing",
      "timestamp": "2023-12-12T22:59:34Z",
      "value": "0",
      "selector": null
    }
  ]
}
  1. limit your delegate to accepting a single task at a time
custom_envs:
- name: DELEGATE_TASK_CAPACITY
  value: "1"
  1. create a HorizontalPodAutoscaler resouce to scale the pods of your delegate based on the tasks
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: lab-hpa
  namespace: harness-delegate-ng
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: lab
  minReplicas: 1
  maxReplicas: 15
  metrics:
  - type: Pods
    pods:
      metric:
        name: io_harness_custom_metric_tasks_currently_executing
      target:
        type: Value
        value: 0.5
        averageValue: 0.5

experiments:

x5 tasks of 65s each: image

x5 tasks of 120s each: image

x5 tasks of 240s each:

it is at this point as the 5/240 test is running that i realize when a task starts and it gets its list of delegates to poll to take the tasks, it never gets any new delegates added after the task was started. so by starting all five tasks at once, they all wait for the single delegate that was alive when the task starts.

image

now we need to adjust the tests so that there are a second batch of tasks that start only after the new delegate is online.

once i see a new delegate is connected, i trigger another batch of tests, and now i see new pods picking up tasks, the total number of concurrent running tasks increases, and we get some sort of "task based scaling"

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment