rssnyder/readme.md

## readme.md

      
    Raw
  

              readme.md
            
          
    problem statement: scale a kubernetes delegate not based on cpu or memory, but on the number of tasks assigned to the delegates
what we need:

a delegate deployed to a cluster
a prometheus server running

delegate metrics should be getting scraped and sent to promethes


the prometheus adapter should be installed in the cluster

this allows us to use prometheus metrics as scaling metrics


validate that our prometheus metrics can be used by the cluster for scaling:

> kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/harness-delegate-ng/pods/*/io_harness_custom_metric_tasks_currently_executing" | jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "harness-delegate-ng",
        "name": "lab-5c9b7cf675-cvpw8",
        "apiVersion": "/v1"
      },
      "metricName": "io_harness_custom_metric_tasks_currently_executing",
      "timestamp": "2023-12-12T22:59:34Z",
      "value": "0",
      "selector": null
    }
  ]
}


limit your delegate to accepting a single task at a time

custom_envs:
- name: DELEGATE_TASK_CAPACITY
  value: "1"


create a HorizontalPodAutoscaler resouce to scale the pods of your delegate based on the tasks

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: lab-hpa
  namespace: harness-delegate-ng
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: lab
  minReplicas: 1
  maxReplicas: 15
  metrics:
  - type: Pods
    pods:
      metric:
        name: io_harness_custom_metric_tasks_currently_executing
      target:
        type: Value
        value: 0.5
        averageValue: 0.5

experiments:
x5 tasks of 65s each:

x5 tasks of 120s each:

x5 tasks of 240s each:
it is at this point as the 5/240 test is running that i realize when a task starts and it gets its list of delegates to poll to take the tasks, it never gets any new delegates added after the task was started. so by starting all five tasks at once, they all wait for the single delegate that was alive when the task starts.

now we need to adjust the tests so that there are a second batch of tasks that start only after the new delegate is online.
once i see a new delegate is connected, i trigger another batch of tests, and now i see new pods picking up tasks, the total number of concurrent running tasks increases, and we get some sort of "task based scaling"