Last active
September 8, 2023 08:35
-
-
Save hkhelif/e9139bc5a909f3d20ca3337e1d2aa33e to your computer and use it in GitHub Desktop.
Some benefits of Multy NGs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Restricting tenant workloads to run on specific nodes can be used to increase isolation in the soft multi-tenancy model. With this approach, tenant-specific workloads are only run on nodes provisioned for the respective tenants. To achieve this isolation, native Kubernetes properties (node affinity, and taints and tolerations) are used to target specific nodes for pod scheduling, and prevent pods, from other tenants, from being scheduled on the tenant-specific nodes. [1] | |
EKS Nodegroups or node pool (in GKE and AKS) is a group of nodes within a cluster that all have the same configuration. The diagram bellow [multi-ng] shows one of the possible use cases, Assuming we have multiple workloads that require different configuration and might have different requirements. In our case we have two nodegroups "application" and "management", both serve different purposes as follow: | |
* Management: all pods that are used for cluster cluster operations such as CA, Prometheus, etc. This nodegroup does not require the same beefy instances that our application needs, hence saves us cost. | |
* Prod: all our application production code will be running here and we expect this nodegroup to scale in and out several times throughout the day. | |
### Deploying ng | |
Using `eksctl` we can deploy the following two simple nodegroups with different compute size and give them some labels, please note that when using an EKS managed nodegroup you won't have to create a custom label to name your labels AWS will automatically add the following label `eks.amazonaws.com/nodegroup=<your-ng-name-here>` | |
Please note that we won't cover how to install or setup CA, in the bellow example we are auto-discovering the ASGs, you can read more about it here [5]. | |
``` | |
managedNodeGroups: | |
- name: 'management' | |
instanceType: t3.medium | |
desiredCapacity: 1 | |
minSize: 1 | |
maxSize: 5 | |
labels: { | |
nodeType: management, | |
nodeOwner: ops | |
} | |
- name: 'prod' | |
instanceType: c5.large | |
desiredCapacity: 1 | |
minSize: 1 | |
maxSize: 5 | |
labels: { | |
nodeType: prod, | |
nodeTeam: dev | |
} | |
``` | |
Now we can simply deploy CA to our management nodegroup by adding `nodeSelector` [3] to the deployment: | |
``` | |
nodeSelector: | |
nodeType: management | |
``` | |
Now if we look at our cluster we can see that the CA was scheduled on a node with the `management` label | |
❯ kubectl get node -l nodeType=management | |
NAME STATUS ROLES AGE VERSION | |
ip-10-0-1-252.us-west-1.compute.internal Ready <none> 99m v1.25.12-eks-8ccc7ba | |
❯ kubectl get pods -A -o wide | grep ip-10-0-1-252.us-west-1.compute.internal | |
kube-system aws-node-ld6rw 1/1 Running 0 101m 10.0.1.252 ip-10-0-1-252.us-west-1.compute.internal <none> <none> | |
kube-system cluster-autoscaler-65b4f87bc7-mqmx9 1/1 Running 0 73m 10.0.0.104 ip-10-0-1-252.us-west-1.compute.internal <none> <none> | |
kube-system kube-proxy-8xgv8 1/1 Running 0 101m 10.0.1.252 ip-10-0-1-252.us-west-1.compute.internal <none> <none> | |
We now can deploy our application and we need to make sure we have the right `nodeSelector` to make sure its scheduled on the right node type | |
``` | |
nodeSelector: | |
nodeType: prod | |
``` | |
We can see that our app is getting deployed correctly but one interesting thing is shown on the CA logs: | |
``` | |
I0908 07:05:15.366281 1 auto_scaling_groups.go:418] Extracted autoscaling options from "eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6" ASG tags: map[] | |
I0908 07:10:47.662171 1 scale_up.go:93] Pod nginx-deployment-7d6dc585d4-7vspz can't be scheduled on eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo= | |
I0908 07:10:47.662184 1 scale_up.go:95] 4 other pods similar to nginx-deployment-7d6dc585d4-7vspz can't be scheduled on eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6 | |
I0908 07:10:47.662237 1 scale_up.go:262] No pod can fit to eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6 | |
``` | |
As shown above, CA is aware that our app cannot be ran on the management nodegroup hence why we are not seeing any scaling happening on the `management` ASG it will then look for the other ASGs it has discovered and will start scaling out as show bellow: | |
``` | |
I0908 07:10:47.662554 1 waste.go:55] Expanding Node Group eks-prod-f6c539c0-de12-8494-3771-47b034529feb would waste 100.00% CPU, 100.00% Memory, 100.00% Blended | |
I0908 07:10:47.662570 1 scale_up.go:282] Best option to resize: eks-prod-f6c539c0-de12-8494-3771-47b034529feb | |
I0908 07:10:47.662601 1 scale_up.go:286] Estimated 1 nodes needed in eks-prod-f6c539c0-de12-8494-3771-47b034529feb | |
``` | |
Not only we have isolated which workload is assigned to which nodegroup, we made sure their scaling is also working properly, this will allow our infrastructure to mitigate down times in case we have management pods on which we rely-on and don't want to have them disturbed in case they are scheduled on Nodes that are scaling in and are being shutdown which might cause them to be rescheduled on a different node. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
References:
[1] - Isolating tenant workloads to specific nodes
[2] - Taints and Tolerations
[3] - Assigning Pods to Nodes
[4] - Cluster Autoscaler on AWS
[5] - Labels and Selectors
[6] -