hkhelif/gist:e9139bc5a909f3d20ca3337e1d2aa33e

## gistfile1.txt
Restricting tenant workloads to run on specific nodes can be used to increase isolation in the soft multi-tenancy model. With this approach, tenant-specific workloads are only run on nodes provisioned for the respective tenants. To achieve this isolation, native Kubernetes properties (node affinity, and taints and tolerations) are used to target specific nodes for pod scheduling, and prevent pods, from other tenants, from being scheduled on the tenant-specific nodes. [1]

EKS Nodegroups or node pool (in GKE and AKS) is a group of nodes within a cluster that all have the same configuration. The diagram bellow [multi-ng] shows one of the possible use cases, Assuming we have multiple workloads that require different configuration and might have different requirements. In our case we have two nodegroups "application" and "management", both serve different purposes as follow:

* Management: all pods that are used for cluster cluster operations such as CA, Prometheus, etc. This nodegroup does not require the same beefy instances that our application needs, hence saves us cost.

* Prod: all our application production code will be running here and we expect this nodegroup to scale in and out several times throughout the day.

### Deploying ng

Using `eksctl` we can deploy the following two simple nodegroups with different compute size and give them some labels, please note that when using an EKS managed nodegroup you won't have to create a custom label to name your labels AWS will automatically add the following label `eks.amazonaws.com/nodegroup=<your-ng-name-here>`

Please note that we won't cover how to install or setup CA, in the bellow example we are auto-discovering the ASGs, you can read more about it here [5].


```
managedNodeGroups:
- name: 'management'
  instanceType: t3.medium
  desiredCapacity: 1
  minSize: 1
  maxSize: 5
  labels: {
    nodeType: management,
    nodeOwner: ops
  }

- name: 'prod'
  instanceType: c5.large
  desiredCapacity: 1
  minSize: 1
  maxSize: 5
  labels: {
    nodeType: prod,
    nodeTeam: dev
  }
```

Now we can simply deploy CA to our management nodegroup by adding `nodeSelector` [3] to the deployment:


```
      nodeSelector:
        nodeType: management
```

Now if we look at our cluster we can see that the CA was scheduled on a node with the `management` label

❯ kubectl get node -l nodeType=management
NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-1-252.us-west-1.compute.internal   Ready    <none>   99m   v1.25.12-eks-8ccc7ba

❯ kubectl get pods -A -o wide | grep ip-10-0-1-252.us-west-1.compute.internal
kube-system   aws-node-ld6rw                        1/1     Running   0          101m   10.0.1.252   ip-10-0-1-252.us-west-1.compute.internal   <none>           <none>
kube-system   cluster-autoscaler-65b4f87bc7-mqmx9   1/1     Running   0          73m    10.0.0.104   ip-10-0-1-252.us-west-1.compute.internal   <none>           <none>
kube-system   kube-proxy-8xgv8                      1/1     Running   0          101m   10.0.1.252   ip-10-0-1-252.us-west-1.compute.internal   <none>           <none>

We now can deploy our application and we need to make sure we have the right `nodeSelector` to make sure its scheduled on the right node type

```
      nodeSelector:
        nodeType: prod
```

We can see that our app is getting deployed correctly but one interesting thing is shown on the CA logs:

```
I0908 07:05:15.366281       1 auto_scaling_groups.go:418] Extracted autoscaling options from "eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6" ASG tags: map[]
I0908 07:10:47.662171       1 scale_up.go:93] Pod nginx-deployment-7d6dc585d4-7vspz can't be scheduled on eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
I0908 07:10:47.662184       1 scale_up.go:95] 4 other pods similar to nginx-deployment-7d6dc585d4-7vspz can't be scheduled on eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6
I0908 07:10:47.662237       1 scale_up.go:262] No pod can fit to eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6
```

As shown above, CA is aware that our app cannot be ran on the management nodegroup hence why we are not seeing any scaling happening on the `management` ASG it will then look for the other ASGs it has discovered and will start scaling out as show bellow:

```
I0908 07:10:47.662554       1 waste.go:55] Expanding Node Group eks-prod-f6c539c0-de12-8494-3771-47b034529feb would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I0908 07:10:47.662570       1 scale_up.go:282] Best option to resize: eks-prod-f6c539c0-de12-8494-3771-47b034529feb
I0908 07:10:47.662601       1 scale_up.go:286] Estimated 1 nodes needed in eks-prod-f6c539c0-de12-8494-3771-47b034529feb
```

Not only we have isolated which workload is assigned to which nodegroup, we made sure their scaling is also working properly, this will allow our infrastructure to mitigate down times in case we have management pods on which we rely-on and don't want to have them disturbed in case they are scheduled on Nodes that are scaling in and are being shutdown which might cause them to be rescheduled on a different node.
	Restricting tenant workloads to run on specific nodes can be used to increase isolation in the soft multi-tenancy model. With this approach, tenant-specific workloads are only run on nodes provisioned for the respective tenants. To achieve this isolation, native Kubernetes properties (node affinity, and taints and tolerations) are used to target specific nodes for pod scheduling, and prevent pods, from other tenants, from being scheduled on the tenant-specific nodes. [1]

	EKS Nodegroups or node pool (in GKE and AKS) is a group of nodes within a cluster that all have the same configuration. The diagram bellow [multi-ng] shows one of the possible use cases, Assuming we have multiple workloads that require different configuration and might have different requirements. In our case we have two nodegroups "application" and "management", both serve different purposes as follow:

	* Management: all pods that are used for cluster cluster operations such as CA, Prometheus, etc. This nodegroup does not require the same beefy instances that our application needs, hence saves us cost.

	* Prod: all our application production code will be running here and we expect this nodegroup to scale in and out several times throughout the day.

	### Deploying ng

	Using `eksctl` we can deploy the following two simple nodegroups with different compute size and give them some labels, please note that when using an EKS managed nodegroup you won't have to create a custom label to name your labels AWS will automatically add the following label `eks.amazonaws.com/nodegroup=<your-ng-name-here>`

	Please note that we won't cover how to install or setup CA, in the bellow example we are auto-discovering the ASGs, you can read more about it here [5].


	```
	managedNodeGroups:
	- name: 'management'
	instanceType: t3.medium
	desiredCapacity: 1
	minSize: 1
	maxSize: 5
	labels: {
	nodeType: management,
	nodeOwner: ops
	}

	- name: 'prod'
	instanceType: c5.large
	desiredCapacity: 1
	minSize: 1
	maxSize: 5
	labels: {
	nodeType: prod,
	nodeTeam: dev
	}
	```

	Now we can simply deploy CA to our management nodegroup by adding `nodeSelector` [3] to the deployment:


	```
	nodeSelector:
	nodeType: management
	```

	Now if we look at our cluster we can see that the CA was scheduled on a node with the `management` label

	❯ kubectl get node -l nodeType=management
	NAME STATUS ROLES AGE VERSION
	ip-10-0-1-252.us-west-1.compute.internal Ready <none> 99m v1.25.12-eks-8ccc7ba

	❯ kubectl get pods -A -o wide \| grep ip-10-0-1-252.us-west-1.compute.internal
	kube-system aws-node-ld6rw 1/1 Running 0 101m 10.0.1.252 ip-10-0-1-252.us-west-1.compute.internal <none> <none>
	kube-system cluster-autoscaler-65b4f87bc7-mqmx9 1/1 Running 0 73m 10.0.0.104 ip-10-0-1-252.us-west-1.compute.internal <none> <none>
	kube-system kube-proxy-8xgv8 1/1 Running 0 101m 10.0.1.252 ip-10-0-1-252.us-west-1.compute.internal <none> <none>

	We now can deploy our application and we need to make sure we have the right `nodeSelector` to make sure its scheduled on the right node type

	```
	nodeSelector:
	nodeType: prod
	```

	We can see that our app is getting deployed correctly but one interesting thing is shown on the CA logs:

	```
	I0908 07:05:15.366281 1 auto_scaling_groups.go:418] Extracted autoscaling options from "eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6" ASG tags: map[]
	I0908 07:10:47.662171 1 scale_up.go:93] Pod nginx-deployment-7d6dc585d4-7vspz can't be scheduled on eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
	I0908 07:10:47.662184 1 scale_up.go:95] 4 other pods similar to nginx-deployment-7d6dc585d4-7vspz can't be scheduled on eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6
	I0908 07:10:47.662237 1 scale_up.go:262] No pod can fit to eks-management-d4c539c0-d616-9435-a661-33f8970cf7f6
	```

	As shown above, CA is aware that our app cannot be ran on the management nodegroup hence why we are not seeing any scaling happening on the `management` ASG it will then look for the other ASGs it has discovered and will start scaling out as show bellow:

	```
	I0908 07:10:47.662554 1 waste.go:55] Expanding Node Group eks-prod-f6c539c0-de12-8494-3771-47b034529feb would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
	I0908 07:10:47.662570 1 scale_up.go:282] Best option to resize: eks-prod-f6c539c0-de12-8494-3771-47b034529feb
	I0908 07:10:47.662601 1 scale_up.go:286] Estimated 1 nodes needed in eks-prod-f6c539c0-de12-8494-3771-47b034529feb
	```

	Not only we have isolated which workload is assigned to which nodegroup, we made sure their scaling is also working properly, this will allow our infrastructure to mitigate down times in case we have management pods on which we rely-on and don't want to have them disturbed in case they are scheduled on Nodes that are scaling in and are being shutdown which might cause them to be rescheduled on a different node.