When you decide to segregate your cluster in namespaces, you should protect against misuses in resources.
You shouldn't allow your user to use more resources than what you agreed in advance.
Cluster administrators can set constraints to limit the number of objects or amount of computing resources that are used in your project with quotas and limit ranges.
Containers without limits can lead to resource contention with other containers and unoptimized consumption of computing resources.
Kubernetes has two features for constraining resource utilisation: ResourceQuota and LimitRange.
With the LimitRange object, you can define default values for resource requests and limits for individual containers inside namespaces.
Any container created inside that namespace, without request and limit values explicitly specified, is assigned the default values.
Example Code:
apiVersion: v1
kind: LimitRange
metadata:
name: cpu-resource-constraint
spec:
limits:
- default: # this section defines default limits
cpu: 500m
defaultRequest: # this section defines default requests
cpu: 500m
max: # max and min define the limit range
cpu: "1"
min:
cpu: 100m
type: Container
With ResourceQuotas, you can limit the total resource consumption of all Pods/containers inside a Namespace.
Defining a resource quota for a namespace limits the total amount of CPU, memory or storage resources that can be consumed by all containers belonging to that namespace.
You can also set quotas for other Kubernetes objects such as the number of Pods in the current namespace.
If you're thinking that someone could exploit your cluster and create 20000 ConfigMaps, using the LimitRange is how you can prevent that. Example
apiVersion: v1
kind: ResourceQuota
metadata:
name: configmap-quota
namespace: my-namespace # Change to your namespace
spec:
hard:
configmaps: "10" # Maximum of 10 ConfigMaps allowed in the namespace
For cpu
and memory
Limits
apiVersion: v1
kind: List
items:
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-high
spec:
hard:
cpu: "1000"
memory: "200Gi"
pods: "10"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high"]
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-medium
spec:
hard:
cpu: "10"
memory: "20Gi"
pods: "10"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["medium"]
# dfd
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-low
spec:
hard:
cpu: "5"
memory: "10Gi"
pods: "10"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["low"]
If you provision and Deployment with out Resource Allocation, Limit Range
will assign default values that are configured in the policy. this allows HPA to calculate the CPU and memory metrics
Best Practice: Always define requests
and limits
to prevent a single pod from consuming excessive resources.
To Make HPA works you need to have metrics, Example if you want to scale based on CPU and memory utilization then make sure metrics server is already installed on kubernetes. By default it will checks CPU, Memory metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
HPA determines the desired number of replicas using this formula:
desiredReplicas = (currentMetricValue/targetMetricValue)*currentReplicas
in above example, we set averageUtilization
is 50%
and assume current CPU Utilization is 80%
, based on formula
desiredReplicas=(80/50)*2 == 3.2
it will be rounded off to 3
.
there are other ways to configure the velocity of the scale. You can add behavior
configuration to the HAP manifest to have differnete velocity of scaling example
behavior:
scaleUp:
policies:
- type: Percent
value: 900
periodSeconds: 60
scaleDown:
policies:
- type: Pods
value: 1
periodSeconds: 600 # (i.e., scale down one pod every 10 min)
The 900
implies that 9 times
the current number of pods can be added, effectively making the number of replicas 10 times the current size. All other parameters are not specified (default values are used)
If the application is started with 1 pod, it will scale up with the following number of pods:
1 -> 10 -> 100 -> 1000
but the scale down will be gradual and it will scale down 1 pod every 10 mins.
stabilizationWindowSeconds - this value indicates the amount of time the HPA controller should consider previous recommendations to prevent flapping of the number of replicas. this configuration allows you to Avoid false positive signals for scaling up(In scale up mode ) and does not want to scale down pods too early expecting some late load spikes(Scale down mode)
HPA supports Custom, External metrics as well( data Sources cab be Prometheus, Datadog, AWS CloudWatch and etc ), if you have any non functional requirements you can use these metrics to scale the application for example http_requests_per_second
(This metrics can be available from your ingress.)
Example:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 15
metrics:
- type: External
external:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
/Add Contenet
-
Use the Right Metrics for Scaling your Application
- Default Metrics: CPU, Memory
- Custom Metrics: HTTP request rate (RPS), Failed requests Per Sec
- External Metrics: API call rate
-
Avoid Overly Aggressive Scaling
you can use
stabilization windows
to prevent frequent scaling (flapping). -
Combine HPA with Readiness & Liveness Probes New pods takes time to get to Ready State, Make sure your Liveness and Readiness probes are configured Right.
-
Set Min and Max Replicas Properly
-
Scale Using Multiple Metrics
You can use tools like keda.sh for Event Driven Auto Scaling