Skip to content

Instantly share code, notes, and snippets.

@SaMnCo

SaMnCo/blog.md Secret

Created February 23, 2018 11:37
Show Gist options
  • Save SaMnCo/d8142e6cbd72b894d03d12d1f1eb8687 to your computer and use it in GitHub Desktop.
Save SaMnCo/d8142e6cbd72b894d03d12d1f1eb8687 to your computer and use it in GitHub Desktop.

Opportunistic Autoscaling with the Kubernetes HPA

Since the version 1.6 of Kubernetes it is possible to use custom metrics to autoscale an application within a cluster.

It means you can expose metrics such as the number of hits on an API or the latency of an API, and scale the serving pods according to that metric instead of the CPU load.

Now let's say you are operating a cluster which you rent to your customers via a Serverless Framework such as Kubeless, Nuclio or Fission. You charge your customers by the GB-s used and the number of invocations. But on the other end, your cost is really that you run the machines. So you would like to make sure that the cluster is used as much as possible.

You notice over time that the cluster is in fact only used at 60% in average, but there are peaks at certain times of the day where it loads at 90%. These peaks are really hard to predict because they depend on your customers' business.

You would like to leverage the remainer of the capacity, so that the cluster is permanently used at its max capacity. You found that NiceHash would be happy to buy it. Now you need to find a way to dynamically allocate the right amount of resources.

"Opportunistic Scaling"

You create an application that can look into the cluster and extract the capacity currently being used, and computes the remaining capacity.

This remaining capacity is fed to an autoscaler, which will scale an app to consume it.

As a result:

  • If the "paid" load on your cluster goes UP, the remaining resources go DOWN, and you DOWNSCALE the mining app.
  • If on the contrary your paid load goes DOWN, the remaining capacity goes UP, and you SCALE UP the number of mining containers.

The target is to have a load that is as constant as possible around a threshold you define (85% let's say), thus collecting an average of 25% unused power and monetize it.

It may look a silly application but I can definitely tell you that the mining pools seem to have a load variation when office hours finish, showing that business resources are definitely being used for mining at night!

In any case, this is what I define here as Opportunistic Scaling: the fact for an application to consume only resources available, and not overrule other apps.

In a real life scenario, crypto mining is effectively adding resources to a compute grid, hence it does also make sense beyond the hype and fun. There are also a lot of other interesting use cases. Among others:

  • Lambda on the edges using a serverless framework (Telco / Cloud Operator)
  • Elastic transcoding (Media Lab / Cloud): Think of what Ikea is doing on workstations but in a compute cluster
  • AI on the edges (Media Lab / Cloud)
  • Caching (CDN)
  • The cool use case you'll share in comments1

Now let's see how we can do that with Kubernetes.

Using Reversed Custom Metrics

In this blog, we will create a K8s cluster with a custom metrics API on bare metal.

We then create an app that exposes (among others) a metric

Remaining CPU = (total amount of CPU requested in cluster) - (total requested by all applications but myself)

This metrics decreases when the load of the cluster grows, and grows when the load shrinks. This effectively will make the metric a "client" of the requested load on a server.

Then we will use this metric to configure a Horizontal Pod Autoscaler (HPA) in Kubernetes. This will result in keeping the load in the cluster as high as possible.

Information about this post

This blog post has been sponsored by my friends at Kontron who gracefully allowed me to play with a 6 nodes cluster of their latest SymKloud Platform. Each node has 24 to 32 cores, and some of them have nVidia P4 cards.

</insert Kontron Content>

In addition, my friend Ronan Delacroix helped me with the code based and wrote most of the python needed for this experiment.

</insert Ronan Content>

Setup

Kubernetes Cluster

There are so many solutions out there to build one that we cannot even count them anymore. But as usual for my posts on bare metal I will use a Canonical Distribution of Kubernetes (CDK), in its version 1.8.

So we assume that you have a K8s Cluster in version 1.8, with RBAC active, and an admin role.

Important Note: The APIs we will be using here are very unstable and subject to big changes. I really recommend you read the K8s change log to check on them.

  • For example, there was a change in 1.8 on the name of the APIs. If you run a 1.7 cluster, this will impact you.
  • There are also changes in 1.9 and the custon.metrics.k8s.io moves from v1alpha1 to v1beta1.

There are some details of the configuration we will see today that are done in a certain way on CDK and may be slightly different on clusters that are self hosted. I will try to mention them whenever possible.

Foreplay

Make sure you installed

RBAC Configuration

NOTE: this applies to CDK, and will apply to GKE when the API Aggregation is GA with K8s 1.9.

On CDK and GKE the default user is not a real admin from an RBAC perspective, so you need to update it before you can create other Cluster Role Bindings that extend your own role.

First make sure you know your identity with

  • CDK: you are the user admin

https://gist.github.com/66aec7469e2ba3d4cb02f00fa885d6cf

  • On GKE:

https://gist.github.com/af3f5fa01b725b1cecbc517d62ed0f02

Now grant yourself the cluster-admin role:

https://gist.github.com/73418507d43ea5ace26f3130db0ff919

You can then check it out with:

https://gist.github.com/0ecd96030f734e0726bf52bb8042b37e

Helm Install

In order for Helm to operate in a RBAC cluster, we need to add it as a cluster-admin as well:

https://gist.github.com/224fea6e44a3b0db3e6d999007957927

OK, we are done with prepping our cluster

Note: By default Helm deploys without resource constraints. When trying to surcharge a cluster and maximize its usage, it means that Tiller will be part of the pods that may go away because resources are exhausted. If you do not want that to happen, you can edit the manifest and reapply it:

https://gist.github.com/0ea6fefc8a9c306b064f92bedc1ab174

AutoScaling: Preparing the cluster

Introduction & References

First of all, I recommend you have a look at the documentation about extending Kubernetes.

Then also look at the documentation about extending Kubernetes with the Aggregation Layer.

The last theorical doc is about setting up an API Server, and you can find it here.

Once the aggregation is active, we will deploy 2 custom APIs: the Metric Server and a Custom Metric Adapter

OK now that you are versed in what we need to do, let's get started.

Configuring the control plane

In order to activate the Aggregation Layer, we must add a few flags to our API Server:

https://gist.github.com/e54ce501402ba9417e5ce7971571aaa8

You will note that we do not activate the flags

https://gist.github.com/6a84b9bf8fc7bc92976ff480f53ac6e4

This is because the proxy in CDK uses a Kubeconfig and not a client certificate.

However, we do enable the aggregator routing because the control plane of Kubernetes is not self hosted and we fall in the case "If you are not running kube-proxy on a host running the API server, then you must make sure that the system is enabled with the enable-aggregator-routing flag".

Also we added the client-ca-file flag to export the CA of the API server in the cluster.

Now for the Controller Manager, we must tell it to use the HPA, which we do with:

https://gist.github.com/602f7e038bb24236aa0e0823a1f8191b

Note that the 2 last options here are really for demos to make it quick to observe the results of actions. You may not need change them for your use case (they default to 3m and 5m)

Just to make sure the settings are apply restart the 2 services with

https://gist.github.com/d0aa0de080d2fb58e8a4910b59409e1b

This will make Kubernetes create a configmap in the kube-system namespace called extension-apiserver-authentication, which contains all the additional flags we generated and their configuration. You can have a look at it via

https://gist.github.com/1464d853e2d2922a152e607e5a0035b2

Each of API servers will now need to have an RBAC authorization to read this Config Map. Thankfully K8s will also automatically create a role for it:

https://gist.github.com/c97cc5e3522ea4af1378f4101a713a35

Last but not least, you won't need Heapster for now, so make sure it is not there via:

https://gist.github.com/35c2b3c8a456efc47f90df3927c6e966

Initial API State

Before you start having fun with API Servers, have a look to the status of your cluster with:

https://gist.github.com/3caeb8727aa59320d2489c5e1c969711

At the end of the next sections, you will have 3 more APIs in this list:

  • monitoring.coreos.com/v1, for the Prometheus Operator
  • metrics.k8s.io, for the Metrics Server that collects metrics for CPU and Memory
  • custom.metrics.k8s.io, for the custom metrics you want to expose

Adding the Metrics Server API

There are 2 implementations of the Metrics API (metrics.k8s.io) at this stage: Heapster and the Metrics Server. At the time of this writing, the Metrics Server has a simple deployment method, while Heapster required some work on my end, and I was too lazy to write the code.

We can simply deploy it with

https://gist.github.com/963f11177617eda64168348a0ffe88d0

This manifest contains:

  • the Service Account for the Metrics Server
  • a RoleBinding so that the Metrics Server can read the configmap above
  • a ClusterRoleBinding so that the Metrics Server inherits the system:auth-delegator ClusterRole (you can find documetation about that here.
  • a Deployment and ClusterIP Service for the Metrics Server
  • an APIService object, which is a registration of the new API into the API Server.

Now check our APIs again:

https://gist.github.com/33b126d3c79468c7be1be3f535537665

Awesome... But does it really work? Query the endpoint of the API via kubectl to make sure

https://gist.github.com/8cad8bd8cdf157b301ea043c5d39fb41

Good job, NodeMetrics and PodMetrics are exposed. Look into what you can use from there:

https://gist.github.com/088b373562a7b3c4542e36a9380a0c72

And

https://gist.github.com/243688abadd186b30219d57ca3050a39

No big surprise here, you can access the CPU and memory consumption in real time. Refer to the docs for more details about how to query the API.

Installing the Custom Metrics Pipeline

Right before we have taken a shortcut, having a metric pipeline that is directly exposable as the aggregated API. Unfortunately, in the case of custom metrics, we must do this in 2 distinct steps.

First of all we must deploy the custom metrics pipeline, which will give us the ability to collect metrics. We use Prometheus for that part as the canonical example of metrics collection system on K8s.

Then we will expose these metrics via a specific API Server. We will use the work of Sully (@DirectXMan12) that can be found here for that.

Prometheus has many installation methods. My personal favorite is the Prometheus Operator. It takes a lot of efforts to architect a piece of software using traditional solution. But crafting a software model that ties to the underlying distributed solution beautifully is closer to art than anything else.

That is essentially what the operator is. The operator models how Prometheus should be given a set of conditions, then realizes that in Kubernetes. Whaow, good job @CoreOS.

Note that you can create an Operator for anything, and that something similar is coming for Tensorflow as far as I can see the APIs coming up... Anyway, let's not get distracted.

Install the Prometheus Operator with:

https://gist.github.com/d28da0eccc0b0a52cadcf2e70dcf8e72

This contains:

  • a Service Account for the operator
  • a ClusterRole and ClusterRoleBinding that are fairly extensive, so that the Operator can deploy Custom Resource Definitions for Prometheus (instances of), Alert Managers and Service Monitors.
  • a Deployment for the Operator pod.

This will let the Operator add the monitoring API as well:

https://gist.github.com/6c7819029b7ace33e39cb4da777e73bf

Now create an instance of Prometheus with:

https://gist.github.com/b2e1625577267dc5134715b138c5b956

The RBAC manifest will allow Prometheus to read the metrics it needs in the cluster and /metrics endpoints of any object (pod or service). The Prometheus manifest defines an instance and a service to expose it as a nodePort (so we can have a look at the UI).

What is important in this second file is the section:

https://gist.github.com/17c49bc59e3aefb80641723e4e0fb932

This essentially dedicates the Prometheus instance to Service Monitors with this label (or set of labels). When we will define the applications we want to monitor and how, we will need that information.

Note that this is a trivial example of deployment, with no persistent storage or any fancy thingy. If you are contemplating using this for a more production grade usage, you will need to spend some time on this.

OK, now you can connect on the UI and check that you have everything deployed correctly. It is pretty empty for now...

image

Installing the Custom Metrics Adapter

Now that we have the ability to collect metrics via our Prometheus pipeline, we want to expose them under the aggregated API.

First of all, you will need some certificates. Joy. This is all documented here. Run the following commands to generate your precious:

https://gist.github.com/405bf8028e99bce99bb1eee1576dd0e4

In order to authenticate our extended API server against the Kubernetes API Server, we have several options:

  • Using a client certificate
  • Using a Kubeconfig file
  • Using BasicAuth or Token authentication

Adding users with certificates in CDK is a project in itself and would deserve its own blog post. If interested, ping me in questions and we can discuss this in DMs. BasicAuth and Tokens are easy, but they also require to edit /root/cdk/known_tokens.csv or /root/cdk/basic_auth.csv on all masters and restart the API server daemon everywhere.

So the solution with the least complexity is actually the Kubeconfig file. Thanks to RBAC, the only thing we need to create a new user is a service account, which will give us access to an authentication token, which we can then put into our kubeconfig.

https://gist.github.com/ebe8e4fb6ca71907fab13e00d6e3c58a

You can then create a copy of your .kube/config file and edit the user section to add the custom-api-server:

https://gist.github.com/fb916f7f7481e9dd1133a301ee6d1d7b

Do not forget to also edit the contexts to map to this user instead of admin.

Now edit a cm-values.yaml file for the helm chart:

https://gist.github.com/1fe8613a730d65a6a57166892f8e3ad9

OK you are now ready to download the chart and install it

https://gist.github.com/7fa2058e2c91b51f159fb289ba7be7e3

And we check that the new API is registered in Kubernetes:

https://gist.github.com/3047a2195737c87e159b03417124f65e

Great. Now let us check that everything works properly by querying the K8s endpoint for it:

https://gist.github.com/5c0ad9ac3a931e60a126008311d4cc9a

At this point of our development, if we get a 200 answer and NOT a 404 we are good and we have our autoscaling up & running. If you get a 404, then it did not work properly.

Summary

In the long section above, we have done the following

  1. Install the new Metrics Server and add the metrics.k8s.io API to the cluster. This gave us access to an equivalent of heapster, but exposing metrics under the classic API.
  2. Install a Custom Metrics pipeline to be able to collect any metrics. We did this via Prometheus, using the Operator to create a Prometheus Instance
  3. Install the Custom Metrics API custom.metrics.k8s.io via the installation of a Prometheus Adapter.

For each step, we validated that the cluster worked properly and as intended. Now we need to put it to good use.

Using Custom Metrics

Demo Application: http_requests

First of all we will test our set up with a very simple application written by @luxas that exposed a http_request metric on /metric. You can deploy it with

https://gist.github.com/738fb8431fbd1bc3e0cd88b54a20e3da

This manifest contains:

  • the Deployment and Service so we can query the application
  • a Service Monitor, which will indicate to the prometheus instance that it should scrap the metrics of the application
  • An Horizontal Pod Autoscaler (HPA), which will consume the number of http_requests and use it to scale the application.

Let us look into the HPA for a moment:

https://gist.github.com/b93e819dd3fd4194964a077265568590

As you can see, we have here

  • a Target (our deployment), with a minReplicas and a maxReplicas.
  • a metrics of type pod which tries to make sure that pods get an average 500m queries (which slightly above what the standard load from Kubernetes + Prometheus is)

So this means that you do not need the application to rely on its own metrics. You could potentially target any application metrics and use them to manage another application. Very powerful principles.

Let us say for example that you manage an application based on the principle of decoupled invocation, such as a chat or an order management solution. Some day, you start getting a peak of requests on the front end, and the backend does not follow. The queue fills up, and you start experimenting delays in processing of the requests. Well now you can scale the workers that process the queue based on the requests made on the front end. You create a target object that monitors the number of http_requests on the frontend, but the scale target may be your application. It is as simple as that.

Now look at how Custom API reacts to this (it may take a couple of minutes before this works)

https://gist.github.com/5b2d621dc36c7dc652dac8a67f4582af

And we can then query the service itself:

https://gist.github.com/f8c9150459d36dfff2e6a63d498dcc9e

And we then look at our HPA:

https://gist.github.com/59f008e98ef98854f45c10c921ac8cc9

We can see here that just the requests for status account for 866m. Now deploy a shell app so we can create some load:

https://gist.github.com/2a18fa1787a9018de563975ab186936d

Now prepare 2 shells. In the first one, connect into your container with

https://gist.github.com/515b33d01810145015dee24cec204c47

And in the second one, track the HPA with

https://gist.github.com/efad7d91010d4d53cbd16e7279a83a9b

In addition, he created a nice UI that presents the values in real time.

This requires a Grafana installation. You can install both apps with:

https://gist.github.com/9334cb37f8152e21a9c6f6c8d0d65c40

These manifests contain

  • Cluster Roles and bindings for collecting metrics
  • Deployments for both Grafana and the python application
  • Nodeports services on ports 30505 (app) and 30902 (Grafana)
  • Config maps to configure both

What is of interest to us in this example is the "cpu_capacity_remaining". As mentioned in the intro, I had access thanks to Kontron to a 184-core cluster. I decided to "reserve" 30 cores, or 15% of my capacity to give room for load peaks. This gave me an autoscaler looking like:

https://gist.github.com/6fe140aa88640b1625133094210bca69

You will note I am using Electroneum as my crypto. The reason for this is practical. It is a very new cryptocurrency, with limited mining resources allocated to it right now, which means you can directly measure your impact and see daily returns, which is cool for demos. In case you wonder, as this currency requires a Monero miner, this setup can easily be converted into something more lucrative by pointing it to a real monero pool.

To replicate this blog with your own machines, edit the src/manifest-etn.yaml file according to your own cluster then deploy with :

https://gist.github.com/edaf158000e16fc6163de8bfa8f61b58

This manifest contains:

  • a Deployment of the miner
  • a Horizontal Pod Autoscaler as seen above
  • a service to expose the UI on port 30500 of nodes.

Now let us check on our HPAs with:

https://gist.github.com/79921139b13a74820c3a342a8082c8d4

Alright, we are all set! Now we can finally check how our application reacts to load.

Opportunistic Load Balancer in motion

In order to supercharge our cluster, we reuse our shell-demo application, and generate 10 hits per second on the API for 5 min. Because we are expecting only 0.5 hits, this will quickly trigger the scale out:

https://gist.github.com/a37f5c09d1bac99bd09062f7b09a3080

There you go, we can see the new pod coming in. Each new pod requests 4 CPU cores to the cluster. This unbalances the HPA, that tries to counter by releasing miners. Over 5 minutes, our app will scale up to 17 replicas, thus claiming 68 cores to the cluster, which will be freed from the mining app. After 5 minutes, the load is now normal and we see a scale down of the simple app from 17 pods down to its stable version at 2 replicas. The HPA for the miner will react and start harvesting the capacity.

This can be seen in the UI on the CPU capacity graph:

capacity graph

That's it, we have an application that is self opportunistically adjusting to the load created by other applications in the cluster.

Some thoughts about the HPA

While creating this blog, I had a huge hard time configuring the HPA to make it stable and convergent and not completely erratic. One must understant that the HPA in K8s is, so far, pretty dumb. It is not exactly learning from the past, rather systematically repeating the same reaction patterns regardless of the fact they failed or succeeded in the past.

Let's say a custom metric is at 150% of its target value, then the cluster will perform a 150% capacity increase. This means that if your application is creating 2% HPA resource value for 1% increase of scale, you will enter into a turbulence zone, with the HPA proving incapable of converging.

For our application, this means that the mining application was designed so that each replica consumes 1 CPU Core, and the max value was set on the max we could use for this specific app. Other things were running there consuming ~30 cores, hence the fact we have a max replicas at 150 ~= (184 - 30).

Long Story short: if you do this at home, be thoughtful about your HPA design, and do some experiments. If the HPA does not learn, you certainly should.

References

I would like to thank @Luxas and @DirectXMan12 for inspiring this work and for the fantastic walkthrough they wrote here and there that helped me a lot while writing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment