Skip to content

Instantly share code, notes, and snippets.

@abasu0713
Last active July 21, 2024 22:30
Show Gist options
  • Save abasu0713/3b8588223740eebd1e5434f92ce14185 to your computer and use it in GitHub Desktop.
Save abasu0713/3b8588223740eebd1e5434f92ce14185 to your computer and use it in GitHub Desktop.
Deploy Machine Learning Models as server-less functions in Kubernetes using Knative

Deploy BentoML services as server-less functions on Kubernetes using Knative

In this gist we are going to deploy a containerized BentoML service to Kubernetes as a server-less function using Knative.

Prerequisites

  1. A BentoML service that you have already locally tested. Refer to this gist for more information on how to create one as an example
  2. Containerized the BentoML Service. Refer to this gist for more information on how to containerize existing BentoML services.
  3. A Virtual Machine/Bare-metal server with Ubuntu/Debian based OS and NVIDIA CUDA enabled GPU that you can use to deploy Kubernetes and test this in.

I'm doing this on a small dekstop I have at home. This one has a old GTX 1660 with 6GB VRAM. Since the model we are loading is only 600 MB. This system is enough to run our Prompt Engineering service (detailed in step 2 Gist).

Screenshot from 2024-07-21 16-52-15

Overview

We are going to:

  1. Create a Kuberenetes Cluster
  2. Prepare cluster by enabling various add-ons like: Metrics Server, Metal LB (for Bare-Metal Load Balancing) NVIDIA GPU Operator, and Knative
  3. Deploy Containerized BentoML application as a server-less function using Knative Serving.

Let's get started.

Step 1: Create a Kubernetes cluster

You can chose any installer/distribution of your choice. I am going to use a canonical product called Microk8s which is a zero-ops CNCF certified Kubernetes Installer.

# Install Microk8s from the snap store
sudo snap install microk8s --classic --channel=1.30

# Join the group 
sudo usermod -a -G microk8s $USER
mkdir -p ~/.kube
chmod 0700 ~/.kube
newgrp

# Wait for the K8s cluster to be ready. Takes less than 1 min
microk8s status --wait-ready

# Alias kubectl
alias kubectl="microk8s kubectl"

# Check for pods and nodes
kubectl get po, nodes -A

And voila you have a K8s cluster up and running Screenshot from 2024-07-21 16-56-02

Step 2: Prepare K8s Cluster

# Install K8s metrics server
sudo microk8s enable metrics-server

# This is an optional step. Since I am deploying this on a bare-metal server there is not Load Balancer. If you are deploying this on Cloud this step is not needed.\
# Provide a private IP range that doesn't collide with your existing Router Settings.
sudo microk8s enable metallb

# Install NVIDIA GPU operator - This will take a few minutes. Wait to make sure
# All nvidia gpu operator resources are running/completed. 
sudo microk8s enable nvidia

# Install Knative
# Refer documentation for more info:
# https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#verifying-image-signatures
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.14.1/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.14.1/serving-core.yaml

# Install networking Layer
# We are going to use Kourier - since it's the most lightweight one. You can chose to install ISTIO or Contour if you'd like
# Follow the documentation for more information
kubectl apply -f https://github.com/knative/net-kourier/releases/download/knative-v1.14.0/kourier.yaml

kubectl patch configmap/config-network \
  --namespace knative-serving \
  --type merge \
  --patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'

# Fetch the External Load Balancer IP (in this case will be an IP that the Metallb has provisioned for the Kourier Ingress Service)
kubectl --namespace kourier-system get service kourier

# Configure DNS: https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns
# Since in this gist context we are only concerned about local testing of serverless function we are not going to worry 
# About DNS and exposing services outside the cluster and hence setup a Magic DNS. 
# In case you want to just simply use the "Real DNS" option from the documentation
# You can additionally also use something like Cloudflare tunnels to securely expose your Services without any External 
# Load Balancers
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.14.1/serving-default-domain.yaml

With all the above completed, Verify the installation:

kubectl get pods -n knative-serving

With this we have everything we need to deploy our BentoML service as a serverless-function. Let's get on with the next step. Screenshot from 2024-07-21 17-02-18 Screenshot from 2024-07-21 17-03-25 Screenshot from 2024-07-21 17-04-07 Screenshot from 2024-07-21 17-04-17 Screenshot from 2024-07-21 17-08-33 Screenshot from 2024-07-21 17-09-03

Step 3: Deploy Containerized BentoML application as a server-less function using Knative Serving

In this step we are going to define a Knative Service to utilize it's serverless Serving capabilities. In order to do so - first in the root of any of your Containerized BentoML application code create a knative-serving.yaml file and copy over the following contents to it and make any necessary changes to image names:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: prompt-enhancer
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: prompt-enhancer-bentoml
        image: <private-repository>/<image-name>:<tag>
        imagePullPolicy: Always
        ports:
        - containerPort: 3000 # Port to route to
        resources:
          limits:
            nvidia.com/gpu: 1
        livenessProbe:
          httpGet:
            path: /healthz
          initialDelaySeconds: 5
          periodSeconds: 5
        readinessProbe:
          httpGet:
            path: /healthz
          initialDelaySeconds: 15
          periodSeconds: 5
          failureThreshold: 3
          timeoutSeconds: 60

The above uses the Container we created from BentoML Build steps to define a server-less function. But before we deploy this, let's make a small update so that Knative knows of our Private Docker Registry credentials and can use that to pull private images, without which this would fail.

# Create the Docker Credentials as a Kubernetes Secret
kubectl create secret docker-registry regcred \
  --docker-server=<private-registry-url> \
  --docker-email=<private-registry-email> \
  --docker-username=<private-registry-user> \
  --docker-password=<private-registry-password>
  
# Patch the default service account so that it uses this new credential
kubectl patch serviceaccount default -p "{\"imagePullSecrets\": [{\"name\": \"regcred\"}]}"

At this point we are ready to deploy our serverless function. Let's go ahead and do that.

kubectl apply -f knative-serving.yaml

Give it a few minutes. If your models are large, this will take a few moments for the first time when Kubernetes pulls the image. This will be only 1 time, since if your container doesn't change too much, and just your application logic, then only the new image layers are downloaded from your Container Registry.

Screenshot from 2024-07-21 17-14-31

Check it's status frequently:

kubectl get ksvc,po

Screenshot from 2024-07-21 17-15-39 Screenshot from 2024-07-21 17-30-30

Once the status of the ksvc turns to Ready=TRUE then you should be able to start calling your function. Just open up a browser and navigate to the URL it shows on the kubectl get ksvc response.

Note: Don't freak out when you don't see the pods. They auto-scale down to 0. Benefits of having server-less function. The start-up times are very fast so don't worry about function calling. Hit the end point and you will see the Pods starting up right away.

This deployment is a basic example. I have not covered topics related to Knative auto-scaling, metrics observation through Prometheus or Log scraping into a persistent storage for seeing all logs across all your pod's lifecycles in 1 single place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment