Skip to content

Instantly share code, notes, and snippets.

@benjamintanweihao
Last active September 8, 2022 06:43
Show Gist options
  • Save benjamintanweihao/5063c09723359d7b274edaaf5d3f27f4 to your computer and use it in GitHub Desktop.
Save benjamintanweihao/5063c09723359d7b274edaaf5d3f27f4 to your computer and use it in GitHub Desktop.

Automated Model Deployment with BentoML and Kubeflow

One of the things that I've been dissatisfied with so far with our current workflow is that model deployment is not as automated as it could be. This is partly because model deployment there are things to consider like:

  1. How the model was built which affects how the model would be served
  2. How the served model would be consumed
  3. How to scale workloads
  4. How to monitor the service and implement logging
  5. How the model can be retrained and automatically deployed if it performs better

I've been experimenting with BentoML and Yatai for a few weeks now intending to come up with a proof-of-concept that would be able to address the above points. I've already covered my initial impressions of BentoML/Yatai in a previous post.

While the POC that I will be presenting is by no means complete, I think it proves that BentoML is indeed a very compelling solution to address most of the pain points when it comes to model deployment.

Note that your tech stack most likely would look different, it doesn't matter because you should be able to find analogue to what you're already using, except of course for BentoML/Yatai!

Brief Outline

Here's a breakdown of how I approached solving the problem:

  1. Model training with Kubeflow and triggering CI
  2. Building Bentos and Deploying to Yatai via CI
  3. Serving, Monitoring, and Logging with Yatai

Step 1: Model Training with Kubeflow and Triggering CI

This is what the pipeline looks like. Yes, your pipeline would most definitely be more sophisticated but this is purely for pedagogical purposes. Just imagine that all the data preparation, transformation, model training, evaluation, etc, steps are all wrapped up in the Short train model

In the following sections, I'd describe the pertinent parts of each component.

Model Training with Kubeflow

The prerequisite to having automated model deployment is to automate model training in the first place. In our case, this means having a model training pipeline built using Kubeflow. This ensures that model training is repeatable and reproducible. Once the model is trained successfully, it gets checked into a model registry. For this, we're using MLFlow. In addition to getting the model registered, metrics such as accuracy, precision, etc are also automatically logged.

MLFlow is an important "glue" piece. Registering a model via MLFlow means that we'll be able to retrieve it later on via a unique URL such as:

model = mlflow.sklearn.load_model(f"runs:/{run_id}/model")

where run_id is a unique identifier like8aa4630a023c4e94b22c6e738127ee7a. This run_id is important because this is what we'll be passing on to the CI. But before that, we'll need to trigger the CI in the first place. If you are not using MLFlow, then even something like storing to S3/GCS could work, as long as you'd be able to download the model via a unique URL later on.

Triggering the CI

The BentoML commands to build the model, push it to the BentoML model registry, and deploy the model via Yatai are all done via CI. I'll cover the details of the command next, but the main thing to note here is that the CI is triggered after a successful model training and that we have the MLFlow Run ID.

In GitLab CI's case, we can trigger a CI job using curl. Note how we're passing in the MLFlow Run ID which would be available as an environment variable when the job executes:

curl --request POST \
  --form token=<token> \
  --form ref=<branch> \
  --form "variables[MLFLOW_RUN_ID]=8aa4630a023c4e94b22c6e738127ee7a" \
  "https://gitlab.com/api/v4/projects/<project-id>/trigger/pipeline"

This is pretty simple to turn into Python code. If you're lazy like me, you can paste your curl command into something like https://curlconverter.com/ and it would helpfully generate the corresponding Python code. I did this so that I could quickly create a Kubeflow component:

def trigger_ci(mlflow_run_id: str, branch: str = "main"):  
    import requests  
  
    files = {  
        "token": (None, "glptt-8c4769c1b5e6b6ab41858a1f8ca5ccb9b9ba4a68"),  
			  "ref": (None, branch),  
			  "variables[MLFLOW_RUN_ID]": (None, mlflow_run_id),  
  }  
  
    response = requests.post(  
        "https://gitlab.com/api/v4/projects/31589964/trigger/pipeline", files=files  
    )  
  
    return response.text

Step 2: Building Bentos and Deploying to Yatai via CI

Here comes the fun part! There are a few pieces to this but let's start with the .gitlabci.yml file which contains all the relevant bentoml commands:

bentoml-build-and-push:  
  image: "python:3.8"  
  stage: bentoml  
  script:  
    - apt-get update -y  
    - apt-get install -y jq  
    - pip install --upgrade pip  
    - pip install -r $CI_PROJECT_DIR/bento/requirements.txt  
    - export BENTO_MODEL="aml-case"  
    - python $CI_PROJECT_DIR/bento/save_model_to_bentoml.py --model_name ${BENTO_MODEL} --run_id $MLFLOW_RUN_ID  
    - bentoml build -f $CI_PROJECT_DIR/bento/bentofile.yaml  
    - bentoml yatai login --api-token $YATAI_API_TOKEN --endpoint http://yatai.yatai-system.svc.cluster.local  
    - export BENTO_DEPLOYMENT_NAME="${BENTO_MODEL}-${CI_COMMIT_SHORT_SHA}"  
    - bentoml push ${BENTO_MODEL}  
    - export BENTO_MODEL_AND_TAG=`bentoml list ${BENTO_MODEL} -o json | jq '.[0]["tag"]'`  
    - cp $CI_PROJECT_DIR/bento/deployment.yaml ./deployment.yaml  
    - 'sed -i "s/BENTO_MODEL_AND_TAG/$BENTO_MODEL_AND_TAG/g" ./deployment.yaml'  
    - 'sed -i "s/BENTO_DEPLOYMENT_NAME/$BENTO_DEPLOYMENT_NAME/g" ./deployment.yaml'  
    - 'sed -i "s/BENTO_MODEL/$BENTO_MODEL/g" ./deployment.yaml'  
  artifacts:  
    paths:  
      - deployment.yaml  
  only:  
    - triggers  
  
  
bentoml-deploy:  
  image: bitnami/kubectl:latest  
  stage: bentoml  
  script:  
    - kubectl create -f deployment.yaml  
  needs: ["bentoml-build-and-push"]  
  only:  
    - triggers

I'll go through what each of the commands does. The first four commands install jq which we would need for JSON manipulation later on along with the dependencies needed for the Bento Service. This includes things like scikit-learn and mlflow.

Next, we execute save_model_to_bentoml.py passing along the model_name and all-important run_idwhich if you recall previously we passed in while manually triggering the CI.

Registering the Model into the BentoML Registry

def main(model_name: str, run_id: str):  
    model = mlflow.sklearn.load_model(f"runs:/{run_id}/model")  
  
    saved_model = bentoml.picklable_model.save_model(  
        name=model_name, model=model, signatures={"predict_proba": {"batchable": True}}  
    )  
 
    runner = bentoml.picklable_model.get(f"{model_name}:latest").to_runner()  
    runner.init_local()  
  
    EXAMPLE_INPUT = np.random.rand(4, 24)  
    data = pd.DataFrame(EXAMPLE_INPUT)  
    predicted_probs = [x[1] for x in model.predict_proba(data)]  
    
    print(predicted_probs)  
 
if __name__ == "__main__":  
    parser = argparse.ArgumentParser()  
    parser.add_argument("--model_name")  
    parser.add_argument("--run_id")  
    args = parser.parse_args()  
    main(model_name=args.model_name, run_id=args.run_id)

The real meat of this script is the first two lines:

 model = mlflow.sklearn.load_model(f"runs:/{run_id}/model")  
  
saved_model = bentoml.picklable_model.save_model(  
    name=model_name, model=model, signatures={"predict_proba": {"batchable": True}}  
)  

Here we load the model from MLFlow using the run_id, then save the model into the BentoML model registry with the model_name that we passed in. Now once we have the model in the BentoML registry, we can start building the Bento Service.

The rest of the script performs a sanity check by making sure that the loaded model in BentoML can work with some random input.

Building the Bento Service

The bentofile.yaml is pretty simple:

service: "bento.service:svc"  
labels:  
  owner: ml-engineering  
  stage: sandbox  
include:  
  - "*.py"  
python:  
  requirements_txt: "./bento/requirements.txt"

bentoml build -f $CI_PROJECT_DIR/bento/bentofile.yaml builds the Bento Service which basically packages everything into a single artifact that we can use to deploy the model later on.

Deploying the Model via Yatai

Think of Yatai as a central repository for your Bentos. Before we can deploy to Yatai, we'll need to login into the cluster with the API token:

bentoml yatai login --api-token $YATAI_API_TOKEN --endpoint http://yatai.yatai-system.svc.cluster.local 

Then, you'll need to push the Bento to Yatai:

bentoml push ${BENTO_MODEL}

Here's how it looks like in GitLab CI:

Once successful, Yatai will start containerizing the Bento. While that is happening, we can prepare the Bento Deployment.

Preparing the deployment.yaml

The YAML looks like this, with some placeholders so that we can perform string substitution.

apiVersion: serving.yatai.ai/v1alpha2  
kind: BentoDeployment  
metadata:  
  name: BENTO_DEPLOYMENT_NAME  
  namespace: ds-models  
spec:  
  bento_tag: BENTO_MODEL_AND_TAG  
  ingress:  
    enabled: true  
  resources:  
    limits:  
        cpu: "1"  
  memory: "1Gi"  
  requests:  
        cpu: "500m"  
  memory: "512m"  
  runners:  
  - name: BENTO_MODEL  
    resources:  
      limits:  
        cpu: "1"  
  memory: "1Gi"  
  requests:  
        cpu: "500m"  
  memory: "512m"

We name the deployment using the model name and the Git short SHA, which makes it predictable and also allows us to easily match which deployment came from which Git commit. We then also extract the Bento tag using bentoml list because that would be the Docker image that the BentoDeployment would use. For all these we can use sed and jq:

export BENTO_DEPLOYMENT_NAME="${BENTO_MODEL}-${CI_COMMIT_SHORT_SHA}"  
export BENTO_MODEL_AND_TAG=`bentoml list ${BENTO_MODEL} -o json | jq '.[0]["tag"]'`  
cp $CI_PROJECT_DIR/bento/deployment.yaml ./deployment.yaml  
'sed -i "s/BENTO_MODEL_AND_TAG/$BENTO_MODEL_AND_TAG/g" ./deployment.yaml'  
'sed -i "s/BENTO_DEPLOYMENT_NAME/$BENTO_DEPLOYMENT_NAME/g" ./deployment.yaml'  
'sed -i "s/BENTO_MODEL/$BENTO_MODEL/g" ./deployment.yaml'

Note here that we are making a copy of the deployment.yaml file (on the third line) because we are going to pass this copy down to the next (and final) step of the CI pipeline.

Deploying the BentoDeployment

Well, the only thing left to do here is to execute kubectl create -f deployment.yaml:

bentoml-deploy:  
  image: bitnami/kubectl:latest  
  stage: bentoml  
  script:  
    - kubectl create -f deployment.yaml  
  needs: ["bentoml-build-and-push"]  
  only:  
    - triggers

Here's how it looks like in GitLab CI:

Doesn't this give you warm fuzzy feelings when automation works the way it should? :D. Once this is completed, you'd be able to check the progress in the Yatai UI.

Step 3: Serving, Monitoring, and Logging with Yatai

Let's talk about monitoring and logging with Yatai first because this tripped me up when I was new to Yatai. Turns out, in order to get monitoring and logging working, you'll need to manually enable it first. In order to do that, you'll first need to navigate to _Clusters in the left menu, click on the cluster (usually called default), then click on the Yatai components menu item. You should at first only see Deployment. Click on Create and select Monitoring in the dropdown. Repeat the process for Logging. Wait a bit and this is what you should see:

Once you have both components enabled, then from now on, each model you deploy would come with monitoring (via Grafana) and logging (via Loki).

Finally, the payoff! Click on the URL and you'll be brought to a nice-looking Swagger page where you can play with the API to your heart's content:

What About Model Retraining?

So far, what I've covered addresses the first four points except for model retraining. What is happening now is that each time a model retraining pipeline runs. the resulting model gets deployed as a REST API. What is missing here is a "gatekeeper" component that checks whether the model deserves to be deployed in the first place.

Firstly, we can simply schedule the Kubeflow pipeline to run say, every week. Kubeflow has recurring runs just for this. You don't really have to wait for your drift detection alerts to go off before you start retraining. The underlying assumption here is that when the pipeline executes, fresh data would be available for the model to be re-trained on.

The second thing you'll need is evaluation metrics to compare a candidate model and the one you have already in production. Obviously, you need to be confident in this set of evaluation metrics because they will decide whether the model gets promoted or not.

One way to do this would be to plug a component right before the Trigger CI component and run the evaluation tests against the deployed model before deciding whether the Trigger CI component should execute.

While I haven't implemented this yet, I intend to test out what model retraining looks like first by scheduling a mode pipeline run every week and having its evaluation metrics posted via Slack. This would give me confidence that:

a) Model retraining is executing b) Evaluation metrics are computed ... c) ... and compared against a deployed model

Once I'm confident, then we can use the evaluation metrics and automatically switch out the production models for the newly trained and better-performing ones. However, that would require a bit more thought and design too! One way would be to simply switch out the service in the Ingress to the best performing model.

For example, say we have the following ingress:

apiVersion: networking.k8s.io/v1  
kind: Ingress  
metadata:  
  name: aml  
  namespace: ds-models  
spec:  
  rules:  
  - host: aml-case.ds-models.dev.jago.data  
    http:  
      paths:  
      - backend:  
          service:  
            name: "aml-123"  
            port:  
              number: 3000  
        path: /  
        pathType: ImplementationSpecific

In this case, the Ingress points to the aml-123 service. However, you can imagine that pointing to a new version would mean swapping out aml-123 for the new one. This could either be done manually or again via a CI trigger.

Conclusion

Overall, I'm quite pleased with the POC. By using MLFlow, I can register almost any kind of model I want. And with BentoML I can load back any pickle-able model, and it also supports models from all the popular ML libraries/frameworks. It is worth noting that BentoML supports MLFlow too but I was trying to go for a more generalizable solution here.

And hey, if automation gives you warm and fuzzy feelings too, why not consider joining us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment