In this gist we are going to deploy a containerized BentoML service to Kubernetes as a server-less function using Knative.
- A BentoML service that you have already locally tested. Refer to this gist for more information on how to create one as an example
- Containerized the BentoML Service. Refer to this gist for more information on how to containerize existing BentoML services.
- A Virtual Machine/Bare-metal server with Ubuntu/Debian based OS and NVIDIA CUDA enabled GPU that you can use to deploy Kubernetes and test this in.
I'm doing this on a small dekstop I have at home. This one has a old GTX 1660 with 6GB VRAM. Since the model we are loading is only 600 MB. This system is enough to run our Prompt Engineering service (detailed in step