- 
Export the model as model.pkl
- 
Deploy to baseten 
import baseten
baseten.deploy_custom(
  model_name='FastAI demo',
  model_class='FastaiModel',
 model_files=['fai_model.py', 'model.pkl'],
| <!DOCTYPE html> | |
| <meta charset="utf-8"> | |
| <style> | |
| path { | |
| cursor: pointer; | |
| fill: #eee; | |
| stroke: #666; | |
| stroke-width: 1.5px; | |
| } | 
| <meta charset="utf-8"> | |
| <style> | |
| body { | |
| background-color: #77f2ff; | |
| } | |
| circle { | |
| fill: none; | |
| stroke: white; | 
Export the model as model.pkl
Deploy to baseten
import baseten
baseten.deploy_custom(
  model_name='FastAI demo',
  model_class='FastaiModel',
 model_files=['fai_model.py', 'model.pkl'],
| import http from "k6/http"; | |
| import { group, check } from "k6"; | |
| const BASE_URL = "https://app.baseten.stability.ai"; | |
| let accept = "application/json"; | |
| export const options = { | |
| tags: { | |
| host: "https://app.baseten.stability.ai", | |
| }, | |
| scenarios: { | 
| package main | |
| import ( | |
| "bytes" | |
| "encoding/json" | |
| "errors" | |
| "fmt" | |
| "io/ioutil" | |
| "log" | |
| "net/http" | 
This benchmark is designed to simulate the way our users typically load/run models in Baseten without additional optimisations. The goal is to get a baseline for how our users will experience Baseten.
Create a g2-standard-8 VM with running ubuntu-2004-focal-v20230715 image, 200GB nvme. This machine has 1x nvidia L4. SSH into the VM and run the following commands to install cuda:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin| apiVersion: apps/v1 | |
| kind: DaemonSet | |
| metadata: | |
| name: no-nvidia-unattended-upgrades | |
| namespace: gpu-operator | |
| spec: | |
| selector: | |
| matchLabels: | |
| name: no-nvidia-unattended-upgrades | |
| template: |