Skip to content

Instantly share code, notes, and snippets.

Last active October 11, 2022 08:45
Show Gist options
  • Save thepycoder/21561d51b5880a7d0cdf04041433acaf to your computer and use it in GitHub Desktop.
Save thepycoder/21561d51b5880a7d0cdf04041433acaf to your computer and use it in GitHub Desktop.
Triton Ensemble
# Convert a Huggingface model to ONNX
docker run -it --rm --gpus all \
-v $PWD:/project \
bash -c "cd /project && \
convert_model -m \"philschmid/MiniLM-L6-H384-uncased-sst2\" \
--backend tensorrt onnx \
--seq-len 16 128 128"
# This will have outputted a triton_models/ folder,
# which we can now serve using Triton
docker run -it --rm --gpus all -p8000:8000 -p8001:8001 -p8002:8002 --shm-size 256m \
-v $PWD/triton_models:/models \
bash -c "pip install transformers && tritonserver --model-repository=/models"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment