Skip to content

Instantly share code, notes, and snippets.

View AGInfer's full-sized avatar

Aishwarya Goel Inferless ( A.G.I) AGInfer

View GitHub Profile
pip install llama-cpp-python
curl - location - request POST 'http://localhost:8000/v2/repository/models/nvidia-triton-llm-streaming/load'
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v ~/model_repo:/
models nvcr.io/nvidia/tritonserver:23.11-py3 tritonserver --model-repository=/
models --model-control-mode=explicit
pip install "autoawq==0.1.8"
pip install "torch==2.1.2"
@AGInfer
AGInfer / nv2
Last active May 31, 2024 17:14
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v ~/model_repo:/
models nvcr.io/nvidia/tritonserver:23.11-py3 tritonserver --model-repository=/
models --model-control-mode=explicit
"autoawq==0.1.8"
"torch==2.1.2"