Skip to content

Instantly share code, notes, and snippets.

View sa-faizal's full-sized avatar
🎯
Focusing

SfOps sa-faizal

🎯
Focusing
View GitHub Profile
@sa-faizal
sa-faizal / triton_server_working_with_vllm.md
Created November 12, 2025 20:32
Triton Inference Server w/ vLLM on AMD GPUs

Triton Inference Server w/ vLLM on AMD GPUs

Note

This tutorial is largely based on the following rocm blog:

Triton Inference Server with vLLM on AMD GPUs

With a couple tweaks and fixes to enable running across 3 MI300X GPUs.

Table of Contents