Note
This tutorial is largely based on the following rocm blog:
Triton Inference Server with vLLM on AMD GPUs
With a couple tweaks and fixes to enable running across 3 MI300X GPUs.
Note
This tutorial is largely based on the following rocm blog:
Triton Inference Server with vLLM on AMD GPUs
With a couple tweaks and fixes to enable running across 3 MI300X GPUs.