Skip to content

Instantly share code, notes, and snippets.

@danielrosehill
Created November 23, 2025 20:06
Show Gist options
  • Select an option

  • Save danielrosehill/8793e2028ef4bd08c6ca955a38b40e5b to your computer and use it in GitHub Desktop.

Select an option

Save danielrosehill/8793e2028ef4bd08c6ca955a38b40e5b to your computer and use it in GitHub Desktop.
AMD GPU Acceleration Technologies Explained: ROCm, HIP, Vulkan, OpenCL & More (2025)

AMD GPU Acceleration Technologies Explained: ROCm Ecosystem Guide (2025)

A comprehensive guide to understanding the different GPU acceleration technologies you'll encounter in the AMD/ROCm ecosystem, demystifying terms like HIP, Vulkan, OpenCL, MIOpen, and more.


Table of Contents

  1. The ROCm Stack Overview
  2. Core Technologies
  3. Programming Models
  4. Math & Deep Learning Libraries
  5. Cross-Platform Standards
  6. Platform-Specific Solutions
  7. Comparison Tables
  8. Which Should You Use?

The ROCm Stack Overview

ROCm (Radeon Open Compute) is AMD's open-source software platform for GPU-accelerated computing. Think of it as AMD's answer to NVIDIA's CUDA ecosystem.

┌─────────────────────────────────────────────────┐
│        Applications & Frameworks                │
│   (PyTorch, TensorFlow, ONNX Runtime, etc.)    │
└─────────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────┐
│          High-Level Libraries                   │
│  (MIOpen, MIGraphX, rocBLAS, hipBLAS, etc.)    │
└─────────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────┐
│         Programming Layers                      │
│      (HIP, OpenCL, Vulkan, SYCL)               │
└─────────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────┐
│           ROCm Runtime                          │
│   (GPU drivers, kernel modules, firmware)       │
└─────────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────┐
│            AMD GPU Hardware                     │
│    (Radeon RX, Instinct MI, Ryzen AI)          │
└─────────────────────────────────────────────────┘

Key Point: ROCm is the entire platform, not just one component. When you see "ROCm support," it means support for AMD's full GPU compute stack.

Sources:


Core Technologies

1. HIP (Heterogeneous-Interface for Portability)

What it is: HIP is a C++ dialect and runtime API that allows developers to write portable GPU code that works on both AMD and NVIDIA hardware.

Think of it as: A translation layer between CUDA and ROCm.

Key Features:

  • Allows easy porting of CUDA code to AMD GPUs
  • Single codebase can target both AMD (via ROCm) and NVIDIA (via CUDA)
  • Syntactically similar to CUDA (often just find-and-replace changes)
  • Used by PyTorch ROCm (reuses torch.cuda interfaces)

Example Conversion:

// CUDA code
cudaMalloc(&d_array, size);
cudaMemcpy(d_array, h_array, size, cudaMemcpyHostToDevice);

// HIP code (nearly identical)
hipMalloc(&d_array, size);
hipMemcpy(d_array, h_array, size, hipMemcpyHostToDevice);

Execution Model:

  • Host (CPU): Main application runs here
  • Device (GPU): Compute kernels execute in SIMT (Single Instruction, Multiple Threads) model
  • Kernels launched from host, executed on device in parallel

When you'll see it: Building GPU-accelerated software from source, compiling libraries like whisper.cpp with WHISPER_HIPBLAS=1

Sources:


2. hipBLAS / rocBLAS

What it is: Linear algebra libraries for matrix operations on GPUs.

The Difference:

Library Description Backend
rocBLAS AMD's implementation of BLAS (Basic Linear Algebra Subprograms) AMD-specific, optimized for ROCm
hipBLAS Portability layer that works on both AMD and NVIDIA Calls rocBLAS on AMD, cuBLAS on NVIDIA

What they do:

  • Matrix multiplication (GEMM operations)
  • Vector operations
  • Core building blocks for deep learning

Think of it as: The math engine underneath frameworks like PyTorch. When you multiply two tensors, rocBLAS/hipBLAS does the heavy lifting.

Internal Tools:

  • Tensile: Code generator that creates optimized GEMM kernels
  • hipBLASLt: Extension for more advanced matrix operations

When you'll see it: Whisper.cpp with HIPBLAS support, building AI models, linear algebra-heavy applications

Sources:


3. MIOpen

What it is: AMD's library for deep learning primitives (convolutions, pooling, activation functions, etc.)

Think of it as: AMD's equivalent to NVIDIA's cuDNN.

What it does:

  • Implements common neural network operations
  • Optimized convolution algorithms
  • Batch normalization, activation functions, pooling
  • Auto-tuning infrastructure to find fastest algorithms for your GPU
  • Kernel fusion to reduce memory bandwidth usage

When you'll see it: Training deep learning models, running CNNs, image processing with neural networks

Sources:


4. MIGraphX

What it is: A graph compiler and optimizer for accelerating machine learning inference on AMD GPUs.

Think of it as: Takes your neural network model and optimizes it for fast inference.

What it does:

  • Graph-level optimizations (operator fusion, constant folding, etc.)
  • Generates optimized code by calling MIOpen, rocBLAS, or creating custom HIP kernels
  • Supports ONNX models
  • Integrated with frameworks (ONNX Runtime, Torch MIGraphX)

Key Point: As of ROCm 7.1+, AMD recommends using MIGraphX Execution Provider instead of the deprecated ROCm Execution Provider for ONNX Runtime.

When you'll see it: Deploying trained models for inference, ONNX model optimization, production ML deployments

Sources:


5. Composable Kernel (CK)

What it is: A high-performance kernel development framework for writing optimized GPU kernels using C++ templates.

Think of it as: A power tool for expert developers to write custom, ultra-optimized GPU operations.

What it does:

  • Tile-based programming model aligned with hardware architecture
  • Template-based code generation for different precisions and fusion patterns
  • Used internally by MIGraphX and hipBLASLt
  • Specialized for machine learning tensor operations

2025 Developments:

  • CK-Tile Framework: Tools for analyzing and eliminating LDS (Local Data Share) bank conflicts
  • Improved GEMM (matrix multiplication) kernels
  • Enhanced Scaled-Dot-Product Attention for transformers/LLMs
  • Support for INT8 quantized models (SmoothQuant on MI300X)

When you'll see it: Advanced kernel development, LLM optimization, custom ML operations

Sources:


6. rocRAND, rocFFT, rocSPARSE, rocSOLVER

What they are: Specialized math libraries for specific domains.

Library Purpose Use Cases
rocRAND Random number generation Monte Carlo simulations, dropout in neural networks
rocFFT Fast Fourier Transform Signal processing, frequency analysis, audio processing
rocSPARSE Sparse matrix operations Graph algorithms, sparse neural networks
rocSOLVER Linear system solvers Scientific computing, engineering simulations

Naming Convention:

  • rocLIB (e.g., rocRAND): AMD-optimized implementation for ROCm
  • hipLIB (e.g., hipRAND): Portability layer that works on AMD and NVIDIA

When you'll see them: Scientific computing, audio processing (rocFFT), graph neural networks (rocSPARSE)

Sources:


Programming Models

OpenCL (Open Computing Language)

What it is: An open standard for cross-platform parallel programming.

Think of it as: The "open source" alternative to CUDA, works across AMD, NVIDIA, Intel, and even CPUs.

Status in 2025:

  • ⚠️ AMD dropped PAL OpenCL driver for consumers in 2020
  • ✅ Mesa's Rusticl provides modern OpenCL 3.0 implementation
  • ✅ ROCm includes its own OpenCL implementation
  • ⚠️ Generally slower than ROCm/HIP for AMD GPUs

Advantages:

  • Cross-vendor compatibility (AMD, Intel, NVIDIA)
  • Works on CPUs too
  • Mature, stable standard

Disadvantages:

  • Performance often lags behind vendor-specific solutions
  • Less optimized for AMD hardware than HIP/ROCm
  • AMD's consumer support has waned

When you'll see it: Legacy applications, cross-platform tools, portable compute code

Sources:


Vulkan

What it is: A modern, low-level graphics and compute API.

Think of it as: Primarily a graphics API (like OpenGL successor), but also supports GPU compute.

Key Characteristics:

  • ✅ Excellent hardware compatibility (works on almost all AMD GPUs)
  • ✅ Low-level control for performance optimization
  • ✅ Cross-platform (Linux, Windows, Android, etc.)
  • Surprisingly fast for certain workloads

Performance (2025):

  • In some LLM inference benchmarks, Vulkan outperforms ROCm by up to 50%
  • Whisper inference: Vulkan competitive with or faster than ROCm
  • Better hardware support for consumer Radeon GPUs

Trade-offs:

  • ⚡ Higher power consumption than ROCm
  • Less optimized for AI/ML workloads than specialized tools
  • More complex programming model

When you'll see it: Gaming engines, graphics applications, llama.cpp (--vulkan flag), broad AMD GPU compatibility scenarios

Sources:


SYCL / oneAPI

What it is: A C++ abstraction layer for heterogeneous computing developed by the Khronos Group, promoted by Intel as part of oneAPI.

Think of it as: Write once, run on CPUs, AMD GPUs, NVIDIA GPUs, Intel GPUs, FPGAs.

How it works with AMD:

  • Codeplay provides oneAPI for AMD GPUs (versions 2025.0.0, 2025.1.1, 2025.2.0)
  • Uses DPC++ compiler with HIP backend
  • Requires ROCm installation
  • AdaptiveCPP (formerly hipSYCL) provides another SYCL implementation targeting ROCm

Verification:

# After installing oneAPI for AMD GPUs
source /opt/intel/oneapi/setvars.sh
sycl-ls
# Should show: [hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon...

Advantages:

  • True cross-platform portability
  • Standard C++ (not a proprietary language)
  • Growing ecosystem

Disadvantages:

  • Additional abstraction layer (potential performance overhead)
  • Less mature than CUDA/ROCm for AMD-specific optimization
  • Requires oneAPI toolkit + ROCm

When you'll see it: Cross-platform HPC applications, Intel-promoted AI frameworks, academic research code

Sources:


Cross-Platform Standards

ONNX Runtime

What it is: Microsoft's cross-platform inference engine for ONNX (Open Neural Network Exchange) models.

AMD GPU Support:

Execution Provider Status Recommendation
ROCm EP ⚠️ Deprecated (ROCm 7.0 last supported version) Migrate away
MIGraphX EP ✅ Active, recommended Use this

Key Points:

  • ROCm 7.1+ removes ROCm Execution Provider
  • MIGraphX Execution Provider is AMD's recommended path forward
  • Uses MIGraphX graph optimization engine
  • Pre-built binaries available: pip3 install onnxruntime-rocm

Windows Support (2025):

  • ONNX Execution Provider for Windows announced for July 2025
  • Part of AMD's push to bring ROCm to Windows

When you'll see it: Deploying ONNX models, cross-framework inference, production ML systems

Sources:


Platform-Specific Solutions

DirectML (Windows Only)

What it is: Microsoft's DirectX-based machine learning library for Windows.

Think of it as: The "easy button" for ML on Windows with AMD GPUs.

Characteristics:

  • ✅ Works on any Windows GPU (AMD, NVIDIA, Intel)
  • ✅ Easy to set up (no complex ROCm installation on Windows)
  • ⚠️ Significantly slower than ROCm on Linux
  • 🐌 Example: Task that takes 29 seconds with ROCm on Linux took 2 minutes with DirectML on Windows

2025 Development:

  • AMD announced ROCm for Windows in public preview (Q2 2025)
  • PyTorch on Windows with ROCm now available for Radeon 7000/9000 series
  • DirectML remains easier but ROCm on Windows offers better performance

When to use:

  • Windows users who need "good enough" AMD GPU acceleration
  • Getting started with AMD ML without Linux dual-boot
  • Compatibility across different GPU vendors

When NOT to use:

  • Performance-critical applications (use Linux + ROCm instead)
  • If you can dual-boot Linux (ROCm on Linux is much faster)

Sources:


OpenVINO (Intel Only)

What it is: Intel's inference optimization toolkit.

AMD GPU Support: ❌ None

Key Point: OpenVINO is Intel-exclusive. It does NOT support AMD GPUs and has no ROCm integration.

Clarification:

  • OpenVINO works on AMD Ryzen CPUs (CPU inference only)
  • No GPU acceleration for AMD hardware
  • For AMD GPU inference, use ROCm/MIGraphX instead

When you'll see it: Intel CPU inference, Intel GPU acceleration, Intel AI inference projects

Sources:


Comparison Tables

Performance Comparison (LLM Inference, 2025)

Technology Performance Hardware Support Ease of Use Energy Efficiency
ROCm ⭐⭐⭐⭐ High Professional GPUs primarily ⭐⭐ Medium ⭐⭐⭐⭐⭐ Excellent
Vulkan ⭐⭐⭐⭐⭐ Very High* Nearly all AMD GPUs ⭐⭐⭐⭐ Good ⭐⭐ Fair (high power)
OpenCL ⭐⭐⭐ Medium All AMD GPUs ⭐⭐⭐⭐ Good ⭐⭐⭐⭐ Good
HIP ⭐⭐⭐⭐⭐ Very High ROCm-supported GPUs ⭐⭐⭐ Medium ⭐⭐⭐⭐ Good

*In some workloads, Vulkan has shown 50%+ performance advantages over ROCm

Sources:


When to Use What

Use Case Recommended Technology Why?
Deep Learning Training ROCm + PyTorch/TensorFlow Best framework support, optimized libraries (MIOpen, rocBLAS)
LLM Inference (Professional) ROCm + MIGraphX Optimized for inference, graph-level optimization
LLM Inference (Consumer GPU) Vulkan (via llama.cpp) Better hardware compatibility, competitive performance
Whisper STT ROCm (OpenAI Whisper) or Vulkan (whisper.cpp) ROCm for fine-tuned models, Vulkan for broad compatibility
Cross-Platform Development HIP or SYCL/oneAPI Write once, run on AMD and NVIDIA
Windows Users (Easy) DirectML Easiest setup, works with all GPUs
Windows Users (Performance) ROCm for Windows (2025+) Better performance than DirectML, native ROCm
Scientific Computing ROCm + rocBLAS/rocFFT/rocSOLVER Optimized math libraries for HPC
ONNX Model Deployment MIGraphX Execution Provider Official AMD recommendation, replaces deprecated ROCm EP
Graphics + Compute Vulkan Unified API for rendering and compute

Technology Stack Diagrams

AI/ML Inference Stack

┌─────────────────────────────────────────┐
│     Application (Whisper, LLaMA, etc.)  │
└─────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────┐
│  Framework (PyTorch, ONNX Runtime, etc.)│
└─────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────┐
│   Optimization Layer                    │
│   • MIGraphX (inference optimization)   │
│   • Composable Kernel (custom ops)      │
└─────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────┐
│   Deep Learning Libraries               │
│   • MIOpen (convolutions, etc.)         │
│   • hipBLAS/rocBLAS (linear algebra)    │
└─────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────┐
│   Runtime Layer                         │
│   • HIP (programming interface)         │
│   • ROCm Runtime (drivers)              │
└─────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────┐
│   AMD GPU Hardware                      │
└─────────────────────────────────────────┘

Cross-Platform Compatibility Stack

┌─────────────────────────────────────────┐
│        Your Application Code            │
└─────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────┐
│     Portability Layer (Choose One)      │
│  • HIP (CUDA-like)                      │
│  • SYCL/oneAPI (C++ standard)           │
│  • OpenCL (open standard)               │
│  • Vulkan (graphics + compute)          │
└─────────────────────────────────────────┘
                    ↓
┌──────────────────┬──────────────────────┐
│   AMD Backend    │   NVIDIA Backend     │
│   (ROCm)         │   (CUDA)             │
└──────────────────┴──────────────────────┘

Hardware Support Matrix (2025)

ROCm Support

GPU Type ROCm Support Notes
AMD Instinct (MI250X, MI300X, etc.) ✅ Full support Primary target, enterprise GPUs
Radeon Pro (W6800, W7900, etc.) ✅ Full support Professional workstation GPUs
Radeon RX 7000 series (7900 XTX, 7700 XT, etc.) ✅ Supported (Linux + Windows preview) Consumer RDNA 3, your GPU!
Radeon RX 6000 series (6900 XT, etc.) ⚠️ Limited/EOL Consumer RDNA 2, short support window
Radeon RX 5000 series (5700 XT, etc.) ❌ Unsupported in ROCm 6+ Use Vulkan or older ROCm versions
Ryzen AI (integrated) ✅ Limited support NPU + iGPU acceleration in 2025

Vulkan Support

GPU Type Vulkan Support
All AMD GPUs (RDNA 1/2/3, GCN, etc.) ✅ Excellent
Older Radeon cards (RX 400/500, Vega, etc.) ✅ Yes (via RADV or AMDVLK)

Key Takeaway: If your GPU isn't officially supported by ROCm, Vulkan is your best option for GPU acceleration.

Sources:


Which Should You Use?

For Your Fine-Tuned Whisper Model (RX 7700 XT, gfx1101)

Phase 1: Get it Working

# Use ROCm with OpenAI Whisper (easiest for fine-tuned models)
export HSA_OVERRIDE_GFX_VERSION=11.0.1
pip install openai-whisper torch --index-url https://download.pytorch.org/whl/rocm5.7
whisper audio.mp3 --model /path/to/finetuned --device cuda

Phase 2: Optimize Performance

# Option A: whisper.cpp with HIPBLAS (good balance)
git clone https://github.com/ggml-org/whisper.cpp
WHISPER_HIPBLAS=1 make -j

# Option B: Vulkan (if ROCm gives you trouble)
./main -m model.bin --vulkan

Phase 3: Production Deployment

  • Use Docker with ROCm base image
  • Convert model to CTranslate2 format
  • Deploy on cloud GPU (Modal, RunPod) or local server

Decision Tree

Need AMD GPU Acceleration?
  │
  ├─ Windows?
  │   ├─ Yes → Try ROCm for Windows (2025+) or DirectML (easier but slower)
  │   └─ No → Continue
  │
  ├─ Is your GPU officially supported by ROCm?
  │   ├─ Yes → Use ROCm
  │   │   ├─ Deep Learning? → ROCm + PyTorch/TensorFlow
  │   │   ├─ Inference? → ROCm + MIGraphX
  │   │   └─ Custom Code? → HIP
  │   │
  │   └─ No → Use Vulkan
  │       └─ llama.cpp, whisper.cpp, or other Vulkan-enabled tools
  │
  ├─ Need Cross-Platform?
  │   ├─ CUDA-like API? → HIP
  │   ├─ Modern C++? → SYCL/oneAPI
  │   └─ Maximum Compatibility? → OpenCL (slower but works everywhere)
  │
  └─ Graphics + Compute?
      └─ Use Vulkan

Common Scenarios & Solutions

Scenario 1: "I have a Radeon RX 6900 XT and ROCm doesn't work"

Problem: Consumer RDNA 2 GPUs have limited/deprecated ROCm support.

Solutions:

  1. Use Vulkan (Best option)

    • Works with llama.cpp, whisper.cpp, and many AI tools
    • Often faster than ROCm for inference anyway
  2. Try older ROCm version (5.7 or earlier)

    • May work but won't get updates
  3. Use DirectML (if on Windows)

    • Slower but guaranteed to work

Scenario 2: "I'm porting CUDA code to AMD"

Solution: Use HIP

Process:

  1. Use hipify-perl or hipify-clang to auto-convert CUDA → HIP
  2. Replace CUDA function calls (usually just cuda*hip*)
  3. Compile with ROCm's hipcc compiler
  4. Test on AMD GPU

Example:

# Automatic conversion
hipify-perl cuda_code.cu > hip_code.cpp

# Manual compilation
hipcc hip_code.cpp -o program

Scenario 3: "I need to run PyTorch on AMD GPU"

Solution: Install PyTorch with ROCm

Linux:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7

Windows (2025+):

# Public preview available Q2 2025
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

Verify:

import torch
print(torch.cuda.is_available())  # Should return True
print(torch.cuda.get_device_name(0))  # Shows your AMD GPU

Note: PyTorch ROCm uses torch.cuda interfaces (HIP reuses CUDA API names)


Scenario 4: "Which is faster: ROCm or Vulkan?"

Answer: It depends on the workload

ROCm is faster for:

  • Deep learning training
  • Highly optimized AI frameworks (PyTorch, TensorFlow)
  • Matrix operations (uses rocBLAS/hipBLAS)
  • Professional Instinct GPUs

Vulkan can be faster for:

  • LLM inference (llama.cpp benchmarks show 50%+ speedup in some cases)
  • Consumer Radeon GPUs
  • Workloads with broad hardware support requirements
  • Applications not using ROCm-optimized libraries

Energy Efficiency:

  • ROCm: Much better (especially on mobile/laptops)
  • Vulkan: Higher power consumption

Sources:


Glossary of Terms

Term Full Name What It Is
ROCm Radeon Open Compute AMD's open-source GPU compute platform (entire stack)
HIP Heterogeneous-Interface for Portability C++ API for portable GPU programming (CUDA-like)
hipBLAS HIP Basic Linear Algebra Subprograms Portability layer for linear algebra (works on AMD/NVIDIA)
rocBLAS ROCm Basic Linear Algebra Subprograms AMD-optimized BLAS implementation
MIOpen Machine Intelligence Open AMD's deep learning primitives library (like cuDNN)
MIGraphX Machine Intelligence Graph X Graph compiler for ML inference optimization
CK Composable Kernel High-performance kernel development framework
rocRAND ROCm Random Number Generator Random number generation library
rocFFT ROCm Fast Fourier Transform FFT library for signal processing
rocSPARSE ROCm Sparse BLAS Sparse matrix operations
rocSOLVER ROCm Linear Solver Linear system solver library
OpenCL Open Computing Language Cross-vendor GPU compute standard
Vulkan - Modern graphics + compute API
SYCL - C++ abstraction for heterogeneous computing
oneAPI - Intel's unified programming model (includes SYCL)
DirectML Direct Machine Learning Microsoft's ML API for DirectX (Windows)
ONNX Open Neural Network Exchange Framework-agnostic model format
SIMT Single Instruction, Multiple Threads GPU execution model
GEMM General Matrix Multiply Core operation in neural networks
LDS Local Data Share GPU local memory (like CUDA shared memory)

2025 Trends & Future Outlook

Major Developments in 2025

  1. ROCm for Windows

    • Public preview released Q2 2025
    • PyTorch + ONNX Runtime support
    • Radeon 7000/9000 series support
    • Closes major gap with NVIDIA's Windows support
  2. MIGraphX as Primary Inference Engine

    • Deprecated ROCm Execution Provider in ONNX Runtime
    • MIGraphX EP is now recommended path
    • Better optimization, faster inference
  3. Vulkan Competitive Performance

    • Vulkan matching or exceeding ROCm in some LLM workloads
    • Better consumer GPU support
    • Growing adoption in AI tools
  4. Composable Kernel Improvements

    • CK-Tile framework for advanced kernel optimization
    • LDS bank conflict analysis tools
    • Better INT8 quantization support
  5. AMD ROCm 7.x Series

    • ROCm 7.0: 4.6x inference performance improvement
    • Enhanced distributed inference
    • Better code portability

What This Means for You

  • Consumer GPUs: Better support, more options (ROCm + Vulkan)
  • Windows Users: Native ROCm coming (finally!)
  • Developers: More mature ecosystem, easier setup
  • Performance: Gap with NVIDIA narrowing (10-30% vs historical 2-3x)

Sources:


Quick Reference Commands

Check Your Setup

# Check ROCm installation
rocm-smi

# Check PyTorch GPU support
python -c "import torch; print(torch.cuda.is_available())"

# List SYCL devices (if oneAPI installed)
sycl-ls

# Check HIP version
hipconfig --version

# Check rocBLAS version
dpkg -l | grep rocblas

# List Vulkan devices
vulkaninfo | grep -A 2 "GPU"

Environment Variables (Common Issues)

# For gfx1101 (your RX 7700 XT)
export HSA_OVERRIDE_GFX_VERSION=11.0.1

# ROCm path (if not in default location)
export ROCM_PATH=/opt/rocm
export ROCM_HOME=/opt/rocm

# For better PyTorch performance
export PYTORCH_ROCM_ARCH=gfx1101

# Enable ROCm optimizations
export HSA_ENABLE_SDMA=0  # Disable SDMA (can help stability)

Additional Resources

Official Documentation

GitHub Repositories

Community

Learning Resources


Summary: The Big Picture

Think of the AMD GPU acceleration ecosystem like this:

┌─────────────────────────────────────────────────────────┐
│  HIGH LEVEL: Use these directly                         │
│  • PyTorch/TensorFlow (with ROCm backend)               │
│  • ONNX Runtime (with MIGraphX EP)                      │
│  • llama.cpp, whisper.cpp (with Vulkan or HIP)         │
└─────────────────────────────────────────────────────────┘
                          ↑
┌─────────────────────────────────────────────────────────┐
│  MID LEVEL: These power the frameworks                  │
│  • MIOpen (deep learning ops)                           │
│  • MIGraphX (inference optimization)                    │
│  • rocBLAS (linear algebra)                             │
│  • rocFFT, rocRAND, etc. (specialized math)             │
└─────────────────────────────────────────────────────────┘
                          ↑
┌─────────────────────────────────────────────────────────┐
│  LOW LEVEL: Programming interfaces                      │
│  • HIP (CUDA-like API)                                  │
│  • Vulkan (graphics + compute)                          │
│  • OpenCL (legacy cross-platform)                       │
│  • SYCL/oneAPI (modern cross-platform)                  │
└─────────────────────────────────────────────────────────┘
                          ↑
┌─────────────────────────────────────────────────────────┐
│  FOUNDATION: ROCm runtime + drivers                     │
└─────────────────────────────────────────────────────────┘

Key Insight: You usually don't interact with ROCm directly. You use:

  1. High-level frameworks (PyTorch, ONNX Runtime) that use...
  2. Optimized libraries (MIOpen, MIGraphX, rocBLAS) that use...
  3. Programming APIs (HIP, Vulkan) that use...
  4. ROCm runtime and drivers

For most users:

  • Training: PyTorch/TensorFlow with ROCm
  • Inference: ONNX Runtime + MIGraphX, or Vulkan-based tools
  • Custom code: HIP (if CUDA-like) or Vulkan (if broader compatibility)

Conclusion

The AMD GPU acceleration ecosystem has matured significantly:

Multiple viable options: ROCm, Vulkan, OpenCL, HIP, SYCL
Strong performance: 10-30% behind NVIDIA (much better than historical gaps)
Better consumer support: RDNA 3 GPUs well-supported, Windows ROCm arriving
Growing ecosystem: More frameworks, better tools, active development

Your RX 7700 XT (gfx1101) is well-supported in 2025, with both ROCm and Vulkan providing excellent performance for AI/ML workloads including your fine-tuned Whisper model.


Disclaimer: This gist was generated by Claude Code. While information has been compiled from official sources and reflects the state of AMD GPU acceleration technologies as of 2025, please validate specific technical details, compatibility requirements, and performance claims against official AMD documentation and release notes for your particular use case.

Last Updated: January 2025

Comments are disabled for this gist.