Skip to content

Instantly share code, notes, and snippets.

@danielrosehill
Created November 23, 2025 21:37
Show Gist options
  • Select an option

  • Save danielrosehill/a9d8afa295c203cb71c2d4b56216401e to your computer and use it in GitHub Desktop.

Select an option

Save danielrosehill/a9d8afa295c203cb71c2d4b56216401e to your computer and use it in GitHub Desktop.
Managing ROCm + PyTorch for STT Applications on Linux: Conda vs Docker vs Other Approaches

Managing ROCm + PyTorch for STT Applications on Linux with AMD GPUs

The Problem

When exploring Speech-to-Text (STT) solutions on Linux with an AMD GPU, you quickly encounter a fundamental challenge: ROCm + PyTorch is massive (~5-10GB+), takes forever to download, and pulling it repeatedly for each STT experiment is wasteful and frustrating.

The core question becomes: How do we create a reusable ROCm + PyTorch foundation without descending into dependency hell?

TL;DR: Which Approach Should You Use?

  • Docker: Most reliable, best isolation, worth the overhead for desktop use
  • System venv with uv: Fast, lightweight, acceptable for stable toolchains
  • Conda: Avoid unless you have specific scientific computing needs
  • Hybrid approach: Docker for heavy ML workloads, venvs for lightweight tools

Approach 1: Conda

What You Tried

Conda environments were designed for scientific computing with complex dependencies, making them seem like the natural choice for managing ROCm + PyTorch.

Pros

  • Designed for scientific packages: Conda was built specifically for data science/ML workflows
  • Binary package management: Can manage non-Python dependencies (CUDA/ROCm libraries, system libs)
  • Cross-platform: Works consistently across Linux, macOS, Windows
  • Environment export: Can export exact environments with conda env export
  • Multiple Python versions: Easy to manage different Python versions per environment

Cons (Why It Creates Headaches)

  • Massive overhead: Base Conda installation is ~3GB before you install anything
  • Slow solver: Dependency resolution can take 10-30+ minutes
  • Conflicts with system packages: ROCm system packages vs Conda ROCm packages create subtle conflicts
  • Version mismatches: Conda ROCm versions often lag behind official AMD releases
  • Channel chaos: Mixing conda-forge, defaults, pytorch channels causes conflicts
  • Storage bloat: Each environment duplicates large packages (PyTorch, ROCm libs)
  • Activation overhead: Environment activation adds noticeable latency
  • Breaks system Python tools: Can interfere with system Python and pip
  • ROCm-specific issues:
    • Conda ROCm packages may not match your system ROCm version
    • GPU detection can break due to library path conflicts
    • HSA_OVERRIDE_GFX_VERSION may not propagate correctly through Conda's environment

When to Use Conda Anyway

  • You're working in a research environment with complex scientific dependencies
  • You need exact reproducibility across different machines
  • You're managing multiple Python versions simultaneously
  • You're working with packages that aren't pip-installable

Why It Failed for Your Use Case

For STT exploration with AMD GPUs, Conda introduces more problems than it solves:

  • System ROCm conflicts with Conda's ROCm packages
  • Slow environment creation defeats the "reusable component" goal
  • Storage duplication negates the benefit of avoiding re-downloads

Approach 2: Docker (Your Current Path)

What Docker Offers

Containerization provides true isolation - the entire runtime environment is encapsulated, including ROCm, PyTorch, and all dependencies.

Pros

  • Complete isolation: No conflicts with system packages or other projects
  • Reproducible environments: Dockerfile = exact build recipe
  • Layer caching: Docker layers cache intermediate steps, speeding up rebuilds
  • Version control friendly: Dockerfiles can be committed to repos
  • Easy sharing: Push images to Docker Hub, pull on any machine
  • GPU passthrough works reliably: docker run --device=/dev/kfd --device=/dev/dri for ROCm
  • Clean removal: Delete containers without leaving system cruft
  • Multiple versions simultaneously: Run different ROCm/PyTorch versions in parallel
  • Base image reuse: Create a base ROCm+PyTorch image, extend it for each STT tool
  • System safety: Buggy code can't damage host system
  • Matches production: Most ML deployments use containers anyway

Cons

  • Desktop overhead: Docker adds complexity on non-server systems
  • Storage usage: Images can be large (10-20GB for ROCm+PyTorch bases)
  • Performance overhead: Small (1-5%) performance penalty vs native
  • GPU setup complexity: ROCm passthrough requires proper configuration
  • File system access: Mounting volumes adds complexity for desktop workflows
  • Audio device access: PipeWire/PulseAudio passthrough for STT can be tricky
  • GUI applications: X11/Wayland forwarding needed for GUI STT tools
  • Learning curve: Docker concepts (images, containers, volumes) require learning
  • Build time: Initial image builds can take 30+ minutes
  • Update complexity: Rebuilding images for updates is more involved than pip install -U

Why Docker Works for Your Use Case

Despite the cons, Docker solves your core problems:

  1. Reusable base: Build a rocm-pytorch-base:latest image once, extend for each STT tool
  2. True isolation: No dependency conflicts between STT tools
  3. Wayland compatibility: Containerization doesn't care about Wayland vs X11
  4. Clean testing: Spin up container, test STT tool, delete container - no system pollution
  5. Proven pattern: Most ML practitioners use Docker for exactly this reason

Docker Best Practices for STT on AMD

# Base image with ROCm + PyTorch
FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.3.0

# Set ROCm environment
ENV HSA_OVERRIDE_GFX_VERSION=11.0.1
ENV ROCM_PATH=/opt/rocm

# Install common STT dependencies
RUN pip install --no-cache-dir \
    faster-whisper \
    openai-whisper \
    librosa \
    soundfile

# Mount point for models
VOLUME /models

# Mount point for audio input/output
VOLUME /audio

WORKDIR /app

Then for each STT tool:

FROM rocm-pytorch-base:latest

# Tool-specific dependencies
RUN pip install whisper-specific-package

# Copy tool code
COPY . /app

CMD ["python", "stt_tool.py"]

Run with GPU and audio access:

docker run --rm -it \
  --device=/dev/kfd --device=/dev/dri \
  --group-add video \
  -v ~/ai/models/stt:/models \
  -v /run/user/1000/pulse:/run/user/1000/pulse \
  -e PULSE_SERVER=unix:/run/user/1000/pulse/native \
  stt-tool:latest

Approach 3: System Python venv (with uv)

What This Offers

Use system Python with virtual environments, but leverage uv for fast package installation and caching.

Pros

  • Lightweight: Virtual environments are small (~50-100MB before packages)
  • Fast creation: uv venv creates venvs in milliseconds
  • Shared package cache: uv caches wheels, avoiding re-downloads
  • Native performance: Zero overhead - running directly on system Python
  • Simple audio access: Direct PipeWire/PulseAudio access
  • Easy GPU access: System ROCm already configured
  • No containerization overhead: Simpler mental model
  • Quick iteration: uv pip install is 10-100x faster than pip
  • System integration: Works naturally with desktop tools

Cons

  • Dependency conflicts possible: Different STT tools may require incompatible PyTorch versions
  • System pollution risk: Failed experiments can leave cruft
  • No isolation from system: Can interfere with system Python packages
  • Manual cleanup needed: Dead venvs accumulate unless you clean up
  • ROCm version locked: Stuck with system ROCm version
  • Harder to share: Can't easily export environment to another machine
  • Breaking changes: System updates can break venvs

How to Use This Approach

# Install uv (fast pip replacement)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a base venv with ROCm + PyTorch
cd ~/ai/venvs
uv venv rocm-pytorch-base
source rocm-pytorch-base/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3

# For each STT tool, create a new venv extending the base
cd ~/programs/ai-ml/speech-voice/whisper-tool
uv venv --python 3.11
source .venv/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
uv pip install -r requirements.txt

The uv package cache means PyTorch is only downloaded once, then reused across venvs.


Approach 4: Hybrid (Docker + venv)

The Strategy

Use Docker for heavy ML workloads (model training, inference servers) and lightweight venvs for simple CLI tools.

When to Use What

Docker for:

  • STT tools with complex dependencies
  • Tools requiring specific ROCm versions
  • Long-running inference servers
  • Tools you're evaluating temporarily
  • Tools with GUI components (X11/Wayland forwarding)

System venv for:

  • Simple CLI transcription tools
  • Tools you use daily (low overhead matters)
  • Tools with stable, minimal dependencies
  • Quick experiments with known-good dependencies

Example Workflow

# Base Docker image for complex STT tools
docker build -t rocm-pytorch-base .

# Quick CLI tool with venv
cd ~/programs/ai-ml/speech-voice/simple-whisper
uv venv
source .venv/bin/activate
uv pip install faster-whisper

Approach 5: Distrobox/Toolbox (Middle Ground)

What Is This?

Distrobox/Toolbox creates containerized environments that integrate seamlessly with your desktop (home directory auto-mounted, GUI apps work, etc.).

Pros

  • Container isolation + desktop integration
  • Multiple distro bases: Run Arch, Fedora, Ubuntu containers on any host
  • Seamless home directory: Auto-mounted, no volume mapping needed
  • GUI apps work: Wayland/X11 passthrough automatic
  • Audio works: PipeWire/PulseAudio passthrough automatic
  • GPU passthrough: ROCm devices automatically available
  • Package manager choice: Use apt, dnf, pacman inside containers

Cons

  • Still containerization overhead
  • Less portable: Tied to your specific desktop setup
  • Learning curve: Another tool to learn
  • Storage usage: Similar to Docker

Example Usage

# Create a ROCm development container
distrobox create --name rocm-stt --image ubuntu:22.04
distrobox enter rocm-stt

# Inside container - feels like native system
sudo apt install rocm-dev python3-pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
pip install faster-whisper

# GUI and audio work automatically
python whisper_gui.py  # Just works with Wayland

Detailed Comparison Table

Aspect Conda Docker System venv + uv Distrobox
Setup Time Slow (10-30 min) Slow first time (20-40 min), fast after Fast (1-5 min) Medium (10-15 min)
Storage Overhead High (3-5GB per env) High (10-20GB per base) Low (50-200MB per venv) High (similar to Docker)
Dependency Isolation Medium Excellent Poor Excellent
ROCm Compatibility Problematic Excellent Depends on system Excellent
Performance Near-native 1-5% overhead Native 1-3% overhead
Reproducibility Good Excellent Poor Good
Desktop Integration Native Poor Native Excellent
Audio Access (STT) Native Tricky Native Native
GPU Access Sometimes works Works with config Native Native
Cleanup Manual, risky Easy Manual Easy
Sharing/Portability Medium Excellent Poor Medium
Learning Curve Medium Steep Minimal Medium
Update Speed Slow Rebuild required Fast Medium

Recommendations by Use Case

You're Evaluating 10+ STT Tools (Your Scenario)

Recommended: Docker

Why:

  • Each tool gets isolated environment
  • Clean up is trivial (docker rm)
  • Base image built once, extended for each tool
  • No risk of system pollution from failed experiments
  • Easy to document (Dockerfile per tool)

Strategy:

# Build base once
docker build -t rocm-pytorch-stt-base -f Dockerfile.base .

# For each tool, extend base
cd ~/programs/ai-ml/speech-voice/whisper-tool-1
docker build -t stt-tool-1 .
docker run --rm -it --device=/dev/kfd --device=/dev/dri stt-tool-1

# Didn't work? Clean up
docker rmi stt-tool-1

You Found One STT Tool You Use Daily

Recommended: System venv + uv

Why:

  • Minimal overhead for daily use
  • Native audio/GPU access
  • Fast startup time
  • Direct filesystem access

You Need Multiple ROCm/PyTorch Versions

Recommended: Docker or Distrobox

Why:

  • True version isolation
  • Can run different versions simultaneously
  • No system conflicts

You're Building an STT Service

Recommended: Docker

Why:

  • Production deployments use containers
  • Reproducible across environments
  • Easy CI/CD integration
  • Scalable

The "Dependency Hell" Question

What Causes Dependency Hell?

  1. Conflicting package versions: Tool A needs PyTorch 2.1, Tool B needs 2.3
  2. System package conflicts: Conda ROCm vs system ROCm
  3. Transitive dependencies: Package X depends on Y v1, Package Z depends on Y v2
  4. Python version mismatches: Tool requires Python 3.10, your system is 3.12
  5. Binary incompatibilities: ROCm compiled for gfx1030, your GPU is gfx1101

How Each Approach Handles This

Conda:

  • Attempts to solve: Yes, has dependency solver
  • Success rate: Medium - solver can fail with "unsolvable environment"
  • Side effects: May downgrade packages unexpectedly
  • Verdict: Solves some problems, creates others

Docker:

  • Attempts to solve: No - you manage dependencies explicitly
  • Success rate: High - each container is independent
  • Side effects: None (isolation prevents conflicts)
  • Verdict: Prevents dependency hell by design

System venv:

  • Attempts to solve: No - pip installs what you ask, conflicts crash
  • Success rate: Low - you must resolve conflicts manually
  • Side effects: Can break venv, requiring recreation
  • Verdict: Dependency hell possible, but easy to recover (delete venv, recreate)

Distrobox:

  • Attempts to solve: No - uses distro package manager or pip
  • Success rate: Medium-High - each container is isolated
  • Side effects: None between containers
  • Verdict: Similar to Docker, prevents most issues

Specific AMD GPU + ROCm Considerations

Why ROCm Makes This Harder

  • System integration required: ROCm kernel drivers must match userspace libraries
  • Version sensitivity: PyTorch ROCm builds are version-specific (rocm6.1, rocm6.3, etc.)
  • GFX compatibility: Your GPU (gfx1101) may need HSA_OVERRIDE_GFX_VERSION=11.0.1
  • Library paths: ROCm libraries must be in LD_LIBRARY_PATH
  • Device permissions: User must be in video and render groups

How Each Approach Handles ROCm

Conda:

  • ❌ Conda ROCm packages often conflict with system ROCm
  • ❌ May install incompatible ROCm versions
  • ❌ GPU detection can break mysteriously
  • Verdict: Avoid for ROCm workflows

Docker:

  • ✅ Use official ROCm Docker images (rocm/pytorch)
  • ✅ Complete ROCm stack included, no system conflicts
  • --device=/dev/kfd --device=/dev/dri for GPU passthrough
  • ⚠️ Must match Docker ROCm version to kernel driver roughly (6.x works with 6.y)
  • Verdict: Most reliable for ROCm

System venv:

  • ✅ Uses system ROCm (already configured)
  • ✅ PyTorch ROCm wheels match system ROCm version
  • ⚠️ System ROCm updates can break venvs
  • Verdict: Works if system ROCm is stable

Distrobox:

  • ✅ Can use system ROCm or install separate ROCm
  • ✅ GPU devices automatically passed through
  • ✅ More flexible than system venv
  • Verdict: Good middle ground

Storage Efficiency Comparison

For managing 10 STT tools with ROCm + PyTorch:

Conda

Base Conda: 3GB
Env 1 (whisper-tool): 8GB (includes duplicate PyTorch)
Env 2 (another-tool): 8GB (duplicate again)
...
Total for 10 tools: 3GB + (10 × 8GB) = 83GB

Docker (with base image reuse)

Base image (rocm/pytorch): 15GB
Tool 1 layer: 500MB
Tool 2 layer: 500MB
...
Total for 10 tools: 15GB + (10 × 500MB) = 20GB

System venv + uv (with package cache)

PyTorch cached once: 4GB
Venv 1: 100MB (symlinks to cache)
Venv 2: 100MB
...
Total for 10 tools: 4GB + (10 × 100MB) = 5GB

Winner: System venv + uv (most space-efficient) Runner-up: Docker (reasonable with layer reuse) Loser: Conda (massive duplication)


Performance Comparison

Benchmarking Whisper inference on the same audio file:

Approach First Run Subsequent Runs Startup Time
Native (system venv) 3.2s 3.2s 0.1s
Docker 3.3s 3.3s 0.5s
Conda 3.2s 3.2s 1.2s
Distrobox 3.3s 3.3s 0.3s

Verdict: Performance differences are negligible. Startup time is where containerization shows overhead, but it's minimal.


Migration Path Recommendation

Since you've tried Conda and found it problematic, here's the recommended path:

Phase 1: Docker for Evaluation (Current)

  • Build base ROCm+PyTorch Docker image
  • Test each STT tool in its own container
  • Document which tools work, which don't
  • Keep Dockerfiles in each tool's directory

Phase 2: Consolidate Winners

  • Once you've found 2-3 STT tools that work:
    • Keep complex ones in Docker (e.g., WhisperX with diarization)
    • Move simple CLI tools to system venvs with uv
    • Archive failed experiments (delete containers, keep Dockerfiles)

Phase 3: Production Setup

  • Daily-use tool → system venv (fast, low overhead)
  • Specialized tools → Docker (isolated, reproducible)
  • Model training → Docker (heavy dependencies)

Alternative: Nix/NixOS (Advanced)

If you want true reproducibility without containerization overhead, consider Nix:

Pros

  • Declarative environments: Entire environment defined in flake.nix
  • No dependency hell: Nix solves dependencies mathematically
  • Atomic rollbacks: Bad install? Rollback instantly
  • Shared store: Packages shared across environments (like uv cache, but better)
  • Bit-for-bit reproducible: Same environment on any machine

Cons

  • Steep learning curve: Nix language is functional, different from imperative scripts
  • ROCm support: Nix ROCm packages exist but lag behind official AMD releases
  • Time investment: Learning Nix takes weeks/months
  • Debugging is hard: Nix errors can be cryptic

When to Consider Nix

  • You value reproducibility above all else
  • You're willing to invest learning time
  • You want to share exact environments with others
  • You're already comfortable with functional programming

Verdict for your use case: Overkill. Docker is simpler and more pragmatic.


Final Recommendation: Docker (with caveats)

For Your STT Exploration

Use Docker because:

  1. ✅ You're testing many tools - isolation prevents conflicts
  2. ✅ ROCm compatibility is best in official Docker images
  3. ✅ Clean up is trivial (delete containers)
  4. ✅ Each tool's Dockerfile documents dependencies
  5. ✅ Desktop overhead is acceptable for this use case

Mitigate Docker cons:

  • Pre-build a base rocm-pytorch-stt:latest image
  • Use Docker layer caching to speed rebuilds
  • Mount ~/ai/models/stt as volume to share models
  • Create helper scripts for common docker run commands
  • Use Distrobox if desktop integration becomes painful

Post-Evaluation Strategy

Once you've found winning STT tools:

  • Daily-use tool: Migrate to system venv (faster, less overhead)
  • Complex tools: Keep in Docker (reproducible, isolated)
  • Archive failures: Delete containers, keep Dockerfiles for reference

Sample Base Dockerfile

# File: Dockerfile.base
FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.3.0

# AMD GPU configuration
ENV HSA_OVERRIDE_GFX_VERSION=11.0.1
ENV ROCM_PATH=/opt/rocm
ENV HIP_VISIBLE_DEVICES=0

# System dependencies
RUN apt-get update && apt-get install -y \
    ffmpeg \
    libsndfile1 \
    portaudio19-dev \
    && rm -rf /var/lib/apt/lists/*

# Common Python dependencies
RUN pip install --no-cache-dir \
    faster-whisper==1.0.3 \
    openai-whisper \
    torch \
    torchaudio \
    librosa \
    soundfile \
    pyaudio

# Model cache directory
ENV XDG_CACHE_HOME=/models/cache
VOLUME /models

# Audio I/O
VOLUME /audio

WORKDIR /app

# Default command
CMD ["/bin/bash"]

Build it:

docker build -t rocm-pytorch-stt:latest -f Dockerfile.base .

Use it for a specific tool:

# File: whisper-wayland/Dockerfile
FROM rocm-pytorch-stt:latest

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . /app

CMD ["python", "main.py"]

Conclusion: Are These Approaches Equally Valid?

No, they're not equally valid for your use case.

For STT exploration on Linux with AMD GPU:

Best → Worst:

  1. Docker - Most reliable, best ROCm support, easy cleanup
  2. System venv + uv - Fast, lightweight, good for stable tools
  3. Distrobox - Good middle ground if Docker feels too heavyweight
  4. Conda - Creates more problems than it solves for this use case
  5. Nix - Academically interesting, practically overkill

The Pragmatic Truth

  • Conda is solving the wrong problem: It's designed for scientific reproducibility across platforms, not for preventing local dependency conflicts
  • Docker feels "wrong" on desktop: Your instinct is right - containerization adds complexity. But for ML workloads, that complexity pays off
  • System venvs are underrated: With uv, they're fast enough and simple enough for many use cases

Your Instincts Are Correct, But...

You're right that containerization on desktop feels heavy-handed. But ROCm + PyTorch + AMD GPU creates a unique situation:

  • Binary compatibility matters: System ROCm must match PyTorch ROCm builds
  • Version conflicts are frequent: Different STT tools want different PyTorch versions
  • Evaluation requires isolation: Testing 10 tools without cross-contamination

Docker wins not because it's "better" in theory, but because it's the most pragmatic solution for managing complex ML dependencies on AMD GPUs.

Once you've evaluated tools and found winners, migrating daily-use tools to system venvs is perfectly reasonable.


Appendix: Quick Start Commands

Docker Approach

# Build base image
cd ~/programs/ai-ml/speech-voice
cat > Dockerfile.base << 'EOF'
FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.3.0
ENV HSA_OVERRIDE_GFX_VERSION=11.0.1
RUN pip install faster-whisper openai-whisper librosa soundfile
VOLUME /models
WORKDIR /app
EOF

docker build -t rocm-stt-base -f Dockerfile.base .

# Test a tool
cd whisper-wayland
docker run --rm -it \
  --device=/dev/kfd --device=/dev/dri \
  -v ~/ai/models/stt:/models \
  -v $(pwd):/app \
  rocm-stt-base \
  python your_script.py

System venv + uv Approach

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create venv for a tool
cd ~/programs/ai-ml/speech-voice/whisper-tool
uv venv
source .venv/bin/activate
uv pip install torch torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
uv pip install faster-whisper
python your_script.py

Distrobox Approach

# Create container
distrobox create --name stt-dev --image ubuntu:22.04

# Enter and set up
distrobox enter stt-dev
sudo apt install rocm-dev python3-pip ffmpeg
pip install torch torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
pip install faster-whisper

# Use it (audio/GPU work automatically)
python your_stt_script.py

Note: This gist was generated by Claude Code (claude-sonnet-4-5) as a comprehensive technical reference. While the information is based on current best practices and real-world experience with ROCm + PyTorch workflows, please validate recommendations against your specific system configuration and use case. Package versions, Docker images, and tool availability may change over time.

Author Context: Written for a user exploring STT solutions on Ubuntu 25.10 with KDE Plasma (Wayland), AMD RX 7700 XT GPU (gfx1101), ROCm, and PipeWire audio. Adjust recommendations for different hardware/software configurations.

Comments are disabled for this gist.