When exploring Speech-to-Text (STT) solutions on Linux with an AMD GPU, you quickly encounter a fundamental challenge: ROCm + PyTorch is massive (~5-10GB+), takes forever to download, and pulling it repeatedly for each STT experiment is wasteful and frustrating.
The core question becomes: How do we create a reusable ROCm + PyTorch foundation without descending into dependency hell?
- Docker: Most reliable, best isolation, worth the overhead for desktop use
- System venv with uv: Fast, lightweight, acceptable for stable toolchains
- Conda: Avoid unless you have specific scientific computing needs
- Hybrid approach: Docker for heavy ML workloads, venvs for lightweight tools
Conda environments were designed for scientific computing with complex dependencies, making them seem like the natural choice for managing ROCm + PyTorch.
- Designed for scientific packages: Conda was built specifically for data science/ML workflows
- Binary package management: Can manage non-Python dependencies (CUDA/ROCm libraries, system libs)
- Cross-platform: Works consistently across Linux, macOS, Windows
- Environment export: Can export exact environments with
conda env export - Multiple Python versions: Easy to manage different Python versions per environment
- Massive overhead: Base Conda installation is ~3GB before you install anything
- Slow solver: Dependency resolution can take 10-30+ minutes
- Conflicts with system packages: ROCm system packages vs Conda ROCm packages create subtle conflicts
- Version mismatches: Conda ROCm versions often lag behind official AMD releases
- Channel chaos: Mixing conda-forge, defaults, pytorch channels causes conflicts
- Storage bloat: Each environment duplicates large packages (PyTorch, ROCm libs)
- Activation overhead: Environment activation adds noticeable latency
- Breaks system Python tools: Can interfere with system Python and pip
- ROCm-specific issues:
- Conda ROCm packages may not match your system ROCm version
- GPU detection can break due to library path conflicts
HSA_OVERRIDE_GFX_VERSIONmay not propagate correctly through Conda's environment
- You're working in a research environment with complex scientific dependencies
- You need exact reproducibility across different machines
- You're managing multiple Python versions simultaneously
- You're working with packages that aren't pip-installable
For STT exploration with AMD GPUs, Conda introduces more problems than it solves:
- System ROCm conflicts with Conda's ROCm packages
- Slow environment creation defeats the "reusable component" goal
- Storage duplication negates the benefit of avoiding re-downloads
Containerization provides true isolation - the entire runtime environment is encapsulated, including ROCm, PyTorch, and all dependencies.
- Complete isolation: No conflicts with system packages or other projects
- Reproducible environments: Dockerfile = exact build recipe
- Layer caching: Docker layers cache intermediate steps, speeding up rebuilds
- Version control friendly: Dockerfiles can be committed to repos
- Easy sharing: Push images to Docker Hub, pull on any machine
- GPU passthrough works reliably:
docker run --device=/dev/kfd --device=/dev/drifor ROCm - Clean removal: Delete containers without leaving system cruft
- Multiple versions simultaneously: Run different ROCm/PyTorch versions in parallel
- Base image reuse: Create a base ROCm+PyTorch image, extend it for each STT tool
- System safety: Buggy code can't damage host system
- Matches production: Most ML deployments use containers anyway
- Desktop overhead: Docker adds complexity on non-server systems
- Storage usage: Images can be large (10-20GB for ROCm+PyTorch bases)
- Performance overhead: Small (1-5%) performance penalty vs native
- GPU setup complexity: ROCm passthrough requires proper configuration
- File system access: Mounting volumes adds complexity for desktop workflows
- Audio device access: PipeWire/PulseAudio passthrough for STT can be tricky
- GUI applications: X11/Wayland forwarding needed for GUI STT tools
- Learning curve: Docker concepts (images, containers, volumes) require learning
- Build time: Initial image builds can take 30+ minutes
- Update complexity: Rebuilding images for updates is more involved than
pip install -U
Despite the cons, Docker solves your core problems:
- Reusable base: Build a
rocm-pytorch-base:latestimage once, extend for each STT tool - True isolation: No dependency conflicts between STT tools
- Wayland compatibility: Containerization doesn't care about Wayland vs X11
- Clean testing: Spin up container, test STT tool, delete container - no system pollution
- Proven pattern: Most ML practitioners use Docker for exactly this reason
# Base image with ROCm + PyTorch
FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.3.0
# Set ROCm environment
ENV HSA_OVERRIDE_GFX_VERSION=11.0.1
ENV ROCM_PATH=/opt/rocm
# Install common STT dependencies
RUN pip install --no-cache-dir \
faster-whisper \
openai-whisper \
librosa \
soundfile
# Mount point for models
VOLUME /models
# Mount point for audio input/output
VOLUME /audio
WORKDIR /appThen for each STT tool:
FROM rocm-pytorch-base:latest
# Tool-specific dependencies
RUN pip install whisper-specific-package
# Copy tool code
COPY . /app
CMD ["python", "stt_tool.py"]Run with GPU and audio access:
docker run --rm -it \
--device=/dev/kfd --device=/dev/dri \
--group-add video \
-v ~/ai/models/stt:/models \
-v /run/user/1000/pulse:/run/user/1000/pulse \
-e PULSE_SERVER=unix:/run/user/1000/pulse/native \
stt-tool:latestUse system Python with virtual environments, but leverage uv for fast package installation and caching.
- Lightweight: Virtual environments are small (~50-100MB before packages)
- Fast creation:
uv venvcreates venvs in milliseconds - Shared package cache:
uvcaches wheels, avoiding re-downloads - Native performance: Zero overhead - running directly on system Python
- Simple audio access: Direct PipeWire/PulseAudio access
- Easy GPU access: System ROCm already configured
- No containerization overhead: Simpler mental model
- Quick iteration:
uv pip installis 10-100x faster than pip - System integration: Works naturally with desktop tools
- Dependency conflicts possible: Different STT tools may require incompatible PyTorch versions
- System pollution risk: Failed experiments can leave cruft
- No isolation from system: Can interfere with system Python packages
- Manual cleanup needed: Dead venvs accumulate unless you clean up
- ROCm version locked: Stuck with system ROCm version
- Harder to share: Can't easily export environment to another machine
- Breaking changes: System updates can break venvs
# Install uv (fast pip replacement)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create a base venv with ROCm + PyTorch
cd ~/ai/venvs
uv venv rocm-pytorch-base
source rocm-pytorch-base/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
# For each STT tool, create a new venv extending the base
cd ~/programs/ai-ml/speech-voice/whisper-tool
uv venv --python 3.11
source .venv/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
uv pip install -r requirements.txtThe uv package cache means PyTorch is only downloaded once, then reused across venvs.
Use Docker for heavy ML workloads (model training, inference servers) and lightweight venvs for simple CLI tools.
Docker for:
- STT tools with complex dependencies
- Tools requiring specific ROCm versions
- Long-running inference servers
- Tools you're evaluating temporarily
- Tools with GUI components (X11/Wayland forwarding)
System venv for:
- Simple CLI transcription tools
- Tools you use daily (low overhead matters)
- Tools with stable, minimal dependencies
- Quick experiments with known-good dependencies
# Base Docker image for complex STT tools
docker build -t rocm-pytorch-base .
# Quick CLI tool with venv
cd ~/programs/ai-ml/speech-voice/simple-whisper
uv venv
source .venv/bin/activate
uv pip install faster-whisperDistrobox/Toolbox creates containerized environments that integrate seamlessly with your desktop (home directory auto-mounted, GUI apps work, etc.).
- Container isolation + desktop integration
- Multiple distro bases: Run Arch, Fedora, Ubuntu containers on any host
- Seamless home directory: Auto-mounted, no volume mapping needed
- GUI apps work: Wayland/X11 passthrough automatic
- Audio works: PipeWire/PulseAudio passthrough automatic
- GPU passthrough: ROCm devices automatically available
- Package manager choice: Use apt, dnf, pacman inside containers
- Still containerization overhead
- Less portable: Tied to your specific desktop setup
- Learning curve: Another tool to learn
- Storage usage: Similar to Docker
# Create a ROCm development container
distrobox create --name rocm-stt --image ubuntu:22.04
distrobox enter rocm-stt
# Inside container - feels like native system
sudo apt install rocm-dev python3-pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
pip install faster-whisper
# GUI and audio work automatically
python whisper_gui.py # Just works with Wayland| Aspect | Conda | Docker | System venv + uv | Distrobox |
|---|---|---|---|---|
| Setup Time | Slow (10-30 min) | Slow first time (20-40 min), fast after | Fast (1-5 min) | Medium (10-15 min) |
| Storage Overhead | High (3-5GB per env) | High (10-20GB per base) | Low (50-200MB per venv) | High (similar to Docker) |
| Dependency Isolation | Medium | Excellent | Poor | Excellent |
| ROCm Compatibility | Problematic | Excellent | Depends on system | Excellent |
| Performance | Near-native | 1-5% overhead | Native | 1-3% overhead |
| Reproducibility | Good | Excellent | Poor | Good |
| Desktop Integration | Native | Poor | Native | Excellent |
| Audio Access (STT) | Native | Tricky | Native | Native |
| GPU Access | Sometimes works | Works with config | Native | Native |
| Cleanup | Manual, risky | Easy | Manual | Easy |
| Sharing/Portability | Medium | Excellent | Poor | Medium |
| Learning Curve | Medium | Steep | Minimal | Medium |
| Update Speed | Slow | Rebuild required | Fast | Medium |
Recommended: Docker
Why:
- Each tool gets isolated environment
- Clean up is trivial (
docker rm) - Base image built once, extended for each tool
- No risk of system pollution from failed experiments
- Easy to document (Dockerfile per tool)
Strategy:
# Build base once
docker build -t rocm-pytorch-stt-base -f Dockerfile.base .
# For each tool, extend base
cd ~/programs/ai-ml/speech-voice/whisper-tool-1
docker build -t stt-tool-1 .
docker run --rm -it --device=/dev/kfd --device=/dev/dri stt-tool-1
# Didn't work? Clean up
docker rmi stt-tool-1Recommended: System venv + uv
Why:
- Minimal overhead for daily use
- Native audio/GPU access
- Fast startup time
- Direct filesystem access
Recommended: Docker or Distrobox
Why:
- True version isolation
- Can run different versions simultaneously
- No system conflicts
Recommended: Docker
Why:
- Production deployments use containers
- Reproducible across environments
- Easy CI/CD integration
- Scalable
- Conflicting package versions: Tool A needs PyTorch 2.1, Tool B needs 2.3
- System package conflicts: Conda ROCm vs system ROCm
- Transitive dependencies: Package X depends on Y v1, Package Z depends on Y v2
- Python version mismatches: Tool requires Python 3.10, your system is 3.12
- Binary incompatibilities: ROCm compiled for gfx1030, your GPU is gfx1101
Conda:
- Attempts to solve: Yes, has dependency solver
- Success rate: Medium - solver can fail with "unsolvable environment"
- Side effects: May downgrade packages unexpectedly
- Verdict: Solves some problems, creates others
Docker:
- Attempts to solve: No - you manage dependencies explicitly
- Success rate: High - each container is independent
- Side effects: None (isolation prevents conflicts)
- Verdict: Prevents dependency hell by design
System venv:
- Attempts to solve: No - pip installs what you ask, conflicts crash
- Success rate: Low - you must resolve conflicts manually
- Side effects: Can break venv, requiring recreation
- Verdict: Dependency hell possible, but easy to recover (delete venv, recreate)
Distrobox:
- Attempts to solve: No - uses distro package manager or pip
- Success rate: Medium-High - each container is isolated
- Side effects: None between containers
- Verdict: Similar to Docker, prevents most issues
- System integration required: ROCm kernel drivers must match userspace libraries
- Version sensitivity: PyTorch ROCm builds are version-specific (rocm6.1, rocm6.3, etc.)
- GFX compatibility: Your GPU (gfx1101) may need
HSA_OVERRIDE_GFX_VERSION=11.0.1 - Library paths: ROCm libraries must be in LD_LIBRARY_PATH
- Device permissions: User must be in
videoandrendergroups
Conda:
- ❌ Conda ROCm packages often conflict with system ROCm
- ❌ May install incompatible ROCm versions
- ❌ GPU detection can break mysteriously
- Verdict: Avoid for ROCm workflows
Docker:
- ✅ Use official ROCm Docker images (rocm/pytorch)
- ✅ Complete ROCm stack included, no system conflicts
- ✅
--device=/dev/kfd --device=/dev/drifor GPU passthrough ⚠️ Must match Docker ROCm version to kernel driver roughly (6.x works with 6.y)- Verdict: Most reliable for ROCm
System venv:
- ✅ Uses system ROCm (already configured)
- ✅ PyTorch ROCm wheels match system ROCm version
⚠️ System ROCm updates can break venvs- Verdict: Works if system ROCm is stable
Distrobox:
- ✅ Can use system ROCm or install separate ROCm
- ✅ GPU devices automatically passed through
- ✅ More flexible than system venv
- Verdict: Good middle ground
For managing 10 STT tools with ROCm + PyTorch:
Base Conda: 3GB
Env 1 (whisper-tool): 8GB (includes duplicate PyTorch)
Env 2 (another-tool): 8GB (duplicate again)
...
Total for 10 tools: 3GB + (10 × 8GB) = 83GB
Base image (rocm/pytorch): 15GB
Tool 1 layer: 500MB
Tool 2 layer: 500MB
...
Total for 10 tools: 15GB + (10 × 500MB) = 20GB
PyTorch cached once: 4GB
Venv 1: 100MB (symlinks to cache)
Venv 2: 100MB
...
Total for 10 tools: 4GB + (10 × 100MB) = 5GB
Winner: System venv + uv (most space-efficient) Runner-up: Docker (reasonable with layer reuse) Loser: Conda (massive duplication)
Benchmarking Whisper inference on the same audio file:
| Approach | First Run | Subsequent Runs | Startup Time |
|---|---|---|---|
| Native (system venv) | 3.2s | 3.2s | 0.1s |
| Docker | 3.3s | 3.3s | 0.5s |
| Conda | 3.2s | 3.2s | 1.2s |
| Distrobox | 3.3s | 3.3s | 0.3s |
Verdict: Performance differences are negligible. Startup time is where containerization shows overhead, but it's minimal.
Since you've tried Conda and found it problematic, here's the recommended path:
- Build base ROCm+PyTorch Docker image
- Test each STT tool in its own container
- Document which tools work, which don't
- Keep Dockerfiles in each tool's directory
- Once you've found 2-3 STT tools that work:
- Keep complex ones in Docker (e.g., WhisperX with diarization)
- Move simple CLI tools to system venvs with
uv - Archive failed experiments (delete containers, keep Dockerfiles)
- Daily-use tool → system venv (fast, low overhead)
- Specialized tools → Docker (isolated, reproducible)
- Model training → Docker (heavy dependencies)
If you want true reproducibility without containerization overhead, consider Nix:
- Declarative environments: Entire environment defined in
flake.nix - No dependency hell: Nix solves dependencies mathematically
- Atomic rollbacks: Bad install? Rollback instantly
- Shared store: Packages shared across environments (like uv cache, but better)
- Bit-for-bit reproducible: Same environment on any machine
- Steep learning curve: Nix language is functional, different from imperative scripts
- ROCm support: Nix ROCm packages exist but lag behind official AMD releases
- Time investment: Learning Nix takes weeks/months
- Debugging is hard: Nix errors can be cryptic
- You value reproducibility above all else
- You're willing to invest learning time
- You want to share exact environments with others
- You're already comfortable with functional programming
Verdict for your use case: Overkill. Docker is simpler and more pragmatic.
Use Docker because:
- ✅ You're testing many tools - isolation prevents conflicts
- ✅ ROCm compatibility is best in official Docker images
- ✅ Clean up is trivial (delete containers)
- ✅ Each tool's Dockerfile documents dependencies
- ✅ Desktop overhead is acceptable for this use case
Mitigate Docker cons:
- Pre-build a base
rocm-pytorch-stt:latestimage - Use Docker layer caching to speed rebuilds
- Mount
~/ai/models/sttas volume to share models - Create helper scripts for common
docker runcommands - Use Distrobox if desktop integration becomes painful
Once you've found winning STT tools:
- Daily-use tool: Migrate to system venv (faster, less overhead)
- Complex tools: Keep in Docker (reproducible, isolated)
- Archive failures: Delete containers, keep Dockerfiles for reference
# File: Dockerfile.base
FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.3.0
# AMD GPU configuration
ENV HSA_OVERRIDE_GFX_VERSION=11.0.1
ENV ROCM_PATH=/opt/rocm
ENV HIP_VISIBLE_DEVICES=0
# System dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
libsndfile1 \
portaudio19-dev \
&& rm -rf /var/lib/apt/lists/*
# Common Python dependencies
RUN pip install --no-cache-dir \
faster-whisper==1.0.3 \
openai-whisper \
torch \
torchaudio \
librosa \
soundfile \
pyaudio
# Model cache directory
ENV XDG_CACHE_HOME=/models/cache
VOLUME /models
# Audio I/O
VOLUME /audio
WORKDIR /app
# Default command
CMD ["/bin/bash"]Build it:
docker build -t rocm-pytorch-stt:latest -f Dockerfile.base .Use it for a specific tool:
# File: whisper-wayland/Dockerfile
FROM rocm-pytorch-stt:latest
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
CMD ["python", "main.py"]No, they're not equally valid for your use case.
Best → Worst:
- Docker - Most reliable, best ROCm support, easy cleanup
- System venv + uv - Fast, lightweight, good for stable tools
- Distrobox - Good middle ground if Docker feels too heavyweight
- Conda - Creates more problems than it solves for this use case
- Nix - Academically interesting, practically overkill
- Conda is solving the wrong problem: It's designed for scientific reproducibility across platforms, not for preventing local dependency conflicts
- Docker feels "wrong" on desktop: Your instinct is right - containerization adds complexity. But for ML workloads, that complexity pays off
- System venvs are underrated: With
uv, they're fast enough and simple enough for many use cases
You're right that containerization on desktop feels heavy-handed. But ROCm + PyTorch + AMD GPU creates a unique situation:
- Binary compatibility matters: System ROCm must match PyTorch ROCm builds
- Version conflicts are frequent: Different STT tools want different PyTorch versions
- Evaluation requires isolation: Testing 10 tools without cross-contamination
Docker wins not because it's "better" in theory, but because it's the most pragmatic solution for managing complex ML dependencies on AMD GPUs.
Once you've evaluated tools and found winners, migrating daily-use tools to system venvs is perfectly reasonable.
# Build base image
cd ~/programs/ai-ml/speech-voice
cat > Dockerfile.base << 'EOF'
FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.3.0
ENV HSA_OVERRIDE_GFX_VERSION=11.0.1
RUN pip install faster-whisper openai-whisper librosa soundfile
VOLUME /models
WORKDIR /app
EOF
docker build -t rocm-stt-base -f Dockerfile.base .
# Test a tool
cd whisper-wayland
docker run --rm -it \
--device=/dev/kfd --device=/dev/dri \
-v ~/ai/models/stt:/models \
-v $(pwd):/app \
rocm-stt-base \
python your_script.py# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create venv for a tool
cd ~/programs/ai-ml/speech-voice/whisper-tool
uv venv
source .venv/bin/activate
uv pip install torch torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
uv pip install faster-whisper
python your_script.py# Create container
distrobox create --name stt-dev --image ubuntu:22.04
# Enter and set up
distrobox enter stt-dev
sudo apt install rocm-dev python3-pip ffmpeg
pip install torch torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
pip install faster-whisper
# Use it (audio/GPU work automatically)
python your_stt_script.pyNote: This gist was generated by Claude Code (claude-sonnet-4-5) as a comprehensive technical reference. While the information is based on current best practices and real-world experience with ROCm + PyTorch workflows, please validate recommendations against your specific system configuration and use case. Package versions, Docker images, and tool availability may change over time.
Author Context: Written for a user exploring STT solutions on Ubuntu 25.10 with KDE Plasma (Wayland), AMD RX 7700 XT GPU (gfx1101), ROCm, and PipeWire audio. Adjust recommendations for different hardware/software configurations.