Skip to content

Instantly share code, notes, and snippets.

@guilt
Last active July 25, 2024 05:09
Show Gist options
  • Save guilt/6c901f7ac0a726685b6334798da77c00 to your computer and use it in GitHub Desktop.
Save guilt/6c901f7ac0a726685b6334798da77c00 to your computer and use it in GitHub Desktop.
ROCM Setup Steps
#!/bin/sh
set -e
mkdir -p pytorch-examples pytorch-cache
docker build pytorch-examples \
--rm \
-t rocm-examples-pytorch \
-f post-rocm-python-ubuntu.Dockerfile
#!/bin/sh
set -e
LOCAL_GID=$(getent group render | cut -d: -f3)
mkdir -p rocm-examples
exec docker build rocm-examples \
--build-arg GID="${LOCAL_GID}" \
--rm \
-t rocm-examples \
-f hip-libraries-rocm-ubuntu.Dockerfile
# Ubuntu based docker image
FROM ubuntu:20.04
# Base packages that are required for the installation
RUN export DEBIAN_FRONTEND=noninteractive; \
apt-get update -qq \
&& apt-get install --no-install-recommends -y \
ca-certificates \
git \
locales-all \
make \
python3 \
python3-venv \
python3-dev \
ssh \
sudo \
wget \
pkg-config \
glslang-tools \
libvulkan-dev \
vulkan-validationlayers \
libglfw3-dev \
neovim \
&& rm -rf /var/lib/apt/lists/*
ENV LANG en_US.utf8
# Install ROCM HIP and libraries using the installer script
RUN export DEBIAN_FRONTEND=noninteractive; \
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/focal/amdgpu-install_5.4.50403-1_all.deb \
&& apt-get update -qq \
&& apt-get install -y ./amdgpu-install_5.4.50403-1_all.deb \
&& rm ./amdgpu-install_5.4.50403-1_all.deb \
&& amdgpu-install -y --usecase=hiplibsdk --no-dkms \
&& apt-get install -y libnuma-dev \
&& rm -rf /var/lib/apt/lists/*
# Install CMake
RUN wget https://github.com/Kitware/CMake/releases/download/v3.21.7/cmake-3.21.7-linux-x86_64.sh \
&& mkdir /cmake \
&& sh cmake-3.21.7-linux-x86_64.sh --skip-license --prefix=/cmake \
&& rm cmake-3.21.7-linux-x86_64.sh
ENV PATH="/cmake/bin:/opt/rocm/bin:${PATH}"
RUN echo "/opt/rocm/lib" >> /etc/ld.so.conf.d/rocm.conf \
&& ldconfig
# Use render group as an argument from user
ARG GID=109
# Add the render group and a user with sudo permissions for the container
RUN groupadd --system --gid ${GID} render \
&& useradd -Um -G sudo,video,render developer \
&& echo developer ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/developer \
&& chmod 0440 /etc/sudoers.d/developer
RUN mkdir /workspaces && chown developer:developer /workspaces
WORKDIR /workspaces
VOLUME /workspaces
USER developer
#!/bin/sh
# shellcheck disable=SC2086
set -e
DOCKER_ARGS=${DOCKER_ARGS:--it}
CMD=${1:-bash}
exec docker run \
--rm \
${DOCKER_ARGS} \
--name rocm-examples-pytorch \
-h rocm-examples-pytorch \
--device /dev/kfd --device /dev/dri \
-v "$(pwd -P)/pytorch-examples":/workspaces/pytorch-examples \
-v "$(pwd -P)/pytorch-cache":/home/developer/.cache \
rocm-examples-pytorch "$CMD"
#!/bin/sh
# shellcheck disable=SC2086
set -e
DOCKER_ARGS=${DOCKER_ARGS:--it}
CMD=${1:-bash}
exec docker run \
--rm \
${DOCKER_ARGS} \
--name rocm-examples \
-h rocm-examples \
--device /dev/kfd --device /dev/dri \
-v "$(pwd -P)/rocm-examples":/workspaces/rocm-examples \
rocm-examples "$CMD"
# ROCm based docker image
FROM rocm-examples:latest
# Set Root User
USER root
# Create VEnv Directory
RUN mkdir -p /venv && chown developer:developer /venv
# Set User
USER developer
# Install VEnv and PyTorch
RUN python3.8 -m venv /venv && \
. /venv/bin/activate && \
python3.8 -m pip install --upgrade \
pip setuptools wheel six && \
python3.8 -m pip install \
--index-url https://download.pytorch.org/whl/rocm5.4.2 \
--pre torch torchvision torchaudio pillow && \
python3.8 -m pip cache purge
VOLUME /venv
CMD /venv/bin/python3.8

ROCm Setup Steps

ROCm Docker Image

  1. Install Docker and ensure you can run docker ps correctly, add yourself to the docker group if necessary.
  2. Run build-rocm.sh builds a ROCm docker image for your Linux System. It is configured to use the render group configured in your Linux distribution and ensure that /dev/kfd and /dev/dri are writeable by the render group users. Ensure you are added to the render group if necessary.
  3. Run launch-rocm.sh if you wish to only use ROCm with the docker image you built.

PyTorch Docker Image

  1. Run build-pytorch.sh if you wish to build a PyTorch image for your Linux System. It is built as a separate docker image, on top of the ROCm docker image you built earlier.
  2. Run launch-pytorch.sh if you wish to run PyTorch with the second image just now built.
  3. Run source /venv/bin/activate within the container and you should be able to run all the cool PyTorch things you need.
@jessecambon
Copy link

I ended up needing to use a live USB to repair my Pop OS install (I believe one of the initramfs commands I ran messed something up because I was unable to login after a reboot). After that I installed ROCM again via these instructions. Then, instead of using docker desktop, I installed docker.io via these commands:

sudo apt install docker.io
sudo usermod -aG docker $USER

Docker now only works via sudo, but I was able to get the ROCM-pytorch image to successfully run by running these commands as sudo:

sudo docker pull rocm/pytorch:latest
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest

For reference this is what /dev/kfd looks like:

$ ls -l /dev/kfd
crw-rw---- 1 root render 235, 0 Jul 24 10:19 /dev/kfd

Haven't tried stable diffusion or running anything in pytorch yet, but at least I'm past the previous error message. Thanks again for your help.

@guilt
Copy link
Author

guilt commented Jul 24, 2024

You're welcome. Please update that System76 ticket as well with what happened to you. Have a wonderful day.

@jessecambon
Copy link

I made a PR to add a note to the system76 docs here system76/docs#1242. I can link it in the prior ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment