Skip to content

Instantly share code, notes, and snippets.

@warmonkey
Last active May 4, 2023 10:04
Show Gist options
  • Save warmonkey/391721ee1b196773ed259baaab10f18b to your computer and use it in GitHub Desktop.
Save warmonkey/391721ee1b196773ed259baaab10f18b to your computer and use it in GitHub Desktop.
Use ROCm and PyTorch on AMD integrated graphics (iGPU, Ryzen 7 5825u)

NOT WORKING - WRONG CALCULATION RESULT

GPU can be detected, but cannot perform training or inference. Calculation result is wrong.

  1. Install PyTorch with ROCm support
    Following offical installation guide: https://pytorch.org/get-started/locally/#linux-installation
    Choose [Stable] -> [Linux] -> [Pip] -> [Python] -> [ROCm], It should be something like:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

Remember the ROCm version here.

  1. Install ROCm drivers
dpkg -i ./amdgpu-install*.deb
  • Run the installation script
amdgpu-install --usecase=graphics,rocm,opencl -y --accept-eula

Note: Ryzen 7 5825u iGPU architecture is Vega 8, which suppose to use legacy opencl.
If you are using other AMD GPU or APU, modifications may required.

  • Add current user to groups
    To access device /dev/kfd, /dev/dri/card0 and /dev/dri/renderD*, current user must be added to group render and video.
sudo usermod -a -G render $LOGNAME
sudo usermod -a -G video $LOGNAME

If not added, only root is allowed to use ROCm

  • Reboot the system
  1. Add environment variables in .bashrc
    Ryzen 7 5825u is gfx90c, should be compatible with gfx900. We force ROCm to treat it as gfx900.
export PYTORCH_ROCM_ARCH=gfx900
export HSA_OVERRIDE_GFX_VERSION=9.0.0
  1. Check iGPU status
rocm-smi

From the output, you can see GPU[0].

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
ERROR: GPU[0]	: sclk clock is unsupported
================================================================================
GPU[0]		: Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr  SCLK  MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
0    43.0c           0.003W  None  1200Mhz  0%   auto  Unsupported   43%   0%    
================================================================================
============================= End of ROCm SMI Log ==============================

Also, you can check OpenCL status

clinfo

From the output you can see GPU has been detected.

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3513.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 
  Device Topology:				 PCI[ B#4, D#0, F#0 ]
  Max compute units:				 8
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  1. Test run
import torch
print(torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
   print(torch.cuda.get_device_properties(i))

Output:

1
_CudaDeviceProperties(name='AMD Radeon Graphics', major=9, minor=0, total_memory=1024MB, multi_processor_count=8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment