Skip to content

Instantly share code, notes, and snippets.

@damico
Created April 10, 2023 18:54
Show Gist options
  • Star 45 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save damico/484f7b0a148a0c5f707054cf9c0a0533 to your computer and use it in GitHub Desktop.
Save damico/484f7b0a148a0c5f707054cf9c0a0533 to your computer and use it in GitHub Desktop.
Script for testing PyTorch support with AMD GPUs using ROCM
import torch, grp, pwd, os, subprocess
devices = []
try:
print("\n\nChecking ROCM support...")
result = subprocess.run(['rocminfo'], stdout=subprocess.PIPE)
cmd_str = result.stdout.decode('utf-8')
cmd_split = cmd_str.split('Agent ')
for part in cmd_split:
item_single = part[0:1]
item_double = part[0:2]
if item_single.isnumeric() or item_double.isnumeric():
new_split = cmd_str.split('Agent '+item_double)
device = new_split[1].split('Marketing Name:')[0].replace(' Name: ', '').replace('\n','').replace(' ','').split('Uuid:')[0].split('*******')[1]
devices.append(device)
if len(devices) > 0:
print('GOOD: ROCM devices found: ', len(devices))
else:
print('BAD: No ROCM devices found.')
print("Checking PyTorch...")
x = torch.rand(5, 3)
has_torch = False
len_x = len(x)
if len_x == 5:
has_torch = True
for i in x:
if len(i) == 3:
has_torch = True
else:
has_torch = False
if has_torch:
print('GOOD: PyTorch is working fine.')
else:
print('BAD: PyTorch is NOT working.')
print("Checking user groups...")
user = os.getlogin()
groups = [g.gr_name for g in grp.getgrall() if user in g.gr_mem]
gid = pwd.getpwnam(user).pw_gid
groups.append(grp.getgrgid(gid).gr_name)
if 'render' in groups and 'video' in groups:
print('GOOD: The user', user, 'is in RENDER and VIDEO groups.')
else:
print('BAD: The user', user, 'is NOT in RENDER and VIDEO groups. This is necessary in order to PyTorch use HIP resources')
if torch.cuda.is_available():
print("GOOD: PyTorch ROCM support found.")
t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
print('Testing PyTorch ROCM support...')
if str(t) == "tensor([5, 5, 5], device='cuda:0')":
print('Everything fine! You can run PyTorch code inside of: ')
for device in devices:
print('---> ', device)
else:
print("BAD: PyTorch ROCM support NOT found.")
except:
print('Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.')
@lazevedo
Copy link

Saved me a lot of time, thanks!

@hydrian
Copy link

hydrian commented Apr 23, 2023

This script didn't find the rocminfo binary eventhough it is installed and functioning as the current user

hydrian@balor ~/tmp $ which rocminfo
/usr/bin/rocminfo
hydrian@balor ~/tmp $ rocminfo 
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 2600X Six-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 2600X Six-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16310472(0xf8e0c8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16310472(0xf8e0c8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16310472(0xf8e0c8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1032                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6600 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 29695(0x73ff)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2900                               
  BDFID:                   3328                               
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1032         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

@lazevedo
Copy link

What distro are you running? Just tested again on ubuntu 22.04 and the rocm binary was found

Checking ROCM support...
GOOD: ROCM devices found:  2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user ... is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of: 
--->  Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz  
--->  gfx1012           

@hydrian
Copy link

hydrian commented Apr 23, 2023

Mint 21.1 (Ubuntu 22.04) rocm works with GPU support on stable-diffusion.

@StatPal
Copy link

StatPal commented May 29, 2023

Hi, noob here. My machine says, 'is NOT in RENDER and VIDEO groups.'
But [g.gr_name for g in grp.getgrall()] contains render and video both. Do you suggest any ideas on how I can fix that? Should I just include the user in the groups?

@mconcas
Copy link

mconcas commented Jun 2, 2023

Hi, noob here. My machine says, 'is NOT in RENDER and VIDEO groups.' But [g.gr_name for g in grp.getgrall()] contains render and video both. Do you suggest any ideas on how I can fix that? Should I just include the user in the groups?

Hello, sure. User should be added to those groups.

@StatPal
Copy link

StatPal commented Jun 2, 2023

Hi, noob here. My machine says, 'is NOT in RENDER and VIDEO groups.' But [g.gr_name for g in grp.getgrall()] contains render and video both. Do you suggest any ideas on how I can fix that? Should I just include the user in the groups?

Hello, sure. User should be added to those groups.

Thanks for the answer.
I tried that and added the user into those two groups. However, the line t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda') just goes on, and I had to forcefully stop the program. Do you think my GPU is not supported?

@userbox020
Copy link

Hello brother, im new into the torch and cuda world i downloaded a free AI model and im running it with oobagooba and exllama, as far i understand they use torch and cuda to load the model into the gpu. Do you think can be posible to modify the exllama to run with amd gpus?

@userbox020
Copy link

also brother im having this output with the script

(amd) mruserbox@guru-X99:/media/10TB_HHD/_AMD$ python test.py


Checking ROCM support...
BAD: No ROCM devices found.
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
BAD: The user mruserbox is NOT in RENDER and VIDEO groups. This is necessary in order to PyTorch use HIP resources
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of: 
(amd) mruserbox@guru-X99:/media/10TB_HHD/_AMD$ rocm-smi


======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK   Fan  Perf  PwrCap  VRAM%  GPU%  
0    56.0c           13.0W   500Mhz  96Mhz  0%   auto  215.0W    0%   0%    
================================================================================
============================= End of ROCm SMI Log ==============================

@userbox020
Copy link

I already found the problem brother, i had to add my user to vide and render group, i did the follow

sudo usermod -a -G video mruserbox
sudo usermod -a -G render mruserbox

now i had the follow output

(amd) mruserbox@guru-X99:/media/10TB_HHD/_AMD$ python test.py

Checking ROCM support...
GOOD: ROCM devices found:  2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user mruserbox is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.

Now its not recognizing rocminfo going to check why

@userbox020
Copy link

(amd) mruserbox@guru-X99:/media/10TB_HHD/_AMD$ python test.py


Checking ROCM support...
GOOD: ROCM devices found:  2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user mruserbox is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
A runtime error occurred: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/media/10TB_HHD/_AMD/test.py", line 55, in <module>
    t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@userbox020
Copy link

got the error solved! I had to select the right cuda device of my system, i changed it to device='cuda:1' now everything passed great!

(amd) mruserbox@guru-X99:/media/10TB_HHD/_AMD$ python test.py


Checking ROCM support...
GOOD: ROCM devices found:  2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user mruserbox is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of: 
--->  Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz  
--->  gfx1030     

@grigio
Copy link

grigio commented Aug 28, 2023

I don't know if rocm is supported or not.. gfx1036 ryzen 7 7700

python testrocm.py 

Checking ROCM support...
GOOD: ROCM devices found:  1
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 7700 8-Core Processor  
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 7700 8-Core Processor  
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3800                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    15865032(0xf214c8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    15865032(0xf214c8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    15865032(0xf214c8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*** Done ***  
rocm-smi 


========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
ERROR: GPU[0]	: sclk clock is unsupported
====================================================================================
GPU[0]		: get_power_cap, Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr   SCLK  MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
0    47.0c           37.153W  None  2600Mhz  0%   auto  Unsupported    3%   0%    
====================================================================================
=============================== End of ROCm SMI Log ================================

@winstonma
Copy link

winstonma commented Sep 1, 2023

Thanks for the script. Actually I came here because I got black image in ComfyUI stable diffusion image generation and got no object detection in ultralytics. My laptop used to be able to generate image in ComfyUI so I am doubt that I got problem upgrading AMD Driver to v5.6.1. I just downgraded the ROCm Driver to v5.5.3 and now everything works.

In both cases my I passed the PyTorch test.So I guess it would be great if additional PyTorch test could be added.

@iskyo0ps
Copy link

iskyo0ps commented Oct 8, 2023

amdgpu driver or pytorch driver will not add your current account into the render and video groups directly.

Due to the rocminfo need access the /dev/kfd and /dev/dri which owned by render or video group.

crw-rw---- 1 root render 235, 0 Oct. 7 17:56 /dev/kfd
drwxr-xr-x   3 root root        120 Oct.  7 17:56 ./
drwxr-xr-x  20 root root       4260 Oct.  7 17:56 ../
drwxr-xr-x   2 root root        100 Oct.  7 17:56 by-path/
crw-rw----+  1 root video  226,   0 Oct.  7 17:56 card0
crw-rw----+  1 root video  226,   1 Oct.  7 17:56 card1
crw-rw----+  1 root render 226, 128 Oct.  7 17:56 renderD128

you should add current groups into this two groups in linux command line, suck like
sudo usermod -aG render,video <your_current_user_name>
using groups <user_name> or id <user_name> or cat /etc/group to doule confirm.

Then reboot your linux machine make sure the changes works.
Run this script again.

(Optional)if necessary add sudo or root priviledge.
sudo usermod -aG render,video,sudo,root <your_current_user_name>
Removing user <usr_name> from group root
sudo gpasswd -d <usr_name> root or sudo deluser <usr_name> root

@hydrian
Copy link

hydrian commented Oct 8, 2023

Depending the distro, make sure the user may need different groups memberships. In the Debian distro family, the user needs to have membership to both the video and render groups.

Also remember, actual group members is only applied at user login. So even if you add the group to the user and it shows up in the groups output, the user may not permission to the groups resources until that user does a login again.

@POMXARK
Copy link

POMXARK commented Dec 19, 2023

if errors

fix

import torch, grp, pwd, os, subprocess
import getpass
devices = []
try:
	print("\n\nChecking ROCM support...")
	result = subprocess.run(['rocminfo'], stdout=subprocess.PIPE)
	cmd_str = result.stdout.decode('utf-8')
	cmd_split = cmd_str.split('Agent ')
	for part in cmd_split:
		item_single = part[0:1]
		item_double = part[0:2]
		if item_single.isnumeric() or item_double.isnumeric():
			new_split = cmd_str.split('Agent '+item_double)
			device = new_split[1].split('Marketing Name:')[0].replace('  Name:                    ', '').replace('\n','').replace('                  ','').split('Uuid:')[0].split('*******')[1]
			devices.append(device)
	if len(devices) > 0:
		print('GOOD: ROCM devices found: ', len(devices))
	else:
		print('BAD: No ROCM devices found.')

	print("Checking PyTorch...")
	x = torch.rand(5, 3)
	has_torch = False
	len_x = len(x)
	if len_x == 5:
		has_torch = True
		for i in x:
			if len(i) == 3:
				has_torch = True
			else:
				has_torch = False
	if has_torch:
		print('GOOD: PyTorch is working fine.')
	else:
		print('BAD: PyTorch is NOT working.')


	print("Checking user groups...")
	user = getpass.getuser()
	groups = [g.gr_name for g in grp.getgrall() if user in g.gr_mem]
	gid = pwd.getpwnam(user).pw_gid
	groups.append(grp.getgrgid(gid).gr_name)
	if 'render' in groups and 'video' in groups:
		print('GOOD: The user', user, 'is in RENDER and VIDEO groups.')
	else:
		print('BAD: The user', user, 'is NOT in RENDER and VIDEO groups. This is necessary in order to PyTorch use HIP resources')

	if torch.cuda.is_available():
		print("GOOD: PyTorch ROCM support found.")
		t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
		print('Testing PyTorch ROCM support...')
		if str(t) == "tensor([5, 5, 5], device='cuda:0')":
			print('Everything fine! You can run PyTorch code inside of: ')
			for device in devices:
				print('---> ', device)
	else:
		print("BAD: PyTorch ROCM support NOT found.")
except:
	print('Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.')

and

image

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6

https://pytorch.org/get-started/locally/

(Linux Mint 21)

Checking ROCM support...
GOOD: ROCM devices found: 2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user roman is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of:
---> AMD Ryzen 5 5500U with Radeon Graphics
---> gfx90c

@mcr-ksh
Copy link

mcr-ksh commented Feb 3, 2024

useful to check for a hip.
if torch.cuda.is_available() and torch.version.hip:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment