Skip to content

Instantly share code, notes, and snippets.

View luraess's full-sized avatar

Ludovic Räss luraess

View GitHub Profile
@luraess
luraess / amd_bench_inbounds.jl
Created February 16, 2023 17:28
Laplacian 2D to test boundscheck perf on AMDGPU
using BenchmarkTools, AMDGPU
function diff2D_step_inbounds!(T2, T, Ci, lam, dt, _dx, _dy)
ix = (workgroupIdx().x - 1) * workgroupDim().x + workitemIdx().x
iy = (workgroupIdx().y - 1) * workgroupDim().y + workitemIdx().y
if (ix>1 && ix<size(T2,1) && iy>1 && iy<size(T2,2))
@inbounds T2[ix,iy] = T[ix,iy] + dt*(Ci[ix,iy]*(
- ((-lam*(T[ix+1,iy] - T[ix,iy])*_dx) - (-lam*(T[ix,iy] - T[ix-1,iy])*_dx))*_dx
- ((-lam*(T[ix,iy+1] - T[ix,iy])*_dy) - (-lam*(T[ix,iy] - T[ix,iy-1])*_dy))*_dy ))
end
@luraess
luraess / alltoall_test_cuda.jl
Last active July 19, 2023 08:24
CUDA-aware MPI test
using MPI
using CUDA
MPI.Init()
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
size = MPI.Comm_size(comm)
dst = mod(rank+1, size)
src = mod(rank-1, size)
println("rank=$rank, size=$size, dst=$dst, src=$src")
N = 4
@luraess
luraess / alltoall_test_cuda_multigpu.jl
Last active July 19, 2023 08:24
CUDA-aware MPI multi-GPU test
using MPI
using CUDA
MPI.Init()
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
# select device
comm_l = MPI.Comm_split_type(comm, MPI.COMM_TYPE_SHARED, rank)
rank_l = MPI.Comm_rank(comm_l)
gpu_id = CUDA.device!(rank_l)
# select device
@luraess
luraess / alltoall_test_rocm_multigpu.jl
Last active July 19, 2023 08:18
ROCm-aware (AMDGPU) MPI multi-GPU test
using MPI
using AMDGPU
MPI.Init()
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
# select device
comm_l = MPI.Comm_split_type(comm, MPI.COMM_TYPE_SHARED, rank)
rank_l = MPI.Comm_rank(comm_l)
device = AMDGPU.device_id!(rank_l+1)
gpu_id = AMDGPU.device_id(AMDGPU.device())
@luraess
luraess / alltoall_test_rocm.jl
Last active July 19, 2023 08:20
ROCm-aware (AMDGPU) MPI test
using MPI
using AMDGPU
MPI.Init()
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
size = MPI.Comm_size(comm)
dst = mod(rank+1, size)
src = mod(rank-1, size)
println("rank=$rank, size=$size, dst=$dst, src=$src")
N = 4

Running a multi-GPU Julia app on octopus

Startup steps for Julia CUDA MPI application relying on ParallelStencil.jl and ImplicitGlobalGrid.jl.

GPU cluster config:

  • CUDA 11.0
  • CUDA-aware OpenMPI 3.0.6
  • gcc 8.3

Following steps should enable a successful multi-GPU run:

@luraess
luraess / Julia_GPU_octopus.md
Last active May 14, 2021 20:49
Steps to set up and run a Julia GPU code on octopus using ParallelStencil.jl module.

Running a Julia code on a single GPU on octopus

In the shell, prepare the Julia environment on an octopus node:

$ ssh <username>@octopus.unil.ch

$ ssh -YC nodeXX

$ cd /scratch/<username>/<path-to-folder>