Skip to content

Instantly share code, notes, and snippets.

@dappelha
dappelha / gist:2101e53893c0abaf6273015870a1ec81
Created December 13, 2023 18:35
3D nested loops on GPUs
for (int l = 0; l < ld; l++)
{
for (int k = 0; k < kd; k++)
{
for (int j = 0; j < jd; j++)
{
if (l > 0 && l < (ld - 1) && k > 0 && k < (kd - 1) && j > 0 && j < (jd - 1))
{
jp = j + 1;
jm = j - 1;
@dappelha
dappelha / loop_unrolling.F90
Created October 25, 2020 03:02
How to trick the compiler to unroll loops for you without manually unrolling the loop.
! Modern compilers with -O3 usually unroll loops when the start and stop bounds of the loop are known
! at compile time. Here is an example where I use a new secondary loop with fixed bounds to unroll by
! the amount specified in the parameter nunroll. This allows the routine to be general with only a change
! to nunroll (and a recompile) to unroll by a different amount.
program loop_unrolling
implicit none
integer :: i, ii, iend, istart
integer, parameter :: nunroll=2
#!/bin/bash
world_rank=$PMIX_RANK
let local_size=$RANKS_PER_SOCKET
export CUDA_CACHE_PATH=/dev/shm/$USER/nvcache_$PMIX_RANK
executable=$1
shift
if [ $world_rank = $PROFILE_RANK ]; then
nvprof -f -o $PROFILE_PATH $executable "$@"
else
$executable "$@"
@dappelha
dappelha / nvtx_mod.F90
Created June 5, 2017 21:03
Fortran module that provides interface with NVIDIA Tools Extension (NVTX) library. This version works with XLF which requires valid arguements to c_loc.
module nvtx_mod
use iso_c_binding
implicit none
integer,private :: col(7) = [ Z'0000ff00', Z'000000ff', Z'00ffff00', &
Z'00ff00ff', Z'0000ffff', Z'00ff0000', Z'00ffffff']
!character(len=256), private :: tempName
character, private, target :: tempName(256)