Jeff Larkin jefflarkin

## jefflarkin-author-bio.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jefflarkin
                / jefflarkin-author-bio.md
            
            
              Last active
              April 16, 2024 12:53
            
          
    Biography

Jeff Larkin is a Director in NVIDIA's HPC Software team, where he leads a team responsible for HPC programming models and standards and also technical marketing engineering. He is passionate about the advancement and adoption of parallel programming models for High Performance Computing. He was previously a member of NVIDIA's Developer Technology group, specializing in performance analysis and optimization of high performance computing applications. Jeff is also the chair of the OpenACC technical committee and has worked in both the OpenACC and OpenMP standards bodies. Before joining NVIDIA, Jeff worked in the Cray Supercomputing Center of Excellence, located at Oak Ridge National Laboratory. Jeff holds a B.S. in Computer Science from Furman University and a M.S. in Computer Science from the University of Tennessee, where he was a member of the Innovative Computing Lab.
Headshot


Social Links


[LinkedIn](https://www.li


## printer.cfg
#
# Klipper configuration file for Anycubic i3 MEGA S
#
# This config file contains settings of all printer pins (steppers, sensors) for Anycubic i3 mega S with TMC2208 Drivers with stock plug orientation
# Klipper firmware should be compiled for the atmega2560
#
# Config file includes
#  - Original or 2208(2209) rotated by cable drivers
#  - Mesh bed leveling: BLtouch (3DTouch sensor from Triangelab)
#  - Manual meshed bed leveling (commented out)

## cudaCheckError.c
//Macro for checking cuda errors following a cuda launch or api call
#define cudaCheckError() {                                          \
 cudaError_t e=cudaGetLastError();                                 \
 if(e!=cudaSuccess) {                                              \
   printf("Cuda failure %s:%d: '%s'\n",__FILE__,__LINE__,cudaGetErrorString(e));           \
   exit(0); \
 }                                                                 \
}

## gist:658822
:) cat /proc/self/status
Name:   cat
State:  R (running)
SleepAVG:       89%
Tgid:   13668
Pid:    13668
PPid:   24697
TracerPid:      0
Uid:    <removed>
Gid:    <removed>

## 00-intro.md

      
              6 files
            
          
              0 forks
            
          
              3 comments
            
          
              0 stars
            
          
                jefflarkin
                / 00-intro.md
            
            
              Last active
              August 10, 2023 15:34
            
              
                OpenACC Unified memory & async clarifications
              
          
    Background

OpenACC defines data acording to whether it is in discrete or shared memory. When in discrete, specific data operations are specified and implicit data clauses are defined. When in shared memory, data clauses may be ignored if they exist. As an optimization, an implementation may wish to use data clauses as optimization hints. I have historically thought of these in terms of CUDA Unified/Managed Memory with preferred location and prefetching hints. A few cases were brought to my attention that are potentially interesting examples of how this thinking may not be sufficient.
Modifying an allocation during an asynchronous region

I have been made aware of an application that extensively uses the pattern below. A temporary array is allocated locally, in the example below it is an automatic array, and dynamic data lifetimes are used to expose it to the device asynchronously. It is possible that the function would return, deallocting the automatic array, before all operations on that array have com

  
## nvtx.F90
! Fortran bindings for a small subset of the NVIDIA Tools Extensions library
module nvtx
  use iso_c_binding
  public :: nvtxrangepusha, nvtxrangepop
  public :: nvtxrangepushaargb
  interface
    ! Annotate the timeline with a message
    ! Parameters:
    ! * string : the message in a string format
    subroutine nvtxrangepusha(string) bind(C, name="nvtxRangePushA")

## nvtx.w
#include <pthread.h>
#include <nvToolsExt.h>
#include <nvToolsExtCudaRt.h>
// Setup event category name
{{fn name MPI_Init}}
  nvtxNameCategoryA(999, "MPI");
  {{callfn}}
  int rank;
  PMPI_Comm_rank(MPI_COMM_WORLD, &rank);
  char name[256];

## nsys_kernel_stats.ipynb

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jefflarkin
                / nsys_kernel_stats.ipynb
            
            
              Created
              January 29, 2020 20:05
            
              
                NVIDIA Nsight Systems Recipes
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## cudaCheckError.c
//Macro for checking cuda errors following a cuda launch or api call
#define cudaCheckError() {                                          \
 cudaError_t e=cudaGetLastError();                                 \
 if(e!=cudaSuccess) {                                              \
   printf("Cuda failure %s:%d: '%s'\n",__FILE__,__LINE__,cudaGetErrorString(e));           \
   exit(0); \
 }                                                                 \
}

## 01-mm1_acc.F90
program mm
  use omp_lib
  integer(8), parameter :: N = 4096
  integer(8) :: i,j,k,tmp6,tmp2
  real(8), dimension(N,N) :: A, B, C
  real(8) :: tmp, chk, t0, t1, t2, t3

  t0 = omp_get_wtime()
  !$acc data create(A,B,C)
  !$acc kernels
	#
	# Klipper configuration file for Anycubic i3 MEGA S
	#
	# This config file contains settings of all printer pins (steppers, sensors) for Anycubic i3 mega S with TMC2208 Drivers with stock plug orientation
	# Klipper firmware should be compiled for the atmega2560
	#
	# Config file includes
	# - Original or 2208(2209) rotated by cable drivers
	# - Mesh bed leveling: BLtouch (3DTouch sensor from Triangelab)
	# - Manual meshed bed leveling (commented out)
	//Macro for checking cuda errors following a cuda launch or api call
	#define cudaCheckError() { \
	cudaError_t e=cudaGetLastError(); \
	if(e!=cudaSuccess) { \
	printf("Cuda failure %s:%d: '%s'\n",__FILE__,__LINE__,cudaGetErrorString(e)); \
	exit(0); \
	} \
	}
	:) cat /proc/self/status
	Name: cat
	State: R (running)
	SleepAVG: 89%
	Tgid: 13668
	Pid: 13668
	PPid: 24697
	TracerPid: 0
	Uid: <removed>
	Gid: <removed>
	! Fortran bindings for a small subset of the NVIDIA Tools Extensions library
	module nvtx
	use iso_c_binding
	public :: nvtxrangepusha, nvtxrangepop
	public :: nvtxrangepushaargb
	interface
	! Annotate the timeline with a message
	! Parameters:
	! * string : the message in a string format
	subroutine nvtxrangepusha(string) bind(C, name="nvtxRangePushA")
	#include <pthread.h>
	#include <nvToolsExt.h>
	#include <nvToolsExtCudaRt.h>
	// Setup event category name
	{{fn name MPI_Init}}
	nvtxNameCategoryA(999, "MPI");
	{{callfn}}
	int rank;
	PMPI_Comm_rank(MPI_COMM_WORLD, &rank);
	char name[256];
	program mm
	use omp_lib
	integer(8), parameter :: N = 4096
	integer(8) :: i,j,k,tmp6,tmp2
	real(8), dimension(N,N) :: A, B, C
	real(8) :: tmp, chk, t0, t1, t2, t3

	t0 = omp_get_wtime()
	!$acc data create(A,B,C)
	!$acc kernels