Skip to content

Instantly share code, notes, and snippets.

Lucas C Wilcox lcw

Block or report user

Report or block lcw

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View try.jl
using GPUifyLoops, Cthulhu, CuArrays, CUDAnative
function kernel!(A, B)
@inbounds @loop for i in (1:size(A,1);
(blockIdx().x-1)*blockDim().x + threadIdx().x)
A[i] = B[i]
end
nothing
end
View big-singlegpu-metrics
─────────────────────────────────────────────────────────────────────────────==26324== Profiling application: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d.jl
==26324== Profiling result:
==26324== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla V100-SXM2-16GB (0)"
Kernel: ptxcall_knl_dof_iteration__6
55 inst_per_warp Instructions per warp 5.4765e+03 5.6132e+03 5.4946e+03
55 branch_efficiency Branch Efficiency 99.36% 99.41% 99.40%
55 warp_execution_efficiency Warp Execution Efficiency 81.70% 83.28% 83.09%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 78.49% 80.00%
View profile.out
❯ nvprof --print-gpu-trace julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d-profiling.jl
==43297== NVPROF is profiling process 43297, command: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d-profiling.jl
[ Info: ----------------------------------------------------
[ Info: ______ _ _____ __ ________
[ Info: | ____| | |_ _| ... | __ |
[ Info: | | | | | | | . | | | |
[ Info: | | | | | | | | | | |__| |
[ Info: | |____| |____ _| |_| | | | | | |
[ Info: | _____|______|_____|_| |_|_| |_|
[ Info:
View metrics.out
==50304== Profiling application: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d-profiling.jl
==50304== Profiling result:
==50304== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla V100-SXM2-16GB (0)"
Kernel: ptxcall_update__10
55 inst_per_warp Instructions per warp 1.3458e+03 1.3458e+03 1.3458e+03
55 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
55 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 95.54% 95.54% 95.54%
@lcw
lcw / README.md
Last active Apr 30, 2019
clang vs Julia compiler output
View README.md

This contains compiler output for and example kernel in [Heptapus][0].

The files begining with volumerhs. are from clang 6.0.1 and the files begining with volumerhs!. are from CUDAnative and julia, with versions

❯ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
View knl.ispc
#define int_floor_div_pos_b(a,b) ( ( (a) - ( ((a)<0) ? ((b)-1) : 0 ) ) / (b) )
task void init_knl_inner(uniform int32 const N, uniform float *uniform a)
{
if (-1 + -8 * ((uniform int32) taskIndex0) + -1 * (varying int32) programIndex + N >= 0)
for (uniform int32 i = 0; i <= -1 + N; ++i)
a[8 * i + (((varying int32) programIndex + ((uniform int32) taskIndex0) * 8) % 8) + 8 * N * (((varying int32) programIndex + ((uniform int32) taskIndex0) * 8) / 8)] = 17.0;
}
export void init_knl(uniform int32 const N, uniform float *uniform a)
{
View try.cpp
/*******************************************************************
This file has been automatically generated by ispc
DO NOT EDIT THIS FILE DIRECTLY
*******************************************************************/
/* Provide Declarations */
#include <stdarg.h>
#include <setjmp.h>
#include <limits.h>
#include <stdlib.h>
View gist:d941766a23efed5f8621
" ******** Syntax highlighting
if has("syntax")
syntax clear
source $VIMRUNTIME/syntax/fortran.vim
function! TextEnableCodeSnip(filetype,start,end,textSnipHl) abort
let ft=toupper(a:filetype)
let group='textGroup'.ft
if exists('b:current_syntax')
let s:current_syntax=b:current_syntax
View fortran.vim
" ******** Syntax highlighting
if has("syntax")
syntax clear
source $VIMRUNTIME/syntax/fortran.vim
function! TextEnableCodeSnip(filetype,start,end,textSnipHl) abort
let ft=toupper(a:filetype)
let group='textGroup'.ft
if exists('b:current_syntax')
let s:current_syntax=b:current_syntax
View gist:160ab830e61b83ddbd20
--[ MESH INFORMATION ]----------------------------
NODES : 1587
ELEMENTS : 968
MESH INSIDE :
[ -5.000000 , 5.000000 ] x [ -5.000000 , 5.000000 ] x [ -0.500000 , 0.500000 ]
==================================================
OCCA mode: OpenMP
OCCA is using Compiler: g++
with flags : -D__extern_always_inline=inline -O3
You can’t perform that action at this time.