Skip to content

Instantly share code, notes, and snippets.

❯ nvprof --print-gpu-trace julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d-profiling.jl
==43297== NVPROF is profiling process 43297, command: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d-profiling.jl
[ Info: ----------------------------------------------------
[ Info: ______ _ _____ __ ________
[ Info: | ____| | |_ _| ... | __ |
[ Info: | | | | | | | . | | | |
[ Info: | | | | | | | | | | |__| |
[ Info: | |____| |____ _| |_| | | | | | |
[ Info: | _____|______|_____|_| |_|_| |_|
[ Info:
==50304== Profiling application: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d-profiling.jl
==50304== Profiling result:
==50304== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla V100-SXM2-16GB (0)"
Kernel: ptxcall_update__10
55 inst_per_warp Instructions per warp 1.3458e+03 1.3458e+03 1.3458e+03
55 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
55 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 95.54% 95.54% 95.54%
@lcw
lcw / README.md
Last active April 30, 2019 21:15
clang vs Julia compiler output

This contains compiler output for and example kernel in [Heptapus][0].

The files begining with volumerhs. are from clang 6.0.1 and the files begining with volumerhs!. are from CUDAnative and julia, with versions

❯ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
#define int_floor_div_pos_b(a,b) ( ( (a) - ( ((a)<0) ? ((b)-1) : 0 ) ) / (b) )
task void init_knl_inner(uniform int32 const N, uniform float *uniform a)
{
if (-1 + -8 * ((uniform int32) taskIndex0) + -1 * (varying int32) programIndex + N >= 0)
for (uniform int32 i = 0; i <= -1 + N; ++i)
a[8 * i + (((varying int32) programIndex + ((uniform int32) taskIndex0) * 8) % 8) + 8 * N * (((varying int32) programIndex + ((uniform int32) taskIndex0) * 8) / 8)] = 17.0;
}
export void init_knl(uniform int32 const N, uniform float *uniform a)
{
@lcw
lcw / try.cpp
Created December 11, 2015 22:48
/*******************************************************************
This file has been automatically generated by ispc
DO NOT EDIT THIS FILE DIRECTLY
*******************************************************************/
/* Provide Declarations */
#include <stdarg.h>
#include <setjmp.h>
#include <limits.h>
#include <stdlib.h>
" ******** Syntax highlighting
if has("syntax")
syntax clear
source $VIMRUNTIME/syntax/fortran.vim
function! TextEnableCodeSnip(filetype,start,end,textSnipHl) abort
let ft=toupper(a:filetype)
let group='textGroup'.ft
if exists('b:current_syntax')
let s:current_syntax=b:current_syntax
@lcw
lcw / fortran.vim
Last active August 29, 2015 14:14
" ******** Syntax highlighting
if has("syntax")
syntax clear
source $VIMRUNTIME/syntax/fortran.vim
function! TextEnableCodeSnip(filetype,start,end,textSnipHl) abort
let ft=toupper(a:filetype)
let group='textGroup'.ft
if exists('b:current_syntax')
let s:current_syntax=b:current_syntax
--[ MESH INFORMATION ]----------------------------
NODES : 1587
ELEMENTS : 968
MESH INSIDE :
[ -5.000000 , 5.000000 ] x [ -5.000000 , 5.000000 ] x [ -0.500000 , 0.500000 ]
==================================================
OCCA mode: OpenMP
OCCA is using Compiler: g++
with flags : -D__extern_always_inline=inline -O3
Sender: LSF System <lsfadmin@compute-12-3>
Subject: Job 70892: <try> Done
Job <try> was submitted from host <login-2-6> by user <lucasw> in cluster <lsfhpc>.
Job was executed on host(s) <2*compute-12-3>, in queue <normal>, as user <lucasw> in cluster <lsfhpc>.
</home/lucasw> was used as the home directory.
</home/lucasw/hello> was used as the working directory.
Started at Mon May 3 10:24:17 2010
Results reported at Mon May 3 10:24:27 2010
Warning: Permanently added 'compute-12-3' (RSA) to the list of known hosts.