This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// | |
// Generated by LLVM NVPTX Back-End | |
// | |
.version 6.0 | |
.target sm_61 | |
.address_size 64 | |
.extern .func (.param .b32 func_retval0) vprintf | |
( |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
diff --git a/Project.toml b/Project.toml | |
index 104d043..d15c4ad 100644 | |
--- a/Project.toml | |
+++ b/Project.toml | |
@@ -5,6 +5,7 @@ version = "0.2.8" | |
[deps] | |
Cassette = "7057c7e9-c182-5462-911a-8362d720325c" | |
+Cthulhu = "f68482b8-f384-11e8-15f7-abe071a5a75f" | |
Requires = "ae029012-a4dd-5104-9daa-d747884805df" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
==64089== Profiling application: julia --project=../../../env/gpu isentropicvortex.jl | |
==64089== Profiling result: | |
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Device Context Stream Name | |
ms ms KB B | |
1.15e+05 1.838327 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [136] | |
1.22e+05 5.682147 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [147] | |
1.25e+05 9.771470 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [158] | |
1.25e+05 3.028208 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [169] | |
1.25e+05 1.840918 (125000 1 1) (125 1 1) 40 0.000000 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
───────────────────────────────────────────────────────────────────────────── | |
Time Allocations | |
────────────────────── ─────────────────────── | |
Tot / % measured: 573s / 100% 242GiB / 100% | |
Section ncalls time %tot avg alloc %tot avg | |
───────────────────────────────────────────────────────────────────────────── | |
dostep! 100 572s 100% 5.72s 242GiB 100% 2.42GiB | |
facerhs! 500 241s 42.2% 483ms 124GiB 51.1% 253MiB |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using KernelAbstractions | |
using GPUifyLoops | |
using CUDAnative, CuArrays, CUDAdrv | |
@kernel function transpose_kernel_naive!(b, a) | |
I = @index(Global, Cartesian) | |
i, j = Tuple(I) | |
@inbounds b[i, j] = a[j, i] | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
==4461== Profiling application: julia kernel_transpose.jl | |
==4461== Profiling result: | |
==4461== Metric result: | |
Invocations Metric Name Metric Description Min Max Avg | |
Device "GeForce MX150 (0)" | |
Kernel: ptxcall___gpu_transpose_kernel_naive__426_1 | |
10 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 0.08% 0.10% 0.09% | |
10 global_hit_rate Global Hit Rate in unified l1/tex 0.00% 0.00% 0.00% | |
Kernel: ptxcall_transpose_cuda__5 | |
10 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 87.50% 87.50% 87.50% |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Invocations Metric Name Metric Description Min Max Avg | |
Device "Tesla V100-SXM2-16GB (0)" | |
Kernel: ptxcall___gpu_transpose_kernel_naive__426_1 | |
10 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 90.44% 90.50% 90.48% | |
10 global_hit_rate Global Hit Rate in unified l1/tex 0.54% 0.55% 0.54% | |
Kernel: ptxcall_transpose_cuda__5 | |
10 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 50.11% 50.12% 50.12% | |
10 global_hit_rate Global Hit Rate in unified l1/tex 77.75% 77.75% 77.75% | |
Kernel: ptxcall___gpu_transpose_kernel_naive_ldg__429_2 | |
10 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 90.53% 90.56% 90.54% |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
diff --git a/src/mapreduce.jl b/src/mapreduce.jl | |
index 14bcfe1..ad2da8c 100644 | |
--- a/src/mapreduce.jl | |
+++ b/src/mapreduce.jl | |
@@ -132,14 +132,15 @@ function partial_mapreduce_grid(f, op, neutral, Rreduce, Rother, shuffle, R, As. | |
end | |
## COV_EXCL_STOP | |
- | |
-NVTX.@range function GPUArrays.mapreducedim!(f, op, R::CuArray{T}, As::AbstractArray...; init=nothing) where T |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+---------+-------------------------+-------------------+----------------+----------+ | |
| FT | split_explicit_implicit | penalty on linear | remainer_model | result | | |
+---------+-------------------------+-------------------+----------------+----------+ | |
| Float32 | false | yes | - | stable | | |
| Float32 | true | yes | fully discrete | unstable | | |
| Float64 | true | yes | fully discrete | stable | | |
| Float64 | false | no | - | unstable | | |
| Float64 | true | no | fully discrete | unstable | | |
| Float64 | true | no | single flux | stable | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Info: volumerhs_orig! details: | |
│ - 171 registers, max 256 threads | |
│ - 0 bytes local memory, | |
│ 1.113 KiB shared memory, | |
└ 0 bytes constant memory | |
Trial(7.074 ms) | |
┌ Info: volumerhs_ijk! details: | |
│ - 136 registers, max 384 threads | |
│ - 0 bytes local memory, | |
│ 1.113 KiB shared memory, |