Skip to content

Instantly share code, notes, and snippets.

@mwarusz
Created September 14, 2019 00:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mwarusz/dc4d9fa142fb70421c246bba4627c2d8 to your computer and use it in GitHub Desktop.
Save mwarusz/dc4d9fa142fb70421c246bba4627c2d8 to your computer and use it in GitHub Desktop.
DifferentialEquations Carpenter Kennedy LSRK vs CLIMA
==64089== Profiling application: julia --project=../../../env/gpu isentropicvortex.jl
==64089== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Device Context Stream Name
ms ms KB B
1.15e+05 1.838327 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [136]
1.22e+05 5.682147 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [147]
1.25e+05 9.771470 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [158]
1.25e+05 3.028208 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [169]
1.25e+05 1.840918 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [172]
1.25e+05 5.693187 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [175]
1.25e+05 9.794285 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [178]
1.25e+05 3.027984 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [181]
1.25e+05 1.836919 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [184]
1.25e+05 5.683426 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [187]
1.25e+05 9.822958 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [190]
1.25e+05 3.027536 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [193]
1.25e+05 1.836534 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [196]
1.25e+05 5.681379 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [199]
1.25e+05 9.815149 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [202]
1.25e+05 3.028432 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [205]
1.25e+05 1.837719 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [208]
1.25e+05 5.670722 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [211]
1.25e+05 9.800878 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [214]
1.25e+05 3.027280 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [217]
Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows.
SSMem: Static shared memory allocated per CUDA block.
DSMem: Dynamic shared memory allocated per CUDA block.
==63415== Profiling application: julia --project=../../../env/gpu isentropicvortex.jl
==63415== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput SrcMemType DstMemType Device Context Stream Name
ms ms KB B MB GB/s
1.17e+05 2.11e-03 - - - - - 596.0464 2.7560e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD]
1.17e+05 1.86e-03 - - - - - 596.0464 3.1362e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD]
1.17e+05 1.41e-03 - - - - - 596.0464 4.1341e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD]
1.17e+05 1.41e-03 - - - - - 596.0464 4.1341e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD]
1.17e+05 0.697884 (305176 1 1) (256 1 1) 16 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous19_5 [137]
1.17e+05 0.698172 (305176 1 1) (256 1 1) 16 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous19_5 [141]
1.20e+05 1.840342 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [153]
1.26e+05 5.436324 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [164]
1.29e+05 9.773101 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [175]
1.30e+05 2.223572 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [185]
1.33e+05 2.221684 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [195]
1.33e+05 1.552184 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [205]
1.33e+05 1.841430 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [208]
1.33e+05 5.445603 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [211]
1.33e+05 9.763756 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [214]
1.33e+05 2.224500 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [216]
1.33e+05 2.221909 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [218]
1.33e+05 1.551479 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [220]
1.33e+05 1.837687 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [223]
1.33e+05 5.431971 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [226]
1.33e+05 9.784428 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [229]
1.33e+05 2.222004 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [231]
1.33e+05 2.220980 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [233]
1.33e+05 1.551128 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [235]
1.33e+05 1.836439 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [238]
1.33e+05 5.433379 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [241]
1.33e+05 9.779564 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [244]
1.33e+05 2.221908 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [246]
1.33e+05 2.220436 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [248]
1.33e+05 1.552280 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [250]
1.33e+05 1.836950 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [253]
1.33e+05 5.434180 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [256]
1.33e+05 9.783756 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [259]
1.33e+05 2.221620 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [261]
1.33e+05 2.220756 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [263]
1.33e+05 1.839318 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [266]
1.33e+05 5.433060 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [269]
1.33e+05 9.755404 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [272]
1.33e+05 1.550648 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [274]
1.33e+05 1.41e-03 - - - - - 596.0464 4.1341e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD]
1.33e+05 1.41e-03 - - - - - 596.0464 4.1341e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD]
Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows.
SSMem: Static shared memory allocated per CUDA block.
DSMem: Dynamic shared memory allocated per CUDA block.
SrcMemType: The type of source memory accessed by memory operation/copy
DstMemType: The type of destination memory accessed by memory operation/copy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment