Created
September 14, 2019 00:29
-
-
Save mwarusz/dc4d9fa142fb70421c246bba4627c2d8 to your computer and use it in GitHub Desktop.
DifferentialEquations Carpenter Kennedy LSRK vs CLIMA
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
==64089== Profiling application: julia --project=../../../env/gpu isentropicvortex.jl | |
==64089== Profiling result: | |
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Device Context Stream Name | |
ms ms KB B | |
1.15e+05 1.838327 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [136] | |
1.22e+05 5.682147 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [147] | |
1.25e+05 9.771470 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [158] | |
1.25e+05 3.028208 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [169] | |
1.25e+05 1.840918 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [172] | |
1.25e+05 5.693187 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [175] | |
1.25e+05 9.794285 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [178] | |
1.25e+05 3.027984 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [181] | |
1.25e+05 1.836919 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [184] | |
1.25e+05 5.683426 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [187] | |
1.25e+05 9.822958 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [190] | |
1.25e+05 3.027536 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [193] | |
1.25e+05 1.836534 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [196] | |
1.25e+05 5.681379 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [199] | |
1.25e+05 9.815149 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [202] | |
1.25e+05 3.028432 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [205] | |
1.25e+05 1.837719 (125000 1 1) (125 1 1) 40 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [208] | |
1.25e+05 5.670722 (125000 1 1) (5 5 5) 150 14.89844 0 Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [211] | |
1.25e+05 9.800878 (125000 1 1) (25 1 1) 106 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [214] | |
1.25e+05 3.027280 (305176 1 1) (256 1 1) 16 0.000000 0 Tesla V100-SXM2 1 7 ptxcall_update__9 [217] | |
Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows. | |
SSMem: Static shared memory allocated per CUDA block. | |
DSMem: Dynamic shared memory allocated per CUDA block. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
==63415== Profiling application: julia --project=../../../env/gpu isentropicvortex.jl | |
==63415== Profiling result: | |
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput SrcMemType DstMemType Device Context Stream Name | |
ms ms KB B MB GB/s | |
1.17e+05 2.11e-03 - - - - - 596.0464 2.7560e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD] | |
1.17e+05 1.86e-03 - - - - - 596.0464 3.1362e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD] | |
1.17e+05 1.41e-03 - - - - - 596.0464 4.1341e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD] | |
1.17e+05 1.41e-03 - - - - - 596.0464 4.1341e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD] | |
1.17e+05 0.697884 (305176 1 1) (256 1 1) 16 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous19_5 [137] | |
1.17e+05 0.698172 (305176 1 1) (256 1 1) 16 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous19_5 [141] | |
1.20e+05 1.840342 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [153] | |
1.26e+05 5.436324 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [164] | |
1.29e+05 9.773101 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [175] | |
1.30e+05 2.223572 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [185] | |
1.33e+05 2.221684 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [195] | |
1.33e+05 1.552184 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [205] | |
1.33e+05 1.841430 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [208] | |
1.33e+05 5.445603 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [211] | |
1.33e+05 9.763756 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [214] | |
1.33e+05 2.224500 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [216] | |
1.33e+05 2.221909 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [218] | |
1.33e+05 1.551479 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [220] | |
1.33e+05 1.837687 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [223] | |
1.33e+05 5.431971 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [226] | |
1.33e+05 9.784428 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [229] | |
1.33e+05 2.222004 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [231] | |
1.33e+05 2.220980 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [233] | |
1.33e+05 1.551128 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [235] | |
1.33e+05 1.836439 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [238] | |
1.33e+05 5.433379 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [241] | |
1.33e+05 9.779564 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [244] | |
1.33e+05 2.221908 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [246] | |
1.33e+05 2.220436 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [248] | |
1.33e+05 1.552280 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [250] | |
1.33e+05 1.836950 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [253] | |
1.33e+05 5.434180 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [256] | |
1.33e+05 9.783756 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [259] | |
1.33e+05 2.221620 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_9 [261] | |
1.33e+05 2.220756 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_10 [263] | |
1.33e+05 1.839318 (125000 1 1) (125 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_nodal_update_aux__6 [266] | |
1.33e+05 5.433060 (125000 1 1) (5 5 5) 150 14.89844 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__7 [269] | |
1.33e+05 9.755404 (125000 1 1) (25 1 1) 106 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__8 [272] | |
1.33e+05 1.550648 (305176 1 1) (256 1 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous23_11 [274] | |
1.33e+05 1.41e-03 - - - - - 596.0464 4.1341e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD] | |
1.33e+05 1.41e-03 - - - - - 596.0464 4.1341e+05 Device Device Tesla V100-SXM2 1 7 [CUDA memcpy DtoD] | |
Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows. | |
SSMem: Static shared memory allocated per CUDA block. | |
DSMem: Dynamic shared memory allocated per CUDA block. | |
SrcMemType: The type of source memory accessed by memory operation/copy | |
DstMemType: The type of destination memory accessed by memory operation/copy |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment