Last active
June 27, 2019 16:35
-
-
Save lcw/955cb37a7c4b5161ea2dbe0a404ca883 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
─────────────────────────────────────────────────────────────────────────────==26324== Profiling application: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d.jl | |
==26324== Profiling result: | |
==26324== Metric result: | |
Invocations Metric Name Metric Description Min Max Avg | |
Device "Tesla V100-SXM2-16GB (0)" | |
Kernel: ptxcall_knl_dof_iteration__6 | |
55 inst_per_warp Instructions per warp 5.4765e+03 5.6132e+03 5.4946e+03 | |
55 branch_efficiency Branch Efficiency 99.36% 99.41% 99.40% | |
55 warp_execution_efficiency Warp Execution Efficiency 81.70% 83.28% 83.09% | |
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 78.49% 80.00% 79.82% | |
55 inst_replay_overhead Instruction Replay Overhead 0.000109 0.000130 0.000118 | |
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 gld_transactions_per_request Global Load Transactions Per Request 8.151501 8.153335 8.152286 | |
55 gst_transactions_per_request Global Store Transactions Per Request 8.562500 8.562500 8.562500 | |
55 shared_store_transactions Shared Store Transactions 0 0 0 | |
55 shared_load_transactions Shared Load Transactions 0 0 0 | |
55 local_load_transactions Local Load Transactions 0 0 0 | |
55 local_store_transactions Local Store Transactions 0 0 0 | |
55 gld_transactions Global Load Transactions 27545551 27551751 27548203 | |
55 gst_transactions Global Store Transactions 21043200 21043200 21043200 | |
55 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
55 sysmem_write_transactions System Memory Write Transactions 5 5 5 | |
55 l2_read_transactions L2 Read Transactions 24038187 24044532 24040579 | |
55 l2_write_transactions L2 Write Transactions 21644021 27583323 22303554 | |
55 dram_read_transactions Device Memory Read Transactions 24088233 24108071 24095624 | |
55 dram_write_transactions Device Memory Write Transactions 19803820 25722448 20348244 | |
55 global_hit_rate Global Hit Rate in unified l1/tex 33.92% 33.99% 33.96% | |
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
55 gld_requested_throughput Requested Global Load Throughput 264.96GB/s 266.85GB/s 265.71GB/s | |
55 gst_requested_throughput Requested Global Store Throughput 192.70GB/s 194.07GB/s 193.24GB/s | |
55 gld_throughput Global Load Throughput 276.48GB/s 278.49GB/s 277.26GB/s | |
55 gst_throughput Global Store Throughput 211.20GB/s 212.71GB/s 211.79GB/s | |
55 local_memory_overhead Local Memory Overhead 28.76% 28.86% 28.82% | |
55 tex_cache_hit_rate Unified Cache Hit Rate 9.48% 9.48% 9.48% | |
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 5.76% 5.77% 5.76% | |
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 52.04% 54.54% 54.07% | |
55 dram_read_throughput Device Memory Read Throughput 241.83GB/s 243.49GB/s 242.51GB/s | |
55 dram_write_throughput Device Memory Write Throughput 198.85GB/s 258.73GB/s 204.80GB/s | |
55 tex_cache_throughput Unified cache to SM throughput 314.32GB/s 316.56GB/s 315.20GB/s | |
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 241.26GB/s 242.98GB/s 241.93GB/s | |
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 211.20GB/s 212.71GB/s 211.79GB/s | |
55 l2_read_throughput L2 Throughput (Reads) 241.27GB/s 243.01GB/s 241.96GB/s | |
55 l2_write_throughput L2 Throughput (Writes) 217.52GB/s 277.72GB/s 224.48GB/s | |
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 sysmem_write_throughput System Memory Write Throughput 52.620KB/s 52.995KB/s 52.767KB/s | |
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 gld_efficiency Global Memory Load Efficiency 95.82% 95.84% 95.83% | |
55 gst_efficiency Global Memory Store Efficiency 91.24% 91.24% 91.24% | |
55 tex_cache_transactions Unified cache to SM transactions 7829345 7829548 7829430 | |
55 flop_count_dp Floating Point Operations(Double Precision) 1.0046e+10 1.0128e+10 1.0080e+10 | |
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 1770104174 1784708150 1776181149 | |
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 3711665054 3742410100 3724452773 | |
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 852212114 858881290 854984057 | |
55 flop_count_sp Floating Point Operations(Single Precision) 341925060 345009700 343206466 | |
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 170962530 172504850 171603233 | |
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 | |
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 258009914 260217020 258927146 | |
55 inst_executed Instructions Executed 557541520 1708968387 1092544931 | |
55 inst_issued Instructions Issued 557604028 571098173 559392226 | |
55 dram_utilization Device Memory Utilization Mid (6) Mid (6) Mid (6) | |
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 29.21% 31.42% 30.88% | |
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 36.90% 37.57% 37.21% | |
55 stall_memory_dependency Issue Stall Reasons (Data Request) 4.94% 5.47% 5.04% | |
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
55 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00% | |
55 stall_other Issue Stall Reasons (Other) 1.36% 1.44% 1.38% | |
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.08% 0.13% 0.11% | |
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 16.17% 17.33% 16.45% | |
55 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% | |
55 inst_fp_32 FP Instructions(Single) 1760554252 1775126460 1766610702 | |
55 inst_fp_64 FP Instructions(Double) 6644709152 6699407810 6667458562 | |
55 inst_integer Integer Instructions 4070222578 4098159680 4081836506 | |
55 inst_bit_convert Bit-Convert Instructions 39414878 39771200 39563266 | |
55 inst_control Control-Flow Instructions 1360825906 1371675230 1365338011 | |
55 inst_compute_ld_st Load/Store Instructions 182400000 182400000 182400000 | |
55 inst_misc Misc Instructions 253012128 253772630 253329151 | |
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
55 issue_slots Issue Slots 557604028 571098173 559392226 | |
55 cf_issued Issued Control-Flow Instructions 52742751 54076013 52917540 | |
55 cf_executed Executed Control-Flow Instructions 52742751 54076013 52917540 | |
55 ldst_issued Issued Load/Store Instructions 8711390 8784932 8721076 | |
55 ldst_executed Executed Load/Store Instructions 8711390 8784932 8721076 | |
55 atomic_transactions Atomic Transactions 0 0 0 | |
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 24038018 24038162 24038080 | |
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 1.16% 1.25% 1.20% | |
55 stall_not_selected Issue Stall Reasons (Not Selected) 7.61% 8.15% 7.74% | |
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 21043200 21043200 21043200 | |
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152 | |
55 nvlink_total_data_received NVLink Total Data Received 864 864 864 | |
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
55 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
55 nvlink_transmit_throughput NVLink Transmit Throughput 378.87KB/s 381.57KB/s 379.92KB/s | |
55 nvlink_receive_throughput NVLink Receive Throughput 284.15KB/s 286.17KB/s 284.94KB/s | |
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 384 291 | |
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
55 inst_fp_16 HP Instructions(Half) 0 0 0 | |
55 ipc Executed IPC 0.527755 1.571853 1.045619 | |
55 issued_ipc Issued IPC 1.533870 1.576269 1.545610 | |
55 issue_slot_utilization Issue Slot Utilization 38.35% 39.41% 38.64% | |
55 sm_efficiency Multiprocessor Activity 99.16% 99.87% 99.62% | |
55 achieved_occupancy Achieved Occupancy 0.238010 0.241668 0.238866 | |
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 2.565442 2.695913 2.595737 | |
55 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) | |
55 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1) | |
55 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) | |
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) | |
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1) | |
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (2) Low (2) Low (2) | |
55 double_precision_fu_utilization Double-Precision Function Unit Utilization High (7) High (8) High (7) | |
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.09% 0.74% 0.68% | |
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 5.32% 43.71% 39.87% | |
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_facerhs__10 | |
55 inst_per_warp Instructions per warp 1.0689e+04 1.0689e+04 1.0689e+04 | |
55 branch_efficiency Branch Efficiency 99.85% 99.85% 99.85% | |
55 warp_execution_efficiency Warp Execution Efficiency 78.06% 78.06% 78.06% | |
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 75.71% 75.71% 75.71% | |
55 inst_replay_overhead Instruction Replay Overhead 0.000559 0.000605 0.000568 | |
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 gld_transactions_per_request Global Load Transactions Per Request 14.274780 14.276180 14.275466 | |
55 gst_transactions_per_request Global Store Transactions Per Request 14.000000 14.000000 14.000000 | |
55 shared_store_transactions Shared Store Transactions 0 0 0 | |
55 shared_load_transactions Shared Load Transactions 0 0 0 | |
55 local_load_transactions Local Load Transactions 0 0 0 | |
55 local_store_transactions Local Store Transactions 0 0 0 | |
55 gld_transactions Global Load Transactions 407356986 407396944 407376568 | |
55 gst_transactions Global Store Transactions 38707200 38707200 38707200 | |
55 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
55 sysmem_write_transactions System Memory Write Transactions 5 6 5 | |
55 l2_read_transactions L2 Read Transactions 354201890 354346464 354278868 | |
55 l2_write_transactions L2 Write Transactions 45282231 77881708 48844727 | |
55 dram_read_transactions Device Memory Read Transactions 420972702 426390178 422150792 | |
55 dram_write_transactions Device Memory Write Transactions 38751602 70370623 39512013 | |
55 global_hit_rate Global Hit Rate in unified l1/tex 20.82% 20.88% 20.85% | |
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
55 gld_requested_throughput Requested Global Load Throughput 266.67GB/s 268.87GB/s 267.63GB/s | |
55 gst_requested_throughput Requested Global Store Throughput 26.243GB/s 26.460GB/s 26.338GB/s | |
55 gld_throughput Global Load Throughput 618.67GB/s 623.77GB/s 620.91GB/s | |
55 gst_throughput Global Store Throughput 58.785GB/s 59.270GB/s 58.996GB/s | |
55 local_memory_overhead Local Memory Overhead 10.13% 10.19% 10.16% | |
55 tex_cache_hit_rate Unified Cache Hit Rate 12.91% 12.95% 12.93% | |
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 9.00% 10.13% 9.90% | |
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 83.33% 84.64% 84.46% | |
55 dram_read_throughput Device Memory Read Throughput 640.08GB/s 647.73GB/s 643.43GB/s | |
55 dram_write_throughput Device Memory Write Throughput 58.855GB/s 106.90GB/s 60.223GB/s | |
55 tex_cache_throughput Unified cache to SM throughput 625.66GB/s 631.17GB/s 628.12GB/s | |
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 538.09GB/s 542.44GB/s 539.99GB/s | |
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 58.785GB/s 59.270GB/s 58.996GB/s | |
55 l2_read_throughput L2 Throughput (Reads) 538.09GB/s 542.55GB/s 539.98GB/s | |
55 l2_write_throughput L2 Throughput (Writes) 68.839GB/s 118.51GB/s 74.448GB/s | |
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 sysmem_write_throughput System Memory Write Throughput 7.9619KB/s 9.5918KB/s 8.1064KB/s | |
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 gld_efficiency Global Memory Load Efficiency 43.10% 43.10% 43.10% | |
55 gst_efficiency Global Memory Store Efficiency 44.64% 44.64% 44.64% | |
55 tex_cache_transactions Unified cache to SM transactions 102968110 103124514 103026715 | |
55 flop_count_dp Floating Point Operations(Double Precision) 5625653248 5625653248 5625653248 | |
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 473419776 473419776 473419776 | |
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 2147493888 2147493888 2147493888 | |
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 857245696 857245696 857245696 | |
55 flop_count_sp Floating Point Operations(Single Precision) 86224896 86224896 86224896 | |
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 22988800 22988800 22988800 | |
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 40247296 40247296 40247296 | |
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 129388544 129388544 129388544 | |
55 inst_executed Instructions Executed 374604800 820912128 536898373 | |
55 inst_issued Instructions Issued 374814277 374831507 374817604 | |
55 dram_utilization Device Memory Utilization High (9) Max (10) High (9) | |
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 1.02% 1.11% 1.07% | |
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 3.80% 4.07% 4.04% | |
55 stall_memory_dependency Issue Stall Reasons (Data Request) 90.20% 91.09% 90.47% | |
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
55 stall_sync Issue Stall Reasons (Synchronization) 0.02% 0.02% 0.02% | |
55 stall_other Issue Stall Reasons (Other) 0.14% 0.15% 0.15% | |
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.02% 0.03% 0.02% | |
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.20% 0.22% 0.21% | |
55 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% | |
55 inst_fp_32 FP Instructions(Single) 443373568 443373568 443373568 | |
55 inst_fp_64 FP Instructions(Double) 3581839360 3581839360 3581839360 | |
55 inst_integer Integer Instructions 3621040128 3621040128 3621040128 | |
55 inst_bit_convert Bit-Convert Instructions 80494592 80494592 80494592 | |
55 inst_control Control-Flow Instructions 466252800 466252800 466252800 | |
55 inst_compute_ld_st Load/Store Instructions 782540800 782540800 782540800 | |
55 inst_misc Misc Instructions 164288512 164288512 164288512 | |
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
55 issue_slots Issue Slots 374814277 374831507 374817604 | |
55 cf_issued Issued Control-Flow Instructions 20450304 20450304 20450304 | |
55 cf_executed Executed Control-Flow Instructions 20450304 20450304 20450304 | |
55 ldst_issued Issued Load/Store Instructions 32146432 32146432 32146432 | |
55 ldst_executed Executed Load/Store Instructions 32146432 32146432 32146432 | |
55 atomic_transactions Atomic Transactions 0 0 0 | |
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 354209772 354360212 354287459 | |
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 3.55% 4.11% 3.86% | |
55 stall_not_selected Issue Stall Reasons (Not Selected) 0.14% 0.15% 0.15% | |
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 38707200 38707200 38707200 | |
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1536 1158 | |
55 nvlink_total_data_received NVLink Total Data Received 864 1152 869 | |
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
55 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
55 nvlink_transmit_throughput NVLink Transmit Throughput 57.329KB/s 76.485KB/s 57.883KB/s | |
55 nvlink_receive_throughput NVLink Receive Throughput 42.996KB/s 57.363KB/s 43.412KB/s | |
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 384 289 | |
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
55 inst_fp_16 HP Instructions(Half) 0 0 0 | |
55 ipc Executed IPC 0.152637 0.317017 0.254990 | |
55 issued_ipc Issued IPC 0.147365 0.157554 0.156123 | |
55 issue_slot_utilization Issue Slot Utilization 3.68% 3.94% 3.90% | |
55 sm_efficiency Multiprocessor Activity 99.88% 99.93% 99.91% | |
55 achieved_occupancy Achieved Occupancy 0.123504 0.123624 0.123553 | |
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.158293 0.169265 0.167624 | |
55 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) | |
55 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1) | |
55 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) | |
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) | |
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1) | |
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.03% 0.03% 0.03% | |
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 3.37% 3.69% 3.63% | |
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_update__11 | |
55 inst_per_warp Instructions per warp 1.3460e+03 1.3460e+03 1.3460e+03 | |
55 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00% | |
55 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00% | |
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 95.54% 95.54% 95.54% | |
55 inst_replay_overhead Instruction Replay Overhead 0.000146 0.000193 0.000167 | |
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 gld_transactions_per_request Global Load Transactions Per Request 8.000000 8.000000 8.000000 | |
55 gst_transactions_per_request Global Store Transactions Per Request 8.000000 8.000000 8.000000 | |
55 shared_store_transactions Shared Store Transactions 0 0 0 | |
55 shared_load_transactions Shared Load Transactions 0 0 0 | |
55 local_load_transactions Local Load Transactions 0 0 0 | |
55 local_store_transactions Local Store Transactions 0 0 0 | |
55 gld_transactions Global Load Transactions 43200000 43200000 43200000 | |
55 gst_transactions Global Store Transactions 28800000 28800000 28800000 | |
55 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
55 sysmem_write_transactions System Memory Write Transactions 5 6 5 | |
55 l2_read_transactions L2 Read Transactions 28800152 28801812 28800821 | |
55 l2_write_transactions L2 Write Transactions 28800043 35600263 29308075 | |
55 dram_read_transactions Device Memory Read Transactions 28800013 28800685 28800153 | |
55 dram_write_transactions Device Memory Write Transactions 28796256 35596759 29780024 | |
55 global_hit_rate Global Hit Rate in unified l1/tex 60.00% 60.00% 60.00% | |
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
55 gld_requested_throughput Requested Global Load Throughput 532.33GB/s 537.47GB/s 533.93GB/s | |
55 gst_requested_throughput Requested Global Store Throughput 354.89GB/s 358.31GB/s 355.95GB/s | |
55 gld_throughput Global Load Throughput 532.33GB/s 537.47GB/s 533.93GB/s | |
55 gst_throughput Global Store Throughput 354.89GB/s 358.31GB/s 355.95GB/s | |
55 local_memory_overhead Local Memory Overhead 50.00% 50.00% 50.00% | |
55 tex_cache_hit_rate Unified Cache Hit Rate 20.00% 20.00% 20.00% | |
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 0.00% 0.00% 0.00% | |
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 100.00% 100.00% 100.00% | |
55 dram_read_throughput Device Memory Read Throughput 354.89GB/s 358.31GB/s 355.95GB/s | |
55 dram_write_throughput Device Memory Write Throughput 355.02GB/s 439.90GB/s 368.06GB/s | |
55 tex_cache_throughput Unified cache to SM throughput 621.05GB/s 627.05GB/s 622.91GB/s | |
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 354.89GB/s 358.32GB/s 355.95GB/s | |
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 354.89GB/s 358.31GB/s 355.95GB/s | |
55 l2_read_throughput L2 Throughput (Reads) 354.90GB/s 358.32GB/s 355.96GB/s | |
55 l2_write_throughput L2 Throughput (Writes) 355.15GB/s 439.76GB/s 362.23GB/s | |
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 sysmem_write_throughput System Memory Write Throughput 64.604KB/s 77.609KB/s 65.033KB/s | |
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 gld_efficiency Global Memory Load Efficiency 100.00% 100.00% 100.00% | |
55 gst_efficiency Global Memory Store Efficiency 100.00% 100.00% 100.00% | |
55 tex_cache_transactions Unified cache to SM transactions 12600000 12600000 12600000 | |
55 flop_count_dp Floating Point Operations(Double Precision) 230400000 230400000 230400000 | |
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 0 0 0 | |
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 57600000 57600000 57600000 | |
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 115200000 115200000 115200000 | |
55 flop_count_sp Floating Point Operations(Single Precision) 0 0 0 | |
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0 | |
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 | |
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 230400000 230400000 230400000 | |
55 inst_executed Instructions Executed 505800000 2422800000 1412018181 | |
55 inst_issued Instructions Issued 505874007 505898401 505885585 | |
55 dram_utilization Device Memory Utilization High (9) Max (10) High (9) | |
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 2.99% 3.55% 3.40% | |
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 17.22% 19.50% 19.23% | |
55 stall_memory_dependency Issue Stall Reasons (Data Request) 38.25% 45.38% 39.02% | |
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
55 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00% | |
55 stall_other Issue Stall Reasons (Other) 4.07% 4.60% 4.54% | |
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.06% 0.15% 0.09% | |
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 13.76% 15.57% 15.37% | |
55 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% | |
55 inst_fp_32 FP Instructions(Single) 230400000 230400000 230400000 | |
55 inst_fp_64 FP Instructions(Double) 172800000 172800000 172800000 | |
55 inst_integer Integer Instructions 1.0869e+10 1.0869e+10 1.0869e+10 | |
55 inst_bit_convert Bit-Convert Instructions 460800000 460800000 460800000 | |
55 inst_control Control-Flow Instructions 1324800000 1324800000 1324800000 | |
55 inst_compute_ld_st Load/Store Instructions 288000000 288000000 288000000 | |
55 inst_misc Misc Instructions 1612800000 1612800000 1612800000 | |
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
55 issue_slots Issue Slots 505874007 505898401 505885585 | |
55 cf_issued Issued Control-Flow Instructions 52200000 52200000 52200000 | |
55 cf_executed Executed Control-Flow Instructions 52200000 52200000 52200000 | |
55 ldst_issued Issued Load/Store Instructions 12600000 12600000 12600000 | |
55 ldst_executed Executed Load/Store Instructions 12600000 12600000 12600000 | |
55 atomic_transactions Atomic Transactions 0 0 0 | |
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 28800004 28800880 28800135 | |
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 1.49% 1.71% 1.62% | |
55 stall_not_selected Issue Stall Reasons (Not Selected) 14.98% 16.96% 16.74% | |
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 28800000 28800000 28800000 | |
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152 | |
55 nvlink_total_data_received NVLink Total Data Received 864 864 864 | |
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
55 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
55 nvlink_transmit_throughput NVLink Transmit Throughput 465.16KB/s 469.65KB/s 466.55KB/s | |
55 nvlink_receive_throughput NVLink Receive Throughput 348.87KB/s 352.23KB/s 349.91KB/s | |
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288 | |
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
55 inst_fp_16 HP Instructions(Half) 0 0 0 | |
55 ipc Executed IPC 0.440216 1.791187 1.091220 | |
55 issued_ipc Issued IPC 1.592313 1.792162 1.768414 | |
55 issue_slot_utilization Issue Slot Utilization 39.81% 44.80% 44.21% | |
55 sm_efficiency Multiprocessor Activity 96.33% 99.77% 97.42% | |
55 achieved_occupancy Achieved Occupancy 0.487766 0.488726 0.488296 | |
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 5.755249 6.469818 6.375840 | |
55 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) | |
55 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1) | |
55 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) | |
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) | |
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 special_fu_utilization Special Function Unit Utilization Low (2) Low (2) Low (2) | |
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (3) Mid (4) Low (3) | |
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00% | |
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.07% 1.23% 1.15% | |
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_volumeviscterms__7 | |
55 inst_per_warp Instructions per warp 1.1718e+03 1.1718e+03 1.1718e+03 | |
55 branch_efficiency Branch Efficiency 97.87% 97.87% 97.87% | |
55 warp_execution_efficiency Warp Execution Efficiency 97.09% 97.09% 97.09% | |
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 94.54% 94.54% 94.54% | |
55 inst_replay_overhead Instruction Replay Overhead 0.000197 0.000267 0.000227 | |
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 2.607649 2.644755 2.634227 | |
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 2.022991 2.026069 2.025061 | |
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 gld_transactions_per_request Global Load Transactions Per Request 8.111274 8.113203 8.112146 | |
55 gst_transactions_per_request Global Store Transactions Per Request 8.562500 8.562500 8.562500 | |
55 shared_store_transactions Shared Store Transactions 3884142 3890052 3888117 | |
55 shared_load_transactions Shared Load Transactions 72096267 73122198 72831114 | |
55 local_load_transactions Local Load Transactions 0 0 0 | |
55 local_store_transactions Local Store Transactions 0 0 0 | |
55 gld_transactions Global Load Transactions 40491482 40501110 40495835 | |
55 gst_transactions Global Store Transactions 34195200 34195200 34195200 | |
55 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
55 sysmem_write_transactions System Memory Write Transactions 5 6 5 | |
55 l2_read_transactions L2 Read Transactions 38923164 38932455 38927048 | |
55 l2_write_transactions L2 Write Transactions 36022613 42494947 36690644 | |
55 dram_read_transactions Device Memory Read Transactions 39352535 39435874 39366649 | |
55 dram_write_transactions Device Memory Write Transactions 32906254 39357004 33330275 | |
55 global_hit_rate Global Hit Rate in unified l1/tex 15.42% 15.47% 15.45% | |
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
55 gld_requested_throughput Requested Global Load Throughput 391.93GB/s 396.48GB/s 393.40GB/s | |
55 gst_requested_throughput Requested Global Store Throughput 314.51GB/s 318.16GB/s 315.69GB/s | |
55 gld_throughput Global Load Throughput 408.24GB/s 412.93GB/s 409.75GB/s | |
55 gst_throughput Global Store Throughput 344.71GB/s 348.71GB/s 346.00GB/s | |
55 local_memory_overhead Local Memory Overhead 13.60% 13.65% 13.63% | |
55 tex_cache_hit_rate Unified Cache Hit Rate 4.58% 4.61% 4.60% | |
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 6.59% 6.67% 6.65% | |
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 26.22% 26.25% 26.24% | |
55 dram_read_throughput Device Memory Read Throughput 396.99GB/s 401.40GB/s 398.32GB/s | |
55 dram_write_throughput Device Memory Write Throughput 332.12GB/s 397.35GB/s 337.25GB/s | |
55 tex_cache_throughput Unified cache to SM throughput 2730.0GB/s 2761.5GB/s 2740.1GB/s | |
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 392.36GB/s 396.92GB/s 393.86GB/s | |
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 344.71GB/s 348.71GB/s 346.00GB/s | |
55 l2_read_throughput L2 Throughput (Reads) 392.45GB/s 396.96GB/s 393.88GB/s | |
55 l2_write_throughput L2 Throughput (Writes) 363.26GB/s 429.92GB/s 371.25GB/s | |
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 sysmem_write_throughput System Memory Write Throughput 52.851KB/s 63.516KB/s 53.241KB/s | |
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_load_throughput Shared Memory Load Throughput 2922.9GB/s 2982.0GB/s 2947.7GB/s | |
55 shared_store_throughput Shared Memory Store Throughput 156.86GB/s 158.63GB/s 157.36GB/s | |
55 gld_efficiency Global Memory Load Efficiency 96.00% 96.02% 96.01% | |
55 gst_efficiency Global Memory Store Efficiency 91.24% 91.24% 91.24% | |
55 tex_cache_transactions Unified cache to SM transactions 67696565 67712818 67700699 | |
55 flop_count_dp Floating Point Operations(Double Precision) 2688000000 2688000065 2688000001 | |
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 134400000 134400000 134400000 | |
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 1152000000 1152000025 1152000000 | |
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 249600000 249600015 249600000 | |
55 flop_count_sp Floating Point Operations(Single Precision) 19200000 19200000 19200000 | |
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 9600000 9600000 9600000 | |
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 | |
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 19200000 19200005 19200000 | |
55 inst_executed Instructions Executed 170342400 359962845 246190102 | |
55 inst_issued Instructions Issued 170375507 170387140 170380878 | |
55 dram_utilization Device Memory Utilization High (9) Max (10) High (9) | |
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.21% 0.40% 0.30% | |
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 5.08% 5.78% 5.39% | |
55 stall_memory_dependency Issue Stall Reasons (Data Request) 44.66% 48.88% 46.80% | |
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
55 stall_sync Issue Stall Reasons (Synchronization) 10.67% 12.36% 11.25% | |
55 stall_other Issue Stall Reasons (Other) 0.43% 0.48% 0.47% | |
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.01% 0.04% 0.02% | |
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 10.00% 11.42% 10.60% | |
55 shared_efficiency Shared Memory Efficiency 32.35% 32.79% 32.47% | |
55 inst_fp_32 FP Instructions(Single) 57600000 57600005 57600000 | |
55 inst_fp_64 FP Instructions(Double) 1536000000 1536000050 1536000000 | |
55 inst_integer Integer Instructions 1992960000 1992960045 1992960000 | |
55 inst_bit_convert Bit-Convert Instructions 0 0 0 | |
55 inst_control Control-Flow Instructions 132480000 132480025 132480000 | |
55 inst_compute_ld_st Load/Store Instructions 1203840000 1203840000 1203840000 | |
55 inst_misc Misc Instructions 305280000 305280015 305280000 | |
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
55 issue_slots Issue Slots 170375507 170387140 170380878 | |
55 cf_issued Issued Control-Flow Instructions 6988800 6988835 6988800 | |
55 cf_executed Executed Control-Flow Instructions 6988800 6988835 6988800 | |
55 ldst_issued Issued Load/Store Instructions 41164800 41164805 41164800 | |
55 ldst_executed Executed Load/Store Instructions 41164800 41164805 41164800 | |
55 atomic_transactions Atomic Transactions 0 0 0 | |
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 38920483 38937843 38925673 | |
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 22.79% 26.80% 23.94% | |
55 stall_not_selected Issue Stall Reasons (Not Selected) 1.19% 1.27% 1.24% | |
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 34195200 34195200 34195200 | |
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152 | |
55 nvlink_total_data_received NVLink Total Data Received 864 864 864 | |
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
55 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
55 nvlink_transmit_throughput NVLink Transmit Throughput 380.53KB/s 384.94KB/s 381.95KB/s | |
55 nvlink_receive_throughput NVLink Receive Throughput 285.40KB/s 288.71KB/s 286.46KB/s | |
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288 | |
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
55 inst_fp_16 HP Instructions(Half) 0 0 0 | |
55 ipc Executed IPC 0.403988 0.492832 0.469417 | |
55 issued_ipc Issued IPC 0.432154 0.478131 0.469992 | |
55 issue_slot_utilization Issue Slot Utilization 10.80% 11.95% 11.75% | |
55 sm_efficiency Multiprocessor Activity 99.71% 99.89% 99.77% | |
55 achieved_occupancy Achieved Occupancy 0.488048 0.490340 0.488675 | |
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.781870 0.872176 0.855160 | |
55 shared_utilization Shared Memory Utilization Low (1) Low (2) Low (1) | |
55 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1) | |
55 tex_utilization Unified Cache Utilization Low (2) Low (2) Low (2) | |
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (2) Low (2) Low (2) | |
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1) | |
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (2) Low (2) Low (2) | |
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.02% 0.04% 0.04% | |
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 4.72% 11.76% 11.18% | |
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_anonymous19_2 | |
1 inst_per_warp Instructions per warp 137.000000 137.000000 137.000000 | |
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00% | |
1 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00% | |
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 95.71% 95.71% 95.71% | |
1 inst_replay_overhead Instruction Replay Overhead 0.000345 0.000345 0.000345 | |
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
1 gld_transactions_per_request Global Load Transactions Per Request 0.000000 0.000000 0.000000 | |
1 gst_transactions_per_request Global Store Transactions Per Request 8.000000 8.000000 8.000000 | |
1 shared_store_transactions Shared Store Transactions 0 0 0 | |
1 shared_load_transactions Shared Load Transactions 0 0 0 | |
1 local_load_transactions Local Load Transactions 0 0 0 | |
1 local_store_transactions Local Store Transactions 0 0 0 | |
1 gld_transactions Global Load Transactions 0 0 0 | |
1 gst_transactions Global Store Transactions 14400000 14400000 14400000 | |
1 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
1 sysmem_write_transactions System Memory Write Transactions 5 5 5 | |
1 l2_read_transactions L2 Read Transactions 96 96 96 | |
1 l2_write_transactions L2 Write Transactions 14400042 14400042 14400042 | |
1 dram_read_transactions Device Memory Read Transactions 67 67 67 | |
1 dram_write_transactions Device Memory Write Transactions 14381944 14381944 14381944 | |
1 global_hit_rate Global Hit Rate in unified l1/tex 0.00% 0.00% 0.00% | |
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
1 gld_requested_throughput Requested Global Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 gst_requested_throughput Requested Global Store Throughput 831.58GB/s 831.58GB/s 831.58GB/s | |
1 gld_throughput Global Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 gst_throughput Global Store Throughput 831.58GB/s 831.58GB/s 831.58GB/s | |
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00% | |
1 tex_cache_hit_rate Unified Cache Hit Rate 0.00% 0.00% 0.00% | |
1 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 0.00% 0.00% 0.00% | |
1 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 0.00% 0.00% 0.00% | |
1 dram_read_throughput Device Memory Read Throughput 3.9620MB/s 3.9620MB/s 3.9620MB/s | |
1 dram_write_throughput Device Memory Write Throughput 830.53GB/s 830.53GB/s 830.53GB/s | |
1 tex_cache_throughput Unified cache to SM throughput 415.79GB/s 415.79GB/s 415.79GB/s | |
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 0.00000B/s 0.00000B/s 0.00000B/s | |
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 831.58GB/s 831.58GB/s 831.58GB/s | |
1 l2_read_throughput L2 Throughput (Reads) 5.6769MB/s 5.6769MB/s 5.6769MB/s | |
1 l2_write_throughput L2 Throughput (Writes) 831.58GB/s 831.58GB/s 831.58GB/s | |
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 sysmem_write_throughput System Memory Write Throughput 302.77KB/s 302.77KB/s 302.76KB/s | |
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 gld_efficiency Global Memory Load Efficiency 0.00% 0.00% 0.00% | |
1 gst_efficiency Global Memory Store Efficiency 100.00% 100.00% 100.00% | |
1 tex_cache_transactions Unified cache to SM transactions 1800000 1800000 1800000 | |
1 flop_count_dp Floating Point Operations(Double Precision) 0 0 0 | |
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 0 0 0 | |
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 0 0 0 | |
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 0 0 0 | |
1 flop_count_sp Floating Point Operations(Single Precision) 0 0 0 | |
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0 | |
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 | |
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 0 0 0 | |
1 inst_executed Instructions Executed 246600000 246600000 246600000 | |
1 inst_issued Instructions Issued 54022714 54022714 54022714 | |
1 dram_utilization Device Memory Utilization Max (10) Max (10) Max (10) | |
1 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 1.05% 1.05% 1.05% | |
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 14.43% 14.43% 14.43% | |
1 stall_memory_dependency Issue Stall Reasons (Data Request) 0.00% 0.00% 0.00% | |
1 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00% | |
1 stall_other Issue Stall Reasons (Other) 0.72% 0.72% 0.72% | |
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.05% 0.05% 0.05% | |
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 23.67% 23.67% 23.67% | |
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% | |
1 inst_fp_32 FP Instructions(Single) 0 0 0 | |
1 inst_fp_64 FP Instructions(Double) 0 0 0 | |
1 inst_integer Integer Instructions 1094400000 1094400000 1094400000 | |
1 inst_bit_convert Bit-Convert Instructions 0 0 0 | |
1 inst_control Control-Flow Instructions 57600000 57600000 57600000 | |
1 inst_compute_ld_st Load/Store Instructions 57600000 57600000 57600000 | |
1 inst_misc Misc Instructions 403200000 403200000 403200000 | |
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
1 issue_slots Issue Slots 54022714 54022714 54022714 | |
1 cf_issued Issued Control-Flow Instructions 5400000 5400000 5400000 | |
1 cf_executed Executed Control-Flow Instructions 5400000 5400000 5400000 | |
1 ldst_issued Issued Load/Store Instructions 5400000 5400000 5400000 | |
1 ldst_executed Executed Load/Store Instructions 5400000 5400000 5400000 | |
1 atomic_transactions Atomic Transactions 0 0 0 | |
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 0 0 0 | |
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 57.73% 57.73% 57.73% | |
1 stall_not_selected Issue Stall Reasons (Not Selected) 2.36% 2.36% 2.36% | |
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 14400000 14400000 14400000 | |
1 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152 | |
1 nvlink_total_data_received NVLink Total Data Received 864 864 864 | |
1 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
1 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
1 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
1 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
1 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
1 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
1 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
1 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
1 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
1 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
1 nvlink_transmit_throughput NVLink Transmit Throughput 2.1288MB/s 2.1288MB/s 2.1288MB/s | |
1 nvlink_receive_throughput NVLink Receive Throughput 1.5966MB/s 1.5966MB/s 1.5966MB/s | |
1 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288 | |
1 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
1 inst_fp_16 HP Instructions(Half) 0 0 0 | |
1 ipc Executed IPC 0.864052 0.864052 0.864052 | |
1 issued_ipc Issued IPC 0.864350 0.864350 0.864350 | |
1 issue_slot_utilization Issue Slot Utilization 21.61% 21.61% 21.61% | |
1 sm_efficiency Multiprocessor Activity 99.01% 99.01% 99.01% | |
1 achieved_occupancy Achieved Occupancy 0.798799 0.798799 0.798799 | |
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 1.875142 1.875142 1.875142 | |
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) | |
1 l2_utilization L2 Cache Utilization Low (2) Low (2) Low (2) | |
1 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) | |
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) | |
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
1 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
1 special_fu_utilization Special Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (3) Low (3) Low (3) | |
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00% | |
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.00% 0.00% 0.00% | |
1 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
1 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
1 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
1 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
1 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_faceviscterms__8 | |
55 inst_per_warp Instructions per warp 4.2483e+03 4.2485e+03 4.2483e+03 | |
55 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00% | |
55 warp_execution_efficiency Warp Execution Efficiency 78.12% 78.12% 78.12% | |
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 76.13% 76.13% 76.13% | |
55 inst_replay_overhead Instruction Replay Overhead 0.000810 0.000912 0.000839 | |
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 gld_transactions_per_request Global Load Transactions Per Request 14.819581 14.820493 14.820037 | |
55 gst_transactions_per_request Global Store Transactions Per Request 14.000000 14.000000 14.000000 | |
55 shared_store_transactions Shared Store Transactions 0 0 0 | |
55 shared_load_transactions Shared Load Transactions 0 0 0 | |
55 local_load_transactions Local Load Transactions 0 0 0 | |
55 local_store_transactions Local Store Transactions 0 0 0 | |
55 gld_transactions Global Load Transactions 197672816 197684988 197678899 | |
55 gst_transactions Global Store Transactions 83865600 83865600 83865600 | |
55 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
55 sysmem_write_transactions System Memory Write Transactions 5 6 5 | |
55 l2_read_transactions L2 Read Transactions 156680398 156721484 156702202 | |
55 l2_write_transactions L2 Write Transactions 95473249 135327413 97980501 | |
55 dram_read_transactions Device Memory Read Transactions 204242200 210993489 204889957 | |
55 dram_write_transactions Device Memory Write Transactions 83858031 118426841 85473470 | |
55 global_hit_rate Global Hit Rate in unified l1/tex 37.72% 37.80% 37.76% | |
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
55 gld_requested_throughput Requested Global Load Throughput 149.93GB/s 154.94GB/s 151.37GB/s | |
55 gst_requested_throughput Requested Global Store Throughput 69.643GB/s 71.972GB/s 70.311GB/s | |
55 gld_throughput Global Load Throughput 367.71GB/s 379.99GB/s 371.23GB/s | |
55 gst_throughput Global Store Throughput 156.00GB/s 161.22GB/s 157.50GB/s | |
55 local_memory_overhead Local Memory Overhead 27.10% 27.21% 27.16% | |
55 tex_cache_hit_rate Unified Cache Hit Rate 15.16% 15.17% 15.16% | |
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 4.76% 5.44% 5.30% | |
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 76.41% 83.99% 83.00% | |
55 dram_read_throughput Device Memory Read Throughput 380.36GB/s 398.25GB/s 384.77GB/s | |
55 dram_write_throughput Device Memory Write Throughput 156.02GB/s 221.58GB/s 160.52GB/s | |
55 tex_cache_throughput Unified cache to SM throughput 339.31GB/s 350.64GB/s 342.54GB/s | |
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 291.45GB/s 301.23GB/s 294.28GB/s | |
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 156.00GB/s 161.22GB/s 157.50GB/s | |
55 l2_read_throughput L2 Throughput (Reads) 291.49GB/s 301.24GB/s 294.28GB/s | |
55 l2_write_throughput L2 Throughput (Writes) 177.75GB/s 252.38GB/s 184.00GB/s | |
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 sysmem_write_throughput System Memory Write Throughput 9.7520KB/s 11.723KB/s 9.8809KB/s | |
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 gld_efficiency Global Memory Load Efficiency 40.77% 40.78% 40.77% | |
55 gst_efficiency Global Memory Store Efficiency 44.64% 44.64% 44.64% | |
55 tex_cache_transactions Unified cache to SM transactions 45587424 45621133 45599946 | |
55 flop_count_dp Floating Point Operations(Double Precision) 1331763200 1331763824 1331763211 | |
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 126156800 126156800 126156800 | |
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 447948800 447949040 447948804 | |
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 309708800 309708944 309708802 | |
55 flop_count_sp Floating Point Operations(Single Precision) 22937600 22937600 22937600 | |
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 11468800 11468800 11468800 | |
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 | |
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 34406400 34406448 34406400 | |
55 inst_executed Instructions Executed 146194432 326281952 237869460 | |
55 inst_issued Instructions Issued 146313068 146329688 146317066 | |
55 dram_utilization Device Memory Utilization High (7) High (8) High (7) | |
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.06% 0.08% 0.07% | |
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 0.68% 0.73% 0.72% | |
55 stall_memory_dependency Issue Stall Reasons (Data Request) 96.77% 97.27% 97.02% | |
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
55 stall_sync Issue Stall Reasons (Synchronization) 0.01% 0.01% 0.01% | |
55 stall_other Issue Stall Reasons (Other) 0.04% 0.04% 0.04% | |
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.00% 0.01% 0.01% | |
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.08% 0.09% 0.09% | |
55 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% | |
55 inst_fp_32 FP Instructions(Single) 91750400 91750448 91750400 | |
55 inst_fp_64 FP Instructions(Double) 883814400 883814880 883814408 | |
55 inst_integer Integer Instructions 1818393600 1818394128 1818393609 | |
55 inst_bit_convert Bit-Convert Instructions 0 0 0 | |
55 inst_control Control-Flow Instructions 143590400 143590640 143590404 | |
55 inst_compute_ld_st Load/Store Instructions 483225600 483225600 483225600 | |
55 inst_misc Misc Instructions 188057600 188057696 188057601 | |
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
55 issue_slots Issue Slots 146313068 146329688 146317066 | |
55 cf_issued Issued Control-Flow Instructions 7123968 7124304 7123974 | |
55 cf_executed Executed Control-Flow Instructions 7123968 7124304 7123974 | |
55 ldst_issued Issued Load/Store Instructions 20173824 20173872 20173824 | |
55 ldst_executed Executed Load/Store Instructions 20173824 20173872 20173824 | |
55 atomic_transactions Atomic Transactions 0 0 0 | |
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 156680451 156720029 156702577 | |
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 1.78% 2.21% 1.97% | |
55 stall_not_selected Issue Stall Reasons (Not Selected) 0.07% 0.08% 0.08% | |
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 83865600 83865600 83865600 | |
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1536 1158 | |
55 nvlink_total_data_received NVLink Total Data Received 864 1152 869 | |
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
55 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
55 nvlink_transmit_throughput NVLink Transmit Throughput 70.217KB/s 94.164KB/s 71.319KB/s | |
55 nvlink_receive_throughput NVLink Receive Throughput 52.662KB/s 70.623KB/s 53.489KB/s | |
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 384 291 | |
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
55 inst_fp_16 HP Instructions(Half) 0 0 0 | |
55 ipc Executed IPC 0.069900 0.146818 0.106376 | |
55 issued_ipc Issued IPC 0.069839 0.075799 0.074879 | |
55 issue_slot_utilization Issue Slot Utilization 1.75% 1.89% 1.87% | |
55 sm_efficiency Multiprocessor Activity 99.67% 99.91% 99.77% | |
55 achieved_occupancy Achieved Occupancy 0.369551 0.370181 0.369719 | |
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.086572 0.094097 0.092725 | |
55 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) | |
55 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1) | |
55 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) | |
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) | |
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1) | |
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.01% 0.01% 0.01% | |
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.82% 1.07% 1.04% | |
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_volumerhs__9 | |
55 inst_per_warp Instructions per warp 3.0973e+03 3.0973e+03 3.0973e+03 | |
55 branch_efficiency Branch Efficiency 99.25% 99.25% 99.25% | |
55 warp_execution_efficiency Warp Execution Efficiency 97.42% 97.42% 97.42% | |
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 94.33% 94.33% 94.33% | |
55 inst_replay_overhead Instruction Replay Overhead 0.000128 0.000144 0.000135 | |
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 2.551209 2.623022 2.589007 | |
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 2.060532 2.078258 2.069457 | |
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
55 gld_transactions_per_request Global Load Transactions Per Request 8.278712 8.280053 8.279455 | |
55 gst_transactions_per_request Global Store Transactions Per Request 8.562500 8.562500 8.562500 | |
55 shared_store_transactions Shared Store Transactions 11552168 11651547 11602203 | |
55 shared_load_transactions Shared Load Transactions 141071636 145042598 143161714 | |
55 local_load_transactions Local Load Transactions 0 0 0 | |
55 local_store_transactions Local Store Transactions 0 0 0 | |
55 gld_transactions Global Load Transactions 117623940 117642992 117634499 | |
55 gst_transactions Global Store Transactions 15782400 15782400 15782400 | |
55 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
55 sysmem_write_transactions System Memory Write Transactions 5 6 5 | |
55 l2_read_transactions L2 Read Transactions 105764705 105811683 105791451 | |
55 l2_write_transactions L2 Write Transactions 16132844 24093821 16863459 | |
55 dram_read_transactions Device Memory Read Transactions 98968200 99165170 98983348 | |
55 dram_write_transactions Device Memory Write Transactions 14771447 22709177 15065175 | |
55 global_hit_rate Global Hit Rate in unified l1/tex 13.71% 13.78% 13.75% | |
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
55 gld_requested_throughput Requested Global Load Throughput 694.81GB/s 704.11GB/s 699.34GB/s | |
55 gst_requested_throughput Requested Global Store Throughput 90.236GB/s 91.443GB/s 90.824GB/s | |
55 gld_throughput Global Load Throughput 737.15GB/s 747.03GB/s 741.95GB/s | |
55 gst_throughput Global Store Throughput 98.898GB/s 100.22GB/s 99.543GB/s | |
55 local_memory_overhead Local Memory Overhead 5.30% 5.38% 5.34% | |
55 tex_cache_hit_rate Unified Cache Hit Rate 11.49% 11.53% 11.51% | |
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 12.19% 12.38% 12.34% | |
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 28.86% 30.14% 29.82% | |
55 dram_read_throughput Device Memory Read Throughput 620.24GB/s 628.55GB/s 624.31GB/s | |
55 dram_write_throughput Device Memory Write Throughput 92.615GB/s 142.94GB/s 95.019GB/s | |
55 tex_cache_throughput Unified cache to SM throughput 3618.8GB/s 3667.2GB/s 3642.4GB/s | |
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 662.85GB/s 671.76GB/s 667.22GB/s | |
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 98.898GB/s 100.22GB/s 99.543GB/s | |
55 l2_read_throughput L2 Throughput (Reads) 662.91GB/s 671.84GB/s 667.25GB/s | |
55 l2_write_throughput L2 Throughput (Writes) 101.13GB/s 151.39GB/s 106.36GB/s | |
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 sysmem_write_throughput System Memory Write Throughput 32.854KB/s 39.922KB/s 33.548KB/s | |
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
55 shared_load_throughput Shared Memory Load Throughput 3558.5GB/s 3679.2GB/s 3611.8GB/s | |
55 shared_store_throughput Shared Memory Store Throughput 290.21GB/s 295.79GB/s 292.71GB/s | |
55 gld_efficiency Global Memory Load Efficiency 94.25% 94.27% 94.26% | |
55 gst_efficiency Global Memory Store Efficiency 91.24% 91.24% 91.24% | |
55 tex_cache_transactions Unified cache to SM transactions 144366492 144381582 144374878 | |
55 flop_count_dp Floating Point Operations(Double Precision) 5700777984 5700777984 5700777984 | |
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 149393408 149393408 149393408 | |
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 2028079104 2028079104 2028079104 | |
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 1495226368 1495226368 1495226368 | |
55 flop_count_sp Floating Point Operations(Single Precision) 36026368 36026368 36026368 | |
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 9600000 9600000 9600000 | |
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 16826368 16826368 16826368 | |
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 54039552 54039552 54039552 | |
55 inst_executed Instructions Executed 390794240 951492608 655851650 | |
55 inst_issued Instructions Issued 390844085 390853161 390847224 | |
55 dram_utilization Device Memory Utilization High (9) Max (10) High (9) | |
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 1.96% 2.34% 2.18% | |
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 14.21% 15.27% 14.85% | |
55 stall_memory_dependency Issue Stall Reasons (Data Request) 39.86% 42.88% 40.73% | |
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
55 stall_sync Issue Stall Reasons (Synchronization) 13.02% 14.81% 13.69% | |
55 stall_other Issue Stall Reasons (Other) 0.72% 0.76% 0.75% | |
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.02% 0.07% 0.03% | |
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 15.68% 17.29% 16.30% | |
55 shared_efficiency Shared Memory Efficiency 26.73% 27.44% 27.06% | |
55 inst_fp_32 FP Instructions(Single) 156163072 156163072 156163072 | |
55 inst_fp_64 FP Instructions(Double) 3701498880 3701498880 3701498880 | |
55 inst_integer Integer Instructions 4432111616 4432111616 4432111616 | |
55 inst_bit_convert Bit-Convert Instructions 33652736 33652736 33652736 | |
55 inst_control Control-Flow Instructions 458880000 458880000 458880000 | |
55 inst_compute_ld_st Load/Store Instructions 2413440000 2413440000 2413440000 | |
55 inst_misc Misc Instructions 688093184 688093184 688093184 | |
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
55 issue_slots Issue Slots 390844085 390853161 390847224 | |
55 cf_issued Issued Control-Flow Instructions 23408640 23408640 23408640 | |
55 cf_executed Executed Control-Flow Instructions 23408640 23408640 23408640 | |
55 ldst_issued Issued Load/Store Instructions 81408000 81408000 81408000 | |
55 ldst_executed Executed Load/Store Instructions 81408000 81408000 81408000 | |
55 atomic_transactions Atomic Transactions 0 0 0 | |
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 105760813 105812749 105787169 | |
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 9.28% 10.79% 9.76% | |
55 stall_not_selected Issue Stall Reasons (Not Selected) 1.62% 1.71% 1.69% | |
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 15782400 15782400 15782400 | |
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152 | |
55 nvlink_total_data_received NVLink Total Data Received 864 864 864 | |
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
55 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
55 nvlink_transmit_throughput NVLink Transmit Throughput 236.55KB/s 239.71KB/s 238.09KB/s | |
55 nvlink_receive_throughput NVLink Receive Throughput 177.41KB/s 179.78KB/s 178.57KB/s | |
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288 | |
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
55 inst_fp_16 HP Instructions(Half) 0 0 0 | |
55 ipc Executed IPC 0.571525 0.738231 0.662514 | |
55 issued_ipc Issued IPC 0.633453 0.682593 0.675725 | |
55 issue_slot_utilization Issue Slot Utilization 15.84% 17.06% 16.89% | |
55 sm_efficiency Multiprocessor Activity 99.65% 99.91% 99.85% | |
55 achieved_occupancy Achieved Occupancy 0.247745 0.247841 0.247791 | |
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.873599 0.941228 0.931401 | |
55 shared_utilization Shared Memory Utilization Low (1) Low (2) Low (1) | |
55 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1) | |
55 tex_utilization Unified Cache Utilization Low (3) Low (3) Low (3) | |
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (2) Low (2) Low (2) | |
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1) | |
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (2) Low (3) Low (2) | |
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.02% 0.05% 0.05% | |
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 5.70% 15.52% 14.83% | |
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_knl_reverse_indefinite_stack_integral__4 | |
56 inst_per_warp Instructions per warp 2.3300e+03 2.3300e+03 2.3300e+03 | |
56 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00% | |
56 warp_execution_efficiency Warp Execution Efficiency 78.12% 78.12% 78.12% | |
56 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 77.18% 77.18% 77.18% | |
56 inst_replay_overhead Instruction Replay Overhead 0.002229 0.005157 0.003383 | |
56 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
56 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
56 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
56 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
56 gld_transactions_per_request Global Load Transactions Per Request 6.640783 6.641474 6.641229 | |
56 gst_transactions_per_request Global Store Transactions Per Request 7.000000 7.000000 7.000000 | |
56 shared_store_transactions Shared Store Transactions 0 0 0 | |
56 shared_load_transactions Shared Load Transactions 0 0 0 | |
56 local_load_transactions Local Load Transactions 0 0 0 | |
56 local_store_transactions Local Store Transactions 0 0 0 | |
56 gld_transactions Global Load Transactions 2556861 2557127 2557032 | |
56 gst_transactions Global Store Transactions 2688000 2688000 2688000 | |
56 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
56 sysmem_write_transactions System Memory Write Transactions 5 5 5 | |
56 l2_read_transactions L2 Read Transactions 2469356 2471389 2470013 | |
56 l2_write_transactions L2 Write Transactions 2911900 3747163 3026850 | |
56 dram_read_transactions Device Memory Read Transactions 2473053 2476371 2473825 | |
56 dram_write_transactions Device Memory Write Transactions 2616351 3431287 2709292 | |
56 global_hit_rate Global Hit Rate in unified l1/tex 18.61% 18.64% 18.62% | |
56 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
56 gld_requested_throughput Requested Global Load Throughput 221.00GB/s 224.94GB/s 222.62GB/s | |
56 gst_requested_throughput Requested Global Store Throughput 220.41GB/s 224.34GB/s 222.03GB/s | |
56 gld_throughput Global Load Throughput 234.84GB/s 239.02GB/s 236.55GB/s | |
56 gst_throughput Global Store Throughput 246.86GB/s 251.26GB/s 248.67GB/s | |
56 local_memory_overhead Local Memory Overhead 17.22% 17.26% 17.24% | |
56 tex_cache_hit_rate Unified Cache Hit Rate 4.19% 4.20% 4.19% | |
56 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 6.90% 6.94% 6.92% | |
56 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 28.57% 28.57% 28.57% | |
56 dram_read_throughput Device Memory Read Throughput 227.18GB/s 231.22GB/s 228.85GB/s | |
56 dram_write_throughput Device Memory Write Throughput 241.23GB/s 318.21GB/s 250.64GB/s | |
56 tex_cache_throughput Unified cache to SM throughput 338.51GB/s 344.54GB/s 340.99GB/s | |
56 l2_tex_read_throughput L2 Throughput (Texture Reads) 226.79GB/s 230.82GB/s 228.45GB/s | |
56 l2_tex_write_throughput L2 Throughput (Texture Writes) 246.86GB/s 251.26GB/s 248.67GB/s | |
56 l2_read_throughput L2 Throughput (Reads) 226.79GB/s 230.95GB/s 228.50GB/s | |
56 l2_write_throughput L2 Throughput (Writes) 267.45GB/s 348.88GB/s 280.02GB/s | |
56 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
56 sysmem_write_throughput System Memory Write Throughput 481.49KB/s 490.07KB/s 485.02KB/s | |
56 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
56 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
56 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
56 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
56 gld_efficiency Global Memory Load Efficiency 94.11% 94.12% 94.11% | |
56 gst_efficiency Global Memory Store Efficiency 89.29% 89.29% 89.29% | |
56 tex_cache_transactions Unified cache to SM transactions 921370 921595 921482 | |
56 flop_count_dp Floating Point Operations(Double Precision) 9600000 9600000 9600000 | |
56 flop_count_dp_add Floating Point Operations(Double Precision Add) 9600000 9600000 9600000 | |
56 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 0 0 0 | |
56 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 0 0 0 | |
56 flop_count_sp Floating Point Operations(Single Precision) 0 0 0 | |
56 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
56 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0 | |
56 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 | |
56 flop_count_sp_special Floating Point Operations(Single Precision Special) 0 0 0 | |
56 inst_executed Instructions Executed 1730560 2385920 2034834 | |
56 inst_issued Instructions Issued 1734417 1739485 1736259 | |
56 dram_utilization Device Memory Utilization Mid (6) High (7) Mid (6) | |
56 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
56 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.01% 0.04% 0.02% | |
56 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 1.09% 1.21% 1.18% | |
56 stall_memory_dependency Issue Stall Reasons (Data Request) 98.45% 98.70% 98.54% | |
56 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
56 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00% | |
56 stall_other Issue Stall Reasons (Other) 0.01% 0.01% 0.01% | |
56 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.04% 0.18% 0.10% | |
56 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.00% 0.02% 0.01% | |
56 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% | |
56 inst_fp_32 FP Instructions(Single) 0 0 0 | |
56 inst_fp_64 FP Instructions(Double) 9600000 9600000 9600000 | |
56 inst_integer Integer Instructions 13900800 13900800 13900800 | |
56 inst_bit_convert Bit-Convert Instructions 0 0 0 | |
56 inst_control Control-Flow Instructions 204800 204800 204800 | |
56 inst_compute_ld_st Load/Store Instructions 19225600 19225600 19225600 | |
56 inst_misc Misc Instructions 204800 204800 204800 | |
56 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
56 issue_slots Issue Slots 1734417 1739485 1736259 | |
56 cf_issued Issued Control-Flow Instructions 13312 13312 13312 | |
56 cf_executed Executed Control-Flow Instructions 13312 13312 13312 | |
56 ldst_issued Issued Load/Store Instructions 772096 772096 772096 | |
56 ldst_executed Executed Load/Store Instructions 772096 772096 772096 | |
56 atomic_transactions Atomic Transactions 0 0 0 | |
56 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
56 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
56 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
56 l2_tex_read_transactions L2 Transactions (Texture Reads) 2469232 2469614 2469428 | |
56 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.10% 0.13% 0.11% | |
56 stall_not_selected Issue Stall Reasons (Not Selected) 0.02% 0.03% 0.02% | |
56 l2_tex_write_transactions L2 Transactions (Texture Writes) 2688000 2688000 2688000 | |
56 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152 | |
56 nvlink_total_data_received NVLink Total Data Received 864 864 864 | |
56 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
56 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
56 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
56 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
56 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
56 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
56 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
56 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
56 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
56 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
56 nvlink_transmit_throughput NVLink Transmit Throughput 3.3855MB/s 3.4458MB/s 3.4103MB/s | |
56 nvlink_receive_throughput NVLink Receive Throughput 2.5391MB/s 2.5844MB/s 2.5577MB/s | |
56 nvlink_total_response_data_received NVLink Total Response Data Received 288 384 289 | |
56 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
56 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
56 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
56 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
56 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
56 inst_fp_16 HP Instructions(Half) 0 0 0 | |
56 ipc Executed IPC 0.041735 0.057263 0.050605 | |
56 issued_ipc Issued IPC 0.041797 0.046456 0.045358 | |
56 issue_slot_utilization Issue Slot Utilization 1.04% 1.16% 1.13% | |
56 sm_efficiency Multiprocessor Activity 92.21% 97.30% 95.95% | |
56 achieved_occupancy Achieved Occupancy 0.197203 0.197455 0.197328 | |
56 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.044212 0.050012 0.048370 | |
56 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) | |
56 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1) | |
56 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) | |
56 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) | |
56 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
56 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
56 special_fu_utilization Special Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
56 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
56 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
56 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
56 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
56 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00% | |
56 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.31% 0.39% 0.37% | |
56 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
56 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
56 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
56 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
56 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_initauxstate__1 | |
1 inst_per_warp Instructions per warp 340.790000 340.790000 340.790000 | |
1 branch_efficiency Branch Efficiency 99.92% 99.92% 99.92% | |
1 warp_execution_efficiency Warp Execution Efficiency 97.61% 97.61% 97.61% | |
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 94.36% 94.36% 94.36% | |
1 inst_replay_overhead Instruction Replay Overhead 0.000505 0.000505 0.000505 | |
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
1 gld_transactions_per_request Global Load Transactions Per Request 8.230789 8.230789 8.230789 | |
1 gst_transactions_per_request Global Store Transactions Per Request 8.562500 8.562500 8.562500 | |
1 shared_store_transactions Shared Store Transactions 0 0 0 | |
1 shared_load_transactions Shared Load Transactions 0 0 0 | |
1 local_load_transactions Local Load Transactions 0 0 0 | |
1 local_store_transactions Local Store Transactions 0 0 0 | |
1 gld_transactions Global Load Transactions 17699488 17699488 17699488 | |
1 gst_transactions Global Store Transactions 21043200 21043200 21043200 | |
1 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
1 sysmem_write_transactions System Memory Write Transactions 5 5 5 | |
1 l2_read_transactions L2 Read Transactions 16914451 16914451 16914451 | |
1 l2_write_transactions L2 Write Transactions 21870131 21870131 21870131 | |
1 dram_read_transactions Device Memory Read Transactions 17061180 17061180 17061180 | |
1 dram_write_transactions Device Memory Write Transactions 19995184 19995184 19995184 | |
1 global_hit_rate Global Hit Rate in unified l1/tex 41.03% 41.03% 41.03% | |
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
1 gld_requested_throughput Requested Global Load Throughput 332.35GB/s 332.35GB/s 332.35GB/s | |
1 gst_requested_throughput Requested Global Store Throughput 379.83GB/s 379.83GB/s 379.83GB/s | |
1 gld_throughput Global Load Throughput 350.15GB/s 350.15GB/s 350.15GB/s | |
1 gst_throughput Global Store Throughput 416.30GB/s 416.30GB/s 416.30GB/s | |
1 local_memory_overhead Local Memory Overhead 39.77% 39.77% 39.77% | |
1 tex_cache_hit_rate Unified Cache Hit Rate 4.08% 4.08% 4.08% | |
1 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 6.81% 6.81% 6.81% | |
1 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 53.96% 53.96% 53.96% | |
1 dram_read_throughput Device Memory Read Throughput 337.52GB/s 337.52GB/s 337.52GB/s | |
1 dram_write_throughput Device Memory Write Throughput 395.56GB/s 395.56GB/s 395.56GB/s | |
1 tex_cache_throughput Unified cache to SM throughput 409.10GB/s 409.10GB/s 409.10GB/s | |
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 334.61GB/s 334.61GB/s 334.61GB/s | |
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 416.30GB/s 416.30GB/s 416.30GB/s | |
1 l2_read_throughput L2 Throughput (Reads) 334.62GB/s 334.62GB/s 334.62GB/s | |
1 l2_write_throughput L2 Throughput (Writes) 432.66GB/s 432.66GB/s 432.66GB/s | |
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 sysmem_write_throughput System Memory Write Throughput 103.72KB/s 103.72KB/s 103.72KB/s | |
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
1 gld_efficiency Global Memory Load Efficiency 94.92% 94.92% 94.92% | |
1 gst_efficiency Global Memory Store Efficiency 91.24% 91.24% 91.24% | |
1 tex_cache_transactions Unified cache to SM transactions 5169878 5169878 5169878 | |
1 flop_count_dp Floating Point Operations(Double Precision) 76108800 76108800 76108800 | |
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 22118400 22118400 22118400 | |
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 21888000 21888000 21888000 | |
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 10214400 10214400 10214400 | |
1 flop_count_sp Floating Point Operations(Single Precision) 2918400 2918400 2918400 | |
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 1459200 1459200 1459200 | |
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 | |
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 1459200 1459200 1459200 | |
1 inst_executed Instructions Executed 104690688 104690688 104690688 | |
1 inst_issued Instructions Issued 36424653 36424653 36424653 | |
1 dram_utilization Device Memory Utilization High (9) High (9) High (9) | |
1 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.16% 0.16% 0.16% | |
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 2.04% 2.04% 2.04% | |
1 stall_memory_dependency Issue Stall Reasons (Data Request) 22.26% 22.26% 22.26% | |
1 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00% | |
1 stall_other Issue Stall Reasons (Other) 0.11% 0.11% 0.11% | |
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.04% 0.04% 0.04% | |
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.12% 0.12% 0.12% | |
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00% | |
1 inst_fp_32 FP Instructions(Single) 52992000 52992000 52992000 | |
1 inst_fp_64 FP Instructions(Double) 84480000 84480000 84480000 | |
1 inst_integer Integer Instructions 663628800 663628800 663628800 | |
1 inst_bit_convert Bit-Convert Instructions 4377600 4377600 4377600 | |
1 inst_control Control-Flow Instructions 41318400 41318400 41318400 | |
1 inst_compute_ld_st Load/Store Instructions 152755200 152755200 152755200 | |
1 inst_misc Misc Instructions 105600000 105600000 105600000 | |
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
1 issue_slots Issue Slots 36424653 36424653 36424653 | |
1 cf_issued Issued Control-Flow Instructions 2575360 2575360 2575360 | |
1 cf_executed Executed Control-Flow Instructions 2575360 2575360 2575360 | |
1 ldst_issued Issued Load/Store Instructions 5222400 5222400 5222400 | |
1 ldst_executed Executed Load/Store Instructions 5222400 5222400 5222400 | |
1 atomic_transactions Atomic Transactions 0 0 0 | |
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 16914169 16914169 16914169 | |
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 74.77% 74.77% 74.77% | |
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.50% 0.50% 0.50% | |
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 21043200 21043200 21043200 | |
1 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152 | |
1 nvlink_total_data_received NVLink Total Data Received 864 864 864 | |
1 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
1 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
1 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
1 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
1 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
1 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
1 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
1 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
1 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
1 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
1 nvlink_transmit_throughput NVLink Transmit Throughput 746.78KB/s 746.78KB/s 746.78KB/s | |
1 nvlink_receive_throughput NVLink Receive Throughput 560.09KB/s 560.09KB/s 560.09KB/s | |
1 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288 | |
1 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
1 inst_fp_16 HP Instructions(Half) 0 0 0 | |
1 ipc Executed IPC 0.210375 0.210375 0.210375 | |
1 issued_ipc Issued IPC 0.231426 0.231426 0.231426 | |
1 issue_slot_utilization Issue Slot Utilization 5.79% 5.79% 5.79% | |
1 sm_efficiency Multiprocessor Activity 99.51% 99.51% 99.51% | |
1 achieved_occupancy Achieved Occupancy 0.519094 0.519094 0.519094 | |
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.412945 0.412945 0.412945 | |
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0) | |
1 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1) | |
1 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) | |
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) | |
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
1 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1) | |
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.01% 0.01% 0.01% | |
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.35% 0.35% 0.35% | |
1 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
1 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
1 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
1 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
1 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
Kernel: ptxcall_knl_indefinite_stack_integral__3 | |
56 inst_per_warp Instructions per warp 1.4196e+04 1.4196e+04 1.4196e+04 | |
56 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00% | |
56 warp_execution_efficiency Warp Execution Efficiency 78.12% 78.12% 78.12% | |
56 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 76.59% 76.59% 76.59% | |
56 inst_replay_overhead Instruction Replay Overhead 0.001369 0.001759 0.001572 | |
56 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 1.972167 1.981350 1.976729 | |
56 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 2.006250 2.021094 2.010676 | |
56 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000 | |
56 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000 | |
56 gld_transactions_per_request Global Load Transactions Per Request 6.533448 6.533963 6.533725 | |
56 gst_transactions_per_request Global Store Transactions Per Request 7.000000 7.000000 7.000000 | |
56 shared_store_transactions Shared Store Transactions 10272 10348 10294 | |
56 shared_load_transactions Shared Load Transactions 1969012 1978180 1973566 | |
56 local_load_transactions Local Load Transactions 0 0 0 | |
56 local_store_transactions Local Store Transactions 0 0 0 | |
56 gld_transactions Global Load Transactions 7559983 7560579 7560304 | |
56 gst_transactions Global Store Transactions 2688000 2688000 2688000 | |
56 sysmem_read_transactions System Memory Read Transactions 0 0 0 | |
56 sysmem_write_transactions System Memory Write Transactions 5 6 5 | |
56 l2_read_transactions L2 Read Transactions 7384278 7391794 7387169 | |
56 l2_write_transactions L2 Write Transactions 2872413 3760617 2962915 | |
56 dram_read_transactions Device Memory Read Transactions 7831373 7833632 7832140 | |
56 dram_write_transactions Device Memory Write Transactions 2537417 3394663 2648592 | |
56 global_hit_rate Global Hit Rate in unified l1/tex 13.47% 13.48% 13.48% | |
56 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00% | |
56 gld_requested_throughput Requested Global Load Throughput 425.73GB/s 431.69GB/s 429.07GB/s | |
56 gst_requested_throughput Requested Global Store Throughput 141.78GB/s 143.77GB/s 142.90GB/s | |
56 gld_throughput Global Load Throughput 446.64GB/s 452.90GB/s 450.14GB/s | |
56 gst_throughput Global Store Throughput 158.80GB/s 161.02GB/s 160.04GB/s | |
56 local_memory_overhead Local Memory Overhead 11.96% 11.97% 11.96% | |
56 tex_cache_hit_rate Unified Cache Hit Rate 6.41% 6.42% 6.41% | |
56 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 6.39% 6.40% 6.39% | |
56 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 25.71% 25.71% 25.71% | |
56 dram_read_throughput Device Memory Read Throughput 462.72GB/s 469.21GB/s 466.33GB/s | |
56 dram_write_throughput Device Memory Write Throughput 150.04GB/s 202.05GB/s 157.70GB/s | |
56 tex_cache_throughput Unified cache to SM throughput 1059.0GB/s 1073.8GB/s 1067.3GB/s | |
56 l2_tex_read_throughput L2 Throughput (Texture Reads) 436.24GB/s 442.37GB/s 439.66GB/s | |
56 l2_tex_write_throughput L2 Throughput (Texture Writes) 158.80GB/s 161.02GB/s 160.04GB/s | |
56 l2_read_throughput L2 Throughput (Reads) 436.23GB/s 442.37GB/s 439.83GB/s | |
56 l2_write_throughput L2 Throughput (Writes) 170.65GB/s 223.56GB/s 176.41GB/s | |
56 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
56 sysmem_write_throughput System Memory Write Throughput 309.73KB/s 374.94KB/s 313.27KB/s | |
56 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
56 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s | |
56 shared_load_throughput Shared Memory Load Throughput 466.37GB/s 473.74GB/s 470.02GB/s | |
56 shared_store_throughput Shared Memory Store Throughput 2.4311GB/s 2.4726GB/s 2.4518GB/s | |
56 gld_efficiency Global Memory Load Efficiency 95.32% 95.32% 95.32% | |
56 gst_efficiency Global Memory Store Efficiency 89.29% 89.29% 89.29% | |
56 tex_cache_transactions Unified cache to SM transactions 4481199 4481734 4481505 | |
56 flop_count_dp Floating Point Operations(Double Precision) 124800000 124800000 124800000 | |
56 flop_count_dp_add Floating Point Operations(Double Precision Add) 0 0 0 | |
56 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 48000000 48000000 48000000 | |
56 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 28800000 28800000 28800000 | |
56 flop_count_sp Floating Point Operations(Single Precision) 0 0 0 | |
56 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0 | |
56 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0 | |
56 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0 | |
56 flop_count_sp_special Floating Point Operations(Single Precision Special) 0 0 0 | |
56 inst_executed Instructions Executed 9183232 14536704 11381979 | |
56 inst_issued Instructions Issued 9195807 9199313 9197578 | |
56 dram_utilization Device Memory Utilization High (8) High (8) High (8) | |
56 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1) | |
56 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.23% 1.53% 0.84% | |
56 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 7.12% 8.90% 7.60% | |
56 stall_memory_dependency Issue Stall Reasons (Data Request) 87.51% 90.75% 89.64% | |
56 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00% | |
56 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.01% 0.01% | |
56 stall_other Issue Stall Reasons (Other) 0.23% 0.29% 0.25% | |
56 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.14% 0.29% 0.21% | |
56 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.19% 0.24% 0.21% | |
56 shared_efficiency Shared Memory Efficiency 6.12% 6.14% 6.13% | |
56 inst_fp_32 FP Instructions(Single) 0 0 0 | |
56 inst_fp_64 FP Instructions(Double) 76800000 76800000 76800000 | |
56 inst_integer Integer Instructions 70988800 70988800 70988800 | |
56 inst_bit_convert Bit-Convert Instructions 0 0 0 | |
56 inst_control Control-Flow Instructions 4147200 4147200 4147200 | |
56 inst_compute_ld_st Load/Store Instructions 63616000 63616000 63616000 | |
56 inst_misc Misc Instructions 11648000 11648000 11648000 | |
56 inst_inter_thread_communication Inter-Thread Instructions 0 0 0 | |
56 issue_slots Issue Slots 9195807 9199313 9197578 | |
56 cf_issued Issued Control-Flow Instructions 256000 256000 256000 | |
56 cf_executed Executed Control-Flow Instructions 256000 256000 256000 | |
56 ldst_issued Issued Load/Store Instructions 2555904 2555904 2555904 | |
56 ldst_executed Executed Load/Store Instructions 2555904 2555904 2555904 | |
56 atomic_transactions Atomic Transactions 0 0 0 | |
56 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000 | |
56 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s | |
56 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0 | |
56 l2_tex_read_transactions L2 Transactions (Texture Reads) 7383769 7384688 7384316 | |
56 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.99% 1.20% 1.10% | |
56 stall_not_selected Issue Stall Reasons (Not Selected) 0.13% 0.17% 0.14% | |
56 l2_tex_write_transactions L2 Transactions (Texture Writes) 2688000 2688000 2688000 | |
56 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152 | |
56 nvlink_total_data_received NVLink Total Data Received 864 864 864 | |
56 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0 | |
56 nvlink_user_data_received NVLink User Data Received 0 0 0 | |
56 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00% | |
56 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00% | |
56 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0 | |
56 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0 | |
56 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0 | |
56 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0 | |
56 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0 | |
56 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0 | |
56 nvlink_transmit_throughput NVLink Transmit Throughput 2.1778MB/s 2.2083MB/s 2.1949MB/s | |
56 nvlink_receive_throughput NVLink Receive Throughput 1.6333MB/s 1.6562MB/s 1.6462MB/s | |
56 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288 | |
56 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0 | |
56 flop_count_hp Floating Point Operations(Half Precision) 0 0 0 | |
56 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0 | |
56 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0 | |
56 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0 | |
56 inst_fp_16 HP Instructions(Half) 0 0 0 | |
56 ipc Executed IPC 0.156469 0.255796 0.214442 | |
56 issued_ipc Issued IPC 0.159007 0.175934 0.169374 | |
56 issue_slot_utilization Issue Slot Utilization 3.98% 4.40% 4.23% | |
56 sm_efficiency Multiprocessor Activity 86.37% 91.45% 88.69% | |
56 achieved_occupancy Achieved Occupancy 0.109154 0.117179 0.113866 | |
56 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.166729 0.207185 0.179912 | |
56 shared_utilization Shared Memory Utilization Low (1) Low (1) Low (1) | |
56 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1) | |
56 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1) | |
56 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1) | |
56 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1) | |
56 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
56 special_fu_utilization Special Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
56 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0) | |
56 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
56 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1) | |
56 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00% | |
56 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00% | |
56 flop_dp_efficiency FLOP Efficiency(Peak Double) 2.81% 3.40% 3.15% | |
56 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0) | |
56 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1) | |
56 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00% | |
56 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00% | |
56 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00% | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
==23586== NVPROF is profiling process 23586, command: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d.jl | |
[ Info: ------------------------------------------------------ | |
[ Info: ______ _ _____ __ ________ | |
[ Info: | ____| | |_ _| ... | __ | | |
[ Info: | | | | | | | . | | | | | |
[ Info: | | | | | | | | | | |__| | | |
[ Info: | |____| |____ _| |_| | | | | | | | |
[ Info: | _____|______|_____|_| |_|_| |_| | |
[ Info: | |
[ Info: ------------------------------------------------------ | |
[ Info: Dycoms | |
[ Info: Resolution: | |
[ Info: (Δx, Δy, Δz) = (3.00e+01, 3.00e+01, 5.00e+00) | |
[ Info: (Nex, Ney, Nez) = (32, 32, 75) | |
[ Info: DoF = 57600000 | |
[ Info: Minimum necessary memory to run this test: 3.84 GBs | |
[ Info: Time step dt: 2.50e-03 | |
[ Info: End time t : 2.50e-02 | |
[ Info: ------------------------------------------------------ | |
┌ Info: Starting... | |
└ norm(Q) = 5.5625443922177753e+09 | |
┌ Info: Update | |
│ simtime = 2.5000000000000001e-03 | |
└ runtime = 00:00:15 | |
┌ Info: Finished... | |
└ norm(Q) = 5.5624841917912407e+09 | |
───────────────────────────────────────────────────────────────────────────── | |
Time Allocations | |
────────────────────── ─────────────────────── | |
Tot / % measured: 220s / 83.4% 29.9GiB / 87.0% | |
Section ncalls time %tot avg alloc %tot avg | |
───────────────────────────────────────────────────────────────────────────── | |
IC init 1 116s 63.4% 116s 12.5GiB 47.9% 12.5GiB | |
Grid init 1 20.2s 11.0% 20.2s 6.63GiB 25.5% 6.63GiB | |
solve 1 15.4s 8.41% 15.4s 1.76GiB 6.79% 1.76GiB | |
Space Disc init 1 13.2s 7.19% 13.2s 2.09GiB 8.05% 2.09GiB | |
Topo init 1 12.2s 6.65% 12.2s 1.30GiB 4.99% 1.30GiB | |
initial integral 1 4.36s 2.38% 4.36s 507MiB 1.91% 507MiB | |
Time stepping init 1 1.72s 0.94% 1.72s 1.25GiB 4.82% 1.25GiB | |
─────────────────────────────────────────────────────────────────────────────==23586== Profiling application: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d.jl | |
==23586== Profiling result: | |
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput SrcMemType DstMemType Device Context Stream Name | |
us us KB B GB GB/s | |
6.01e+07 1.44e+05 - - - - - 1.072884 7.468494 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD] | |
6.14e+07 6.03e+04 - - - - - 0.429153 7.117702 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD] | |
6.20e+07 351.6790 - - - - - 3.43e-03 9.762390 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD] | |
6.25e+07 1.17e+04 - - - - - 0.085831 7.322073 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD] | |
6.26e+07 1.19e+04 - - - - - 0.085831 7.218210 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD] | |
6.33e+07 1.952000 - - - - - 1.86e-07 0.095422 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD] | |
6.33e+07 1.568000 - - - - - 1.86e-07 0.118791 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD] | |
6.38e+07 7.28e+05 - - - - - 1.072884 1.473239 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH] | |
7.64e+07 1.50e+03 (76800 1 1) (125 1 1) 54 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_initauxstate__1 [62] | |
7.71e+07 7.52e+05 - - - - - 1.072884 1.426476 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH] | |
7.84e+07 7.52e+05 - - - - - 1.072884 1.426073 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH] | |
7.93e+07 2.99e+05 - - - - - 0.429153 1.436141 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH] | |
7.97e+07 4.02e+05 - - - - - 0.572205 1.424482 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH] | |
1.93e+08 5.86e+04 - - - - - 0.429153 7.326053 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD] | |
1.94e+08 518.8450 (225000 1 1) (256 1 1) 16 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous19_2 [81] | |
1.94e+08 7.35e+05 - - - - - 1.072884 1.460133 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH] | |
1.95e+08 2.94e+05 - - - - - 0.429153 1.462129 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH] | |
1.99e+08 498.3020 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [96] | |
2.00e+08 322.5580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [107] | |
2.05e+08 3.47e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [124] | |
2.05e+08 501.6300 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [127] | |
2.05e+08 317.9180 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [130] | |
2.07e+08 2.91e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [141] | |
2.10e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [152] | |
2.12e+08 4.68e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [163] | |
2.14e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [174] | |
2.15e+08 2.50e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [185] | |
2.15e+08 3.48e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [188] | |
2.15e+08 505.7900 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [191] | |
2.15e+08 321.8220 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [194] | |
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [197] | |
2.15e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [200] | |
2.15e+08 4.68e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [203] | |
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [206] | |
2.15e+08 2.49e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [209] | |
2.15e+08 3.47e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [212] | |
2.15e+08 497.6300 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [215] | |
2.15e+08 319.7740 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [218] | |
2.15e+08 2.91e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [221] | |
2.15e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [224] | |
2.15e+08 4.68e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [227] | |
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [230] | |
2.15e+08 2.49e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [233] | |
2.15e+08 3.47e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [236] | |
2.15e+08 500.4780 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [239] | |
2.15e+08 327.6460 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [242] | |
2.15e+08 2.91e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [245] | |
2.15e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [248] | |
2.15e+08 4.67e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [251] | |
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [254] | |
2.15e+08 2.43e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [257] | |
2.15e+08 3.16e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [260] | |
2.15e+08 498.7170 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [263] | |
2.15e+08 316.3830 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [266] | |
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [269] | |
2.15e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [272] | |
2.15e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [275] | |
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [278] | |
2.15e+08 2.43e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [281] | |
2.15e+08 3.15e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [284] | |
2.15e+08 496.1260 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [287] | |
2.15e+08 315.7430 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [290] | |
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [293] | |
2.15e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [296] | |
2.15e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [299] | |
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [302] | |
2.15e+08 2.43e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [305] | |
2.15e+08 3.16e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [308] | |
2.15e+08 497.0860 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [311] | |
2.15e+08 316.4790 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [314] | |
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [317] | |
2.15e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [320] | |
2.15e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [323] | |
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [326] | |
2.15e+08 2.42e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [329] | |
2.15e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [332] | |
2.15e+08 497.9820 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [335] | |
2.15e+08 316.2860 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [338] | |
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [341] | |
2.15e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [344] | |
2.15e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [347] | |
2.15e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [350] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [353] | |
2.16e+08 2.97e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [356] | |
2.16e+08 493.4380 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [359] | |
2.16e+08 314.9420 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [362] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [365] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [368] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [371] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [374] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [377] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [380] | |
2.16e+08 496.5410 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [383] | |
2.16e+08 316.1270 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [386] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [389] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [392] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [395] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [398] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [401] | |
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [404] | |
2.16e+08 494.9100 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [407] | |
2.16e+08 314.0460 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [410] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [413] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [416] | |
2.16e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [419] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [422] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [425] | |
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [428] | |
2.16e+08 499.0060 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [431] | |
2.16e+08 315.3900 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [434] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [437] | |
2.16e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [440] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [443] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [446] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [449] | |
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [452] | |
2.16e+08 499.8370 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [455] | |
2.16e+08 315.6150 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [458] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [461] | |
2.16e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [464] | |
2.16e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [467] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [470] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [473] | |
2.16e+08 3.00e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [476] | |
2.16e+08 495.3260 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [479] | |
2.16e+08 314.9750 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [482] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [485] | |
2.16e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [488] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [491] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [494] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [497] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [500] | |
2.16e+08 496.3810 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [503] | |
2.16e+08 315.8710 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [506] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [509] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [512] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [515] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [518] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [521] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [524] | |
2.16e+08 497.4700 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [527] | |
2.16e+08 321.9820 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [530] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [533] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [536] | |
2.16e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [539] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [542] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [545] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [548] | |
2.16e+08 495.6460 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [551] | |
2.16e+08 314.9420 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [554] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [557] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [560] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [563] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [566] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [569] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [572] | |
2.16e+08 494.5580 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [575] | |
2.16e+08 319.4540 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [578] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [581] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [584] | |
2.16e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [587] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [590] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [593] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [596] | |
2.16e+08 494.4930 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [599] | |
2.16e+08 315.3590 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [602] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [605] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [608] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [611] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [614] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [617] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [620] | |
2.16e+08 495.4210 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [623] | |
2.16e+08 316.4470 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [626] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [629] | |
2.16e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [632] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [635] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [638] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [641] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [644] | |
2.16e+08 496.3490 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [647] | |
2.16e+08 314.6550 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [650] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [653] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [656] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [659] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [662] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [665] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [668] | |
2.16e+08 494.7500 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [671] | |
2.16e+08 317.1500 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [674] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [677] | |
2.16e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [680] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [683] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [686] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [689] | |
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [692] | |
2.16e+08 494.9740 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [695] | |
2.16e+08 311.7100 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [698] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [701] | |
2.16e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [704] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [707] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [710] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [713] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [716] | |
2.16e+08 492.5420 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [719] | |
2.16e+08 316.9580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [722] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [725] | |
2.16e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [728] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [731] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [734] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [737] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [740] | |
2.16e+08 497.5970 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [743] | |
2.16e+08 317.0230 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [746] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [749] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [752] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [755] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [758] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [761] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [764] | |
2.16e+08 495.5170 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [767] | |
2.16e+08 313.7270 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [770] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [773] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [776] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [779] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [782] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [785] | |
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [788] | |
2.16e+08 494.9100 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [791] | |
2.16e+08 322.0460 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [794] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [797] | |
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [800] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [803] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [806] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [809] | |
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [812] | |
2.16e+08 494.9420 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [815] | |
2.16e+08 314.8140 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [818] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [821] | |
2.16e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [824] | |
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [827] | |
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [830] | |
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [833] | |
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [836] | |
2.16e+08 495.3890 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [839] | |
2.16e+08 319.9990 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [842] | |
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [845] | |
2.16e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [848] | |
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [851] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [854] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [857] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [860] | |
2.17e+08 495.7100 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [863] | |
2.17e+08 316.1580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [866] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [869] | |
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [872] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [875] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [878] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [881] | |
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [884] | |
2.17e+08 500.9580 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [887] | |
2.17e+08 315.5820 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [890] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [893] | |
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [896] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [899] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [902] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [905] | |
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [908] | |
2.17e+08 495.6780 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [911] | |
2.17e+08 317.7580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [914] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [917] | |
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [920] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [923] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [926] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [929] | |
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [932] | |
2.17e+08 496.0610 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [935] | |
2.17e+08 323.8070 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [938] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [941] | |
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [944] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [947] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [950] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [953] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [956] | |
2.17e+08 494.3330 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [959] | |
2.17e+08 315.3270 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [962] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [965] | |
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [968] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [971] | |
2.17e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [974] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [977] | |
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [980] | |
2.17e+08 496.5410 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [983] | |
2.17e+08 318.9430 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [986] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [989] | |
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [992] | |
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [995] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [998] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1001] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1004] | |
2.17e+08 494.0450 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1007] | |
2.17e+08 315.0070 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1010] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1013] | |
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1016] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1019] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1022] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1025] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1028] | |
2.17e+08 496.6060 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1031] | |
2.17e+08 318.8140 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1034] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1037] | |
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1040] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1043] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1046] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1049] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1052] | |
2.17e+08 495.2930 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1055] | |
2.17e+08 315.6790 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1058] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1061] | |
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1064] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1067] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1070] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1073] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1076] | |
2.17e+08 496.0300 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1079] | |
2.17e+08 318.7820 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1082] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1085] | |
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1088] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1091] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1094] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1097] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1100] | |
2.17e+08 494.5580 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1103] | |
2.17e+08 318.3980 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1106] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1109] | |
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1112] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1115] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1118] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1121] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1124] | |
2.17e+08 497.7250 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1127] | |
2.17e+08 317.1510 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1130] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1133] | |
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1136] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1139] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1142] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1145] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1148] | |
2.17e+08 497.3420 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1151] | |
2.17e+08 316.2550 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1154] | |
2.17e+08 2.93e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1157] | |
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1160] | |
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1163] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1166] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1169] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1172] | |
2.17e+08 496.9900 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1175] | |
2.17e+08 318.9740 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1178] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1181] | |
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1184] | |
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1187] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1190] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1193] | |
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1196] | |
2.17e+08 495.1010 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1199] | |
2.17e+08 315.4550 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1202] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1205] | |
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1208] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1211] | |
2.17e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1214] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1217] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1220] | |
2.17e+08 497.9810 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1223] | |
2.17e+08 314.6550 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1226] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1229] | |
2.17e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1232] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1235] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1238] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1241] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1244] | |
2.17e+08 495.1340 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1247] | |
2.17e+08 317.2470 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1250] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1253] | |
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1256] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1259] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1262] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1265] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1268] | |
2.17e+08 495.7100 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1271] | |
2.17e+08 317.5660 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1274] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1277] | |
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1280] | |
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1283] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1286] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1289] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1292] | |
2.17e+08 493.8860 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1295] | |
2.17e+08 315.1980 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1298] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1301] | |
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1304] | |
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1307] | |
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1310] | |
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1313] | |
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1316] | |
2.17e+08 493.7890 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1319] | |
2.17e+08 316.7350 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1322] | |
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1325] | |
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1328] | |
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1331] | |
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1334] | |
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1337] | |
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1340] | |
2.18e+08 498.8140 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1343] | |
2.18e+08 314.4620 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1346] | |
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1349] | |
2.18e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1352] | |
2.18e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1355] | |
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1358] | |
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1361] | |
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1364] | |
2.18e+08 495.4860 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1367] | |
2.18e+08 316.1580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1370] | |
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1373] | |
2.18e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1376] | |
2.18e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1379] | |
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1382] | |
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1385] | |
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1388] | |
2.18e+08 496.5730 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1391] | |
2.18e+08 318.2710 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1394] | |
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1397] | |
2.18e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1400] | |
2.18e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1403] | |
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1406] | |
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1409] | |
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1412] | |
2.18e+08 497.2140 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1415] | |
2.18e+08 319.9340 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1418] | |
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1421] | |
2.18e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1424] | |
2.18e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1427] | |
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1430] | |
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1433] | |
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1436] | |
2.18e+08 497.4370 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1439] | |
2.18e+08 315.6470 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1442] | |
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1445] | |
2.18e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1448] | |
2.18e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1451] | |
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1454] | |
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1457] | |
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1460] | |
2.18e+08 495.1010 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1463] | |
2.18e+08 315.1030 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1466] | |
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1469] | |
2.18e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1472] | |
2.18e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1475] | |
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1478] | |
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1481] | |
2.18e+08 2.96e+05 - - - - - 0.429153 1.448729 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH] | |
Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows. | |
SSMem: Static shared memory allocated per CUDA block. | |
DSMem: Dynamic shared memory allocated per CUDA block. | |
SrcMemType: The type of source memory accessed by memory operation/copy | |
DstMemType: The type of destination memory accessed by memory operation/copy | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment