Skip to content

Instantly share code, notes, and snippets.

@lcw
Last active June 27, 2019 16:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lcw/955cb37a7c4b5161ea2dbe0a404ca883 to your computer and use it in GitHub Desktop.
Save lcw/955cb37a7c4b5161ea2dbe0a404ca883 to your computer and use it in GitHub Desktop.
─────────────────────────────────────────────────────────────────────────────==26324== Profiling application: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d.jl
==26324== Profiling result:
==26324== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla V100-SXM2-16GB (0)"
Kernel: ptxcall_knl_dof_iteration__6
55 inst_per_warp Instructions per warp 5.4765e+03 5.6132e+03 5.4946e+03
55 branch_efficiency Branch Efficiency 99.36% 99.41% 99.40%
55 warp_execution_efficiency Warp Execution Efficiency 81.70% 83.28% 83.09%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 78.49% 80.00% 79.82%
55 inst_replay_overhead Instruction Replay Overhead 0.000109 0.000130 0.000118
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 gld_transactions_per_request Global Load Transactions Per Request 8.151501 8.153335 8.152286
55 gst_transactions_per_request Global Store Transactions Per Request 8.562500 8.562500 8.562500
55 shared_store_transactions Shared Store Transactions 0 0 0
55 shared_load_transactions Shared Load Transactions 0 0 0
55 local_load_transactions Local Load Transactions 0 0 0
55 local_store_transactions Local Store Transactions 0 0 0
55 gld_transactions Global Load Transactions 27545551 27551751 27548203
55 gst_transactions Global Store Transactions 21043200 21043200 21043200
55 sysmem_read_transactions System Memory Read Transactions 0 0 0
55 sysmem_write_transactions System Memory Write Transactions 5 5 5
55 l2_read_transactions L2 Read Transactions 24038187 24044532 24040579
55 l2_write_transactions L2 Write Transactions 21644021 27583323 22303554
55 dram_read_transactions Device Memory Read Transactions 24088233 24108071 24095624
55 dram_write_transactions Device Memory Write Transactions 19803820 25722448 20348244
55 global_hit_rate Global Hit Rate in unified l1/tex 33.92% 33.99% 33.96%
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
55 gld_requested_throughput Requested Global Load Throughput 264.96GB/s 266.85GB/s 265.71GB/s
55 gst_requested_throughput Requested Global Store Throughput 192.70GB/s 194.07GB/s 193.24GB/s
55 gld_throughput Global Load Throughput 276.48GB/s 278.49GB/s 277.26GB/s
55 gst_throughput Global Store Throughput 211.20GB/s 212.71GB/s 211.79GB/s
55 local_memory_overhead Local Memory Overhead 28.76% 28.86% 28.82%
55 tex_cache_hit_rate Unified Cache Hit Rate 9.48% 9.48% 9.48%
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 5.76% 5.77% 5.76%
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 52.04% 54.54% 54.07%
55 dram_read_throughput Device Memory Read Throughput 241.83GB/s 243.49GB/s 242.51GB/s
55 dram_write_throughput Device Memory Write Throughput 198.85GB/s 258.73GB/s 204.80GB/s
55 tex_cache_throughput Unified cache to SM throughput 314.32GB/s 316.56GB/s 315.20GB/s
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 241.26GB/s 242.98GB/s 241.93GB/s
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 211.20GB/s 212.71GB/s 211.79GB/s
55 l2_read_throughput L2 Throughput (Reads) 241.27GB/s 243.01GB/s 241.96GB/s
55 l2_write_throughput L2 Throughput (Writes) 217.52GB/s 277.72GB/s 224.48GB/s
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 sysmem_write_throughput System Memory Write Throughput 52.620KB/s 52.995KB/s 52.767KB/s
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 gld_efficiency Global Memory Load Efficiency 95.82% 95.84% 95.83%
55 gst_efficiency Global Memory Store Efficiency 91.24% 91.24% 91.24%
55 tex_cache_transactions Unified cache to SM transactions 7829345 7829548 7829430
55 flop_count_dp Floating Point Operations(Double Precision) 1.0046e+10 1.0128e+10 1.0080e+10
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 1770104174 1784708150 1776181149
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 3711665054 3742410100 3724452773
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 852212114 858881290 854984057
55 flop_count_sp Floating Point Operations(Single Precision) 341925060 345009700 343206466
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 170962530 172504850 171603233
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 258009914 260217020 258927146
55 inst_executed Instructions Executed 557541520 1708968387 1092544931
55 inst_issued Instructions Issued 557604028 571098173 559392226
55 dram_utilization Device Memory Utilization Mid (6) Mid (6) Mid (6)
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 29.21% 31.42% 30.88%
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 36.90% 37.57% 37.21%
55 stall_memory_dependency Issue Stall Reasons (Data Request) 4.94% 5.47% 5.04%
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
55 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
55 stall_other Issue Stall Reasons (Other) 1.36% 1.44% 1.38%
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.08% 0.13% 0.11%
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 16.17% 17.33% 16.45%
55 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
55 inst_fp_32 FP Instructions(Single) 1760554252 1775126460 1766610702
55 inst_fp_64 FP Instructions(Double) 6644709152 6699407810 6667458562
55 inst_integer Integer Instructions 4070222578 4098159680 4081836506
55 inst_bit_convert Bit-Convert Instructions 39414878 39771200 39563266
55 inst_control Control-Flow Instructions 1360825906 1371675230 1365338011
55 inst_compute_ld_st Load/Store Instructions 182400000 182400000 182400000
55 inst_misc Misc Instructions 253012128 253772630 253329151
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
55 issue_slots Issue Slots 557604028 571098173 559392226
55 cf_issued Issued Control-Flow Instructions 52742751 54076013 52917540
55 cf_executed Executed Control-Flow Instructions 52742751 54076013 52917540
55 ldst_issued Issued Load/Store Instructions 8711390 8784932 8721076
55 ldst_executed Executed Load/Store Instructions 8711390 8784932 8721076
55 atomic_transactions Atomic Transactions 0 0 0
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 24038018 24038162 24038080
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 1.16% 1.25% 1.20%
55 stall_not_selected Issue Stall Reasons (Not Selected) 7.61% 8.15% 7.74%
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 21043200 21043200 21043200
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152
55 nvlink_total_data_received NVLink Total Data Received 864 864 864
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
55 nvlink_user_data_received NVLink User Data Received 0 0 0
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
55 nvlink_transmit_throughput NVLink Transmit Throughput 378.87KB/s 381.57KB/s 379.92KB/s
55 nvlink_receive_throughput NVLink Receive Throughput 284.15KB/s 286.17KB/s 284.94KB/s
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 384 291
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
55 inst_fp_16 HP Instructions(Half) 0 0 0
55 ipc Executed IPC 0.527755 1.571853 1.045619
55 issued_ipc Issued IPC 1.533870 1.576269 1.545610
55 issue_slot_utilization Issue Slot Utilization 38.35% 39.41% 38.64%
55 sm_efficiency Multiprocessor Activity 99.16% 99.87% 99.62%
55 achieved_occupancy Achieved Occupancy 0.238010 0.241668 0.238866
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 2.565442 2.695913 2.595737
55 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
55 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1)
55 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1)
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (2) Low (2) Low (2)
55 double_precision_fu_utilization Double-Precision Function Unit Utilization High (7) High (8) High (7)
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.09% 0.74% 0.68%
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 5.32% 43.71% 39.87%
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_facerhs__10
55 inst_per_warp Instructions per warp 1.0689e+04 1.0689e+04 1.0689e+04
55 branch_efficiency Branch Efficiency 99.85% 99.85% 99.85%
55 warp_execution_efficiency Warp Execution Efficiency 78.06% 78.06% 78.06%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 75.71% 75.71% 75.71%
55 inst_replay_overhead Instruction Replay Overhead 0.000559 0.000605 0.000568
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 gld_transactions_per_request Global Load Transactions Per Request 14.274780 14.276180 14.275466
55 gst_transactions_per_request Global Store Transactions Per Request 14.000000 14.000000 14.000000
55 shared_store_transactions Shared Store Transactions 0 0 0
55 shared_load_transactions Shared Load Transactions 0 0 0
55 local_load_transactions Local Load Transactions 0 0 0
55 local_store_transactions Local Store Transactions 0 0 0
55 gld_transactions Global Load Transactions 407356986 407396944 407376568
55 gst_transactions Global Store Transactions 38707200 38707200 38707200
55 sysmem_read_transactions System Memory Read Transactions 0 0 0
55 sysmem_write_transactions System Memory Write Transactions 5 6 5
55 l2_read_transactions L2 Read Transactions 354201890 354346464 354278868
55 l2_write_transactions L2 Write Transactions 45282231 77881708 48844727
55 dram_read_transactions Device Memory Read Transactions 420972702 426390178 422150792
55 dram_write_transactions Device Memory Write Transactions 38751602 70370623 39512013
55 global_hit_rate Global Hit Rate in unified l1/tex 20.82% 20.88% 20.85%
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
55 gld_requested_throughput Requested Global Load Throughput 266.67GB/s 268.87GB/s 267.63GB/s
55 gst_requested_throughput Requested Global Store Throughput 26.243GB/s 26.460GB/s 26.338GB/s
55 gld_throughput Global Load Throughput 618.67GB/s 623.77GB/s 620.91GB/s
55 gst_throughput Global Store Throughput 58.785GB/s 59.270GB/s 58.996GB/s
55 local_memory_overhead Local Memory Overhead 10.13% 10.19% 10.16%
55 tex_cache_hit_rate Unified Cache Hit Rate 12.91% 12.95% 12.93%
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 9.00% 10.13% 9.90%
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 83.33% 84.64% 84.46%
55 dram_read_throughput Device Memory Read Throughput 640.08GB/s 647.73GB/s 643.43GB/s
55 dram_write_throughput Device Memory Write Throughput 58.855GB/s 106.90GB/s 60.223GB/s
55 tex_cache_throughput Unified cache to SM throughput 625.66GB/s 631.17GB/s 628.12GB/s
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 538.09GB/s 542.44GB/s 539.99GB/s
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 58.785GB/s 59.270GB/s 58.996GB/s
55 l2_read_throughput L2 Throughput (Reads) 538.09GB/s 542.55GB/s 539.98GB/s
55 l2_write_throughput L2 Throughput (Writes) 68.839GB/s 118.51GB/s 74.448GB/s
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 sysmem_write_throughput System Memory Write Throughput 7.9619KB/s 9.5918KB/s 8.1064KB/s
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 gld_efficiency Global Memory Load Efficiency 43.10% 43.10% 43.10%
55 gst_efficiency Global Memory Store Efficiency 44.64% 44.64% 44.64%
55 tex_cache_transactions Unified cache to SM transactions 102968110 103124514 103026715
55 flop_count_dp Floating Point Operations(Double Precision) 5625653248 5625653248 5625653248
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 473419776 473419776 473419776
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 2147493888 2147493888 2147493888
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 857245696 857245696 857245696
55 flop_count_sp Floating Point Operations(Single Precision) 86224896 86224896 86224896
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 22988800 22988800 22988800
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 40247296 40247296 40247296
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 129388544 129388544 129388544
55 inst_executed Instructions Executed 374604800 820912128 536898373
55 inst_issued Instructions Issued 374814277 374831507 374817604
55 dram_utilization Device Memory Utilization High (9) Max (10) High (9)
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 1.02% 1.11% 1.07%
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 3.80% 4.07% 4.04%
55 stall_memory_dependency Issue Stall Reasons (Data Request) 90.20% 91.09% 90.47%
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
55 stall_sync Issue Stall Reasons (Synchronization) 0.02% 0.02% 0.02%
55 stall_other Issue Stall Reasons (Other) 0.14% 0.15% 0.15%
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.02% 0.03% 0.02%
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.20% 0.22% 0.21%
55 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
55 inst_fp_32 FP Instructions(Single) 443373568 443373568 443373568
55 inst_fp_64 FP Instructions(Double) 3581839360 3581839360 3581839360
55 inst_integer Integer Instructions 3621040128 3621040128 3621040128
55 inst_bit_convert Bit-Convert Instructions 80494592 80494592 80494592
55 inst_control Control-Flow Instructions 466252800 466252800 466252800
55 inst_compute_ld_st Load/Store Instructions 782540800 782540800 782540800
55 inst_misc Misc Instructions 164288512 164288512 164288512
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
55 issue_slots Issue Slots 374814277 374831507 374817604
55 cf_issued Issued Control-Flow Instructions 20450304 20450304 20450304
55 cf_executed Executed Control-Flow Instructions 20450304 20450304 20450304
55 ldst_issued Issued Load/Store Instructions 32146432 32146432 32146432
55 ldst_executed Executed Load/Store Instructions 32146432 32146432 32146432
55 atomic_transactions Atomic Transactions 0 0 0
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 354209772 354360212 354287459
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 3.55% 4.11% 3.86%
55 stall_not_selected Issue Stall Reasons (Not Selected) 0.14% 0.15% 0.15%
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 38707200 38707200 38707200
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1536 1158
55 nvlink_total_data_received NVLink Total Data Received 864 1152 869
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
55 nvlink_user_data_received NVLink User Data Received 0 0 0
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
55 nvlink_transmit_throughput NVLink Transmit Throughput 57.329KB/s 76.485KB/s 57.883KB/s
55 nvlink_receive_throughput NVLink Receive Throughput 42.996KB/s 57.363KB/s 43.412KB/s
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 384 289
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
55 inst_fp_16 HP Instructions(Half) 0 0 0
55 ipc Executed IPC 0.152637 0.317017 0.254990
55 issued_ipc Issued IPC 0.147365 0.157554 0.156123
55 issue_slot_utilization Issue Slot Utilization 3.68% 3.94% 3.90%
55 sm_efficiency Multiprocessor Activity 99.88% 99.93% 99.91%
55 achieved_occupancy Achieved Occupancy 0.123504 0.123624 0.123553
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.158293 0.169265 0.167624
55 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
55 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1)
55 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1)
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1)
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.03% 0.03% 0.03%
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 3.37% 3.69% 3.63%
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_update__11
55 inst_per_warp Instructions per warp 1.3460e+03 1.3460e+03 1.3460e+03
55 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
55 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 95.54% 95.54% 95.54%
55 inst_replay_overhead Instruction Replay Overhead 0.000146 0.000193 0.000167
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 gld_transactions_per_request Global Load Transactions Per Request 8.000000 8.000000 8.000000
55 gst_transactions_per_request Global Store Transactions Per Request 8.000000 8.000000 8.000000
55 shared_store_transactions Shared Store Transactions 0 0 0
55 shared_load_transactions Shared Load Transactions 0 0 0
55 local_load_transactions Local Load Transactions 0 0 0
55 local_store_transactions Local Store Transactions 0 0 0
55 gld_transactions Global Load Transactions 43200000 43200000 43200000
55 gst_transactions Global Store Transactions 28800000 28800000 28800000
55 sysmem_read_transactions System Memory Read Transactions 0 0 0
55 sysmem_write_transactions System Memory Write Transactions 5 6 5
55 l2_read_transactions L2 Read Transactions 28800152 28801812 28800821
55 l2_write_transactions L2 Write Transactions 28800043 35600263 29308075
55 dram_read_transactions Device Memory Read Transactions 28800013 28800685 28800153
55 dram_write_transactions Device Memory Write Transactions 28796256 35596759 29780024
55 global_hit_rate Global Hit Rate in unified l1/tex 60.00% 60.00% 60.00%
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
55 gld_requested_throughput Requested Global Load Throughput 532.33GB/s 537.47GB/s 533.93GB/s
55 gst_requested_throughput Requested Global Store Throughput 354.89GB/s 358.31GB/s 355.95GB/s
55 gld_throughput Global Load Throughput 532.33GB/s 537.47GB/s 533.93GB/s
55 gst_throughput Global Store Throughput 354.89GB/s 358.31GB/s 355.95GB/s
55 local_memory_overhead Local Memory Overhead 50.00% 50.00% 50.00%
55 tex_cache_hit_rate Unified Cache Hit Rate 20.00% 20.00% 20.00%
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 0.00% 0.00% 0.00%
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 100.00% 100.00% 100.00%
55 dram_read_throughput Device Memory Read Throughput 354.89GB/s 358.31GB/s 355.95GB/s
55 dram_write_throughput Device Memory Write Throughput 355.02GB/s 439.90GB/s 368.06GB/s
55 tex_cache_throughput Unified cache to SM throughput 621.05GB/s 627.05GB/s 622.91GB/s
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 354.89GB/s 358.32GB/s 355.95GB/s
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 354.89GB/s 358.31GB/s 355.95GB/s
55 l2_read_throughput L2 Throughput (Reads) 354.90GB/s 358.32GB/s 355.96GB/s
55 l2_write_throughput L2 Throughput (Writes) 355.15GB/s 439.76GB/s 362.23GB/s
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 sysmem_write_throughput System Memory Write Throughput 64.604KB/s 77.609KB/s 65.033KB/s
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 gld_efficiency Global Memory Load Efficiency 100.00% 100.00% 100.00%
55 gst_efficiency Global Memory Store Efficiency 100.00% 100.00% 100.00%
55 tex_cache_transactions Unified cache to SM transactions 12600000 12600000 12600000
55 flop_count_dp Floating Point Operations(Double Precision) 230400000 230400000 230400000
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 0 0 0
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 57600000 57600000 57600000
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 115200000 115200000 115200000
55 flop_count_sp Floating Point Operations(Single Precision) 0 0 0
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 230400000 230400000 230400000
55 inst_executed Instructions Executed 505800000 2422800000 1412018181
55 inst_issued Instructions Issued 505874007 505898401 505885585
55 dram_utilization Device Memory Utilization High (9) Max (10) High (9)
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 2.99% 3.55% 3.40%
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 17.22% 19.50% 19.23%
55 stall_memory_dependency Issue Stall Reasons (Data Request) 38.25% 45.38% 39.02%
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
55 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
55 stall_other Issue Stall Reasons (Other) 4.07% 4.60% 4.54%
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.06% 0.15% 0.09%
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 13.76% 15.57% 15.37%
55 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
55 inst_fp_32 FP Instructions(Single) 230400000 230400000 230400000
55 inst_fp_64 FP Instructions(Double) 172800000 172800000 172800000
55 inst_integer Integer Instructions 1.0869e+10 1.0869e+10 1.0869e+10
55 inst_bit_convert Bit-Convert Instructions 460800000 460800000 460800000
55 inst_control Control-Flow Instructions 1324800000 1324800000 1324800000
55 inst_compute_ld_st Load/Store Instructions 288000000 288000000 288000000
55 inst_misc Misc Instructions 1612800000 1612800000 1612800000
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
55 issue_slots Issue Slots 505874007 505898401 505885585
55 cf_issued Issued Control-Flow Instructions 52200000 52200000 52200000
55 cf_executed Executed Control-Flow Instructions 52200000 52200000 52200000
55 ldst_issued Issued Load/Store Instructions 12600000 12600000 12600000
55 ldst_executed Executed Load/Store Instructions 12600000 12600000 12600000
55 atomic_transactions Atomic Transactions 0 0 0
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 28800004 28800880 28800135
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 1.49% 1.71% 1.62%
55 stall_not_selected Issue Stall Reasons (Not Selected) 14.98% 16.96% 16.74%
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 28800000 28800000 28800000
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152
55 nvlink_total_data_received NVLink Total Data Received 864 864 864
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
55 nvlink_user_data_received NVLink User Data Received 0 0 0
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
55 nvlink_transmit_throughput NVLink Transmit Throughput 465.16KB/s 469.65KB/s 466.55KB/s
55 nvlink_receive_throughput NVLink Receive Throughput 348.87KB/s 352.23KB/s 349.91KB/s
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
55 inst_fp_16 HP Instructions(Half) 0 0 0
55 ipc Executed IPC 0.440216 1.791187 1.091220
55 issued_ipc Issued IPC 1.592313 1.792162 1.768414
55 issue_slot_utilization Issue Slot Utilization 39.81% 44.80% 44.21%
55 sm_efficiency Multiprocessor Activity 96.33% 99.77% 97.42%
55 achieved_occupancy Achieved Occupancy 0.487766 0.488726 0.488296
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 5.755249 6.469818 6.375840
55 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
55 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1)
55 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1)
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 special_fu_utilization Special Function Unit Utilization Low (2) Low (2) Low (2)
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (3) Mid (4) Low (3)
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1)
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00%
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.07% 1.23% 1.15%
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_volumeviscterms__7
55 inst_per_warp Instructions per warp 1.1718e+03 1.1718e+03 1.1718e+03
55 branch_efficiency Branch Efficiency 97.87% 97.87% 97.87%
55 warp_execution_efficiency Warp Execution Efficiency 97.09% 97.09% 97.09%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 94.54% 94.54% 94.54%
55 inst_replay_overhead Instruction Replay Overhead 0.000197 0.000267 0.000227
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 2.607649 2.644755 2.634227
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 2.022991 2.026069 2.025061
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 gld_transactions_per_request Global Load Transactions Per Request 8.111274 8.113203 8.112146
55 gst_transactions_per_request Global Store Transactions Per Request 8.562500 8.562500 8.562500
55 shared_store_transactions Shared Store Transactions 3884142 3890052 3888117
55 shared_load_transactions Shared Load Transactions 72096267 73122198 72831114
55 local_load_transactions Local Load Transactions 0 0 0
55 local_store_transactions Local Store Transactions 0 0 0
55 gld_transactions Global Load Transactions 40491482 40501110 40495835
55 gst_transactions Global Store Transactions 34195200 34195200 34195200
55 sysmem_read_transactions System Memory Read Transactions 0 0 0
55 sysmem_write_transactions System Memory Write Transactions 5 6 5
55 l2_read_transactions L2 Read Transactions 38923164 38932455 38927048
55 l2_write_transactions L2 Write Transactions 36022613 42494947 36690644
55 dram_read_transactions Device Memory Read Transactions 39352535 39435874 39366649
55 dram_write_transactions Device Memory Write Transactions 32906254 39357004 33330275
55 global_hit_rate Global Hit Rate in unified l1/tex 15.42% 15.47% 15.45%
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
55 gld_requested_throughput Requested Global Load Throughput 391.93GB/s 396.48GB/s 393.40GB/s
55 gst_requested_throughput Requested Global Store Throughput 314.51GB/s 318.16GB/s 315.69GB/s
55 gld_throughput Global Load Throughput 408.24GB/s 412.93GB/s 409.75GB/s
55 gst_throughput Global Store Throughput 344.71GB/s 348.71GB/s 346.00GB/s
55 local_memory_overhead Local Memory Overhead 13.60% 13.65% 13.63%
55 tex_cache_hit_rate Unified Cache Hit Rate 4.58% 4.61% 4.60%
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 6.59% 6.67% 6.65%
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 26.22% 26.25% 26.24%
55 dram_read_throughput Device Memory Read Throughput 396.99GB/s 401.40GB/s 398.32GB/s
55 dram_write_throughput Device Memory Write Throughput 332.12GB/s 397.35GB/s 337.25GB/s
55 tex_cache_throughput Unified cache to SM throughput 2730.0GB/s 2761.5GB/s 2740.1GB/s
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 392.36GB/s 396.92GB/s 393.86GB/s
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 344.71GB/s 348.71GB/s 346.00GB/s
55 l2_read_throughput L2 Throughput (Reads) 392.45GB/s 396.96GB/s 393.88GB/s
55 l2_write_throughput L2 Throughput (Writes) 363.26GB/s 429.92GB/s 371.25GB/s
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 sysmem_write_throughput System Memory Write Throughput 52.851KB/s 63.516KB/s 53.241KB/s
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_load_throughput Shared Memory Load Throughput 2922.9GB/s 2982.0GB/s 2947.7GB/s
55 shared_store_throughput Shared Memory Store Throughput 156.86GB/s 158.63GB/s 157.36GB/s
55 gld_efficiency Global Memory Load Efficiency 96.00% 96.02% 96.01%
55 gst_efficiency Global Memory Store Efficiency 91.24% 91.24% 91.24%
55 tex_cache_transactions Unified cache to SM transactions 67696565 67712818 67700699
55 flop_count_dp Floating Point Operations(Double Precision) 2688000000 2688000065 2688000001
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 134400000 134400000 134400000
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 1152000000 1152000025 1152000000
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 249600000 249600015 249600000
55 flop_count_sp Floating Point Operations(Single Precision) 19200000 19200000 19200000
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 9600000 9600000 9600000
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 19200000 19200005 19200000
55 inst_executed Instructions Executed 170342400 359962845 246190102
55 inst_issued Instructions Issued 170375507 170387140 170380878
55 dram_utilization Device Memory Utilization High (9) Max (10) High (9)
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.21% 0.40% 0.30%
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 5.08% 5.78% 5.39%
55 stall_memory_dependency Issue Stall Reasons (Data Request) 44.66% 48.88% 46.80%
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
55 stall_sync Issue Stall Reasons (Synchronization) 10.67% 12.36% 11.25%
55 stall_other Issue Stall Reasons (Other) 0.43% 0.48% 0.47%
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.01% 0.04% 0.02%
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 10.00% 11.42% 10.60%
55 shared_efficiency Shared Memory Efficiency 32.35% 32.79% 32.47%
55 inst_fp_32 FP Instructions(Single) 57600000 57600005 57600000
55 inst_fp_64 FP Instructions(Double) 1536000000 1536000050 1536000000
55 inst_integer Integer Instructions 1992960000 1992960045 1992960000
55 inst_bit_convert Bit-Convert Instructions 0 0 0
55 inst_control Control-Flow Instructions 132480000 132480025 132480000
55 inst_compute_ld_st Load/Store Instructions 1203840000 1203840000 1203840000
55 inst_misc Misc Instructions 305280000 305280015 305280000
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
55 issue_slots Issue Slots 170375507 170387140 170380878
55 cf_issued Issued Control-Flow Instructions 6988800 6988835 6988800
55 cf_executed Executed Control-Flow Instructions 6988800 6988835 6988800
55 ldst_issued Issued Load/Store Instructions 41164800 41164805 41164800
55 ldst_executed Executed Load/Store Instructions 41164800 41164805 41164800
55 atomic_transactions Atomic Transactions 0 0 0
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 38920483 38937843 38925673
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 22.79% 26.80% 23.94%
55 stall_not_selected Issue Stall Reasons (Not Selected) 1.19% 1.27% 1.24%
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 34195200 34195200 34195200
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152
55 nvlink_total_data_received NVLink Total Data Received 864 864 864
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
55 nvlink_user_data_received NVLink User Data Received 0 0 0
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
55 nvlink_transmit_throughput NVLink Transmit Throughput 380.53KB/s 384.94KB/s 381.95KB/s
55 nvlink_receive_throughput NVLink Receive Throughput 285.40KB/s 288.71KB/s 286.46KB/s
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
55 inst_fp_16 HP Instructions(Half) 0 0 0
55 ipc Executed IPC 0.403988 0.492832 0.469417
55 issued_ipc Issued IPC 0.432154 0.478131 0.469992
55 issue_slot_utilization Issue Slot Utilization 10.80% 11.95% 11.75%
55 sm_efficiency Multiprocessor Activity 99.71% 99.89% 99.77%
55 achieved_occupancy Achieved Occupancy 0.488048 0.490340 0.488675
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.781870 0.872176 0.855160
55 shared_utilization Shared Memory Utilization Low (1) Low (2) Low (1)
55 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1)
55 tex_utilization Unified Cache Utilization Low (2) Low (2) Low (2)
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (2) Low (2) Low (2)
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (2) Low (2) Low (2)
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.02% 0.04% 0.04%
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 4.72% 11.76% 11.18%
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_anonymous19_2
1 inst_per_warp Instructions per warp 137.000000 137.000000 137.000000
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 95.71% 95.71% 95.71%
1 inst_replay_overhead Instruction Replay Overhead 0.000345 0.000345 0.000345
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 0.000000 0.000000 0.000000
1 gst_transactions_per_request Global Store Transactions Per Request 8.000000 8.000000 8.000000
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 0 0 0
1 gst_transactions Global Store Transactions 14400000 14400000 14400000
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 5 5 5
1 l2_read_transactions L2 Read Transactions 96 96 96
1 l2_write_transactions L2 Write Transactions 14400042 14400042 14400042
1 dram_read_transactions Device Memory Read Transactions 67 67 67
1 dram_write_transactions Device Memory Write Transactions 14381944 14381944 14381944
1 global_hit_rate Global Hit Rate in unified l1/tex 0.00% 0.00% 0.00%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gst_requested_throughput Requested Global Store Throughput 831.58GB/s 831.58GB/s 831.58GB/s
1 gld_throughput Global Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gst_throughput Global Store Throughput 831.58GB/s 831.58GB/s 831.58GB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 0.00% 0.00% 0.00%
1 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 0.00% 0.00% 0.00%
1 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 0.00% 0.00% 0.00%
1 dram_read_throughput Device Memory Read Throughput 3.9620MB/s 3.9620MB/s 3.9620MB/s
1 dram_write_throughput Device Memory Write Throughput 830.53GB/s 830.53GB/s 830.53GB/s
1 tex_cache_throughput Unified cache to SM throughput 415.79GB/s 415.79GB/s 415.79GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 831.58GB/s 831.58GB/s 831.58GB/s
1 l2_read_throughput L2 Throughput (Reads) 5.6769MB/s 5.6769MB/s 5.6769MB/s
1 l2_write_throughput L2 Throughput (Writes) 831.58GB/s 831.58GB/s 831.58GB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 302.77KB/s 302.77KB/s 302.76KB/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 0.00% 0.00% 0.00%
1 gst_efficiency Global Memory Store Efficiency 100.00% 100.00% 100.00%
1 tex_cache_transactions Unified cache to SM transactions 1800000 1800000 1800000
1 flop_count_dp Floating Point Operations(Double Precision) 0 0 0
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 0 0 0
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 0 0 0
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 0 0 0
1 flop_count_sp Floating Point Operations(Single Precision) 0 0 0
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 0 0 0
1 inst_executed Instructions Executed 246600000 246600000 246600000
1 inst_issued Instructions Issued 54022714 54022714 54022714
1 dram_utilization Device Memory Utilization Max (10) Max (10) Max (10)
1 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 1.05% 1.05% 1.05%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 14.43% 14.43% 14.43%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 0.00% 0.00% 0.00%
1 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 0.72% 0.72% 0.72%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.05% 0.05% 0.05%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 23.67% 23.67% 23.67%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 0 0 0
1 inst_fp_64 FP Instructions(Double) 0 0 0
1 inst_integer Integer Instructions 1094400000 1094400000 1094400000
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 57600000 57600000 57600000
1 inst_compute_ld_st Load/Store Instructions 57600000 57600000 57600000
1 inst_misc Misc Instructions 403200000 403200000 403200000
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 54022714 54022714 54022714
1 cf_issued Issued Control-Flow Instructions 5400000 5400000 5400000
1 cf_executed Executed Control-Flow Instructions 5400000 5400000 5400000
1 ldst_issued Issued Load/Store Instructions 5400000 5400000 5400000
1 ldst_executed Executed Load/Store Instructions 5400000 5400000 5400000
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 0 0 0
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 57.73% 57.73% 57.73%
1 stall_not_selected Issue Stall Reasons (Not Selected) 2.36% 2.36% 2.36%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 14400000 14400000 14400000
1 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152
1 nvlink_total_data_received NVLink Total Data Received 864 864 864
1 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
1 nvlink_user_data_received NVLink User Data Received 0 0 0
1 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
1 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
1 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
1 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
1 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
1 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
1 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
1 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
1 nvlink_transmit_throughput NVLink Transmit Throughput 2.1288MB/s 2.1288MB/s 2.1288MB/s
1 nvlink_receive_throughput NVLink Receive Throughput 1.5966MB/s 1.5966MB/s 1.5966MB/s
1 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288
1 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 ipc Executed IPC 0.864052 0.864052 0.864052
1 issued_ipc Issued IPC 0.864350 0.864350 0.864350
1 issue_slot_utilization Issue Slot Utilization 21.61% 21.61% 21.61%
1 sm_efficiency Multiprocessor Activity 99.01% 99.01% 99.01%
1 achieved_occupancy Achieved Occupancy 0.798799 0.798799 0.798799
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 1.875142 1.875142 1.875142
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 l2_utilization L2 Cache Utilization Low (2) Low (2) Low (2)
1 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
1 special_fu_utilization Special Function Unit Utilization Idle (0) Idle (0) Idle (0)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (3) Low (3) Low (3)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.00% 0.00% 0.00%
1 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
1 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
1 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
1 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
1 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_faceviscterms__8
55 inst_per_warp Instructions per warp 4.2483e+03 4.2485e+03 4.2483e+03
55 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
55 warp_execution_efficiency Warp Execution Efficiency 78.12% 78.12% 78.12%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 76.13% 76.13% 76.13%
55 inst_replay_overhead Instruction Replay Overhead 0.000810 0.000912 0.000839
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 gld_transactions_per_request Global Load Transactions Per Request 14.819581 14.820493 14.820037
55 gst_transactions_per_request Global Store Transactions Per Request 14.000000 14.000000 14.000000
55 shared_store_transactions Shared Store Transactions 0 0 0
55 shared_load_transactions Shared Load Transactions 0 0 0
55 local_load_transactions Local Load Transactions 0 0 0
55 local_store_transactions Local Store Transactions 0 0 0
55 gld_transactions Global Load Transactions 197672816 197684988 197678899
55 gst_transactions Global Store Transactions 83865600 83865600 83865600
55 sysmem_read_transactions System Memory Read Transactions 0 0 0
55 sysmem_write_transactions System Memory Write Transactions 5 6 5
55 l2_read_transactions L2 Read Transactions 156680398 156721484 156702202
55 l2_write_transactions L2 Write Transactions 95473249 135327413 97980501
55 dram_read_transactions Device Memory Read Transactions 204242200 210993489 204889957
55 dram_write_transactions Device Memory Write Transactions 83858031 118426841 85473470
55 global_hit_rate Global Hit Rate in unified l1/tex 37.72% 37.80% 37.76%
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
55 gld_requested_throughput Requested Global Load Throughput 149.93GB/s 154.94GB/s 151.37GB/s
55 gst_requested_throughput Requested Global Store Throughput 69.643GB/s 71.972GB/s 70.311GB/s
55 gld_throughput Global Load Throughput 367.71GB/s 379.99GB/s 371.23GB/s
55 gst_throughput Global Store Throughput 156.00GB/s 161.22GB/s 157.50GB/s
55 local_memory_overhead Local Memory Overhead 27.10% 27.21% 27.16%
55 tex_cache_hit_rate Unified Cache Hit Rate 15.16% 15.17% 15.16%
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 4.76% 5.44% 5.30%
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 76.41% 83.99% 83.00%
55 dram_read_throughput Device Memory Read Throughput 380.36GB/s 398.25GB/s 384.77GB/s
55 dram_write_throughput Device Memory Write Throughput 156.02GB/s 221.58GB/s 160.52GB/s
55 tex_cache_throughput Unified cache to SM throughput 339.31GB/s 350.64GB/s 342.54GB/s
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 291.45GB/s 301.23GB/s 294.28GB/s
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 156.00GB/s 161.22GB/s 157.50GB/s
55 l2_read_throughput L2 Throughput (Reads) 291.49GB/s 301.24GB/s 294.28GB/s
55 l2_write_throughput L2 Throughput (Writes) 177.75GB/s 252.38GB/s 184.00GB/s
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 sysmem_write_throughput System Memory Write Throughput 9.7520KB/s 11.723KB/s 9.8809KB/s
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 gld_efficiency Global Memory Load Efficiency 40.77% 40.78% 40.77%
55 gst_efficiency Global Memory Store Efficiency 44.64% 44.64% 44.64%
55 tex_cache_transactions Unified cache to SM transactions 45587424 45621133 45599946
55 flop_count_dp Floating Point Operations(Double Precision) 1331763200 1331763824 1331763211
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 126156800 126156800 126156800
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 447948800 447949040 447948804
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 309708800 309708944 309708802
55 flop_count_sp Floating Point Operations(Single Precision) 22937600 22937600 22937600
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 11468800 11468800 11468800
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 34406400 34406448 34406400
55 inst_executed Instructions Executed 146194432 326281952 237869460
55 inst_issued Instructions Issued 146313068 146329688 146317066
55 dram_utilization Device Memory Utilization High (7) High (8) High (7)
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.06% 0.08% 0.07%
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 0.68% 0.73% 0.72%
55 stall_memory_dependency Issue Stall Reasons (Data Request) 96.77% 97.27% 97.02%
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
55 stall_sync Issue Stall Reasons (Synchronization) 0.01% 0.01% 0.01%
55 stall_other Issue Stall Reasons (Other) 0.04% 0.04% 0.04%
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.00% 0.01% 0.01%
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.08% 0.09% 0.09%
55 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
55 inst_fp_32 FP Instructions(Single) 91750400 91750448 91750400
55 inst_fp_64 FP Instructions(Double) 883814400 883814880 883814408
55 inst_integer Integer Instructions 1818393600 1818394128 1818393609
55 inst_bit_convert Bit-Convert Instructions 0 0 0
55 inst_control Control-Flow Instructions 143590400 143590640 143590404
55 inst_compute_ld_st Load/Store Instructions 483225600 483225600 483225600
55 inst_misc Misc Instructions 188057600 188057696 188057601
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
55 issue_slots Issue Slots 146313068 146329688 146317066
55 cf_issued Issued Control-Flow Instructions 7123968 7124304 7123974
55 cf_executed Executed Control-Flow Instructions 7123968 7124304 7123974
55 ldst_issued Issued Load/Store Instructions 20173824 20173872 20173824
55 ldst_executed Executed Load/Store Instructions 20173824 20173872 20173824
55 atomic_transactions Atomic Transactions 0 0 0
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 156680451 156720029 156702577
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 1.78% 2.21% 1.97%
55 stall_not_selected Issue Stall Reasons (Not Selected) 0.07% 0.08% 0.08%
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 83865600 83865600 83865600
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1536 1158
55 nvlink_total_data_received NVLink Total Data Received 864 1152 869
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
55 nvlink_user_data_received NVLink User Data Received 0 0 0
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
55 nvlink_transmit_throughput NVLink Transmit Throughput 70.217KB/s 94.164KB/s 71.319KB/s
55 nvlink_receive_throughput NVLink Receive Throughput 52.662KB/s 70.623KB/s 53.489KB/s
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 384 291
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
55 inst_fp_16 HP Instructions(Half) 0 0 0
55 ipc Executed IPC 0.069900 0.146818 0.106376
55 issued_ipc Issued IPC 0.069839 0.075799 0.074879
55 issue_slot_utilization Issue Slot Utilization 1.75% 1.89% 1.87%
55 sm_efficiency Multiprocessor Activity 99.67% 99.91% 99.77%
55 achieved_occupancy Achieved Occupancy 0.369551 0.370181 0.369719
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.086572 0.094097 0.092725
55 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
55 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1)
55 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1)
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1)
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.01% 0.01% 0.01%
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.82% 1.07% 1.04%
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_volumerhs__9
55 inst_per_warp Instructions per warp 3.0973e+03 3.0973e+03 3.0973e+03
55 branch_efficiency Branch Efficiency 99.25% 99.25% 99.25%
55 warp_execution_efficiency Warp Execution Efficiency 97.42% 97.42% 97.42%
55 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 94.33% 94.33% 94.33%
55 inst_replay_overhead Instruction Replay Overhead 0.000128 0.000144 0.000135
55 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 2.551209 2.623022 2.589007
55 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 2.060532 2.078258 2.069457
55 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
55 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
55 gld_transactions_per_request Global Load Transactions Per Request 8.278712 8.280053 8.279455
55 gst_transactions_per_request Global Store Transactions Per Request 8.562500 8.562500 8.562500
55 shared_store_transactions Shared Store Transactions 11552168 11651547 11602203
55 shared_load_transactions Shared Load Transactions 141071636 145042598 143161714
55 local_load_transactions Local Load Transactions 0 0 0
55 local_store_transactions Local Store Transactions 0 0 0
55 gld_transactions Global Load Transactions 117623940 117642992 117634499
55 gst_transactions Global Store Transactions 15782400 15782400 15782400
55 sysmem_read_transactions System Memory Read Transactions 0 0 0
55 sysmem_write_transactions System Memory Write Transactions 5 6 5
55 l2_read_transactions L2 Read Transactions 105764705 105811683 105791451
55 l2_write_transactions L2 Write Transactions 16132844 24093821 16863459
55 dram_read_transactions Device Memory Read Transactions 98968200 99165170 98983348
55 dram_write_transactions Device Memory Write Transactions 14771447 22709177 15065175
55 global_hit_rate Global Hit Rate in unified l1/tex 13.71% 13.78% 13.75%
55 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
55 gld_requested_throughput Requested Global Load Throughput 694.81GB/s 704.11GB/s 699.34GB/s
55 gst_requested_throughput Requested Global Store Throughput 90.236GB/s 91.443GB/s 90.824GB/s
55 gld_throughput Global Load Throughput 737.15GB/s 747.03GB/s 741.95GB/s
55 gst_throughput Global Store Throughput 98.898GB/s 100.22GB/s 99.543GB/s
55 local_memory_overhead Local Memory Overhead 5.30% 5.38% 5.34%
55 tex_cache_hit_rate Unified Cache Hit Rate 11.49% 11.53% 11.51%
55 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 12.19% 12.38% 12.34%
55 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 28.86% 30.14% 29.82%
55 dram_read_throughput Device Memory Read Throughput 620.24GB/s 628.55GB/s 624.31GB/s
55 dram_write_throughput Device Memory Write Throughput 92.615GB/s 142.94GB/s 95.019GB/s
55 tex_cache_throughput Unified cache to SM throughput 3618.8GB/s 3667.2GB/s 3642.4GB/s
55 l2_tex_read_throughput L2 Throughput (Texture Reads) 662.85GB/s 671.76GB/s 667.22GB/s
55 l2_tex_write_throughput L2 Throughput (Texture Writes) 98.898GB/s 100.22GB/s 99.543GB/s
55 l2_read_throughput L2 Throughput (Reads) 662.91GB/s 671.84GB/s 667.25GB/s
55 l2_write_throughput L2 Throughput (Writes) 101.13GB/s 151.39GB/s 106.36GB/s
55 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 sysmem_write_throughput System Memory Write Throughput 32.854KB/s 39.922KB/s 33.548KB/s
55 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
55 shared_load_throughput Shared Memory Load Throughput 3558.5GB/s 3679.2GB/s 3611.8GB/s
55 shared_store_throughput Shared Memory Store Throughput 290.21GB/s 295.79GB/s 292.71GB/s
55 gld_efficiency Global Memory Load Efficiency 94.25% 94.27% 94.26%
55 gst_efficiency Global Memory Store Efficiency 91.24% 91.24% 91.24%
55 tex_cache_transactions Unified cache to SM transactions 144366492 144381582 144374878
55 flop_count_dp Floating Point Operations(Double Precision) 5700777984 5700777984 5700777984
55 flop_count_dp_add Floating Point Operations(Double Precision Add) 149393408 149393408 149393408
55 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 2028079104 2028079104 2028079104
55 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 1495226368 1495226368 1495226368
55 flop_count_sp Floating Point Operations(Single Precision) 36026368 36026368 36026368
55 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
55 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 9600000 9600000 9600000
55 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 16826368 16826368 16826368
55 flop_count_sp_special Floating Point Operations(Single Precision Special) 54039552 54039552 54039552
55 inst_executed Instructions Executed 390794240 951492608 655851650
55 inst_issued Instructions Issued 390844085 390853161 390847224
55 dram_utilization Device Memory Utilization High (9) Max (10) High (9)
55 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
55 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 1.96% 2.34% 2.18%
55 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 14.21% 15.27% 14.85%
55 stall_memory_dependency Issue Stall Reasons (Data Request) 39.86% 42.88% 40.73%
55 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
55 stall_sync Issue Stall Reasons (Synchronization) 13.02% 14.81% 13.69%
55 stall_other Issue Stall Reasons (Other) 0.72% 0.76% 0.75%
55 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.02% 0.07% 0.03%
55 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 15.68% 17.29% 16.30%
55 shared_efficiency Shared Memory Efficiency 26.73% 27.44% 27.06%
55 inst_fp_32 FP Instructions(Single) 156163072 156163072 156163072
55 inst_fp_64 FP Instructions(Double) 3701498880 3701498880 3701498880
55 inst_integer Integer Instructions 4432111616 4432111616 4432111616
55 inst_bit_convert Bit-Convert Instructions 33652736 33652736 33652736
55 inst_control Control-Flow Instructions 458880000 458880000 458880000
55 inst_compute_ld_st Load/Store Instructions 2413440000 2413440000 2413440000
55 inst_misc Misc Instructions 688093184 688093184 688093184
55 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
55 issue_slots Issue Slots 390844085 390853161 390847224
55 cf_issued Issued Control-Flow Instructions 23408640 23408640 23408640
55 cf_executed Executed Control-Flow Instructions 23408640 23408640 23408640
55 ldst_issued Issued Load/Store Instructions 81408000 81408000 81408000
55 ldst_executed Executed Load/Store Instructions 81408000 81408000 81408000
55 atomic_transactions Atomic Transactions 0 0 0
55 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
55 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
55 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
55 l2_tex_read_transactions L2 Transactions (Texture Reads) 105760813 105812749 105787169
55 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 9.28% 10.79% 9.76%
55 stall_not_selected Issue Stall Reasons (Not Selected) 1.62% 1.71% 1.69%
55 l2_tex_write_transactions L2 Transactions (Texture Writes) 15782400 15782400 15782400
55 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152
55 nvlink_total_data_received NVLink Total Data Received 864 864 864
55 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
55 nvlink_user_data_received NVLink User Data Received 0 0 0
55 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
55 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
55 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
55 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
55 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
55 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
55 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
55 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
55 nvlink_transmit_throughput NVLink Transmit Throughput 236.55KB/s 239.71KB/s 238.09KB/s
55 nvlink_receive_throughput NVLink Receive Throughput 177.41KB/s 179.78KB/s 178.57KB/s
55 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288
55 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
55 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
55 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
55 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
55 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
55 inst_fp_16 HP Instructions(Half) 0 0 0
55 ipc Executed IPC 0.571525 0.738231 0.662514
55 issued_ipc Issued IPC 0.633453 0.682593 0.675725
55 issue_slot_utilization Issue Slot Utilization 15.84% 17.06% 16.89%
55 sm_efficiency Multiprocessor Activity 99.65% 99.91% 99.85%
55 achieved_occupancy Achieved Occupancy 0.247745 0.247841 0.247791
55 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.873599 0.941228 0.931401
55 shared_utilization Shared Memory Utilization Low (1) Low (2) Low (1)
55 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1)
55 tex_utilization Unified Cache Utilization Low (3) Low (3) Low (3)
55 ldst_fu_utilization Load/Store Function Unit Utilization Low (2) Low (2) Low (2)
55 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
55 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
55 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
55 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
55 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (2) Low (3) Low (2)
55 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
55 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.02% 0.05% 0.05%
55 flop_dp_efficiency FLOP Efficiency(Peak Double) 5.70% 15.52% 14.83%
55 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
55 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
55 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
55 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
55 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_knl_reverse_indefinite_stack_integral__4
56 inst_per_warp Instructions per warp 2.3300e+03 2.3300e+03 2.3300e+03
56 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
56 warp_execution_efficiency Warp Execution Efficiency 78.12% 78.12% 78.12%
56 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 77.18% 77.18% 77.18%
56 inst_replay_overhead Instruction Replay Overhead 0.002229 0.005157 0.003383
56 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
56 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
56 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
56 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
56 gld_transactions_per_request Global Load Transactions Per Request 6.640783 6.641474 6.641229
56 gst_transactions_per_request Global Store Transactions Per Request 7.000000 7.000000 7.000000
56 shared_store_transactions Shared Store Transactions 0 0 0
56 shared_load_transactions Shared Load Transactions 0 0 0
56 local_load_transactions Local Load Transactions 0 0 0
56 local_store_transactions Local Store Transactions 0 0 0
56 gld_transactions Global Load Transactions 2556861 2557127 2557032
56 gst_transactions Global Store Transactions 2688000 2688000 2688000
56 sysmem_read_transactions System Memory Read Transactions 0 0 0
56 sysmem_write_transactions System Memory Write Transactions 5 5 5
56 l2_read_transactions L2 Read Transactions 2469356 2471389 2470013
56 l2_write_transactions L2 Write Transactions 2911900 3747163 3026850
56 dram_read_transactions Device Memory Read Transactions 2473053 2476371 2473825
56 dram_write_transactions Device Memory Write Transactions 2616351 3431287 2709292
56 global_hit_rate Global Hit Rate in unified l1/tex 18.61% 18.64% 18.62%
56 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
56 gld_requested_throughput Requested Global Load Throughput 221.00GB/s 224.94GB/s 222.62GB/s
56 gst_requested_throughput Requested Global Store Throughput 220.41GB/s 224.34GB/s 222.03GB/s
56 gld_throughput Global Load Throughput 234.84GB/s 239.02GB/s 236.55GB/s
56 gst_throughput Global Store Throughput 246.86GB/s 251.26GB/s 248.67GB/s
56 local_memory_overhead Local Memory Overhead 17.22% 17.26% 17.24%
56 tex_cache_hit_rate Unified Cache Hit Rate 4.19% 4.20% 4.19%
56 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 6.90% 6.94% 6.92%
56 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 28.57% 28.57% 28.57%
56 dram_read_throughput Device Memory Read Throughput 227.18GB/s 231.22GB/s 228.85GB/s
56 dram_write_throughput Device Memory Write Throughput 241.23GB/s 318.21GB/s 250.64GB/s
56 tex_cache_throughput Unified cache to SM throughput 338.51GB/s 344.54GB/s 340.99GB/s
56 l2_tex_read_throughput L2 Throughput (Texture Reads) 226.79GB/s 230.82GB/s 228.45GB/s
56 l2_tex_write_throughput L2 Throughput (Texture Writes) 246.86GB/s 251.26GB/s 248.67GB/s
56 l2_read_throughput L2 Throughput (Reads) 226.79GB/s 230.95GB/s 228.50GB/s
56 l2_write_throughput L2 Throughput (Writes) 267.45GB/s 348.88GB/s 280.02GB/s
56 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
56 sysmem_write_throughput System Memory Write Throughput 481.49KB/s 490.07KB/s 485.02KB/s
56 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
56 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
56 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
56 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
56 gld_efficiency Global Memory Load Efficiency 94.11% 94.12% 94.11%
56 gst_efficiency Global Memory Store Efficiency 89.29% 89.29% 89.29%
56 tex_cache_transactions Unified cache to SM transactions 921370 921595 921482
56 flop_count_dp Floating Point Operations(Double Precision) 9600000 9600000 9600000
56 flop_count_dp_add Floating Point Operations(Double Precision Add) 9600000 9600000 9600000
56 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 0 0 0
56 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 0 0 0
56 flop_count_sp Floating Point Operations(Single Precision) 0 0 0
56 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
56 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0
56 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
56 flop_count_sp_special Floating Point Operations(Single Precision Special) 0 0 0
56 inst_executed Instructions Executed 1730560 2385920 2034834
56 inst_issued Instructions Issued 1734417 1739485 1736259
56 dram_utilization Device Memory Utilization Mid (6) High (7) Mid (6)
56 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
56 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.01% 0.04% 0.02%
56 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 1.09% 1.21% 1.18%
56 stall_memory_dependency Issue Stall Reasons (Data Request) 98.45% 98.70% 98.54%
56 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
56 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
56 stall_other Issue Stall Reasons (Other) 0.01% 0.01% 0.01%
56 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.04% 0.18% 0.10%
56 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.00% 0.02% 0.01%
56 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
56 inst_fp_32 FP Instructions(Single) 0 0 0
56 inst_fp_64 FP Instructions(Double) 9600000 9600000 9600000
56 inst_integer Integer Instructions 13900800 13900800 13900800
56 inst_bit_convert Bit-Convert Instructions 0 0 0
56 inst_control Control-Flow Instructions 204800 204800 204800
56 inst_compute_ld_st Load/Store Instructions 19225600 19225600 19225600
56 inst_misc Misc Instructions 204800 204800 204800
56 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
56 issue_slots Issue Slots 1734417 1739485 1736259
56 cf_issued Issued Control-Flow Instructions 13312 13312 13312
56 cf_executed Executed Control-Flow Instructions 13312 13312 13312
56 ldst_issued Issued Load/Store Instructions 772096 772096 772096
56 ldst_executed Executed Load/Store Instructions 772096 772096 772096
56 atomic_transactions Atomic Transactions 0 0 0
56 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
56 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
56 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
56 l2_tex_read_transactions L2 Transactions (Texture Reads) 2469232 2469614 2469428
56 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.10% 0.13% 0.11%
56 stall_not_selected Issue Stall Reasons (Not Selected) 0.02% 0.03% 0.02%
56 l2_tex_write_transactions L2 Transactions (Texture Writes) 2688000 2688000 2688000
56 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152
56 nvlink_total_data_received NVLink Total Data Received 864 864 864
56 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
56 nvlink_user_data_received NVLink User Data Received 0 0 0
56 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
56 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
56 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
56 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
56 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
56 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
56 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
56 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
56 nvlink_transmit_throughput NVLink Transmit Throughput 3.3855MB/s 3.4458MB/s 3.4103MB/s
56 nvlink_receive_throughput NVLink Receive Throughput 2.5391MB/s 2.5844MB/s 2.5577MB/s
56 nvlink_total_response_data_received NVLink Total Response Data Received 288 384 289
56 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
56 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
56 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
56 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
56 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
56 inst_fp_16 HP Instructions(Half) 0 0 0
56 ipc Executed IPC 0.041735 0.057263 0.050605
56 issued_ipc Issued IPC 0.041797 0.046456 0.045358
56 issue_slot_utilization Issue Slot Utilization 1.04% 1.16% 1.13%
56 sm_efficiency Multiprocessor Activity 92.21% 97.30% 95.95%
56 achieved_occupancy Achieved Occupancy 0.197203 0.197455 0.197328
56 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.044212 0.050012 0.048370
56 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
56 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1)
56 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1)
56 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
56 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
56 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
56 special_fu_utilization Special Function Unit Utilization Idle (0) Idle (0) Idle (0)
56 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
56 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
56 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1)
56 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
56 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00%
56 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.31% 0.39% 0.37%
56 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
56 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
56 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
56 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
56 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_initauxstate__1
1 inst_per_warp Instructions per warp 340.790000 340.790000 340.790000
1 branch_efficiency Branch Efficiency 99.92% 99.92% 99.92%
1 warp_execution_efficiency Warp Execution Efficiency 97.61% 97.61% 97.61%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 94.36% 94.36% 94.36%
1 inst_replay_overhead Instruction Replay Overhead 0.000505 0.000505 0.000505
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 8.230789 8.230789 8.230789
1 gst_transactions_per_request Global Store Transactions Per Request 8.562500 8.562500 8.562500
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 17699488 17699488 17699488
1 gst_transactions Global Store Transactions 21043200 21043200 21043200
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 5 5 5
1 l2_read_transactions L2 Read Transactions 16914451 16914451 16914451
1 l2_write_transactions L2 Write Transactions 21870131 21870131 21870131
1 dram_read_transactions Device Memory Read Transactions 17061180 17061180 17061180
1 dram_write_transactions Device Memory Write Transactions 19995184 19995184 19995184
1 global_hit_rate Global Hit Rate in unified l1/tex 41.03% 41.03% 41.03%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 332.35GB/s 332.35GB/s 332.35GB/s
1 gst_requested_throughput Requested Global Store Throughput 379.83GB/s 379.83GB/s 379.83GB/s
1 gld_throughput Global Load Throughput 350.15GB/s 350.15GB/s 350.15GB/s
1 gst_throughput Global Store Throughput 416.30GB/s 416.30GB/s 416.30GB/s
1 local_memory_overhead Local Memory Overhead 39.77% 39.77% 39.77%
1 tex_cache_hit_rate Unified Cache Hit Rate 4.08% 4.08% 4.08%
1 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 6.81% 6.81% 6.81%
1 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 53.96% 53.96% 53.96%
1 dram_read_throughput Device Memory Read Throughput 337.52GB/s 337.52GB/s 337.52GB/s
1 dram_write_throughput Device Memory Write Throughput 395.56GB/s 395.56GB/s 395.56GB/s
1 tex_cache_throughput Unified cache to SM throughput 409.10GB/s 409.10GB/s 409.10GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 334.61GB/s 334.61GB/s 334.61GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 416.30GB/s 416.30GB/s 416.30GB/s
1 l2_read_throughput L2 Throughput (Reads) 334.62GB/s 334.62GB/s 334.62GB/s
1 l2_write_throughput L2 Throughput (Writes) 432.66GB/s 432.66GB/s 432.66GB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 103.72KB/s 103.72KB/s 103.72KB/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 94.92% 94.92% 94.92%
1 gst_efficiency Global Memory Store Efficiency 91.24% 91.24% 91.24%
1 tex_cache_transactions Unified cache to SM transactions 5169878 5169878 5169878
1 flop_count_dp Floating Point Operations(Double Precision) 76108800 76108800 76108800
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 22118400 22118400 22118400
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 21888000 21888000 21888000
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 10214400 10214400 10214400
1 flop_count_sp Floating Point Operations(Single Precision) 2918400 2918400 2918400
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 1459200 1459200 1459200
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 1459200 1459200 1459200
1 inst_executed Instructions Executed 104690688 104690688 104690688
1 inst_issued Instructions Issued 36424653 36424653 36424653
1 dram_utilization Device Memory Utilization High (9) High (9) High (9)
1 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.16% 0.16% 0.16%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 2.04% 2.04% 2.04%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 22.26% 22.26% 22.26%
1 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 0.11% 0.11% 0.11%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.04% 0.04% 0.04%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.12% 0.12% 0.12%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 52992000 52992000 52992000
1 inst_fp_64 FP Instructions(Double) 84480000 84480000 84480000
1 inst_integer Integer Instructions 663628800 663628800 663628800
1 inst_bit_convert Bit-Convert Instructions 4377600 4377600 4377600
1 inst_control Control-Flow Instructions 41318400 41318400 41318400
1 inst_compute_ld_st Load/Store Instructions 152755200 152755200 152755200
1 inst_misc Misc Instructions 105600000 105600000 105600000
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 36424653 36424653 36424653
1 cf_issued Issued Control-Flow Instructions 2575360 2575360 2575360
1 cf_executed Executed Control-Flow Instructions 2575360 2575360 2575360
1 ldst_issued Issued Load/Store Instructions 5222400 5222400 5222400
1 ldst_executed Executed Load/Store Instructions 5222400 5222400 5222400
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 16914169 16914169 16914169
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 74.77% 74.77% 74.77%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.50% 0.50% 0.50%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 21043200 21043200 21043200
1 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152
1 nvlink_total_data_received NVLink Total Data Received 864 864 864
1 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
1 nvlink_user_data_received NVLink User Data Received 0 0 0
1 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
1 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
1 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
1 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
1 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
1 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
1 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
1 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
1 nvlink_transmit_throughput NVLink Transmit Throughput 746.78KB/s 746.78KB/s 746.78KB/s
1 nvlink_receive_throughput NVLink Receive Throughput 560.09KB/s 560.09KB/s 560.09KB/s
1 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288
1 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 ipc Executed IPC 0.210375 0.210375 0.210375
1 issued_ipc Issued IPC 0.231426 0.231426 0.231426
1 issue_slot_utilization Issue Slot Utilization 5.79% 5.79% 5.79%
1 sm_efficiency Multiprocessor Activity 99.51% 99.51% 99.51%
1 achieved_occupancy Achieved Occupancy 0.519094 0.519094 0.519094
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.412945 0.412945 0.412945
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 l2_utilization L2 Cache Utilization Low (1) Low (1) Low (1)
1 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.01% 0.01% 0.01%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 0.35% 0.35% 0.35%
1 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
1 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
1 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
1 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
1 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
Kernel: ptxcall_knl_indefinite_stack_integral__3
56 inst_per_warp Instructions per warp 1.4196e+04 1.4196e+04 1.4196e+04
56 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
56 warp_execution_efficiency Warp Execution Efficiency 78.12% 78.12% 78.12%
56 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 76.59% 76.59% 76.59%
56 inst_replay_overhead Instruction Replay Overhead 0.001369 0.001759 0.001572
56 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 1.972167 1.981350 1.976729
56 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 2.006250 2.021094 2.010676
56 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
56 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
56 gld_transactions_per_request Global Load Transactions Per Request 6.533448 6.533963 6.533725
56 gst_transactions_per_request Global Store Transactions Per Request 7.000000 7.000000 7.000000
56 shared_store_transactions Shared Store Transactions 10272 10348 10294
56 shared_load_transactions Shared Load Transactions 1969012 1978180 1973566
56 local_load_transactions Local Load Transactions 0 0 0
56 local_store_transactions Local Store Transactions 0 0 0
56 gld_transactions Global Load Transactions 7559983 7560579 7560304
56 gst_transactions Global Store Transactions 2688000 2688000 2688000
56 sysmem_read_transactions System Memory Read Transactions 0 0 0
56 sysmem_write_transactions System Memory Write Transactions 5 6 5
56 l2_read_transactions L2 Read Transactions 7384278 7391794 7387169
56 l2_write_transactions L2 Write Transactions 2872413 3760617 2962915
56 dram_read_transactions Device Memory Read Transactions 7831373 7833632 7832140
56 dram_write_transactions Device Memory Write Transactions 2537417 3394663 2648592
56 global_hit_rate Global Hit Rate in unified l1/tex 13.47% 13.48% 13.48%
56 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
56 gld_requested_throughput Requested Global Load Throughput 425.73GB/s 431.69GB/s 429.07GB/s
56 gst_requested_throughput Requested Global Store Throughput 141.78GB/s 143.77GB/s 142.90GB/s
56 gld_throughput Global Load Throughput 446.64GB/s 452.90GB/s 450.14GB/s
56 gst_throughput Global Store Throughput 158.80GB/s 161.02GB/s 160.04GB/s
56 local_memory_overhead Local Memory Overhead 11.96% 11.97% 11.96%
56 tex_cache_hit_rate Unified Cache Hit Rate 6.41% 6.42% 6.41%
56 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 6.39% 6.40% 6.39%
56 l2_tex_write_hit_rate L2 Hit Rate (Texture Writes) 25.71% 25.71% 25.71%
56 dram_read_throughput Device Memory Read Throughput 462.72GB/s 469.21GB/s 466.33GB/s
56 dram_write_throughput Device Memory Write Throughput 150.04GB/s 202.05GB/s 157.70GB/s
56 tex_cache_throughput Unified cache to SM throughput 1059.0GB/s 1073.8GB/s 1067.3GB/s
56 l2_tex_read_throughput L2 Throughput (Texture Reads) 436.24GB/s 442.37GB/s 439.66GB/s
56 l2_tex_write_throughput L2 Throughput (Texture Writes) 158.80GB/s 161.02GB/s 160.04GB/s
56 l2_read_throughput L2 Throughput (Reads) 436.23GB/s 442.37GB/s 439.83GB/s
56 l2_write_throughput L2 Throughput (Writes) 170.65GB/s 223.56GB/s 176.41GB/s
56 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
56 sysmem_write_throughput System Memory Write Throughput 309.73KB/s 374.94KB/s 313.27KB/s
56 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
56 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
56 shared_load_throughput Shared Memory Load Throughput 466.37GB/s 473.74GB/s 470.02GB/s
56 shared_store_throughput Shared Memory Store Throughput 2.4311GB/s 2.4726GB/s 2.4518GB/s
56 gld_efficiency Global Memory Load Efficiency 95.32% 95.32% 95.32%
56 gst_efficiency Global Memory Store Efficiency 89.29% 89.29% 89.29%
56 tex_cache_transactions Unified cache to SM transactions 4481199 4481734 4481505
56 flop_count_dp Floating Point Operations(Double Precision) 124800000 124800000 124800000
56 flop_count_dp_add Floating Point Operations(Double Precision Add) 0 0 0
56 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 48000000 48000000 48000000
56 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 28800000 28800000 28800000
56 flop_count_sp Floating Point Operations(Single Precision) 0 0 0
56 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
56 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 0 0 0
56 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
56 flop_count_sp_special Floating Point Operations(Single Precision Special) 0 0 0
56 inst_executed Instructions Executed 9183232 14536704 11381979
56 inst_issued Instructions Issued 9195807 9199313 9197578
56 dram_utilization Device Memory Utilization High (8) High (8) High (8)
56 sysmem_utilization System Memory Utilization Low (1) Low (1) Low (1)
56 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.23% 1.53% 0.84%
56 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 7.12% 8.90% 7.60%
56 stall_memory_dependency Issue Stall Reasons (Data Request) 87.51% 90.75% 89.64%
56 stall_texture Issue Stall Reasons (Texture) 0.00% 0.00% 0.00%
56 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.01% 0.01%
56 stall_other Issue Stall Reasons (Other) 0.23% 0.29% 0.25%
56 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.14% 0.29% 0.21%
56 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.19% 0.24% 0.21%
56 shared_efficiency Shared Memory Efficiency 6.12% 6.14% 6.13%
56 inst_fp_32 FP Instructions(Single) 0 0 0
56 inst_fp_64 FP Instructions(Double) 76800000 76800000 76800000
56 inst_integer Integer Instructions 70988800 70988800 70988800
56 inst_bit_convert Bit-Convert Instructions 0 0 0
56 inst_control Control-Flow Instructions 4147200 4147200 4147200
56 inst_compute_ld_st Load/Store Instructions 63616000 63616000 63616000
56 inst_misc Misc Instructions 11648000 11648000 11648000
56 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
56 issue_slots Issue Slots 9195807 9199313 9197578
56 cf_issued Issued Control-Flow Instructions 256000 256000 256000
56 cf_executed Executed Control-Flow Instructions 256000 256000 256000
56 ldst_issued Issued Load/Store Instructions 2555904 2555904 2555904
56 ldst_executed Executed Load/Store Instructions 2555904 2555904 2555904
56 atomic_transactions Atomic Transactions 0 0 0
56 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
56 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
56 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
56 l2_tex_read_transactions L2 Transactions (Texture Reads) 7383769 7384688 7384316
56 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.99% 1.20% 1.10%
56 stall_not_selected Issue Stall Reasons (Not Selected) 0.13% 0.17% 0.14%
56 l2_tex_write_transactions L2 Transactions (Texture Writes) 2688000 2688000 2688000
56 nvlink_total_data_transmitted NVLink Total Data Transmitted 1152 1152 1152
56 nvlink_total_data_received NVLink Total Data Received 864 864 864
56 nvlink_user_data_transmitted NVLink User Data Transmitted 0 0 0
56 nvlink_user_data_received NVLink User Data Received 0 0 0
56 nvlink_overhead_data_transmitted NVLink Overhead Data Transmitted 1.00% 1.00% 1.00%
56 nvlink_overhead_data_received NVLink Overhead Data Received 1.00% 1.00% 1.00%
56 nvlink_total_nratom_data_transmitted NVLink Total Nratom Data Transmitted 0 0 0
56 nvlink_user_nratom_data_transmitted NVLink User Nratom Data Transmitted 0 0 0
56 nvlink_total_ratom_data_transmitted NVLink Total Ratom Data Transmitted 0 0 0
56 nvlink_user_ratom_data_transmitted NVLink User Ratom Data Transmitted 0 0 0
56 nvlink_total_write_data_transmitted NVLink Total Write Data Transmitted 0 0 0
56 nvlink_user_write_data_transmitted NVLink User Write Data Transmitted 0 0 0
56 nvlink_transmit_throughput NVLink Transmit Throughput 2.1778MB/s 2.2083MB/s 2.1949MB/s
56 nvlink_receive_throughput NVLink Receive Throughput 1.6333MB/s 1.6562MB/s 1.6462MB/s
56 nvlink_total_response_data_received NVLink Total Response Data Received 288 288 288
56 nvlink_user_response_data_received NVLink User Response Data Received 0 0 0
56 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
56 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
56 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
56 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
56 inst_fp_16 HP Instructions(Half) 0 0 0
56 ipc Executed IPC 0.156469 0.255796 0.214442
56 issued_ipc Issued IPC 0.159007 0.175934 0.169374
56 issue_slot_utilization Issue Slot Utilization 3.98% 4.40% 4.23%
56 sm_efficiency Multiprocessor Activity 86.37% 91.45% 88.69%
56 achieved_occupancy Achieved Occupancy 0.109154 0.117179 0.113866
56 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.166729 0.207185 0.179912
56 shared_utilization Shared Memory Utilization Low (1) Low (1) Low (1)
56 l2_utilization L2 Cache Utilization Low (1) Low (2) Low (1)
56 tex_utilization Unified Cache Utilization Low (1) Low (1) Low (1)
56 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
56 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
56 tex_fu_utilization Texture Function Unit Utilization Idle (0) Idle (0) Idle (0)
56 special_fu_utilization Special Function Unit Utilization Idle (0) Idle (0) Idle (0)
56 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
56 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
56 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (1) Low (1) Low (1)
56 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
56 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00%
56 flop_dp_efficiency FLOP Efficiency(Peak Double) 2.81% 3.40% 3.15%
56 sysmem_read_utilization System Memory Read Utilization Idle (0) Idle (0) Idle (0)
56 sysmem_write_utilization System Memory Write Utilization Low (1) Low (1) Low (1)
56 nvlink_data_transmission_efficiency NVLink Data Transmission Efficiency 0.00% 0.00% 0.00%
56 nvlink_data_receive_efficiency NVLink Data Receive Efficiency 0.00% 0.00% 0.00%
56 stall_sleeping Issue Stall Reasons (Sleeping) 0.00% 0.00% 0.00%
==23586== NVPROF is profiling process 23586, command: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d.jl
[ Info: ------------------------------------------------------
[ Info: ______ _ _____ __ ________
[ Info: | ____| | |_ _| ... | __ |
[ Info: | | | | | | | . | | | |
[ Info: | | | | | | | | | | |__| |
[ Info: | |____| |____ _| |_| | | | | | |
[ Info: | _____|______|_____|_| |_|_| |_|
[ Info:
[ Info: ------------------------------------------------------
[ Info: Dycoms
[ Info: Resolution:
[ Info: (Δx, Δy, Δz) = (3.00e+01, 3.00e+01, 5.00e+00)
[ Info: (Nex, Ney, Nez) = (32, 32, 75)
[ Info: DoF = 57600000
[ Info: Minimum necessary memory to run this test: 3.84 GBs
[ Info: Time step dt: 2.50e-03
[ Info: End time t : 2.50e-02
[ Info: ------------------------------------------------------
┌ Info: Starting...
└ norm(Q) = 5.5625443922177753e+09
┌ Info: Update
│ simtime = 2.5000000000000001e-03
└ runtime = 00:00:15
┌ Info: Finished...
└ norm(Q) = 5.5624841917912407e+09
─────────────────────────────────────────────────────────────────────────────
Time Allocations
────────────────────── ───────────────────────
Tot / % measured: 220s / 83.4% 29.9GiB / 87.0%
Section ncalls time %tot avg alloc %tot avg
─────────────────────────────────────────────────────────────────────────────
IC init 1 116s 63.4% 116s 12.5GiB 47.9% 12.5GiB
Grid init 1 20.2s 11.0% 20.2s 6.63GiB 25.5% 6.63GiB
solve 1 15.4s 8.41% 15.4s 1.76GiB 6.79% 1.76GiB
Space Disc init 1 13.2s 7.19% 13.2s 2.09GiB 8.05% 2.09GiB
Topo init 1 12.2s 6.65% 12.2s 1.30GiB 4.99% 1.30GiB
initial integral 1 4.36s 2.38% 4.36s 507MiB 1.91% 507MiB
Time stepping init 1 1.72s 0.94% 1.72s 1.25GiB 4.82% 1.25GiB
─────────────────────────────────────────────────────────────────────────────==23586== Profiling application: julia --project=env/gpu test/DGmethods/compressible_Navier_Stokes/dycoms3d.jl
==23586== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput SrcMemType DstMemType Device Context Stream Name
us us KB B GB GB/s
6.01e+07 1.44e+05 - - - - - 1.072884 7.468494 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD]
6.14e+07 6.03e+04 - - - - - 0.429153 7.117702 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD]
6.20e+07 351.6790 - - - - - 3.43e-03 9.762390 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD]
6.25e+07 1.17e+04 - - - - - 0.085831 7.322073 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD]
6.26e+07 1.19e+04 - - - - - 0.085831 7.218210 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD]
6.33e+07 1.952000 - - - - - 1.86e-07 0.095422 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD]
6.33e+07 1.568000 - - - - - 1.86e-07 0.118791 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD]
6.38e+07 7.28e+05 - - - - - 1.072884 1.473239 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH]
7.64e+07 1.50e+03 (76800 1 1) (125 1 1) 54 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_initauxstate__1 [62]
7.71e+07 7.52e+05 - - - - - 1.072884 1.426476 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH]
7.84e+07 7.52e+05 - - - - - 1.072884 1.426073 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH]
7.93e+07 2.99e+05 - - - - - 0.429153 1.436141 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH]
7.97e+07 4.02e+05 - - - - - 0.572205 1.424482 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH]
1.93e+08 5.86e+04 - - - - - 0.429153 7.326053 Pageable Device Tesla V100-SXM2 1 7 [CUDA memcpy HtoD]
1.94e+08 518.8450 (225000 1 1) (256 1 1) 16 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_anonymous19_2 [81]
1.94e+08 7.35e+05 - - - - - 1.072884 1.460133 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH]
1.95e+08 2.94e+05 - - - - - 0.429153 1.462129 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH]
1.99e+08 498.3020 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [96]
2.00e+08 322.5580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [107]
2.05e+08 3.47e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [124]
2.05e+08 501.6300 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [127]
2.05e+08 317.9180 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [130]
2.07e+08 2.91e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [141]
2.10e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [152]
2.12e+08 4.68e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [163]
2.14e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [174]
2.15e+08 2.50e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [185]
2.15e+08 3.48e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [188]
2.15e+08 505.7900 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [191]
2.15e+08 321.8220 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [194]
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [197]
2.15e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [200]
2.15e+08 4.68e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [203]
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [206]
2.15e+08 2.49e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [209]
2.15e+08 3.47e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [212]
2.15e+08 497.6300 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [215]
2.15e+08 319.7740 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [218]
2.15e+08 2.91e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [221]
2.15e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [224]
2.15e+08 4.68e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [227]
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [230]
2.15e+08 2.49e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [233]
2.15e+08 3.47e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [236]
2.15e+08 500.4780 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [239]
2.15e+08 327.6460 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [242]
2.15e+08 2.91e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [245]
2.15e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [248]
2.15e+08 4.67e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [251]
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [254]
2.15e+08 2.43e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [257]
2.15e+08 3.16e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [260]
2.15e+08 498.7170 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [263]
2.15e+08 316.3830 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [266]
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [269]
2.15e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [272]
2.15e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [275]
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [278]
2.15e+08 2.43e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [281]
2.15e+08 3.15e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [284]
2.15e+08 496.1260 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [287]
2.15e+08 315.7430 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [290]
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [293]
2.15e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [296]
2.15e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [299]
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [302]
2.15e+08 2.43e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [305]
2.15e+08 3.16e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [308]
2.15e+08 497.0860 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [311]
2.15e+08 316.4790 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [314]
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [317]
2.15e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [320]
2.15e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [323]
2.15e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [326]
2.15e+08 2.42e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [329]
2.15e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [332]
2.15e+08 497.9820 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [335]
2.15e+08 316.2860 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [338]
2.15e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [341]
2.15e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [344]
2.15e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [347]
2.15e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [350]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [353]
2.16e+08 2.97e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [356]
2.16e+08 493.4380 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [359]
2.16e+08 314.9420 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [362]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [365]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [368]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [371]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [374]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [377]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [380]
2.16e+08 496.5410 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [383]
2.16e+08 316.1270 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [386]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [389]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [392]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [395]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [398]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [401]
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [404]
2.16e+08 494.9100 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [407]
2.16e+08 314.0460 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [410]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [413]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [416]
2.16e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [419]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [422]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [425]
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [428]
2.16e+08 499.0060 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [431]
2.16e+08 315.3900 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [434]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [437]
2.16e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [440]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [443]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [446]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [449]
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [452]
2.16e+08 499.8370 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [455]
2.16e+08 315.6150 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [458]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [461]
2.16e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [464]
2.16e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [467]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [470]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [473]
2.16e+08 3.00e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [476]
2.16e+08 495.3260 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [479]
2.16e+08 314.9750 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [482]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [485]
2.16e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [488]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [491]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [494]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [497]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [500]
2.16e+08 496.3810 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [503]
2.16e+08 315.8710 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [506]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [509]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [512]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [515]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [518]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [521]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [524]
2.16e+08 497.4700 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [527]
2.16e+08 321.9820 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [530]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [533]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [536]
2.16e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [539]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [542]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [545]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [548]
2.16e+08 495.6460 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [551]
2.16e+08 314.9420 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [554]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [557]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [560]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [563]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [566]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [569]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [572]
2.16e+08 494.5580 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [575]
2.16e+08 319.4540 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [578]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [581]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [584]
2.16e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [587]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [590]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [593]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [596]
2.16e+08 494.4930 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [599]
2.16e+08 315.3590 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [602]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [605]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [608]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [611]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [614]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [617]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [620]
2.16e+08 495.4210 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [623]
2.16e+08 316.4470 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [626]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [629]
2.16e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [632]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [635]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [638]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [641]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [644]
2.16e+08 496.3490 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [647]
2.16e+08 314.6550 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [650]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [653]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [656]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [659]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [662]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [665]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [668]
2.16e+08 494.7500 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [671]
2.16e+08 317.1500 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [674]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [677]
2.16e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [680]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [683]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [686]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [689]
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [692]
2.16e+08 494.9740 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [695]
2.16e+08 311.7100 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [698]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [701]
2.16e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [704]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [707]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [710]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [713]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [716]
2.16e+08 492.5420 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [719]
2.16e+08 316.9580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [722]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [725]
2.16e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [728]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [731]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [734]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [737]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [740]
2.16e+08 497.5970 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [743]
2.16e+08 317.0230 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [746]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [749]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [752]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [755]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [758]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [761]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [764]
2.16e+08 495.5170 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [767]
2.16e+08 313.7270 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [770]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [773]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [776]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [779]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [782]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [785]
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [788]
2.16e+08 494.9100 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [791]
2.16e+08 322.0460 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [794]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [797]
2.16e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [800]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [803]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [806]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [809]
2.16e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [812]
2.16e+08 494.9420 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [815]
2.16e+08 314.8140 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [818]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [821]
2.16e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [824]
2.16e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [827]
2.16e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [830]
2.16e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [833]
2.16e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [836]
2.16e+08 495.3890 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [839]
2.16e+08 319.9990 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [842]
2.16e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [845]
2.16e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [848]
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [851]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [854]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [857]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [860]
2.17e+08 495.7100 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [863]
2.17e+08 316.1580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [866]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [869]
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [872]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [875]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [878]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [881]
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [884]
2.17e+08 500.9580 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [887]
2.17e+08 315.5820 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [890]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [893]
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [896]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [899]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [902]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [905]
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [908]
2.17e+08 495.6780 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [911]
2.17e+08 317.7580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [914]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [917]
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [920]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [923]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [926]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [929]
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [932]
2.17e+08 496.0610 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [935]
2.17e+08 323.8070 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [938]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [941]
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [944]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [947]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [950]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [953]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [956]
2.17e+08 494.3330 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [959]
2.17e+08 315.3270 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [962]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [965]
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [968]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [971]
2.17e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [974]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [977]
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [980]
2.17e+08 496.5410 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [983]
2.17e+08 318.9430 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [986]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [989]
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [992]
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [995]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [998]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1001]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1004]
2.17e+08 494.0450 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1007]
2.17e+08 315.0070 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1010]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1013]
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1016]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1019]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1022]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1025]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1028]
2.17e+08 496.6060 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1031]
2.17e+08 318.8140 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1034]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1037]
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1040]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1043]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1046]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1049]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1052]
2.17e+08 495.2930 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1055]
2.17e+08 315.6790 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1058]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1061]
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1064]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1067]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1070]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1073]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1076]
2.17e+08 496.0300 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1079]
2.17e+08 318.7820 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1082]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1085]
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1088]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1091]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1094]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1097]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1100]
2.17e+08 494.5580 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1103]
2.17e+08 318.3980 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1106]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1109]
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1112]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1115]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1118]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1121]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1124]
2.17e+08 497.7250 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1127]
2.17e+08 317.1510 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1130]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1133]
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1136]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1139]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1142]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1145]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1148]
2.17e+08 497.3420 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1151]
2.17e+08 316.2550 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1154]
2.17e+08 2.93e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1157]
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1160]
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1163]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1166]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1169]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1172]
2.17e+08 496.9900 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1175]
2.17e+08 318.9740 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1178]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1181]
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1184]
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1187]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1190]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1193]
2.17e+08 2.98e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1196]
2.17e+08 495.1010 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1199]
2.17e+08 315.4550 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1202]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1205]
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1208]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1211]
2.17e+08 1.94e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1214]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1217]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1220]
2.17e+08 497.9810 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1223]
2.17e+08 314.6550 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1226]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1229]
2.17e+08 1.60e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1232]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1235]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1238]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1241]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1244]
2.17e+08 495.1340 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1247]
2.17e+08 317.2470 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1250]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1253]
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1256]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1259]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1262]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1265]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1268]
2.17e+08 495.7100 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1271]
2.17e+08 317.5660 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1274]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1277]
2.17e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1280]
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1283]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1286]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1289]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1292]
2.17e+08 493.8860 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1295]
2.17e+08 315.1980 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1298]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1301]
2.17e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1304]
2.17e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1307]
2.17e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1310]
2.17e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1313]
2.17e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1316]
2.17e+08 493.7890 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1319]
2.17e+08 316.7350 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1322]
2.17e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1325]
2.17e+08 1.58e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1328]
2.17e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1331]
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1334]
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1337]
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1340]
2.18e+08 498.8140 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1343]
2.18e+08 314.4620 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1346]
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1349]
2.18e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1352]
2.18e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1355]
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1358]
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1361]
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1364]
2.18e+08 495.4860 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1367]
2.18e+08 316.1580 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1370]
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1373]
2.18e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1376]
2.18e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1379]
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1382]
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1385]
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1388]
2.18e+08 496.5730 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1391]
2.18e+08 318.2710 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1394]
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1397]
2.18e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1400]
2.18e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1403]
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1406]
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1409]
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1412]
2.18e+08 497.2140 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1415]
2.18e+08 319.9340 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1418]
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1421]
2.18e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1424]
2.18e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1427]
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1430]
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1433]
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1436]
2.18e+08 497.4370 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1439]
2.18e+08 315.6470 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1442]
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1445]
2.18e+08 1.59e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1448]
2.18e+08 4.70e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1451]
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1454]
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1457]
2.18e+08 2.99e+03 (76800 1 1) (125 1 1) 102 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_dof_iteration__6 [1460]
2.18e+08 495.1010 (1024 1 1) (5 5 1) 188 0.195313 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_indefinite_stack_integral__3 [1463]
2.18e+08 315.1030 (1024 1 1) (5 5 1) 32 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_knl_reverse_indefinite_stack_integral__4 [1466]
2.18e+08 2.92e+03 (76800 1 1) (5 5 5) 64 6.054688 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumeviscterms__7 [1469]
2.18e+08 1.61e+04 (76800 1 1) (25 1 1) 76 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_faceviscterms__8 [1472]
2.18e+08 4.69e+03 (76800 1 1) (5 5 5) 118 17.77344 0 - - - - Tesla V100-SXM2 1 7 ptxcall_volumerhs__9 [1475]
2.18e+08 1.95e+04 (76800 1 1) (25 1 1) 190 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_facerhs__10 [1478]
2.18e+08 2.41e+03 (56250 1 1) (1024 1 1) 40 0.000000 0 - - - - Tesla V100-SXM2 1 7 ptxcall_update__11 [1481]
2.18e+08 2.96e+05 - - - - - 0.429153 1.448729 Device Pageable Tesla V100-SXM2 1 7 [CUDA memcpy DtoH]
Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows.
SSMem: Static shared memory allocated per CUDA block.
DSMem: Dynamic shared memory allocated per CUDA block.
SrcMemType: The type of source memory accessed by memory operation/copy
DstMemType: The type of destination memory accessed by memory operation/copy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment