Following is the list of CUPTI events that you can profile for a Volta V100 (32 GB) NVIDIA graphics card in a DGX Station.
active_cycles_pm
active_warps_pm
shared_ld_transactions
shared_st_transactions
elapsed_cycles_sm
elapsed_cycles_pm
inst_executed_fp16_pipe_s0
inst_executed_fp16_pipe_s1
inst_executed_fp16_pipe_s2
inst_executed_fp16_pipe_s3
inst_executed_fp64_pipe_s0
inst_executed_fp64_pipe_s1
inst_executed_fp64_pipe_s2
inst_executed_fp64_pipe_s3
inst_executed_fma_pipe_s0
inst_executed_fma_pipe_s1
inst_executed_fma_pipe_s2
inst_executed_fma_pipe_s3
shared_ld_bank_conflict
shared_st_bank_conflict
tensor_pipe_active_cycles_s0
tensor_pipe_active_cycles_s1
tensor_pipe_active_cycles_s2
tensor_pipe_active_cycles_s3
fb_subp0_read_sectors
fb_subp1_read_sectors
fb_subp0_write_sectors
fb_subp1_write_sectors
active_cycles
active_warps
prof_trigger_00
prof_trigger_01
prof_trigger_02
prof_trigger_03
prof_trigger_04
prof_trigger_05
prof_trigger_06
prof_trigger_07
inst_issued0
inst_issued1
inst_executed
thread_inst_executed
not_predicated_off_thread_inst_executed
generic_load
generic_store
atom_count
global_atom_cas
gred_count
global_load
global_store
local_load
local_store
shared_atom
shared_atom_cas
shared_load
shared_store
warps_launched
sm_cta_launched
l2_subp0_write_sector_misses
l2_subp1_write_sector_misses
l2_subp0_read_sector_misses
l2_subp1_read_sector_misses
l2_subp0_read_tex_sector_queries
l2_subp0_write_tex_sector_queries
l2_subp0_read_tex_hit_sectors
l2_subp0_write_tex_hit_sectors
l2_subp0_read_sysmem_sector_queries
l2_subp0_write_sysmem_sector_queries
l2_subp0_total_read_sector_queries
l2_subp0_total_write_sector_queries
l2_subp1_read_tex_sector_queries
l2_subp1_write_tex_sector_queries
l2_subp1_read_tex_hit_sectors
l2_subp1_write_tex_hit_sectors
l2_subp1_read_sysmem_sector_queries
l2_subp1_write_sysmem_sector_queries
l2_subp1_total_read_sector_queries
l2_subp1_total_write_sector_queries
pcie_tx_active_pulse
pcie_rx_active_pulse
elapsed_cycles_sys
active_cycles_sys