Skip to content

Instantly share code, notes, and snippets.

@Mark-Simulacrum
Created September 21, 2019 15:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Mark-Simulacrum/516e03b3ac748a19e843962b680298fd to your computer and use it in GitHub Desktop.
Save Mark-Simulacrum/516e03b3ac748a19e843962b680298fd to your computer and use it in GitHub Desktop.
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-store-misses [Hardware cache event]
node-stores [Hardware cache event]
branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
branch-misses OR cpu/branch-misses/ [Kernel PMU event]
bus-cycles OR cpu/bus-cycles/ [Kernel PMU event]
cache-misses OR cpu/cache-misses/ [Kernel PMU event]
cache-references OR cpu/cache-references/ [Kernel PMU event]
cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event]
cstate_core/c3-residency/ [Kernel PMU event]
cstate_core/c6-residency/ [Kernel PMU event]
cstate_core/c7-residency/ [Kernel PMU event]
cstate_pkg/c2-residency/ [Kernel PMU event]
cstate_pkg/c3-residency/ [Kernel PMU event]
cstate_pkg/c6-residency/ [Kernel PMU event]
cstate_pkg/c7-residency/ [Kernel PMU event]
instructions OR cpu/instructions/ [Kernel PMU event]
mem-loads OR cpu/mem-loads/ [Kernel PMU event]
mem-stores OR cpu/mem-stores/ [Kernel PMU event]
msr/aperf/ [Kernel PMU event]
msr/mperf/ [Kernel PMU event]
msr/smi/ [Kernel PMU event]
msr/tsc/ [Kernel PMU event]
power/energy-cores/ [Kernel PMU event]
power/energy-gpu/ [Kernel PMU event]
power/energy-pkg/ [Kernel PMU event]
power/energy-ram/ [Kernel PMU event]
ref-cycles OR cpu/ref-cycles/ [Kernel PMU event]
topdown-fetch-bubbles OR cpu/topdown-fetch-bubbles/ [Kernel PMU event]
topdown-recovery-bubbles OR cpu/topdown-recovery-bubbles/ [Kernel PMU event]
topdown-slots-issued OR cpu/topdown-slots-issued/ [Kernel PMU event]
topdown-slots-retired OR cpu/topdown-slots-retired/ [Kernel PMU event]
topdown-total-slots OR cpu/topdown-total-slots/ [Kernel PMU event]
uncore_cbox_0/clockticks/ [Kernel PMU event]
uncore_cbox_1/clockticks/ [Kernel PMU event]
uncore_cbox_2/clockticks/ [Kernel PMU event]
uncore_cbox_3/clockticks/ [Kernel PMU event]
uncore_imc/data_reads/ [Kernel PMU event]
uncore_imc/data_writes/ [Kernel PMU event]
cache:
l1d.replacement
[L1D data line replacements]
l1d_pend_miss.fb_full
[Cycles a demand request was blocked due to Fill Buffers inavailability]
l1d_pend_miss.pending
[L1D miss oustandings duration in cycles]
l1d_pend_miss.pending_cycles
[Cycles with L1D load Misses outstanding]
l1d_pend_miss.pending_cycles_any
[Cycles with L1D load Misses outstanding from any thread on physical
core]
l1d_pend_miss.request_fb_full
[Number of times a request needed a FB entry but there was no entry
available for it. That is the FB unavailability was dominant reason
for blocking the request. A request includes cacheable/uncacheable
demands that is load, store or SW prefetch. HWP are e]
l2_demand_rqsts.wb_hit
[Not rejected writebacks that hit L2 cache]
l2_lines_in.all
[L2 cache lines filling L2]
l2_lines_in.e
[L2 cache lines in E state filling L2]
l2_lines_in.i
[L2 cache lines in I state filling L2]
l2_lines_in.s
[L2 cache lines in S state filling L2]
l2_lines_out.demand_clean
[Clean L2 cache lines evicted by demand]
l2_lines_out.demand_dirty
[Dirty L2 cache lines evicted by demand]
l2_rqsts.all_code_rd
[L2 code requests]
l2_rqsts.all_demand_data_rd
[Demand Data Read requests Spec update: HSD78]
l2_rqsts.all_demand_miss
[Demand requests that miss L2 cache Spec update: HSD78]
l2_rqsts.all_demand_references
[Demand requests to L2 cache Spec update: HSD78]
l2_rqsts.all_pf
[Requests from L2 hardware prefetchers]
l2_rqsts.all_rfo
[RFO requests to L2 cache]
l2_rqsts.code_rd_hit
[L2 cache hits when fetching instructions, code reads]
l2_rqsts.code_rd_miss
[L2 cache misses when fetching instructions]
l2_rqsts.demand_data_rd_hit
[Demand Data Read requests that hit L2 cache Spec update: HSD78]
l2_rqsts.demand_data_rd_miss
[Demand Data Read miss L2, no rejects Spec update: HSD78]
l2_rqsts.l2_pf_hit
[L2 prefetch requests that hit L2 cache]
l2_rqsts.l2_pf_miss
[L2 prefetch requests that miss L2 cache]
l2_rqsts.miss
[All requests that miss L2 cache Spec update: HSD78]
l2_rqsts.references
[All L2 requests Spec update: HSD78]
l2_rqsts.rfo_hit
[RFO requests that hit L2 cache]
l2_rqsts.rfo_miss
[RFO requests that miss L2 cache]
l2_trans.all_pf
[L2 or L3 HW prefetches that access L2 cache]
l2_trans.all_requests
[Transactions accessing L2 pipe]
l2_trans.code_rd
[L2 cache accesses when fetching instructions]
l2_trans.demand_data_rd
[Demand Data Read requests that access L2 cache]
l2_trans.l1d_wb
[L1D writebacks that access L2 cache]
l2_trans.l2_fill
[L2 fill requests that access L2 cache]
l2_trans.l2_wb
[L2 writebacks that access L2 cache]
l2_trans.rfo
[RFO requests that access L2 cache]
lock_cycles.cache_lock_duration
[Cycles when L1D is locked]
longest_lat_cache.miss
[Core-originated cacheable demand requests missed L3]
longest_lat_cache.reference
[Core-originated cacheable demand requests that refer to L3]
mem_load_uops_l3_hit_retired.xsnp_hit
[Retired load uops which data sources were L3 and cross-core snoop hits
in on-pkg core cache Spec update: HSD29, HSD25, HSM26, HSM30. Supports
address when precise (Precise event)]
mem_load_uops_l3_hit_retired.xsnp_hitm
[Retired load uops which data sources were HitM responses from shared
L3 Spec update: HSD29, HSD25, HSM26, HSM30. Supports address when
precise (Precise event)]
mem_load_uops_l3_hit_retired.xsnp_miss
[Retired load uops which data sources were L3 hit and cross-core snoop
missed in on-pkg core cache Spec update: HSD29, HSD25, HSM26, HSM30.
Supports address when precise (Precise event)]
mem_load_uops_l3_hit_retired.xsnp_none
[Retired load uops which data sources were hits in L3 without snoops
required Spec update: HSD74, HSD29, HSD25, HSM26, HSM30. Supports
address when precise (Precise event)]
mem_load_uops_l3_miss_retired.local_dram
[Data from local DRAM either Snoop not needed or Snoop Miss (RspI) Spec
update: HSD74, HSD29, HSD25, HSM30. Supports address when precise
(Precise event)]
mem_load_uops_retired.hit_lfb
[Retired load uops which data sources were load uops missed L1 but hit
FB due to preceding miss to the same cache line with data not ready
Spec update: HSM30. Supports address when precise (Precise event)]
mem_load_uops_retired.l1_hit
[Retired load uops with L1 cache hits as data sources Spec update:
HSD29, HSM30. Supports address when precise (Precise event)]
mem_load_uops_retired.l1_miss
[Retired load uops misses in L1 cache as data sources Spec update:
HSM30. Supports address when precise (Precise event)]
mem_load_uops_retired.l2_hit
[Retired load uops with L2 cache hits as data sources Spec update:
HSD76, HSD29, HSM30. Supports address when precise (Precise event)]
mem_load_uops_retired.l2_miss
[Miss in mid-level (L2) cache. Excludes Unknown data-source Spec
update: HSD29, HSM30. Supports address when precise (Precise event)]
mem_load_uops_retired.l3_hit
[Retired load uops which data sources were data hits in L3 without
snoops required Spec update: HSD74, HSD29, HSD25, HSM26, HSM30.
Supports address when precise (Precise event)]
mem_load_uops_retired.l3_miss
[Miss in last-level (L3) cache. Excludes Unknown data-source Spec
update: HSD74, HSD29, HSD25, HSM26, HSM30. Supports address when
precise (Precise event)]
mem_uops_retired.all_loads
[All retired load uops Spec update: HSD29, HSM30. Supports address when
precise (Precise event)]
mem_uops_retired.all_stores
[All retired store uops Spec update: HSD29, HSM30. Supports address
when precise (Precise event)]
mem_uops_retired.lock_loads
[Retired load uops with locked access Spec update: HSD76, HSD29, HSM30.
Supports address when precise (Precise event)]
mem_uops_retired.split_loads
[Retired load uops that split across a cacheline boundary Spec update:
HSD29, HSM30. Supports address when precise (Precise event)]
mem_uops_retired.split_stores
[Retired store uops that split across a cacheline boundary Spec update:
HSD29, HSM30. Supports address when precise (Precise event)]
mem_uops_retired.stlb_miss_loads
[Retired load uops that miss the STLB Spec update: HSD29, HSM30.
Supports address when precise (Precise event)]
mem_uops_retired.stlb_miss_stores
[Retired store uops that miss the STLB Spec update: HSD29, HSM30.
Supports address when precise (Precise event)]
offcore_requests.all_data_rd
[Demand and prefetch data reads]
offcore_requests.demand_code_rd
[Cacheable and noncachaeble code read requests]
offcore_requests.demand_data_rd
[Demand Data Read requests sent to uncore Spec update: HSD78]
offcore_requests.demand_rfo
[Demand RFO requests including regular RFOs, locks, ItoM]
offcore_requests_buffer.sq_full
[Offcore requests buffer cannot take more entries for this thread core]
offcore_requests_outstanding.all_data_rd
[Offcore outstanding cacheable Core Data Read transactions in
SuperQueue (SQ), queue to uncore Spec update: HSD62, HSD61]
offcore_requests_outstanding.cycles_with_data_rd
[Cycles when offcore outstanding cacheable Core Data Read transactions
are present in SuperQueue (SQ), queue to uncore Spec update: HSD62,
HSD61]
offcore_requests_outstanding.cycles_with_demand_data_rd
[Cycles when offcore outstanding Demand Data Read transactions are
present in SuperQueue (SQ), queue to uncore Spec update: HSD78, HSD62,
HSD61]
offcore_requests_outstanding.cycles_with_demand_rfo
[Offcore outstanding demand rfo reads transactions in SuperQueue (SQ),
queue to uncore, every cycle Spec update: HSD62, HSD61]
offcore_requests_outstanding.demand_code_rd
[Offcore outstanding code reads transactions in SuperQueue (SQ), queue
to uncore, every cycle Spec update: HSD62, HSD61]
offcore_requests_outstanding.demand_data_rd
[Offcore outstanding Demand Data Read transactions in uncore queue Spec
update: HSD78, HSD62, HSD61]
offcore_requests_outstanding.demand_data_rd_ge_6
[Cycles with at least 6 offcore outstanding Demand Data Read
transactions in uncore queue Spec update: HSD78, HSD62, HSD61]
offcore_requests_outstanding.demand_rfo
[Offcore outstanding RFO store transactions in SuperQueue (SQ), queue
to uncore Spec update: HSD62, HSD61]
offcore_response
[Offcore response can be programmed only with a specific pair of event
select and counter MSR, and with specific event codes and predefine
mask bit value in a dedicated MSR to specify attributes of the offcore
transaction]
offcore_response.all_code_rd.l3_hit.hit_other_core_no_fwd
[Counts all demand & prefetch code reads that hit in the L3 and the
snoops to sibling cores hit in either E/S state and the line is not
forwarded]
offcore_response.all_data_rd.l3_hit.hit_other_core_no_fwd
[Counts all demand & prefetch data reads that hit in the L3 and the
snoops to sibling cores hit in either E/S state and the line is not
forwarded]
offcore_response.all_data_rd.l3_hit.hitm_other_core
[Counts all demand & prefetch data reads that hit in the L3 and the
snoop to one of the sibling cores hits the line in M state and the
line is forwarded]
offcore_response.all_reads.l3_hit.hit_other_core_no_fwd
[Counts all data/code/rfo reads (demand & prefetch) that hit in the L3
and the snoops to sibling cores hit in either E/S state and the line
is not forwarded]
offcore_response.all_reads.l3_hit.hitm_other_core
[Counts all data/code/rfo reads (demand & prefetch) that hit in the L3
and the snoop to one of the sibling cores hits the line in M state and
the line is forwarded]
offcore_response.all_requests.l3_hit.any_response
[Counts all requests that hit in the L3]
offcore_response.all_rfo.l3_hit.hit_other_core_no_fwd
[Counts all demand & prefetch RFOs that hit in the L3 and the snoops to
sibling cores hit in either E/S state and the line is not forwarded]
offcore_response.all_rfo.l3_hit.hitm_other_core
[Counts all demand & prefetch RFOs that hit in the L3 and the snoop to
one of the sibling cores hits the line in M state and the line is
forwarded]
offcore_response.demand_code_rd.l3_hit.hit_other_core_no_fwd
[Counts all demand code reads that hit in the L3 and the snoops to
sibling cores hit in either E/S state and the line is not forwarded]
offcore_response.demand_code_rd.l3_hit.hitm_other_core
[Counts all demand code reads that hit in the L3 and the snoop to one
of the sibling cores hits the line in M state and the line is
forwarded]
offcore_response.demand_data_rd.l3_hit.hit_other_core_no_fwd
[Counts demand data reads that hit in the L3 and the snoops to sibling
cores hit in either E/S state and the line is not forwarded]
offcore_response.demand_data_rd.l3_hit.hitm_other_core
[Counts demand data reads that hit in the L3 and the snoop to one of
the sibling cores hits the line in M state and the line is forwarded]
offcore_response.demand_rfo.l3_hit.hit_other_core_no_fwd
[Counts all demand data writes (RFOs) that hit in the L3 and the snoops
to sibling cores hit in either E/S state and the line is not forwarded]
offcore_response.demand_rfo.l3_hit.hitm_other_core
[Counts all demand data writes (RFOs) that hit in the L3 and the snoop
to one of the sibling cores hits the line in M state and the line is
forwarded]
offcore_response.pf_l2_code_rd.l3_hit.any_response
[Counts all prefetch (that bring data to LLC only) code reads that hit
in the L3]
offcore_response.pf_l2_data_rd.l3_hit.any_response
[Counts prefetch (that bring data to L2) data reads that hit in the L3]
offcore_response.pf_l2_rfo.l3_hit.any_response
[Counts all prefetch (that bring data to L2) RFOs that hit in the L3]
offcore_response.pf_l3_code_rd.l3_hit.any_response
[Counts prefetch (that bring data to LLC only) code reads that hit in
the L3]
offcore_response.pf_l3_data_rd.l3_hit.any_response
[Counts all prefetch (that bring data to LLC only) data reads that hit
in the L3]
offcore_response.pf_l3_rfo.l3_hit.any_response
[Counts all prefetch (that bring data to LLC only) RFOs that hit in the
L3]
sq_misc.split_lock
[Split locks in SQ]
floating point:
avx_insts.all
[Approximate counts of AVX & AVX2 256-bit instructions, including
non-arithmetic instructions, loads, and stores. May count non-AVX
instructions that employ 256-bit operations, including (but not
necessarily limited to) rep string instructions that use 256-bit loads
and stores for optimized performance, XSAVE* and XRSTOR*, and
operations that transition the x87 FPU data registers between x87 and
MMX]
fp_assist.any
[Cycles with any input/output SSE or FP assist]
fp_assist.simd_input
[Number of SIMD FP assists due to input values]
fp_assist.simd_output
[Number of SIMD FP assists due to Output values]
fp_assist.x87_input
[Number of X87 assists due to input value]
fp_assist.x87_output
[Number of X87 assists due to output value]
other_assists.avx_to_sse
[Number of transitions from AVX-256 to legacy SSE when penalty
applicable Spec update: HSD56, HSM57]
other_assists.sse_to_avx
[Number of transitions from SSE to AVX-256 when penalty applicable Spec
update: HSD56, HSM57]
frontend:
dsb2mite_switches.penalty_cycles
[Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles]
icache.hit
[Number of Instruction Cache, Streaming Buffer and Victim Cache Reads.
both cacheable and noncacheable, including UC fetches]
icache.ifdata_stall
[Cycles where a code fetch is stalled due to L1 instruction-cache miss]
icache.ifetch_stall
[Cycles where a code fetch is stalled due to L1 instruction-cache miss]
icache.misses
[Number of Instruction Cache, Streaming Buffer and Victim Cache Misses.
Includes Uncacheable accesses]
idq.all_dsb_cycles_4_uops
[Cycles Decode Stream Buffer (DSB) is delivering 4 Uops]
idq.all_dsb_cycles_any_uops
[Cycles Decode Stream Buffer (DSB) is delivering any Uop]
idq.all_mite_cycles_4_uops
[Cycles MITE is delivering 4 Uops]
idq.all_mite_cycles_any_uops
[Cycles MITE is delivering any Uop]
idq.dsb_cycles
[Cycles when uops are being delivered to Instruction Decode Queue (IDQ)
from Decode Stream Buffer (DSB) path]
idq.dsb_uops
[Uops delivered to Instruction Decode Queue (IDQ) from the Decode
Stream Buffer (DSB) path]
idq.empty
[Instruction Decode Queue (IDQ) empty cycles Spec update: HSD135]
idq.mite_all_uops
[Uops delivered to Instruction Decode Queue (IDQ) from MITE path]
idq.mite_cycles
[Cycles when uops are being delivered to Instruction Decode Queue (IDQ)
from MITE path]
idq.mite_uops
[Uops delivered to Instruction Decode Queue (IDQ) from MITE path]
idq.ms_cycles
[Cycles when uops are being delivered to Instruction Decode Queue (IDQ)
while Microcode Sequenser (MS) is busy]
idq.ms_dsb_cycles
[Cycles when uops initiated by Decode Stream Buffer (DSB) are being
delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser
(MS) is busy]
idq.ms_dsb_occur
[Deliveries to Instruction Decode Queue (IDQ) initiated by Decode
Stream Buffer (DSB) while Microcode Sequenser (MS) is busy]
idq.ms_dsb_uops
[Uops initiated by Decode Stream Buffer (DSB) that are being delivered
to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is
busy]
idq.ms_mite_uops
[Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ)
while Microcode Sequenser (MS) is busy]
idq.ms_switches
[Number of switches from DSB (Decode Stream Buffer) or MITE (legacy
decode pipeline) to the Microcode Sequencer]
idq.ms_uops
[Uops delivered to Instruction Decode Queue (IDQ) while Microcode
Sequenser (MS) is busy]
idq_uops_not_delivered.core
[Uops not delivered to Resource Allocation Table (RAT) per thread when
backend of the machine is not stalled Spec update: HSD135]
idq_uops_not_delivered.cycles_0_uops_deliv.core
[Cycles per thread when 4 or more uops are not delivered to Resource
Allocation Table (RAT) when backend of the machine is not stalled Spec
update: HSD135]
idq_uops_not_delivered.cycles_fe_was_ok
[Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT)
was stalling FE Spec update: HSD135]
idq_uops_not_delivered.cycles_le_1_uop_deliv.core
[Cycles per thread when 3 or more uops are not delivered to Resource
Allocation Table (RAT) when backend of the machine is not stalled Spec
update: HSD135]
idq_uops_not_delivered.cycles_le_2_uop_deliv.core
[Cycles with less than 2 uops delivered by the front end Spec update:
HSD135]
idq_uops_not_delivered.cycles_le_3_uop_deliv.core
[Cycles with less than 3 uops delivered by the front end Spec update:
HSD135]
memory:
hle_retired.aborted
[Number of times an HLE execution aborted due to any reasons (multiple
categories may count as one) (Precise event)]
hle_retired.aborted_misc1
[Number of times an HLE execution aborted due to various memory events
(e.g., read/write capacity and conflicts)]
hle_retired.aborted_misc2
[Number of times an HLE execution aborted due to uncommon conditions]
hle_retired.aborted_misc3
[Number of times an HLE execution aborted due to HLE-unfriendly
instructions]
hle_retired.aborted_misc4
[Number of times an HLE execution aborted due to incompatible memory
type Spec update: HSD65]
hle_retired.aborted_misc5
[Number of times an HLE execution aborted due to none of the previous 4
categories (e.g. interrupts)]
hle_retired.commit
[Number of times an HLE execution successfully committed]
hle_retired.start
[Number of times an HLE execution started]
machine_clears.memory_ordering
[Counts the number of machine clears due to memory order conflicts]
mem_trans_retired.load_latency_gt_128
[Loads with latency value being above 128 Spec update: HSD76, HSD25,
HSM26 (Must be precise)]
mem_trans_retired.load_latency_gt_16
[Loads with latency value being above 16 Spec update: HSD76, HSD25,
HSM26 (Must be precise)]
mem_trans_retired.load_latency_gt_256
[Loads with latency value being above 256 Spec update: HSD76, HSD25,
HSM26 (Must be precise)]
mem_trans_retired.load_latency_gt_32
[Loads with latency value being above 32 Spec update: HSD76, HSD25,
HSM26 (Must be precise)]
mem_trans_retired.load_latency_gt_4
[Loads with latency value being above 4 Spec update: HSD76, HSD25,
HSM26 (Must be precise)]
mem_trans_retired.load_latency_gt_512
[Loads with latency value being above 512 Spec update: HSD76, HSD25,
HSM26 (Must be precise)]
mem_trans_retired.load_latency_gt_64
[Loads with latency value being above 64 Spec update: HSD76, HSD25,
HSM26 (Must be precise)]
mem_trans_retired.load_latency_gt_8
[Loads with latency value being above 8 Spec update: HSD76, HSD25,
HSM26 (Must be precise)]
misalign_mem_ref.loads
[Speculative cache line split load uops dispatched to L1 cache]
misalign_mem_ref.stores
[Speculative cache line split STA uops dispatched to L1 cache]
offcore_response.all_code_rd.l3_miss.any_response
[Counts all demand & prefetch code reads that miss in the L3]
offcore_response.all_code_rd.l3_miss.local_dram
[Counts all demand & prefetch code reads that miss the L3 and the data
is returned from local dram]
offcore_response.all_data_rd.l3_miss.any_response
[Counts all demand & prefetch data reads that miss in the L3]
offcore_response.all_data_rd.l3_miss.local_dram
[Counts all demand & prefetch data reads that miss the L3 and the data
is returned from local dram]
offcore_response.all_reads.l3_miss.any_response
[Counts all data/code/rfo reads (demand & prefetch) that miss in the L3]
offcore_response.all_reads.l3_miss.local_dram
[Counts all data/code/rfo reads (demand & prefetch) that miss the L3
and the data is returned from local dram]
offcore_response.all_requests.l3_miss.any_response
[Counts all requests that miss in the L3]
offcore_response.all_rfo.l3_miss.any_response
[Counts all demand & prefetch RFOs that miss in the L3]
offcore_response.all_rfo.l3_miss.local_dram
[Counts all demand & prefetch RFOs that miss the L3 and the data is
returned from local dram]
offcore_response.demand_code_rd.l3_miss.any_response
[Counts all demand code reads that miss in the L3]
offcore_response.demand_code_rd.l3_miss.local_dram
[Counts all demand code reads that miss the L3 and the data is returned
from local dram]
offcore_response.demand_data_rd.l3_miss.any_response
[Counts demand data reads that miss in the L3]
offcore_response.demand_data_rd.l3_miss.local_dram
[Counts demand data reads that miss the L3 and the data is returned
from local dram]
offcore_response.demand_rfo.l3_miss.any_response
[Counts all demand data writes (RFOs) that miss in the L3]
offcore_response.demand_rfo.l3_miss.local_dram
[Counts all demand data writes (RFOs) that miss the L3 and the data is
returned from local dram]
offcore_response.pf_l2_code_rd.l3_miss.any_response
[Counts all prefetch (that bring data to LLC only) code reads that miss
in the L3]
offcore_response.pf_l2_data_rd.l3_miss.any_response
[Counts prefetch (that bring data to L2) data reads that miss in the L3]
offcore_response.pf_l2_rfo.l3_miss.any_response
[Counts all prefetch (that bring data to L2) RFOs that miss in the L3]
offcore_response.pf_l3_code_rd.l3_miss.any_response
[Counts prefetch (that bring data to LLC only) code reads that miss in
the L3]
offcore_response.pf_l3_data_rd.l3_miss.any_response
[Counts all prefetch (that bring data to LLC only) data reads that miss
in the L3]
offcore_response.pf_l3_rfo.l3_miss.any_response
[Counts all prefetch (that bring data to LLC only) RFOs that miss in
the L3]
rtm_retired.aborted
[Number of times an RTM execution aborted due to any reasons (multiple
categories may count as one) (Precise event)]
rtm_retired.aborted_misc1
[Number of times an RTM execution aborted due to various memory events
(e.g. read/write capacity and conflicts)]
rtm_retired.aborted_misc2
[Number of times an RTM execution aborted due to various memory events
(e.g., read/write capacity and conflicts)]
rtm_retired.aborted_misc3
[Number of times an RTM execution aborted due to HLE-unfriendly
instructions]
rtm_retired.aborted_misc4
[Number of times an RTM execution aborted due to incompatible memory
type Spec update: HSD65]
rtm_retired.aborted_misc5
[Number of times an RTM execution aborted due to none of the previous 4
categories (e.g. interrupt)]
rtm_retired.commit
[Number of times an RTM execution successfully committed]
rtm_retired.start
[Number of times an RTM execution started]
tx_exec.misc1
[Counts the number of times a class of instructions that may cause a
transactional abort was executed. Since this is the count of
execution, it may not always cause a transactional abort]
tx_exec.misc2
[Counts the number of times a class of instructions (e.g., vzeroupper)
that may cause a transactional abort was executed inside a
transactional region]
tx_exec.misc3
[Counts the number of times an instruction execution caused the
transactional nest count supported to be exceeded]
tx_exec.misc4
[Counts the number of times a XBEGIN instruction was executed inside an
HLE transactional region]
tx_exec.misc5
[Counts the number of times an HLE XACQUIRE instruction was executed
inside an RTM transactional region]
tx_mem.abort_capacity_write
[Number of times a transactional abort was signaled due to a data
capacity limitation for transactional writes]
tx_mem.abort_conflict
[Number of times a transactional abort was signaled due to a data
conflict on a transactionally accessed address]
tx_mem.abort_hle_elision_buffer_mismatch
[Number of times an HLE transactional execution aborted due to XRELEASE
lock not satisfying the address and value requirements in the elision
buffer]
tx_mem.abort_hle_elision_buffer_not_empty
[Number of times an HLE transactional execution aborted due to
NoAllocatedElisionBuffer being non-zero]
tx_mem.abort_hle_elision_buffer_unsupported_alignment
[Number of times an HLE transactional execution aborted due to an
unsupported read alignment from the elision buffer]
tx_mem.abort_hle_store_to_elided_lock
[Number of times a HLE transactional region aborted due to a non
XRELEASE prefixed instruction writing to an elided lock in the elision
buffer]
tx_mem.hle_elision_buffer_full
[Number of times HLE lock could not be elided due to
ElisionBufferAvailable being zero]
other:
cpl_cycles.ring0
[Unhalted core cycles when the thread is in ring 0]
cpl_cycles.ring0_trans
[Number of intervals between processor halts while thread is in ring 0]
cpl_cycles.ring123
[Unhalted core cycles when thread is in rings 1, 2, or 3]
lock_cycles.split_lock_uc_lock_duration
[Cycles when L1 and L2 are locked due to UC or split lock]
pipeline:
arith.divider_uops
[Any uop executed by the Divider. (This includes all divide uops, sqrt,
...)]
baclears.any
[Counts the total number when the front end is resteered, mainly when
the BPU cannot provide a correct prediction and this is corrected by
other branch handling mechanisms at the front end]
br_inst_exec.all_branches
[Speculative and retired branches]
br_inst_exec.all_conditional
[Speculative and retired macro-conditional branches]
br_inst_exec.all_direct_jmp
[Speculative and retired macro-unconditional branches excluding calls
and indirects]
br_inst_exec.all_direct_near_call
[Speculative and retired direct near calls]
br_inst_exec.all_indirect_jump_non_call_ret
[Speculative and retired indirect branches excluding calls and returns]
br_inst_exec.all_indirect_near_return
[Speculative and retired indirect return branches]
br_inst_exec.nontaken_conditional
[Not taken macro-conditional branches]
br_inst_exec.taken_conditional
[Taken speculative and retired macro-conditional branches]
br_inst_exec.taken_direct_jump
[Taken speculative and retired macro-conditional branch instructions
excluding calls and indirects]
br_inst_exec.taken_direct_near_call
[Taken speculative and retired direct near calls]
br_inst_exec.taken_indirect_jump_non_call_ret
[Taken speculative and retired indirect branches excluding calls and
returns]
br_inst_exec.taken_indirect_near_call
[Taken speculative and retired indirect calls]
br_inst_exec.taken_indirect_near_return
[Taken speculative and retired indirect branches with return mnemonic]
br_inst_retired.all_branches
[All (macro) branch instructions retired]
br_inst_retired.all_branches_pebs
[All (macro) branch instructions retired (Must be precise)]
br_inst_retired.conditional
[Conditional branch instructions retired (Precise event)]
br_inst_retired.far_branch
[Far branch instructions retired]
br_inst_retired.near_call
[Direct and indirect near call instructions retired (Precise event)]
br_inst_retired.near_return
[Return instructions retired (Precise event)]
br_inst_retired.near_taken
[Taken branch instructions retired (Precise event)]
br_inst_retired.not_taken
[Not taken branch instructions retired]
br_misp_exec.all_branches
[Speculative and retired mispredicted macro conditional branches]
br_misp_exec.all_conditional
[Speculative and retired mispredicted macro conditional branches]
br_misp_exec.all_indirect_jump_non_call_ret
[Mispredicted indirect branches excluding calls and returns]
br_misp_exec.nontaken_conditional
[Not taken speculative and retired mispredicted macro conditional
branches]
br_misp_exec.taken_conditional
[Taken speculative and retired mispredicted macro conditional branches]
br_misp_exec.taken_indirect_jump_non_call_ret
[Taken speculative and retired mispredicted indirect branches excluding
calls and returns]
br_misp_exec.taken_indirect_near_call
[Taken speculative and retired mispredicted indirect calls]
br_misp_exec.taken_return_near
[Taken speculative and retired mispredicted indirect branches with
return mnemonic]
br_misp_retired.all_branches
[All mispredicted macro branch instructions retired]
br_misp_retired.all_branches_pebs
[Mispredicted macro branch instructions retired (Must be precise)]
br_misp_retired.conditional
[Mispredicted conditional branch instructions retired (Precise event)]
br_misp_retired.near_taken
[number of near branch instructions retired that were mispredicted and
taken (Precise event)]
cpu_clk_thread_unhalted.one_thread_active
[Count XClk pulses when this thread is unhalted and the other thread is
halted]
cpu_clk_thread_unhalted.ref_xclk
[Reference cycles when the thread is unhalted (counts at 100 MHz rate)]
cpu_clk_thread_unhalted.ref_xclk_any
[Reference cycles when the at least one thread on the physical core is
unhalted (counts at 100 MHz rate)]
cpu_clk_unhalted.one_thread_active
[Count XClk pulses when this thread is unhalted and the other thread is
halted]
cpu_clk_unhalted.ref_tsc
[Reference cycles when the core is not in halt state]
cpu_clk_unhalted.ref_xclk
[Reference cycles when the thread is unhalted (counts at 100 MHz rate)]
cpu_clk_unhalted.ref_xclk_any
[Reference cycles when the at least one thread on the physical core is
unhalted (counts at 100 MHz rate)]
cpu_clk_unhalted.thread
[Core cycles when the thread is not in halt state]
cpu_clk_unhalted.thread_any
[Core cycles when at least one thread on the physical core is not in
halt state]
cpu_clk_unhalted.thread_p
[Thread cycles when thread is not in halt state]
cpu_clk_unhalted.thread_p_any
[Core cycles when at least one thread on the physical core is not in
halt state]
cycle_activity.cycles_l1d_pending
[Cycles with pending L1 cache miss loads]
cycle_activity.cycles_l2_pending
[Cycles with pending L2 cache miss loads Spec update: HSD78]
cycle_activity.cycles_ldm_pending
[Cycles with pending memory loads]
cycle_activity.cycles_no_execute
[Total execution stalls]
cycle_activity.stalls_l1d_pending
[Execution stalls due to L1 data cache misses]
cycle_activity.stalls_l2_pending
[Execution stalls due to L2 cache misses]
cycle_activity.stalls_ldm_pending
[Execution stalls due to memory subsystem]
ild_stall.iq_full
[Stall cycles because IQ is full]
ild_stall.lcp
[Stalls caused by changing prefix length of the instruction]
inst_retired.any
[Instructions retired from execution Spec update: HSD140, HSD143]
inst_retired.any_p
[Number of instructions retired. General Counter - architectural event
Spec update: HSD11, HSD140]
inst_retired.prec_dist
[Precise instruction retired event with HW to reduce effect of PEBS
shadow in IP distribution Spec update: HSD140 (Must be precise)]
inst_retired.x87
[FP operations retired. X87 FP operations that have no exceptions:
Counts also flows that have several X87 or flows that use X87 uops in
the exception handling]
int_misc.recovery_cycles
[Number of cycles waiting for the checkpoints in Resource Allocation
Table (RAT) to be recovered after Nuke due to all other cases except
JEClear (e.g. whenever a ucode assist is needed like SSE exception,
memory disambiguation, etc...)]
int_misc.recovery_cycles_any
[Core cycles the allocator was stalled due to recovery from earlier
clear event for any thread running on the physical core (e.g.
misprediction or memory nuke)]
ld_blocks.no_sr
[The number of times that split load operations are temporarily blocked
because all resources for handling the split accesses are in use]
ld_blocks.store_forward
[loads blocked by overlapping with store buffer that cannot be
forwarded]
ld_blocks_partial.address_alias
[False dependencies in MOB due to partial compare on address]
load_hit_pre.hw_pf
[Not software-prefetch load dispatches that hit FB allocated for
hardware prefetch]
load_hit_pre.sw_pf
[Not software-prefetch load dispatches that hit FB allocated for
software prefetch]
lsd.cycles_4_uops
[Cycles 4 Uops delivered by the LSD, but didn't come from the decoder]
lsd.cycles_active
[Cycles Uops delivered by the LSD, but didn't come from the decoder]
lsd.uops
[Number of Uops delivered by the LSD]
machine_clears.count
[Number of machine clears (nukes) of any type]
machine_clears.cycles
[Cycles there was a Nuke. Account for both thread-specific and All
Thread Nukes]
machine_clears.maskmov
[This event counts the number of executed Intel AVX masked load
operations that refer to an illegal address range with the mask bits
set to 0]
machine_clears.smc
[Self-modifying code (SMC) detected]
move_elimination.int_eliminated
[Number of integer Move Elimination candidate uops that were eliminated]
move_elimination.int_not_eliminated
[Number of integer Move Elimination candidate uops that were not
eliminated]
move_elimination.simd_eliminated
[Number of SIMD Move Elimination candidate uops that were eliminated]
move_elimination.simd_not_eliminated
[Number of SIMD Move Elimination candidate uops that were not
eliminated]
other_assists.any_wb_assist
[Number of times any microcode assist is invoked by HW upon uop
writeback]
resource_stalls.any
[Resource-related stall cycles Spec update: HSD135]
resource_stalls.rob
[Cycles stalled due to re-order buffer full]
resource_stalls.rs
[Cycles stalled due to no eligible RS entry available]
resource_stalls.sb
[Cycles stalled due to no store buffers available. (not including
draining form sync)]
rob_misc_events.lbr_inserts
[Count cases of saving new LBR]
rs_events.empty_cycles
[Cycles when Reservation Station (RS) is empty for the thread]
rs_events.empty_end
[Counts end of periods where the Reservation Station (RS) was empty.
Could be useful to precisely locate Frontend Latency Bound issues]
uops_dispatched_port.port_0
[Cycles per thread when uops are executed in port 0]
uops_dispatched_port.port_1
[Cycles per thread when uops are executed in port 1]
uops_dispatched_port.port_2
[Cycles per thread when uops are executed in port 2]
uops_dispatched_port.port_3
[Cycles per thread when uops are executed in port 3]
uops_dispatched_port.port_4
[Cycles per thread when uops are executed in port 4]
uops_dispatched_port.port_5
[Cycles per thread when uops are executed in port 5]
uops_dispatched_port.port_6
[Cycles per thread when uops are executed in port 6]
uops_dispatched_port.port_7
[Cycles per thread when uops are executed in port 7]
uops_executed.core
[Number of uops executed on the core Spec update: HSD30, HSM31]
uops_executed.core_cycles_ge_1
[Cycles at least 1 micro-op is executed from any thread on physical
core Spec update: HSD30, HSM31]
uops_executed.core_cycles_ge_2
[Cycles at least 2 micro-op is executed from any thread on physical
core Spec update: HSD30, HSM31]
uops_executed.core_cycles_ge_3
[Cycles at least 3 micro-op is executed from any thread on physical
core Spec update: HSD30, HSM31]
uops_executed.core_cycles_ge_4
[Cycles at least 4 micro-op is executed from any thread on physical
core Spec update: HSD30, HSM31]
uops_executed.core_cycles_none
[Cycles with no micro-ops executed from any thread on physical core
Spec update: HSD30, HSM31]
uops_executed.cycles_ge_1_uop_exec
[Cycles where at least 1 uop was executed per-thread Spec update:
HSD144, HSD30, HSM31]
uops_executed.cycles_ge_2_uops_exec
[Cycles where at least 2 uops were executed per-thread Spec update:
HSD144, HSD30, HSM31]
uops_executed.cycles_ge_3_uops_exec
[Cycles where at least 3 uops were executed per-thread Spec update:
HSD144, HSD30, HSM31]
uops_executed.cycles_ge_4_uops_exec
[Cycles where at least 4 uops were executed per-thread Spec update:
HSD144, HSD30, HSM31]
uops_executed.stall_cycles
[Counts number of cycles no uops were dispatched to be executed on this
thread Spec update: HSD144, HSD30, HSM31]
uops_executed_port.port_0
[Cycles per thread when uops are executed in port 0]
uops_executed_port.port_0_core
[Cycles per core when uops are exectuted in port 0]
uops_executed_port.port_1
[Cycles per thread when uops are executed in port 1]
uops_executed_port.port_1_core
[Cycles per core when uops are exectuted in port 1]
uops_executed_port.port_2
[Cycles per thread when uops are executed in port 2]
uops_executed_port.port_2_core
[Cycles per core when uops are dispatched to port 2]
uops_executed_port.port_3
[Cycles per thread when uops are executed in port 3]
uops_executed_port.port_3_core
[Cycles per core when uops are dispatched to port 3]
uops_executed_port.port_4
[Cycles per thread when uops are executed in port 4]
uops_executed_port.port_4_core
[Cycles per core when uops are exectuted in port 4]
uops_executed_port.port_5
[Cycles per thread when uops are executed in port 5]
uops_executed_port.port_5_core
[Cycles per core when uops are exectuted in port 5]
uops_executed_port.port_6
[Cycles per thread when uops are executed in port 6]
uops_executed_port.port_6_core
[Cycles per core when uops are exectuted in port 6]
uops_executed_port.port_7
[Cycles per thread when uops are executed in port 7]
uops_executed_port.port_7_core
[Cycles per core when uops are dispatched to port 7]
uops_issued.any
[Uops that Resource Allocation Table (RAT) issues to Reservation
Station (RS)]
uops_issued.core_stall_cycles
[Cycles when Resource Allocation Table (RAT) does not issue Uops to
Reservation Station (RS) for all threads]
uops_issued.flags_merge
[Number of flags-merge uops being allocated. Such uops considered perf
sensitive; added by GSR u-arch]
uops_issued.single_mul
[Number of Multiply packed/scalar single precision uops allocated]
uops_issued.slow_lea
[Number of slow LEA uops being allocated. A uop is generally considered
SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if
as a result of LEA instruction or not]
uops_issued.stall_cycles
[Cycles when Resource Allocation Table (RAT) does not issue Uops to
Reservation Station (RS) for the thread]
uops_retired.all
[Actually retired uops Supports address when precise (Precise event)]
uops_retired.core_stall_cycles
[Cycles without actually retired uops]
uops_retired.retire_slots
[Retirement slots used (Precise event)]
uops_retired.stall_cycles
[Cycles without actually retired uops]
uops_retired.total_cycles
[Cycles with less than 10 actually retired uops]
uncore:
unc_arb_coh_trk_occupancy.all
[Unit: uncore_arb Each cycle count number of valid entries in Coherency
Tracker queue from allocation till deallocation. Aperture requests
(snoops) appear as NC decoded internally and become coherent (snoop
L3, access memory)]
unc_arb_coh_trk_requests.all
[Unit: uncore_arb Number of entries allocated. Account for Any type:
e.g. Snoop, Core aperture, etc]
unc_arb_trk_occupancy.all
[Unit: uncore_arb Each cycle count number of all Core outgoing valid
entries. Such entry is defined as valid from it's allocation till
first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and
non-coherent traffic]
unc_arb_trk_requests.all
[Unit: uncore_arb Total number of Core outgoing entries allocated.
Accounts for Coherent and non-coherent traffic]
unc_arb_trk_requests.writes
[Unit: uncore_arb Number of Writes allocated - any write transactions:
full/partials writes and evictions]
unc_cbo_cache_lookup.any_es
[Unit: uncore_cbox L3 Lookup any request that access cache and found
line in E or S-state]
unc_cbo_cache_lookup.any_i
[Unit: uncore_cbox L3 Lookup any request that access cache and found
line in I-state]
unc_cbo_cache_lookup.any_m
[Unit: uncore_cbox L3 Lookup any request that access cache and found
line in M-state]
unc_cbo_cache_lookup.any_mesi
[Unit: uncore_cbox L3 Lookup any request that access cache and found
line in MESI-state]
unc_cbo_cache_lookup.extsnp_es
[Unit: uncore_cbox L3 Lookup external snoop request that access cache
and found line in E or S-state]
unc_cbo_cache_lookup.extsnp_i
[Unit: uncore_cbox L3 Lookup external snoop request that access cache
and found line in I-state]
unc_cbo_cache_lookup.extsnp_m
[Unit: uncore_cbox L3 Lookup external snoop request that access cache
and found line in M-state]
unc_cbo_cache_lookup.extsnp_mesi
[Unit: uncore_cbox L3 Lookup external snoop request that access cache
and found line in MESI-state]
unc_cbo_cache_lookup.read_es
[Unit: uncore_cbox L3 Lookup read request that access cache and found
line in E or S-state]
unc_cbo_cache_lookup.read_i
[Unit: uncore_cbox L3 Lookup read request that access cache and found
line in I-state]
unc_cbo_cache_lookup.read_m
[Unit: uncore_cbox L3 Lookup read request that access cache and found
line in M-state]
unc_cbo_cache_lookup.read_mesi
[Unit: uncore_cbox L3 Lookup read request that access cache and found
line in any MESI-state]
unc_cbo_cache_lookup.write_es
[Unit: uncore_cbox L3 Lookup write request that access cache and found
line in E or S-state]
unc_cbo_cache_lookup.write_i
[Unit: uncore_cbox L3 Lookup write request that access cache and found
line in I-state]
unc_cbo_cache_lookup.write_m
[Unit: uncore_cbox L3 Lookup write request that access cache and found
line in M-state]
unc_cbo_cache_lookup.write_mesi
[Unit: uncore_cbox L3 Lookup write request that access cache and found
line in MESI-state]
unc_cbo_xsnp_response.hit_eviction
[Unit: uncore_cbox A cross-core snoop resulted from L3 Eviction which
hits a non-modified line in some processor core]
unc_cbo_xsnp_response.hit_external
[Unit: uncore_cbox An external snoop hits a non-modified line in some
processor core]
unc_cbo_xsnp_response.hit_xcore
[Unit: uncore_cbox A cross-core snoop initiated by this Cbox due to
processor core memory request which hits a non-modified line in some
processor core]
unc_cbo_xsnp_response.hitm_eviction
[Unit: uncore_cbox A cross-core snoop resulted from L3 Eviction which
hits a modified line in some processor core]
unc_cbo_xsnp_response.hitm_external
[Unit: uncore_cbox An external snoop hits a modified line in some
processor core]
unc_cbo_xsnp_response.hitm_xcore
[Unit: uncore_cbox A cross-core snoop initiated by this Cbox due to
processor core memory request which hits a modified line in some
processor core]
unc_cbo_xsnp_response.miss_eviction
[Unit: uncore_cbox A cross-core snoop resulted from L3 Eviction which
misses in some processor core]
unc_cbo_xsnp_response.miss_external
[Unit: uncore_cbox An external snoop misses in some processor core]
unc_cbo_xsnp_response.miss_xcore
[Unit: uncore_cbox A cross-core snoop initiated by this Cbox due to
processor core memory request which misses in some processor core]
virtual memory:
dtlb_load_misses.miss_causes_a_walk
[Load misses in all DTLB levels that cause page walks]
dtlb_load_misses.pde_cache_miss
[DTLB demand load misses with low part of linear-to-physical address
translation missed]
dtlb_load_misses.stlb_hit
[Load operations that miss the first DTLB level but hit the second and
do not cause page walks]
dtlb_load_misses.stlb_hit_2m
[Load misses that miss the DTLB and hit the STLB (2M)]
dtlb_load_misses.stlb_hit_4k
[Load misses that miss the DTLB and hit the STLB (4K)]
dtlb_load_misses.walk_completed
[Demand load Miss in all translation lookaside buffer (TLB) levels
causes a page walk that completes of any page size]
dtlb_load_misses.walk_completed_1g
[Load miss in all TLB levels causes a page walk that completes. (1G)]
dtlb_load_misses.walk_completed_2m_4m
[Demand load Miss in all translation lookaside buffer (TLB) levels
causes a page walk that completes (2M/4M)]
dtlb_load_misses.walk_completed_4k
[Demand load Miss in all translation lookaside buffer (TLB) levels
causes a page walk that completes (4K)]
dtlb_load_misses.walk_duration
[Cycles when PMH is busy with page walks]
dtlb_store_misses.miss_causes_a_walk
[Store misses in all DTLB levels that cause page walks]
dtlb_store_misses.pde_cache_miss
[DTLB store misses with low part of linear-to-physical address
translation missed]
dtlb_store_misses.stlb_hit
[Store operations that miss the first TLB level but hit the second and
do not cause page walks]
dtlb_store_misses.stlb_hit_2m
[Store misses that miss the DTLB and hit the STLB (2M)]
dtlb_store_misses.stlb_hit_4k
[Store misses that miss the DTLB and hit the STLB (4K)]
dtlb_store_misses.walk_completed
[Store misses in all DTLB levels that cause completed page walks]
dtlb_store_misses.walk_completed_1g
[Store misses in all DTLB levels that cause completed page walks. (1G)]
dtlb_store_misses.walk_completed_2m_4m
[Store misses in all DTLB levels that cause completed page walks
(2M/4M)]
dtlb_store_misses.walk_completed_4k
[Store miss in all TLB levels causes a page walk that completes. (4K)]
dtlb_store_misses.walk_duration
[Cycles when PMH is busy with page walks]
ept.walk_cycles
[Cycle count for an Extended Page table walk]
itlb.itlb_flush
[Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages]
itlb_misses.miss_causes_a_walk
[Misses at all ITLB levels that cause page walks]
itlb_misses.stlb_hit
[Operations that miss the first ITLB level but hit the second and do
not cause any page walks]
itlb_misses.stlb_hit_2m
[Code misses that miss the DTLB and hit the STLB (2M)]
itlb_misses.stlb_hit_4k
[Core misses that miss the DTLB and hit the STLB (4K)]
itlb_misses.walk_completed
[Misses in all ITLB levels that cause completed page walks]
itlb_misses.walk_completed_1g
[Store miss in all TLB levels causes a page walk that completes. (1G)]
itlb_misses.walk_completed_2m_4m
[Code miss in all TLB levels causes a page walk that completes. (2M/4M)]
itlb_misses.walk_completed_4k
[Code miss in all TLB levels causes a page walk that completes. (4K)]
itlb_misses.walk_duration
[Cycles when PMH is busy with page walks]
page_walker_loads.dtlb_l1
[Number of DTLB page walker hits in the L1+FB]
page_walker_loads.dtlb_l2
[Number of DTLB page walker hits in the L2]
page_walker_loads.dtlb_l3
[Number of DTLB page walker hits in the L3 + XSNP Spec update: HSD25]
page_walker_loads.dtlb_memory
[Number of DTLB page walker hits in Memory Spec update: HSD25]
page_walker_loads.ept_dtlb_l1
[Counts the number of Extended Page Table walks from the DTLB that hit
in the L1 and FB]
page_walker_loads.ept_dtlb_l2
[Counts the number of Extended Page Table walks from the DTLB that hit
in the L2]
page_walker_loads.ept_dtlb_l3
[Counts the number of Extended Page Table walks from the DTLB that hit
in the L3]
page_walker_loads.ept_dtlb_memory
[Counts the number of Extended Page Table walks from the DTLB that hit
in memory]
page_walker_loads.ept_itlb_l1
[Counts the number of Extended Page Table walks from the ITLB that hit
in the L1 and FB]
page_walker_loads.ept_itlb_l2
[Counts the number of Extended Page Table walks from the ITLB that hit
in the L2]
page_walker_loads.ept_itlb_l3
[Counts the number of Extended Page Table walks from the ITLB that hit
in the L2]
page_walker_loads.ept_itlb_memory
[Counts the number of Extended Page Table walks from the ITLB that hit
in memory]
page_walker_loads.itlb_l1
[Number of ITLB page walker hits in the L1+FB]
page_walker_loads.itlb_l2
[Number of ITLB page walker hits in the L2]
page_walker_loads.itlb_l3
[Number of ITLB page walker hits in the L3 + XSNP Spec update: HSD25]
page_walker_loads.itlb_memory
[Number of ITLB page walker hits in Memory Spec update: HSD25]
tlb_flush.dtlb_thread
[DTLB flush attempts of the thread-specific entries]
tlb_flush.stlb_any
[STLB flush attempts]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
mem:<addr>[/len][:access] [Hardware breakpoint]
Metric Groups:
DSB:
DSB_Coverage
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
Frontend:
IFetch_Line_Utilization
[Rough Estimation of fraction of fetched lines bytes that were likely consumed by program instructions]
Frontend_Bandwidth:
DSB_Coverage
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
Memory_BW:
MLP
[Memory-Level-Parallelism (average number of L1 miss demand load when there is at least 1 such miss)]
Memory_Bound:
Load_Miss_Real_Latency
[Actual Average Latency for L1 data-cache miss demand loads]
MLP
[Memory-Level-Parallelism (average number of L1 miss demand load when there is at least 1 such miss)]
Memory_Lat:
Load_Miss_Real_Latency
[Actual Average Latency for L1 data-cache miss demand loads]
Pipeline:
CPI
[Cycles Per Instruction (threaded)]
ILP
[Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)]
UPI
[Uops Per Instruction]
Ports_Utilization:
ILP
[Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)]
Power:
C2_Pkg_Residency
[C2 residency percent per package]
C3_Core_Residency
[C3 residency percent per core]
C3_Pkg_Residency
[C3 residency percent per package]
C6_Core_Residency
[C6 residency percent per core]
C6_Pkg_Residency
[C6 residency percent per package]
C7_Core_Residency
[C7 residency percent per core]
C7_Pkg_Residency
[C7 residency percent per package]
Turbo_Utilization
[Average Frequency Utilization relative nominal frequency]
SMT:
CORE_CLKS
[Core actual clocks when any thread is active on the physical core]
CoreIPC
[Instructions Per Cycle (per physical core)]
SMT_2T_Utilization
[Fraction of cycles where both hardware threads were active]
Summary:
CLKS
[Per-thread actual clocks when the logical processor is active. This is called 'Clockticks' in VTune]
CPI
[Cycles Per Instruction (threaded)]
CPU_Utilization
[Average CPU Utilization]
Instructions
[Total number of retired Instructions]
Kernel_Utilization
[Fraction of cycles spent in Kernel mode]
SMT_2T_Utilization
[Fraction of cycles where both hardware threads were active]
TLB:
Page_Walks_Utilization
[Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses]
TopDownL1:
IPC
[Instructions Per Cycle (per logical thread)]
SLOTS
[Total issue-pipeline slots]
Unknown_Branches:
BAClear_Cost
[Average Branch Address Clear Cost (fraction of cycles)]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment