Created
September 21, 2019 15:59
-
-
Save Mark-Simulacrum/516e03b3ac748a19e843962b680298fd to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
branch-instructions OR branches [Hardware event] | |
branch-misses [Hardware event] | |
bus-cycles [Hardware event] | |
cache-misses [Hardware event] | |
cache-references [Hardware event] | |
cpu-cycles OR cycles [Hardware event] | |
instructions [Hardware event] | |
ref-cycles [Hardware event] | |
alignment-faults [Software event] | |
bpf-output [Software event] | |
context-switches OR cs [Software event] | |
cpu-clock [Software event] | |
cpu-migrations OR migrations [Software event] | |
dummy [Software event] | |
emulation-faults [Software event] | |
major-faults [Software event] | |
minor-faults [Software event] | |
page-faults OR faults [Software event] | |
task-clock [Software event] | |
L1-dcache-load-misses [Hardware cache event] | |
L1-dcache-loads [Hardware cache event] | |
L1-dcache-stores [Hardware cache event] | |
L1-icache-load-misses [Hardware cache event] | |
LLC-load-misses [Hardware cache event] | |
LLC-loads [Hardware cache event] | |
LLC-store-misses [Hardware cache event] | |
LLC-stores [Hardware cache event] | |
branch-load-misses [Hardware cache event] | |
branch-loads [Hardware cache event] | |
dTLB-load-misses [Hardware cache event] | |
dTLB-loads [Hardware cache event] | |
dTLB-store-misses [Hardware cache event] | |
dTLB-stores [Hardware cache event] | |
iTLB-load-misses [Hardware cache event] | |
iTLB-loads [Hardware cache event] | |
node-load-misses [Hardware cache event] | |
node-loads [Hardware cache event] | |
node-store-misses [Hardware cache event] | |
node-stores [Hardware cache event] | |
branch-instructions OR cpu/branch-instructions/ [Kernel PMU event] | |
branch-misses OR cpu/branch-misses/ [Kernel PMU event] | |
bus-cycles OR cpu/bus-cycles/ [Kernel PMU event] | |
cache-misses OR cpu/cache-misses/ [Kernel PMU event] | |
cache-references OR cpu/cache-references/ [Kernel PMU event] | |
cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event] | |
cstate_core/c3-residency/ [Kernel PMU event] | |
cstate_core/c6-residency/ [Kernel PMU event] | |
cstate_core/c7-residency/ [Kernel PMU event] | |
cstate_pkg/c2-residency/ [Kernel PMU event] | |
cstate_pkg/c3-residency/ [Kernel PMU event] | |
cstate_pkg/c6-residency/ [Kernel PMU event] | |
cstate_pkg/c7-residency/ [Kernel PMU event] | |
instructions OR cpu/instructions/ [Kernel PMU event] | |
mem-loads OR cpu/mem-loads/ [Kernel PMU event] | |
mem-stores OR cpu/mem-stores/ [Kernel PMU event] | |
msr/aperf/ [Kernel PMU event] | |
msr/mperf/ [Kernel PMU event] | |
msr/smi/ [Kernel PMU event] | |
msr/tsc/ [Kernel PMU event] | |
power/energy-cores/ [Kernel PMU event] | |
power/energy-gpu/ [Kernel PMU event] | |
power/energy-pkg/ [Kernel PMU event] | |
power/energy-ram/ [Kernel PMU event] | |
ref-cycles OR cpu/ref-cycles/ [Kernel PMU event] | |
topdown-fetch-bubbles OR cpu/topdown-fetch-bubbles/ [Kernel PMU event] | |
topdown-recovery-bubbles OR cpu/topdown-recovery-bubbles/ [Kernel PMU event] | |
topdown-slots-issued OR cpu/topdown-slots-issued/ [Kernel PMU event] | |
topdown-slots-retired OR cpu/topdown-slots-retired/ [Kernel PMU event] | |
topdown-total-slots OR cpu/topdown-total-slots/ [Kernel PMU event] | |
uncore_cbox_0/clockticks/ [Kernel PMU event] | |
uncore_cbox_1/clockticks/ [Kernel PMU event] | |
uncore_cbox_2/clockticks/ [Kernel PMU event] | |
uncore_cbox_3/clockticks/ [Kernel PMU event] | |
uncore_imc/data_reads/ [Kernel PMU event] | |
uncore_imc/data_writes/ [Kernel PMU event] | |
cache: | |
l1d.replacement | |
[L1D data line replacements] | |
l1d_pend_miss.fb_full | |
[Cycles a demand request was blocked due to Fill Buffers inavailability] | |
l1d_pend_miss.pending | |
[L1D miss oustandings duration in cycles] | |
l1d_pend_miss.pending_cycles | |
[Cycles with L1D load Misses outstanding] | |
l1d_pend_miss.pending_cycles_any | |
[Cycles with L1D load Misses outstanding from any thread on physical | |
core] | |
l1d_pend_miss.request_fb_full | |
[Number of times a request needed a FB entry but there was no entry | |
available for it. That is the FB unavailability was dominant reason | |
for blocking the request. A request includes cacheable/uncacheable | |
demands that is load, store or SW prefetch. HWP are e] | |
l2_demand_rqsts.wb_hit | |
[Not rejected writebacks that hit L2 cache] | |
l2_lines_in.all | |
[L2 cache lines filling L2] | |
l2_lines_in.e | |
[L2 cache lines in E state filling L2] | |
l2_lines_in.i | |
[L2 cache lines in I state filling L2] | |
l2_lines_in.s | |
[L2 cache lines in S state filling L2] | |
l2_lines_out.demand_clean | |
[Clean L2 cache lines evicted by demand] | |
l2_lines_out.demand_dirty | |
[Dirty L2 cache lines evicted by demand] | |
l2_rqsts.all_code_rd | |
[L2 code requests] | |
l2_rqsts.all_demand_data_rd | |
[Demand Data Read requests Spec update: HSD78] | |
l2_rqsts.all_demand_miss | |
[Demand requests that miss L2 cache Spec update: HSD78] | |
l2_rqsts.all_demand_references | |
[Demand requests to L2 cache Spec update: HSD78] | |
l2_rqsts.all_pf | |
[Requests from L2 hardware prefetchers] | |
l2_rqsts.all_rfo | |
[RFO requests to L2 cache] | |
l2_rqsts.code_rd_hit | |
[L2 cache hits when fetching instructions, code reads] | |
l2_rqsts.code_rd_miss | |
[L2 cache misses when fetching instructions] | |
l2_rqsts.demand_data_rd_hit | |
[Demand Data Read requests that hit L2 cache Spec update: HSD78] | |
l2_rqsts.demand_data_rd_miss | |
[Demand Data Read miss L2, no rejects Spec update: HSD78] | |
l2_rqsts.l2_pf_hit | |
[L2 prefetch requests that hit L2 cache] | |
l2_rqsts.l2_pf_miss | |
[L2 prefetch requests that miss L2 cache] | |
l2_rqsts.miss | |
[All requests that miss L2 cache Spec update: HSD78] | |
l2_rqsts.references | |
[All L2 requests Spec update: HSD78] | |
l2_rqsts.rfo_hit | |
[RFO requests that hit L2 cache] | |
l2_rqsts.rfo_miss | |
[RFO requests that miss L2 cache] | |
l2_trans.all_pf | |
[L2 or L3 HW prefetches that access L2 cache] | |
l2_trans.all_requests | |
[Transactions accessing L2 pipe] | |
l2_trans.code_rd | |
[L2 cache accesses when fetching instructions] | |
l2_trans.demand_data_rd | |
[Demand Data Read requests that access L2 cache] | |
l2_trans.l1d_wb | |
[L1D writebacks that access L2 cache] | |
l2_trans.l2_fill | |
[L2 fill requests that access L2 cache] | |
l2_trans.l2_wb | |
[L2 writebacks that access L2 cache] | |
l2_trans.rfo | |
[RFO requests that access L2 cache] | |
lock_cycles.cache_lock_duration | |
[Cycles when L1D is locked] | |
longest_lat_cache.miss | |
[Core-originated cacheable demand requests missed L3] | |
longest_lat_cache.reference | |
[Core-originated cacheable demand requests that refer to L3] | |
mem_load_uops_l3_hit_retired.xsnp_hit | |
[Retired load uops which data sources were L3 and cross-core snoop hits | |
in on-pkg core cache Spec update: HSD29, HSD25, HSM26, HSM30. Supports | |
address when precise (Precise event)] | |
mem_load_uops_l3_hit_retired.xsnp_hitm | |
[Retired load uops which data sources were HitM responses from shared | |
L3 Spec update: HSD29, HSD25, HSM26, HSM30. Supports address when | |
precise (Precise event)] | |
mem_load_uops_l3_hit_retired.xsnp_miss | |
[Retired load uops which data sources were L3 hit and cross-core snoop | |
missed in on-pkg core cache Spec update: HSD29, HSD25, HSM26, HSM30. | |
Supports address when precise (Precise event)] | |
mem_load_uops_l3_hit_retired.xsnp_none | |
[Retired load uops which data sources were hits in L3 without snoops | |
required Spec update: HSD74, HSD29, HSD25, HSM26, HSM30. Supports | |
address when precise (Precise event)] | |
mem_load_uops_l3_miss_retired.local_dram | |
[Data from local DRAM either Snoop not needed or Snoop Miss (RspI) Spec | |
update: HSD74, HSD29, HSD25, HSM30. Supports address when precise | |
(Precise event)] | |
mem_load_uops_retired.hit_lfb | |
[Retired load uops which data sources were load uops missed L1 but hit | |
FB due to preceding miss to the same cache line with data not ready | |
Spec update: HSM30. Supports address when precise (Precise event)] | |
mem_load_uops_retired.l1_hit | |
[Retired load uops with L1 cache hits as data sources Spec update: | |
HSD29, HSM30. Supports address when precise (Precise event)] | |
mem_load_uops_retired.l1_miss | |
[Retired load uops misses in L1 cache as data sources Spec update: | |
HSM30. Supports address when precise (Precise event)] | |
mem_load_uops_retired.l2_hit | |
[Retired load uops with L2 cache hits as data sources Spec update: | |
HSD76, HSD29, HSM30. Supports address when precise (Precise event)] | |
mem_load_uops_retired.l2_miss | |
[Miss in mid-level (L2) cache. Excludes Unknown data-source Spec | |
update: HSD29, HSM30. Supports address when precise (Precise event)] | |
mem_load_uops_retired.l3_hit | |
[Retired load uops which data sources were data hits in L3 without | |
snoops required Spec update: HSD74, HSD29, HSD25, HSM26, HSM30. | |
Supports address when precise (Precise event)] | |
mem_load_uops_retired.l3_miss | |
[Miss in last-level (L3) cache. Excludes Unknown data-source Spec | |
update: HSD74, HSD29, HSD25, HSM26, HSM30. Supports address when | |
precise (Precise event)] | |
mem_uops_retired.all_loads | |
[All retired load uops Spec update: HSD29, HSM30. Supports address when | |
precise (Precise event)] | |
mem_uops_retired.all_stores | |
[All retired store uops Spec update: HSD29, HSM30. Supports address | |
when precise (Precise event)] | |
mem_uops_retired.lock_loads | |
[Retired load uops with locked access Spec update: HSD76, HSD29, HSM30. | |
Supports address when precise (Precise event)] | |
mem_uops_retired.split_loads | |
[Retired load uops that split across a cacheline boundary Spec update: | |
HSD29, HSM30. Supports address when precise (Precise event)] | |
mem_uops_retired.split_stores | |
[Retired store uops that split across a cacheline boundary Spec update: | |
HSD29, HSM30. Supports address when precise (Precise event)] | |
mem_uops_retired.stlb_miss_loads | |
[Retired load uops that miss the STLB Spec update: HSD29, HSM30. | |
Supports address when precise (Precise event)] | |
mem_uops_retired.stlb_miss_stores | |
[Retired store uops that miss the STLB Spec update: HSD29, HSM30. | |
Supports address when precise (Precise event)] | |
offcore_requests.all_data_rd | |
[Demand and prefetch data reads] | |
offcore_requests.demand_code_rd | |
[Cacheable and noncachaeble code read requests] | |
offcore_requests.demand_data_rd | |
[Demand Data Read requests sent to uncore Spec update: HSD78] | |
offcore_requests.demand_rfo | |
[Demand RFO requests including regular RFOs, locks, ItoM] | |
offcore_requests_buffer.sq_full | |
[Offcore requests buffer cannot take more entries for this thread core] | |
offcore_requests_outstanding.all_data_rd | |
[Offcore outstanding cacheable Core Data Read transactions in | |
SuperQueue (SQ), queue to uncore Spec update: HSD62, HSD61] | |
offcore_requests_outstanding.cycles_with_data_rd | |
[Cycles when offcore outstanding cacheable Core Data Read transactions | |
are present in SuperQueue (SQ), queue to uncore Spec update: HSD62, | |
HSD61] | |
offcore_requests_outstanding.cycles_with_demand_data_rd | |
[Cycles when offcore outstanding Demand Data Read transactions are | |
present in SuperQueue (SQ), queue to uncore Spec update: HSD78, HSD62, | |
HSD61] | |
offcore_requests_outstanding.cycles_with_demand_rfo | |
[Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), | |
queue to uncore, every cycle Spec update: HSD62, HSD61] | |
offcore_requests_outstanding.demand_code_rd | |
[Offcore outstanding code reads transactions in SuperQueue (SQ), queue | |
to uncore, every cycle Spec update: HSD62, HSD61] | |
offcore_requests_outstanding.demand_data_rd | |
[Offcore outstanding Demand Data Read transactions in uncore queue Spec | |
update: HSD78, HSD62, HSD61] | |
offcore_requests_outstanding.demand_data_rd_ge_6 | |
[Cycles with at least 6 offcore outstanding Demand Data Read | |
transactions in uncore queue Spec update: HSD78, HSD62, HSD61] | |
offcore_requests_outstanding.demand_rfo | |
[Offcore outstanding RFO store transactions in SuperQueue (SQ), queue | |
to uncore Spec update: HSD62, HSD61] | |
offcore_response | |
[Offcore response can be programmed only with a specific pair of event | |
select and counter MSR, and with specific event codes and predefine | |
mask bit value in a dedicated MSR to specify attributes of the offcore | |
transaction] | |
offcore_response.all_code_rd.l3_hit.hit_other_core_no_fwd | |
[Counts all demand & prefetch code reads that hit in the L3 and the | |
snoops to sibling cores hit in either E/S state and the line is not | |
forwarded] | |
offcore_response.all_data_rd.l3_hit.hit_other_core_no_fwd | |
[Counts all demand & prefetch data reads that hit in the L3 and the | |
snoops to sibling cores hit in either E/S state and the line is not | |
forwarded] | |
offcore_response.all_data_rd.l3_hit.hitm_other_core | |
[Counts all demand & prefetch data reads that hit in the L3 and the | |
snoop to one of the sibling cores hits the line in M state and the | |
line is forwarded] | |
offcore_response.all_reads.l3_hit.hit_other_core_no_fwd | |
[Counts all data/code/rfo reads (demand & prefetch) that hit in the L3 | |
and the snoops to sibling cores hit in either E/S state and the line | |
is not forwarded] | |
offcore_response.all_reads.l3_hit.hitm_other_core | |
[Counts all data/code/rfo reads (demand & prefetch) that hit in the L3 | |
and the snoop to one of the sibling cores hits the line in M state and | |
the line is forwarded] | |
offcore_response.all_requests.l3_hit.any_response | |
[Counts all requests that hit in the L3] | |
offcore_response.all_rfo.l3_hit.hit_other_core_no_fwd | |
[Counts all demand & prefetch RFOs that hit in the L3 and the snoops to | |
sibling cores hit in either E/S state and the line is not forwarded] | |
offcore_response.all_rfo.l3_hit.hitm_other_core | |
[Counts all demand & prefetch RFOs that hit in the L3 and the snoop to | |
one of the sibling cores hits the line in M state and the line is | |
forwarded] | |
offcore_response.demand_code_rd.l3_hit.hit_other_core_no_fwd | |
[Counts all demand code reads that hit in the L3 and the snoops to | |
sibling cores hit in either E/S state and the line is not forwarded] | |
offcore_response.demand_code_rd.l3_hit.hitm_other_core | |
[Counts all demand code reads that hit in the L3 and the snoop to one | |
of the sibling cores hits the line in M state and the line is | |
forwarded] | |
offcore_response.demand_data_rd.l3_hit.hit_other_core_no_fwd | |
[Counts demand data reads that hit in the L3 and the snoops to sibling | |
cores hit in either E/S state and the line is not forwarded] | |
offcore_response.demand_data_rd.l3_hit.hitm_other_core | |
[Counts demand data reads that hit in the L3 and the snoop to one of | |
the sibling cores hits the line in M state and the line is forwarded] | |
offcore_response.demand_rfo.l3_hit.hit_other_core_no_fwd | |
[Counts all demand data writes (RFOs) that hit in the L3 and the snoops | |
to sibling cores hit in either E/S state and the line is not forwarded] | |
offcore_response.demand_rfo.l3_hit.hitm_other_core | |
[Counts all demand data writes (RFOs) that hit in the L3 and the snoop | |
to one of the sibling cores hits the line in M state and the line is | |
forwarded] | |
offcore_response.pf_l2_code_rd.l3_hit.any_response | |
[Counts all prefetch (that bring data to LLC only) code reads that hit | |
in the L3] | |
offcore_response.pf_l2_data_rd.l3_hit.any_response | |
[Counts prefetch (that bring data to L2) data reads that hit in the L3] | |
offcore_response.pf_l2_rfo.l3_hit.any_response | |
[Counts all prefetch (that bring data to L2) RFOs that hit in the L3] | |
offcore_response.pf_l3_code_rd.l3_hit.any_response | |
[Counts prefetch (that bring data to LLC only) code reads that hit in | |
the L3] | |
offcore_response.pf_l3_data_rd.l3_hit.any_response | |
[Counts all prefetch (that bring data to LLC only) data reads that hit | |
in the L3] | |
offcore_response.pf_l3_rfo.l3_hit.any_response | |
[Counts all prefetch (that bring data to LLC only) RFOs that hit in the | |
L3] | |
sq_misc.split_lock | |
[Split locks in SQ] | |
floating point: | |
avx_insts.all | |
[Approximate counts of AVX & AVX2 256-bit instructions, including | |
non-arithmetic instructions, loads, and stores. May count non-AVX | |
instructions that employ 256-bit operations, including (but not | |
necessarily limited to) rep string instructions that use 256-bit loads | |
and stores for optimized performance, XSAVE* and XRSTOR*, and | |
operations that transition the x87 FPU data registers between x87 and | |
MMX] | |
fp_assist.any | |
[Cycles with any input/output SSE or FP assist] | |
fp_assist.simd_input | |
[Number of SIMD FP assists due to input values] | |
fp_assist.simd_output | |
[Number of SIMD FP assists due to Output values] | |
fp_assist.x87_input | |
[Number of X87 assists due to input value] | |
fp_assist.x87_output | |
[Number of X87 assists due to output value] | |
other_assists.avx_to_sse | |
[Number of transitions from AVX-256 to legacy SSE when penalty | |
applicable Spec update: HSD56, HSM57] | |
other_assists.sse_to_avx | |
[Number of transitions from SSE to AVX-256 when penalty applicable Spec | |
update: HSD56, HSM57] | |
frontend: | |
dsb2mite_switches.penalty_cycles | |
[Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles] | |
icache.hit | |
[Number of Instruction Cache, Streaming Buffer and Victim Cache Reads. | |
both cacheable and noncacheable, including UC fetches] | |
icache.ifdata_stall | |
[Cycles where a code fetch is stalled due to L1 instruction-cache miss] | |
icache.ifetch_stall | |
[Cycles where a code fetch is stalled due to L1 instruction-cache miss] | |
icache.misses | |
[Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. | |
Includes Uncacheable accesses] | |
idq.all_dsb_cycles_4_uops | |
[Cycles Decode Stream Buffer (DSB) is delivering 4 Uops] | |
idq.all_dsb_cycles_any_uops | |
[Cycles Decode Stream Buffer (DSB) is delivering any Uop] | |
idq.all_mite_cycles_4_uops | |
[Cycles MITE is delivering 4 Uops] | |
idq.all_mite_cycles_any_uops | |
[Cycles MITE is delivering any Uop] | |
idq.dsb_cycles | |
[Cycles when uops are being delivered to Instruction Decode Queue (IDQ) | |
from Decode Stream Buffer (DSB) path] | |
idq.dsb_uops | |
[Uops delivered to Instruction Decode Queue (IDQ) from the Decode | |
Stream Buffer (DSB) path] | |
idq.empty | |
[Instruction Decode Queue (IDQ) empty cycles Spec update: HSD135] | |
idq.mite_all_uops | |
[Uops delivered to Instruction Decode Queue (IDQ) from MITE path] | |
idq.mite_cycles | |
[Cycles when uops are being delivered to Instruction Decode Queue (IDQ) | |
from MITE path] | |
idq.mite_uops | |
[Uops delivered to Instruction Decode Queue (IDQ) from MITE path] | |
idq.ms_cycles | |
[Cycles when uops are being delivered to Instruction Decode Queue (IDQ) | |
while Microcode Sequenser (MS) is busy] | |
idq.ms_dsb_cycles | |
[Cycles when uops initiated by Decode Stream Buffer (DSB) are being | |
delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser | |
(MS) is busy] | |
idq.ms_dsb_occur | |
[Deliveries to Instruction Decode Queue (IDQ) initiated by Decode | |
Stream Buffer (DSB) while Microcode Sequenser (MS) is busy] | |
idq.ms_dsb_uops | |
[Uops initiated by Decode Stream Buffer (DSB) that are being delivered | |
to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is | |
busy] | |
idq.ms_mite_uops | |
[Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) | |
while Microcode Sequenser (MS) is busy] | |
idq.ms_switches | |
[Number of switches from DSB (Decode Stream Buffer) or MITE (legacy | |
decode pipeline) to the Microcode Sequencer] | |
idq.ms_uops | |
[Uops delivered to Instruction Decode Queue (IDQ) while Microcode | |
Sequenser (MS) is busy] | |
idq_uops_not_delivered.core | |
[Uops not delivered to Resource Allocation Table (RAT) per thread when | |
backend of the machine is not stalled Spec update: HSD135] | |
idq_uops_not_delivered.cycles_0_uops_deliv.core | |
[Cycles per thread when 4 or more uops are not delivered to Resource | |
Allocation Table (RAT) when backend of the machine is not stalled Spec | |
update: HSD135] | |
idq_uops_not_delivered.cycles_fe_was_ok | |
[Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) | |
was stalling FE Spec update: HSD135] | |
idq_uops_not_delivered.cycles_le_1_uop_deliv.core | |
[Cycles per thread when 3 or more uops are not delivered to Resource | |
Allocation Table (RAT) when backend of the machine is not stalled Spec | |
update: HSD135] | |
idq_uops_not_delivered.cycles_le_2_uop_deliv.core | |
[Cycles with less than 2 uops delivered by the front end Spec update: | |
HSD135] | |
idq_uops_not_delivered.cycles_le_3_uop_deliv.core | |
[Cycles with less than 3 uops delivered by the front end Spec update: | |
HSD135] | |
memory: | |
hle_retired.aborted | |
[Number of times an HLE execution aborted due to any reasons (multiple | |
categories may count as one) (Precise event)] | |
hle_retired.aborted_misc1 | |
[Number of times an HLE execution aborted due to various memory events | |
(e.g., read/write capacity and conflicts)] | |
hle_retired.aborted_misc2 | |
[Number of times an HLE execution aborted due to uncommon conditions] | |
hle_retired.aborted_misc3 | |
[Number of times an HLE execution aborted due to HLE-unfriendly | |
instructions] | |
hle_retired.aborted_misc4 | |
[Number of times an HLE execution aborted due to incompatible memory | |
type Spec update: HSD65] | |
hle_retired.aborted_misc5 | |
[Number of times an HLE execution aborted due to none of the previous 4 | |
categories (e.g. interrupts)] | |
hle_retired.commit | |
[Number of times an HLE execution successfully committed] | |
hle_retired.start | |
[Number of times an HLE execution started] | |
machine_clears.memory_ordering | |
[Counts the number of machine clears due to memory order conflicts] | |
mem_trans_retired.load_latency_gt_128 | |
[Loads with latency value being above 128 Spec update: HSD76, HSD25, | |
HSM26 (Must be precise)] | |
mem_trans_retired.load_latency_gt_16 | |
[Loads with latency value being above 16 Spec update: HSD76, HSD25, | |
HSM26 (Must be precise)] | |
mem_trans_retired.load_latency_gt_256 | |
[Loads with latency value being above 256 Spec update: HSD76, HSD25, | |
HSM26 (Must be precise)] | |
mem_trans_retired.load_latency_gt_32 | |
[Loads with latency value being above 32 Spec update: HSD76, HSD25, | |
HSM26 (Must be precise)] | |
mem_trans_retired.load_latency_gt_4 | |
[Loads with latency value being above 4 Spec update: HSD76, HSD25, | |
HSM26 (Must be precise)] | |
mem_trans_retired.load_latency_gt_512 | |
[Loads with latency value being above 512 Spec update: HSD76, HSD25, | |
HSM26 (Must be precise)] | |
mem_trans_retired.load_latency_gt_64 | |
[Loads with latency value being above 64 Spec update: HSD76, HSD25, | |
HSM26 (Must be precise)] | |
mem_trans_retired.load_latency_gt_8 | |
[Loads with latency value being above 8 Spec update: HSD76, HSD25, | |
HSM26 (Must be precise)] | |
misalign_mem_ref.loads | |
[Speculative cache line split load uops dispatched to L1 cache] | |
misalign_mem_ref.stores | |
[Speculative cache line split STA uops dispatched to L1 cache] | |
offcore_response.all_code_rd.l3_miss.any_response | |
[Counts all demand & prefetch code reads that miss in the L3] | |
offcore_response.all_code_rd.l3_miss.local_dram | |
[Counts all demand & prefetch code reads that miss the L3 and the data | |
is returned from local dram] | |
offcore_response.all_data_rd.l3_miss.any_response | |
[Counts all demand & prefetch data reads that miss in the L3] | |
offcore_response.all_data_rd.l3_miss.local_dram | |
[Counts all demand & prefetch data reads that miss the L3 and the data | |
is returned from local dram] | |
offcore_response.all_reads.l3_miss.any_response | |
[Counts all data/code/rfo reads (demand & prefetch) that miss in the L3] | |
offcore_response.all_reads.l3_miss.local_dram | |
[Counts all data/code/rfo reads (demand & prefetch) that miss the L3 | |
and the data is returned from local dram] | |
offcore_response.all_requests.l3_miss.any_response | |
[Counts all requests that miss in the L3] | |
offcore_response.all_rfo.l3_miss.any_response | |
[Counts all demand & prefetch RFOs that miss in the L3] | |
offcore_response.all_rfo.l3_miss.local_dram | |
[Counts all demand & prefetch RFOs that miss the L3 and the data is | |
returned from local dram] | |
offcore_response.demand_code_rd.l3_miss.any_response | |
[Counts all demand code reads that miss in the L3] | |
offcore_response.demand_code_rd.l3_miss.local_dram | |
[Counts all demand code reads that miss the L3 and the data is returned | |
from local dram] | |
offcore_response.demand_data_rd.l3_miss.any_response | |
[Counts demand data reads that miss in the L3] | |
offcore_response.demand_data_rd.l3_miss.local_dram | |
[Counts demand data reads that miss the L3 and the data is returned | |
from local dram] | |
offcore_response.demand_rfo.l3_miss.any_response | |
[Counts all demand data writes (RFOs) that miss in the L3] | |
offcore_response.demand_rfo.l3_miss.local_dram | |
[Counts all demand data writes (RFOs) that miss the L3 and the data is | |
returned from local dram] | |
offcore_response.pf_l2_code_rd.l3_miss.any_response | |
[Counts all prefetch (that bring data to LLC only) code reads that miss | |
in the L3] | |
offcore_response.pf_l2_data_rd.l3_miss.any_response | |
[Counts prefetch (that bring data to L2) data reads that miss in the L3] | |
offcore_response.pf_l2_rfo.l3_miss.any_response | |
[Counts all prefetch (that bring data to L2) RFOs that miss in the L3] | |
offcore_response.pf_l3_code_rd.l3_miss.any_response | |
[Counts prefetch (that bring data to LLC only) code reads that miss in | |
the L3] | |
offcore_response.pf_l3_data_rd.l3_miss.any_response | |
[Counts all prefetch (that bring data to LLC only) data reads that miss | |
in the L3] | |
offcore_response.pf_l3_rfo.l3_miss.any_response | |
[Counts all prefetch (that bring data to LLC only) RFOs that miss in | |
the L3] | |
rtm_retired.aborted | |
[Number of times an RTM execution aborted due to any reasons (multiple | |
categories may count as one) (Precise event)] | |
rtm_retired.aborted_misc1 | |
[Number of times an RTM execution aborted due to various memory events | |
(e.g. read/write capacity and conflicts)] | |
rtm_retired.aborted_misc2 | |
[Number of times an RTM execution aborted due to various memory events | |
(e.g., read/write capacity and conflicts)] | |
rtm_retired.aborted_misc3 | |
[Number of times an RTM execution aborted due to HLE-unfriendly | |
instructions] | |
rtm_retired.aborted_misc4 | |
[Number of times an RTM execution aborted due to incompatible memory | |
type Spec update: HSD65] | |
rtm_retired.aborted_misc5 | |
[Number of times an RTM execution aborted due to none of the previous 4 | |
categories (e.g. interrupt)] | |
rtm_retired.commit | |
[Number of times an RTM execution successfully committed] | |
rtm_retired.start | |
[Number of times an RTM execution started] | |
tx_exec.misc1 | |
[Counts the number of times a class of instructions that may cause a | |
transactional abort was executed. Since this is the count of | |
execution, it may not always cause a transactional abort] | |
tx_exec.misc2 | |
[Counts the number of times a class of instructions (e.g., vzeroupper) | |
that may cause a transactional abort was executed inside a | |
transactional region] | |
tx_exec.misc3 | |
[Counts the number of times an instruction execution caused the | |
transactional nest count supported to be exceeded] | |
tx_exec.misc4 | |
[Counts the number of times a XBEGIN instruction was executed inside an | |
HLE transactional region] | |
tx_exec.misc5 | |
[Counts the number of times an HLE XACQUIRE instruction was executed | |
inside an RTM transactional region] | |
tx_mem.abort_capacity_write | |
[Number of times a transactional abort was signaled due to a data | |
capacity limitation for transactional writes] | |
tx_mem.abort_conflict | |
[Number of times a transactional abort was signaled due to a data | |
conflict on a transactionally accessed address] | |
tx_mem.abort_hle_elision_buffer_mismatch | |
[Number of times an HLE transactional execution aborted due to XRELEASE | |
lock not satisfying the address and value requirements in the elision | |
buffer] | |
tx_mem.abort_hle_elision_buffer_not_empty | |
[Number of times an HLE transactional execution aborted due to | |
NoAllocatedElisionBuffer being non-zero] | |
tx_mem.abort_hle_elision_buffer_unsupported_alignment | |
[Number of times an HLE transactional execution aborted due to an | |
unsupported read alignment from the elision buffer] | |
tx_mem.abort_hle_store_to_elided_lock | |
[Number of times a HLE transactional region aborted due to a non | |
XRELEASE prefixed instruction writing to an elided lock in the elision | |
buffer] | |
tx_mem.hle_elision_buffer_full | |
[Number of times HLE lock could not be elided due to | |
ElisionBufferAvailable being zero] | |
other: | |
cpl_cycles.ring0 | |
[Unhalted core cycles when the thread is in ring 0] | |
cpl_cycles.ring0_trans | |
[Number of intervals between processor halts while thread is in ring 0] | |
cpl_cycles.ring123 | |
[Unhalted core cycles when thread is in rings 1, 2, or 3] | |
lock_cycles.split_lock_uc_lock_duration | |
[Cycles when L1 and L2 are locked due to UC or split lock] | |
pipeline: | |
arith.divider_uops | |
[Any uop executed by the Divider. (This includes all divide uops, sqrt, | |
...)] | |
baclears.any | |
[Counts the total number when the front end is resteered, mainly when | |
the BPU cannot provide a correct prediction and this is corrected by | |
other branch handling mechanisms at the front end] | |
br_inst_exec.all_branches | |
[Speculative and retired branches] | |
br_inst_exec.all_conditional | |
[Speculative and retired macro-conditional branches] | |
br_inst_exec.all_direct_jmp | |
[Speculative and retired macro-unconditional branches excluding calls | |
and indirects] | |
br_inst_exec.all_direct_near_call | |
[Speculative and retired direct near calls] | |
br_inst_exec.all_indirect_jump_non_call_ret | |
[Speculative and retired indirect branches excluding calls and returns] | |
br_inst_exec.all_indirect_near_return | |
[Speculative and retired indirect return branches] | |
br_inst_exec.nontaken_conditional | |
[Not taken macro-conditional branches] | |
br_inst_exec.taken_conditional | |
[Taken speculative and retired macro-conditional branches] | |
br_inst_exec.taken_direct_jump | |
[Taken speculative and retired macro-conditional branch instructions | |
excluding calls and indirects] | |
br_inst_exec.taken_direct_near_call | |
[Taken speculative and retired direct near calls] | |
br_inst_exec.taken_indirect_jump_non_call_ret | |
[Taken speculative and retired indirect branches excluding calls and | |
returns] | |
br_inst_exec.taken_indirect_near_call | |
[Taken speculative and retired indirect calls] | |
br_inst_exec.taken_indirect_near_return | |
[Taken speculative and retired indirect branches with return mnemonic] | |
br_inst_retired.all_branches | |
[All (macro) branch instructions retired] | |
br_inst_retired.all_branches_pebs | |
[All (macro) branch instructions retired (Must be precise)] | |
br_inst_retired.conditional | |
[Conditional branch instructions retired (Precise event)] | |
br_inst_retired.far_branch | |
[Far branch instructions retired] | |
br_inst_retired.near_call | |
[Direct and indirect near call instructions retired (Precise event)] | |
br_inst_retired.near_return | |
[Return instructions retired (Precise event)] | |
br_inst_retired.near_taken | |
[Taken branch instructions retired (Precise event)] | |
br_inst_retired.not_taken | |
[Not taken branch instructions retired] | |
br_misp_exec.all_branches | |
[Speculative and retired mispredicted macro conditional branches] | |
br_misp_exec.all_conditional | |
[Speculative and retired mispredicted macro conditional branches] | |
br_misp_exec.all_indirect_jump_non_call_ret | |
[Mispredicted indirect branches excluding calls and returns] | |
br_misp_exec.nontaken_conditional | |
[Not taken speculative and retired mispredicted macro conditional | |
branches] | |
br_misp_exec.taken_conditional | |
[Taken speculative and retired mispredicted macro conditional branches] | |
br_misp_exec.taken_indirect_jump_non_call_ret | |
[Taken speculative and retired mispredicted indirect branches excluding | |
calls and returns] | |
br_misp_exec.taken_indirect_near_call | |
[Taken speculative and retired mispredicted indirect calls] | |
br_misp_exec.taken_return_near | |
[Taken speculative and retired mispredicted indirect branches with | |
return mnemonic] | |
br_misp_retired.all_branches | |
[All mispredicted macro branch instructions retired] | |
br_misp_retired.all_branches_pebs | |
[Mispredicted macro branch instructions retired (Must be precise)] | |
br_misp_retired.conditional | |
[Mispredicted conditional branch instructions retired (Precise event)] | |
br_misp_retired.near_taken | |
[number of near branch instructions retired that were mispredicted and | |
taken (Precise event)] | |
cpu_clk_thread_unhalted.one_thread_active | |
[Count XClk pulses when this thread is unhalted and the other thread is | |
halted] | |
cpu_clk_thread_unhalted.ref_xclk | |
[Reference cycles when the thread is unhalted (counts at 100 MHz rate)] | |
cpu_clk_thread_unhalted.ref_xclk_any | |
[Reference cycles when the at least one thread on the physical core is | |
unhalted (counts at 100 MHz rate)] | |
cpu_clk_unhalted.one_thread_active | |
[Count XClk pulses when this thread is unhalted and the other thread is | |
halted] | |
cpu_clk_unhalted.ref_tsc | |
[Reference cycles when the core is not in halt state] | |
cpu_clk_unhalted.ref_xclk | |
[Reference cycles when the thread is unhalted (counts at 100 MHz rate)] | |
cpu_clk_unhalted.ref_xclk_any | |
[Reference cycles when the at least one thread on the physical core is | |
unhalted (counts at 100 MHz rate)] | |
cpu_clk_unhalted.thread | |
[Core cycles when the thread is not in halt state] | |
cpu_clk_unhalted.thread_any | |
[Core cycles when at least one thread on the physical core is not in | |
halt state] | |
cpu_clk_unhalted.thread_p | |
[Thread cycles when thread is not in halt state] | |
cpu_clk_unhalted.thread_p_any | |
[Core cycles when at least one thread on the physical core is not in | |
halt state] | |
cycle_activity.cycles_l1d_pending | |
[Cycles with pending L1 cache miss loads] | |
cycle_activity.cycles_l2_pending | |
[Cycles with pending L2 cache miss loads Spec update: HSD78] | |
cycle_activity.cycles_ldm_pending | |
[Cycles with pending memory loads] | |
cycle_activity.cycles_no_execute | |
[Total execution stalls] | |
cycle_activity.stalls_l1d_pending | |
[Execution stalls due to L1 data cache misses] | |
cycle_activity.stalls_l2_pending | |
[Execution stalls due to L2 cache misses] | |
cycle_activity.stalls_ldm_pending | |
[Execution stalls due to memory subsystem] | |
ild_stall.iq_full | |
[Stall cycles because IQ is full] | |
ild_stall.lcp | |
[Stalls caused by changing prefix length of the instruction] | |
inst_retired.any | |
[Instructions retired from execution Spec update: HSD140, HSD143] | |
inst_retired.any_p | |
[Number of instructions retired. General Counter - architectural event | |
Spec update: HSD11, HSD140] | |
inst_retired.prec_dist | |
[Precise instruction retired event with HW to reduce effect of PEBS | |
shadow in IP distribution Spec update: HSD140 (Must be precise)] | |
inst_retired.x87 | |
[FP operations retired. X87 FP operations that have no exceptions: | |
Counts also flows that have several X87 or flows that use X87 uops in | |
the exception handling] | |
int_misc.recovery_cycles | |
[Number of cycles waiting for the checkpoints in Resource Allocation | |
Table (RAT) to be recovered after Nuke due to all other cases except | |
JEClear (e.g. whenever a ucode assist is needed like SSE exception, | |
memory disambiguation, etc...)] | |
int_misc.recovery_cycles_any | |
[Core cycles the allocator was stalled due to recovery from earlier | |
clear event for any thread running on the physical core (e.g. | |
misprediction or memory nuke)] | |
ld_blocks.no_sr | |
[The number of times that split load operations are temporarily blocked | |
because all resources for handling the split accesses are in use] | |
ld_blocks.store_forward | |
[loads blocked by overlapping with store buffer that cannot be | |
forwarded] | |
ld_blocks_partial.address_alias | |
[False dependencies in MOB due to partial compare on address] | |
load_hit_pre.hw_pf | |
[Not software-prefetch load dispatches that hit FB allocated for | |
hardware prefetch] | |
load_hit_pre.sw_pf | |
[Not software-prefetch load dispatches that hit FB allocated for | |
software prefetch] | |
lsd.cycles_4_uops | |
[Cycles 4 Uops delivered by the LSD, but didn't come from the decoder] | |
lsd.cycles_active | |
[Cycles Uops delivered by the LSD, but didn't come from the decoder] | |
lsd.uops | |
[Number of Uops delivered by the LSD] | |
machine_clears.count | |
[Number of machine clears (nukes) of any type] | |
machine_clears.cycles | |
[Cycles there was a Nuke. Account for both thread-specific and All | |
Thread Nukes] | |
machine_clears.maskmov | |
[This event counts the number of executed Intel AVX masked load | |
operations that refer to an illegal address range with the mask bits | |
set to 0] | |
machine_clears.smc | |
[Self-modifying code (SMC) detected] | |
move_elimination.int_eliminated | |
[Number of integer Move Elimination candidate uops that were eliminated] | |
move_elimination.int_not_eliminated | |
[Number of integer Move Elimination candidate uops that were not | |
eliminated] | |
move_elimination.simd_eliminated | |
[Number of SIMD Move Elimination candidate uops that were eliminated] | |
move_elimination.simd_not_eliminated | |
[Number of SIMD Move Elimination candidate uops that were not | |
eliminated] | |
other_assists.any_wb_assist | |
[Number of times any microcode assist is invoked by HW upon uop | |
writeback] | |
resource_stalls.any | |
[Resource-related stall cycles Spec update: HSD135] | |
resource_stalls.rob | |
[Cycles stalled due to re-order buffer full] | |
resource_stalls.rs | |
[Cycles stalled due to no eligible RS entry available] | |
resource_stalls.sb | |
[Cycles stalled due to no store buffers available. (not including | |
draining form sync)] | |
rob_misc_events.lbr_inserts | |
[Count cases of saving new LBR] | |
rs_events.empty_cycles | |
[Cycles when Reservation Station (RS) is empty for the thread] | |
rs_events.empty_end | |
[Counts end of periods where the Reservation Station (RS) was empty. | |
Could be useful to precisely locate Frontend Latency Bound issues] | |
uops_dispatched_port.port_0 | |
[Cycles per thread when uops are executed in port 0] | |
uops_dispatched_port.port_1 | |
[Cycles per thread when uops are executed in port 1] | |
uops_dispatched_port.port_2 | |
[Cycles per thread when uops are executed in port 2] | |
uops_dispatched_port.port_3 | |
[Cycles per thread when uops are executed in port 3] | |
uops_dispatched_port.port_4 | |
[Cycles per thread when uops are executed in port 4] | |
uops_dispatched_port.port_5 | |
[Cycles per thread when uops are executed in port 5] | |
uops_dispatched_port.port_6 | |
[Cycles per thread when uops are executed in port 6] | |
uops_dispatched_port.port_7 | |
[Cycles per thread when uops are executed in port 7] | |
uops_executed.core | |
[Number of uops executed on the core Spec update: HSD30, HSM31] | |
uops_executed.core_cycles_ge_1 | |
[Cycles at least 1 micro-op is executed from any thread on physical | |
core Spec update: HSD30, HSM31] | |
uops_executed.core_cycles_ge_2 | |
[Cycles at least 2 micro-op is executed from any thread on physical | |
core Spec update: HSD30, HSM31] | |
uops_executed.core_cycles_ge_3 | |
[Cycles at least 3 micro-op is executed from any thread on physical | |
core Spec update: HSD30, HSM31] | |
uops_executed.core_cycles_ge_4 | |
[Cycles at least 4 micro-op is executed from any thread on physical | |
core Spec update: HSD30, HSM31] | |
uops_executed.core_cycles_none | |
[Cycles with no micro-ops executed from any thread on physical core | |
Spec update: HSD30, HSM31] | |
uops_executed.cycles_ge_1_uop_exec | |
[Cycles where at least 1 uop was executed per-thread Spec update: | |
HSD144, HSD30, HSM31] | |
uops_executed.cycles_ge_2_uops_exec | |
[Cycles where at least 2 uops were executed per-thread Spec update: | |
HSD144, HSD30, HSM31] | |
uops_executed.cycles_ge_3_uops_exec | |
[Cycles where at least 3 uops were executed per-thread Spec update: | |
HSD144, HSD30, HSM31] | |
uops_executed.cycles_ge_4_uops_exec | |
[Cycles where at least 4 uops were executed per-thread Spec update: | |
HSD144, HSD30, HSM31] | |
uops_executed.stall_cycles | |
[Counts number of cycles no uops were dispatched to be executed on this | |
thread Spec update: HSD144, HSD30, HSM31] | |
uops_executed_port.port_0 | |
[Cycles per thread when uops are executed in port 0] | |
uops_executed_port.port_0_core | |
[Cycles per core when uops are exectuted in port 0] | |
uops_executed_port.port_1 | |
[Cycles per thread when uops are executed in port 1] | |
uops_executed_port.port_1_core | |
[Cycles per core when uops are exectuted in port 1] | |
uops_executed_port.port_2 | |
[Cycles per thread when uops are executed in port 2] | |
uops_executed_port.port_2_core | |
[Cycles per core when uops are dispatched to port 2] | |
uops_executed_port.port_3 | |
[Cycles per thread when uops are executed in port 3] | |
uops_executed_port.port_3_core | |
[Cycles per core when uops are dispatched to port 3] | |
uops_executed_port.port_4 | |
[Cycles per thread when uops are executed in port 4] | |
uops_executed_port.port_4_core | |
[Cycles per core when uops are exectuted in port 4] | |
uops_executed_port.port_5 | |
[Cycles per thread when uops are executed in port 5] | |
uops_executed_port.port_5_core | |
[Cycles per core when uops are exectuted in port 5] | |
uops_executed_port.port_6 | |
[Cycles per thread when uops are executed in port 6] | |
uops_executed_port.port_6_core | |
[Cycles per core when uops are exectuted in port 6] | |
uops_executed_port.port_7 | |
[Cycles per thread when uops are executed in port 7] | |
uops_executed_port.port_7_core | |
[Cycles per core when uops are dispatched to port 7] | |
uops_issued.any | |
[Uops that Resource Allocation Table (RAT) issues to Reservation | |
Station (RS)] | |
uops_issued.core_stall_cycles | |
[Cycles when Resource Allocation Table (RAT) does not issue Uops to | |
Reservation Station (RS) for all threads] | |
uops_issued.flags_merge | |
[Number of flags-merge uops being allocated. Such uops considered perf | |
sensitive; added by GSR u-arch] | |
uops_issued.single_mul | |
[Number of Multiply packed/scalar single precision uops allocated] | |
uops_issued.slow_lea | |
[Number of slow LEA uops being allocated. A uop is generally considered | |
SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if | |
as a result of LEA instruction or not] | |
uops_issued.stall_cycles | |
[Cycles when Resource Allocation Table (RAT) does not issue Uops to | |
Reservation Station (RS) for the thread] | |
uops_retired.all | |
[Actually retired uops Supports address when precise (Precise event)] | |
uops_retired.core_stall_cycles | |
[Cycles without actually retired uops] | |
uops_retired.retire_slots | |
[Retirement slots used (Precise event)] | |
uops_retired.stall_cycles | |
[Cycles without actually retired uops] | |
uops_retired.total_cycles | |
[Cycles with less than 10 actually retired uops] | |
uncore: | |
unc_arb_coh_trk_occupancy.all | |
[Unit: uncore_arb Each cycle count number of valid entries in Coherency | |
Tracker queue from allocation till deallocation. Aperture requests | |
(snoops) appear as NC decoded internally and become coherent (snoop | |
L3, access memory)] | |
unc_arb_coh_trk_requests.all | |
[Unit: uncore_arb Number of entries allocated. Account for Any type: | |
e.g. Snoop, Core aperture, etc] | |
unc_arb_trk_occupancy.all | |
[Unit: uncore_arb Each cycle count number of all Core outgoing valid | |
entries. Such entry is defined as valid from it's allocation till | |
first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and | |
non-coherent traffic] | |
unc_arb_trk_requests.all | |
[Unit: uncore_arb Total number of Core outgoing entries allocated. | |
Accounts for Coherent and non-coherent traffic] | |
unc_arb_trk_requests.writes | |
[Unit: uncore_arb Number of Writes allocated - any write transactions: | |
full/partials writes and evictions] | |
unc_cbo_cache_lookup.any_es | |
[Unit: uncore_cbox L3 Lookup any request that access cache and found | |
line in E or S-state] | |
unc_cbo_cache_lookup.any_i | |
[Unit: uncore_cbox L3 Lookup any request that access cache and found | |
line in I-state] | |
unc_cbo_cache_lookup.any_m | |
[Unit: uncore_cbox L3 Lookup any request that access cache and found | |
line in M-state] | |
unc_cbo_cache_lookup.any_mesi | |
[Unit: uncore_cbox L3 Lookup any request that access cache and found | |
line in MESI-state] | |
unc_cbo_cache_lookup.extsnp_es | |
[Unit: uncore_cbox L3 Lookup external snoop request that access cache | |
and found line in E or S-state] | |
unc_cbo_cache_lookup.extsnp_i | |
[Unit: uncore_cbox L3 Lookup external snoop request that access cache | |
and found line in I-state] | |
unc_cbo_cache_lookup.extsnp_m | |
[Unit: uncore_cbox L3 Lookup external snoop request that access cache | |
and found line in M-state] | |
unc_cbo_cache_lookup.extsnp_mesi | |
[Unit: uncore_cbox L3 Lookup external snoop request that access cache | |
and found line in MESI-state] | |
unc_cbo_cache_lookup.read_es | |
[Unit: uncore_cbox L3 Lookup read request that access cache and found | |
line in E or S-state] | |
unc_cbo_cache_lookup.read_i | |
[Unit: uncore_cbox L3 Lookup read request that access cache and found | |
line in I-state] | |
unc_cbo_cache_lookup.read_m | |
[Unit: uncore_cbox L3 Lookup read request that access cache and found | |
line in M-state] | |
unc_cbo_cache_lookup.read_mesi | |
[Unit: uncore_cbox L3 Lookup read request that access cache and found | |
line in any MESI-state] | |
unc_cbo_cache_lookup.write_es | |
[Unit: uncore_cbox L3 Lookup write request that access cache and found | |
line in E or S-state] | |
unc_cbo_cache_lookup.write_i | |
[Unit: uncore_cbox L3 Lookup write request that access cache and found | |
line in I-state] | |
unc_cbo_cache_lookup.write_m | |
[Unit: uncore_cbox L3 Lookup write request that access cache and found | |
line in M-state] | |
unc_cbo_cache_lookup.write_mesi | |
[Unit: uncore_cbox L3 Lookup write request that access cache and found | |
line in MESI-state] | |
unc_cbo_xsnp_response.hit_eviction | |
[Unit: uncore_cbox A cross-core snoop resulted from L3 Eviction which | |
hits a non-modified line in some processor core] | |
unc_cbo_xsnp_response.hit_external | |
[Unit: uncore_cbox An external snoop hits a non-modified line in some | |
processor core] | |
unc_cbo_xsnp_response.hit_xcore | |
[Unit: uncore_cbox A cross-core snoop initiated by this Cbox due to | |
processor core memory request which hits a non-modified line in some | |
processor core] | |
unc_cbo_xsnp_response.hitm_eviction | |
[Unit: uncore_cbox A cross-core snoop resulted from L3 Eviction which | |
hits a modified line in some processor core] | |
unc_cbo_xsnp_response.hitm_external | |
[Unit: uncore_cbox An external snoop hits a modified line in some | |
processor core] | |
unc_cbo_xsnp_response.hitm_xcore | |
[Unit: uncore_cbox A cross-core snoop initiated by this Cbox due to | |
processor core memory request which hits a modified line in some | |
processor core] | |
unc_cbo_xsnp_response.miss_eviction | |
[Unit: uncore_cbox A cross-core snoop resulted from L3 Eviction which | |
misses in some processor core] | |
unc_cbo_xsnp_response.miss_external | |
[Unit: uncore_cbox An external snoop misses in some processor core] | |
unc_cbo_xsnp_response.miss_xcore | |
[Unit: uncore_cbox A cross-core snoop initiated by this Cbox due to | |
processor core memory request which misses in some processor core] | |
virtual memory: | |
dtlb_load_misses.miss_causes_a_walk | |
[Load misses in all DTLB levels that cause page walks] | |
dtlb_load_misses.pde_cache_miss | |
[DTLB demand load misses with low part of linear-to-physical address | |
translation missed] | |
dtlb_load_misses.stlb_hit | |
[Load operations that miss the first DTLB level but hit the second and | |
do not cause page walks] | |
dtlb_load_misses.stlb_hit_2m | |
[Load misses that miss the DTLB and hit the STLB (2M)] | |
dtlb_load_misses.stlb_hit_4k | |
[Load misses that miss the DTLB and hit the STLB (4K)] | |
dtlb_load_misses.walk_completed | |
[Demand load Miss in all translation lookaside buffer (TLB) levels | |
causes a page walk that completes of any page size] | |
dtlb_load_misses.walk_completed_1g | |
[Load miss in all TLB levels causes a page walk that completes. (1G)] | |
dtlb_load_misses.walk_completed_2m_4m | |
[Demand load Miss in all translation lookaside buffer (TLB) levels | |
causes a page walk that completes (2M/4M)] | |
dtlb_load_misses.walk_completed_4k | |
[Demand load Miss in all translation lookaside buffer (TLB) levels | |
causes a page walk that completes (4K)] | |
dtlb_load_misses.walk_duration | |
[Cycles when PMH is busy with page walks] | |
dtlb_store_misses.miss_causes_a_walk | |
[Store misses in all DTLB levels that cause page walks] | |
dtlb_store_misses.pde_cache_miss | |
[DTLB store misses with low part of linear-to-physical address | |
translation missed] | |
dtlb_store_misses.stlb_hit | |
[Store operations that miss the first TLB level but hit the second and | |
do not cause page walks] | |
dtlb_store_misses.stlb_hit_2m | |
[Store misses that miss the DTLB and hit the STLB (2M)] | |
dtlb_store_misses.stlb_hit_4k | |
[Store misses that miss the DTLB and hit the STLB (4K)] | |
dtlb_store_misses.walk_completed | |
[Store misses in all DTLB levels that cause completed page walks] | |
dtlb_store_misses.walk_completed_1g | |
[Store misses in all DTLB levels that cause completed page walks. (1G)] | |
dtlb_store_misses.walk_completed_2m_4m | |
[Store misses in all DTLB levels that cause completed page walks | |
(2M/4M)] | |
dtlb_store_misses.walk_completed_4k | |
[Store miss in all TLB levels causes a page walk that completes. (4K)] | |
dtlb_store_misses.walk_duration | |
[Cycles when PMH is busy with page walks] | |
ept.walk_cycles | |
[Cycle count for an Extended Page table walk] | |
itlb.itlb_flush | |
[Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages] | |
itlb_misses.miss_causes_a_walk | |
[Misses at all ITLB levels that cause page walks] | |
itlb_misses.stlb_hit | |
[Operations that miss the first ITLB level but hit the second and do | |
not cause any page walks] | |
itlb_misses.stlb_hit_2m | |
[Code misses that miss the DTLB and hit the STLB (2M)] | |
itlb_misses.stlb_hit_4k | |
[Core misses that miss the DTLB and hit the STLB (4K)] | |
itlb_misses.walk_completed | |
[Misses in all ITLB levels that cause completed page walks] | |
itlb_misses.walk_completed_1g | |
[Store miss in all TLB levels causes a page walk that completes. (1G)] | |
itlb_misses.walk_completed_2m_4m | |
[Code miss in all TLB levels causes a page walk that completes. (2M/4M)] | |
itlb_misses.walk_completed_4k | |
[Code miss in all TLB levels causes a page walk that completes. (4K)] | |
itlb_misses.walk_duration | |
[Cycles when PMH is busy with page walks] | |
page_walker_loads.dtlb_l1 | |
[Number of DTLB page walker hits in the L1+FB] | |
page_walker_loads.dtlb_l2 | |
[Number of DTLB page walker hits in the L2] | |
page_walker_loads.dtlb_l3 | |
[Number of DTLB page walker hits in the L3 + XSNP Spec update: HSD25] | |
page_walker_loads.dtlb_memory | |
[Number of DTLB page walker hits in Memory Spec update: HSD25] | |
page_walker_loads.ept_dtlb_l1 | |
[Counts the number of Extended Page Table walks from the DTLB that hit | |
in the L1 and FB] | |
page_walker_loads.ept_dtlb_l2 | |
[Counts the number of Extended Page Table walks from the DTLB that hit | |
in the L2] | |
page_walker_loads.ept_dtlb_l3 | |
[Counts the number of Extended Page Table walks from the DTLB that hit | |
in the L3] | |
page_walker_loads.ept_dtlb_memory | |
[Counts the number of Extended Page Table walks from the DTLB that hit | |
in memory] | |
page_walker_loads.ept_itlb_l1 | |
[Counts the number of Extended Page Table walks from the ITLB that hit | |
in the L1 and FB] | |
page_walker_loads.ept_itlb_l2 | |
[Counts the number of Extended Page Table walks from the ITLB that hit | |
in the L2] | |
page_walker_loads.ept_itlb_l3 | |
[Counts the number of Extended Page Table walks from the ITLB that hit | |
in the L2] | |
page_walker_loads.ept_itlb_memory | |
[Counts the number of Extended Page Table walks from the ITLB that hit | |
in memory] | |
page_walker_loads.itlb_l1 | |
[Number of ITLB page walker hits in the L1+FB] | |
page_walker_loads.itlb_l2 | |
[Number of ITLB page walker hits in the L2] | |
page_walker_loads.itlb_l3 | |
[Number of ITLB page walker hits in the L3 + XSNP Spec update: HSD25] | |
page_walker_loads.itlb_memory | |
[Number of ITLB page walker hits in Memory Spec update: HSD25] | |
tlb_flush.dtlb_thread | |
[DTLB flush attempts of the thread-specific entries] | |
tlb_flush.stlb_any | |
[STLB flush attempts] | |
rNNN [Raw hardware event descriptor] | |
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] | |
mem:<addr>[/len][:access] [Hardware breakpoint] | |
Metric Groups: | |
DSB: | |
DSB_Coverage | |
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] | |
Frontend: | |
IFetch_Line_Utilization | |
[Rough Estimation of fraction of fetched lines bytes that were likely consumed by program instructions] | |
Frontend_Bandwidth: | |
DSB_Coverage | |
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] | |
Memory_BW: | |
MLP | |
[Memory-Level-Parallelism (average number of L1 miss demand load when there is at least 1 such miss)] | |
Memory_Bound: | |
Load_Miss_Real_Latency | |
[Actual Average Latency for L1 data-cache miss demand loads] | |
MLP | |
[Memory-Level-Parallelism (average number of L1 miss demand load when there is at least 1 such miss)] | |
Memory_Lat: | |
Load_Miss_Real_Latency | |
[Actual Average Latency for L1 data-cache miss demand loads] | |
Pipeline: | |
CPI | |
[Cycles Per Instruction (threaded)] | |
ILP | |
[Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)] | |
UPI | |
[Uops Per Instruction] | |
Ports_Utilization: | |
ILP | |
[Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)] | |
Power: | |
C2_Pkg_Residency | |
[C2 residency percent per package] | |
C3_Core_Residency | |
[C3 residency percent per core] | |
C3_Pkg_Residency | |
[C3 residency percent per package] | |
C6_Core_Residency | |
[C6 residency percent per core] | |
C6_Pkg_Residency | |
[C6 residency percent per package] | |
C7_Core_Residency | |
[C7 residency percent per core] | |
C7_Pkg_Residency | |
[C7 residency percent per package] | |
Turbo_Utilization | |
[Average Frequency Utilization relative nominal frequency] | |
SMT: | |
CORE_CLKS | |
[Core actual clocks when any thread is active on the physical core] | |
CoreIPC | |
[Instructions Per Cycle (per physical core)] | |
SMT_2T_Utilization | |
[Fraction of cycles where both hardware threads were active] | |
Summary: | |
CLKS | |
[Per-thread actual clocks when the logical processor is active. This is called 'Clockticks' in VTune] | |
CPI | |
[Cycles Per Instruction (threaded)] | |
CPU_Utilization | |
[Average CPU Utilization] | |
Instructions | |
[Total number of retired Instructions] | |
Kernel_Utilization | |
[Fraction of cycles spent in Kernel mode] | |
SMT_2T_Utilization | |
[Fraction of cycles where both hardware threads were active] | |
TLB: | |
Page_Walks_Utilization | |
[Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses] | |
TopDownL1: | |
IPC | |
[Instructions Per Cycle (per logical thread)] | |
SLOTS | |
[Total issue-pipeline slots] | |
Unknown_Branches: | |
BAClear_Cost | |
[Average Branch Address Clear Cost (fraction of cycles)] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment