Skip to content

Instantly share code, notes, and snippets.

@travisdowns
Created June 11, 2019 05:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save travisdowns/279ae768f0da5d5b0b3fe7f6c8e89af3 to your computer and use it in GitHub Desktop.
Save travisdowns/279ae768f0da5d5b0b3fe7f6c8e89af3 to your computer and use it in GitHub Desktop.
Two runs: the first has all the hit events and fb_hit, the second all the miss events and fb_hit.
The column names are all the same, sorry: the events are the mem_lo columns in the same order as
shown by the lines "Resolved and programmed event".
$ ./uarch-bench.sh --timer=perf --test-name='memory/bandwidth/load/load-bandwidth-256b*' --extra-events=mem_load_retired.l1_hit,mem_load_retired.l2_hit,mem_load_retired.l3_hit,mem_load_retired.fb_hit
USE_LIBPFC=1 USE_PERF_TIMER=1
make: Nothing to be done for 'all'.
Driver: intel_pstate, governor: performance
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
intel_pstate/no_turbo reports that turbo is already disabled
Using timer: perf
Welcome to uarch-bench (fcc39c8-dirty)
Supported CPU features: SSE3 PCLMULQDQ VMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ SGX BMI1 HLE AVX2 BMI2 ERMS RTM MPX RDSEED ADX CLFLUSHOPT INTEL_PT
Pinned to CPU 0
Programmed cycles event, caps: R:1 UT:1 ZT:1 index: 0x40000002
Resolved and programmed event 'mem_load_retired.l1_hit' to 'cpu/config=0x1d1/', caps: R:1 UT:1 ZT:1 index: 0x1
Resolved and programmed event 'mem_load_retired.l2_hit' to 'cpu/config=0x2d1/', caps: R:1 UT:1 ZT:1 index: 0x2
Resolved and programmed event 'mem_load_retired.l3_hit' to 'cpu/config=0x4d1/', caps: R:1 UT:1 ZT:1 index: 0x3
Resolved and programmed event 'mem_load_retired.fb_hit' to 'cpu/config=0x40d1/', caps: R:1 UT:1 ZT:1 index: 0x4
Running benchmarks groups using timer perf
** Running group memory/bandwidth/load : Linear AVX2 loads **
Benchmark Cycles mem_lo mem_lo mem_lo mem_lo
4-KiB 256-bit linear load BW per CL 1.22 2.03 0.00 0.00 0.00
8-KiB 256-bit linear load BW per CL 1.16 2.02 0.00 0.00 0.00
16-KiB 256-bit linear load BW per CL 1.07 2.01 0.00 0.00 0.00
32-KiB 256-bit linear load BW per CL 1.07 1.97 0.02 0.00 0.02
64-KiB 256-bit linear load BW per CL 2.01 0.02 1.00 0.00 0.98
128-KiB 256-bit linear load BW per CL 1.98 0.00 1.00 0.00 1.00
256-KiB 256-bit linear load BW per CL 2.00 0.03 1.00 0.00 0.97
512-KiB 256-bit linear load BW per CL 4.18 0.00 0.00 1.00 1.00
1024-KiB 256-bit linear load BW per CL 4.18 0.00 0.00 1.00 1.00
2048-KiB 256-bit linear load BW per CL 4.18 0.00 0.00 1.00 1.00
4096-KiB 256-bit linear load BW per CL 4.18 0.00 0.00 1.00 1.00
8192-KiB 256-bit linear load BW per CL 9.14 0.01 0.00 0.57 0.99
16384-KiB 256-bit linear load BW per CL 11.93 0.01 0.00 0.30 1.00
32768-KiB 256-bit linear load BW per CL 13.64 0.01 0.00 0.13 1.00
65536-KiB 256-bit linear load BW per CL 14.51 0.01 0.00 0.05 1.00
Finished in 18780 ms (memory/bandwidth/load)
$ ./uarch-bench.sh --timer=perf --test-name='memory/bandwidth/load/load-bandwidth-256b*' --extra-events=mem_load_retired.l1_miss,mem_load_retired.l2_miss,mem_load_retired.l3_miss,mem_load_retired.fb_hit
Driver: intel_pstate, governor: performance
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
intel_pstate/no_turbo reports that turbo is already disabled
Using timer: perf
Welcome to uarch-bench (fcc39c8-dirty)
Supported CPU features: SSE3 PCLMULQDQ VMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ SGX BMI1 HLE AVX2 BMI2 ERMS RTM MPX RDSEED ADX CLFLUSHOPT INTEL_PT
Pinned to CPU 0
Programmed cycles event, caps: R:1 UT:1 ZT:1 index: 0x40000002
Resolved and programmed event 'mem_load_retired.l1_miss' to 'cpu/config=0x8d1/', caps: R:1 UT:1 ZT:1 index: 0x1
Resolved and programmed event 'mem_load_retired.l2_miss' to 'cpu/config=0x10d1/', caps: R:1 UT:1 ZT:1 index: 0x2
Resolved and programmed event 'mem_load_retired.l3_miss' to 'cpu/config=0x20d1/', caps: R:1 UT:1 ZT:1 index: 0x3
Resolved and programmed event 'mem_load_retired.fb_hit' to 'cpu/config=0x40d1/', caps: R:1 UT:1 ZT:1 index: 0x4
Running benchmarks groups using timer perf
** Running group memory/bandwidth/load : Linear AVX2 loads **
Benchmark Cycles mem_lo mem_lo mem_lo mem_lo
4-KiB 256-bit linear load BW per CL 0.88 0.00 0.00 0.00 0.00
8-KiB 256-bit linear load BW per CL 1.08 0.00 0.00 0.00 0.00
16-KiB 256-bit linear load BW per CL 1.15 -0.00 -0.00 -0.00 -0.00
32-KiB 256-bit linear load BW per CL 1.07 0.02 0.00 0.00 0.02
64-KiB 256-bit linear load BW per CL 2.01 1.00 0.00 0.00 1.01
128-KiB 256-bit linear load BW per CL 2.00 1.00 0.00 0.00 0.98
256-KiB 256-bit linear load BW per CL 2.00 1.00 0.00 0.00 0.97
512-KiB 256-bit linear load BW per CL 4.17 1.00 1.00 0.00 1.00
1024-KiB 256-bit linear load BW per CL 4.17 1.00 1.00 0.00 1.00
2048-KiB 256-bit linear load BW per CL 4.17 1.00 1.00 0.00 1.00
4096-KiB 256-bit linear load BW per CL 4.17 1.00 1.00 0.00 1.00
8192-KiB 256-bit linear load BW per CL 9.25 0.99 0.99 0.44 1.00
16384-KiB 256-bit linear load BW per CL 11.91 1.00 1.00 0.70 1.00
32768-KiB 256-bit linear load BW per CL 13.59 1.00 1.00 0.86 1.00
65536-KiB 256-bit linear load BW per CL 14.44 1.00 1.00 0.95 1.00
Finished in 18782 ms (memory/bandwidth/load)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment