-
-
Save travisdowns/279ae768f0da5d5b0b3fe7f6c8e89af3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Two runs: the first has all the hit events and fb_hit, the second all the miss events and fb_hit. | |
The column names are all the same, sorry: the events are the mem_lo columns in the same order as | |
shown by the lines "Resolved and programmed event". | |
$ ./uarch-bench.sh --timer=perf --test-name='memory/bandwidth/load/load-bandwidth-256b*' --extra-events=mem_load_retired.l1_hit,mem_load_retired.l2_hit,mem_load_retired.l3_hit,mem_load_retired.fb_hit | |
USE_LIBPFC=1 USE_PERF_TIMER=1 | |
make: Nothing to be done for 'all'. | |
Driver: intel_pstate, governor: performance | |
Vendor ID: GenuineIntel | |
Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz | |
intel_pstate/no_turbo reports that turbo is already disabled | |
Using timer: perf | |
Welcome to uarch-bench (fcc39c8-dirty) | |
Supported CPU features: SSE3 PCLMULQDQ VMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ SGX BMI1 HLE AVX2 BMI2 ERMS RTM MPX RDSEED ADX CLFLUSHOPT INTEL_PT | |
Pinned to CPU 0 | |
Programmed cycles event, caps: R:1 UT:1 ZT:1 index: 0x40000002 | |
Resolved and programmed event 'mem_load_retired.l1_hit' to 'cpu/config=0x1d1/', caps: R:1 UT:1 ZT:1 index: 0x1 | |
Resolved and programmed event 'mem_load_retired.l2_hit' to 'cpu/config=0x2d1/', caps: R:1 UT:1 ZT:1 index: 0x2 | |
Resolved and programmed event 'mem_load_retired.l3_hit' to 'cpu/config=0x4d1/', caps: R:1 UT:1 ZT:1 index: 0x3 | |
Resolved and programmed event 'mem_load_retired.fb_hit' to 'cpu/config=0x40d1/', caps: R:1 UT:1 ZT:1 index: 0x4 | |
Running benchmarks groups using timer perf | |
** Running group memory/bandwidth/load : Linear AVX2 loads ** | |
Benchmark Cycles mem_lo mem_lo mem_lo mem_lo | |
4-KiB 256-bit linear load BW per CL 1.22 2.03 0.00 0.00 0.00 | |
8-KiB 256-bit linear load BW per CL 1.16 2.02 0.00 0.00 0.00 | |
16-KiB 256-bit linear load BW per CL 1.07 2.01 0.00 0.00 0.00 | |
32-KiB 256-bit linear load BW per CL 1.07 1.97 0.02 0.00 0.02 | |
64-KiB 256-bit linear load BW per CL 2.01 0.02 1.00 0.00 0.98 | |
128-KiB 256-bit linear load BW per CL 1.98 0.00 1.00 0.00 1.00 | |
256-KiB 256-bit linear load BW per CL 2.00 0.03 1.00 0.00 0.97 | |
512-KiB 256-bit linear load BW per CL 4.18 0.00 0.00 1.00 1.00 | |
1024-KiB 256-bit linear load BW per CL 4.18 0.00 0.00 1.00 1.00 | |
2048-KiB 256-bit linear load BW per CL 4.18 0.00 0.00 1.00 1.00 | |
4096-KiB 256-bit linear load BW per CL 4.18 0.00 0.00 1.00 1.00 | |
8192-KiB 256-bit linear load BW per CL 9.14 0.01 0.00 0.57 0.99 | |
16384-KiB 256-bit linear load BW per CL 11.93 0.01 0.00 0.30 1.00 | |
32768-KiB 256-bit linear load BW per CL 13.64 0.01 0.00 0.13 1.00 | |
65536-KiB 256-bit linear load BW per CL 14.51 0.01 0.00 0.05 1.00 | |
Finished in 18780 ms (memory/bandwidth/load) | |
$ ./uarch-bench.sh --timer=perf --test-name='memory/bandwidth/load/load-bandwidth-256b*' --extra-events=mem_load_retired.l1_miss,mem_load_retired.l2_miss,mem_load_retired.l3_miss,mem_load_retired.fb_hit | |
Driver: intel_pstate, governor: performance | |
Vendor ID: GenuineIntel | |
Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz | |
intel_pstate/no_turbo reports that turbo is already disabled | |
Using timer: perf | |
Welcome to uarch-bench (fcc39c8-dirty) | |
Supported CPU features: SSE3 PCLMULQDQ VMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ SGX BMI1 HLE AVX2 BMI2 ERMS RTM MPX RDSEED ADX CLFLUSHOPT INTEL_PT | |
Pinned to CPU 0 | |
Programmed cycles event, caps: R:1 UT:1 ZT:1 index: 0x40000002 | |
Resolved and programmed event 'mem_load_retired.l1_miss' to 'cpu/config=0x8d1/', caps: R:1 UT:1 ZT:1 index: 0x1 | |
Resolved and programmed event 'mem_load_retired.l2_miss' to 'cpu/config=0x10d1/', caps: R:1 UT:1 ZT:1 index: 0x2 | |
Resolved and programmed event 'mem_load_retired.l3_miss' to 'cpu/config=0x20d1/', caps: R:1 UT:1 ZT:1 index: 0x3 | |
Resolved and programmed event 'mem_load_retired.fb_hit' to 'cpu/config=0x40d1/', caps: R:1 UT:1 ZT:1 index: 0x4 | |
Running benchmarks groups using timer perf | |
** Running group memory/bandwidth/load : Linear AVX2 loads ** | |
Benchmark Cycles mem_lo mem_lo mem_lo mem_lo | |
4-KiB 256-bit linear load BW per CL 0.88 0.00 0.00 0.00 0.00 | |
8-KiB 256-bit linear load BW per CL 1.08 0.00 0.00 0.00 0.00 | |
16-KiB 256-bit linear load BW per CL 1.15 -0.00 -0.00 -0.00 -0.00 | |
32-KiB 256-bit linear load BW per CL 1.07 0.02 0.00 0.00 0.02 | |
64-KiB 256-bit linear load BW per CL 2.01 1.00 0.00 0.00 1.01 | |
128-KiB 256-bit linear load BW per CL 2.00 1.00 0.00 0.00 0.98 | |
256-KiB 256-bit linear load BW per CL 2.00 1.00 0.00 0.00 0.97 | |
512-KiB 256-bit linear load BW per CL 4.17 1.00 1.00 0.00 1.00 | |
1024-KiB 256-bit linear load BW per CL 4.17 1.00 1.00 0.00 1.00 | |
2048-KiB 256-bit linear load BW per CL 4.17 1.00 1.00 0.00 1.00 | |
4096-KiB 256-bit linear load BW per CL 4.17 1.00 1.00 0.00 1.00 | |
8192-KiB 256-bit linear load BW per CL 9.25 0.99 0.99 0.44 1.00 | |
16384-KiB 256-bit linear load BW per CL 11.91 1.00 1.00 0.70 1.00 | |
32768-KiB 256-bit linear load BW per CL 13.59 1.00 1.00 0.86 1.00 | |
65536-KiB 256-bit linear load BW per CL 14.44 1.00 1.00 0.95 1.00 | |
Finished in 18782 ms (memory/bandwidth/load) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment