Skip to content

Instantly share code, notes, and snippets.

@travisdowns
Last active September 26, 2018 21:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save travisdowns/9d85a159631146dd04c8e743a6ef2195 to your computer and use it in GitHub Desktop.
Save travisdowns/9d85a159631146dd04c8e743a6ef2195 to your computer and use it in GitHub Desktop.
Driver: intel_pstate, governor: performance
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) W-2104 CPU @ 3.20GHz
intel_pstate/no_turbo reports that turbo is already disabled
Using timer: libpfc
Reloading pfc.ko kernel module
USE_LIBPFC=1
sudo sh -c "echo 2 > /sys/bus/event_source/devices/cpu/rdpmc"
! lsmod | grep -q pfc || sudo rmmod pfc
sudo insmod libpfc/pfc.ko
Welcome to uarch-bench (e9437bd-dirty)
Supported CPU features: SSE3 PCLMULQDQ VMX SMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ BMI1 HLE AVX2 BMI2 ERMS RTM MPX PQE AVX512F AVX512DQ RDSEED ADX CLFLUSHOPT CLWB INTEL_PT AVX512CD AVX512BW AVX512VL
Pinned to CPU 0
lipfc init OK
Running benchmarks groups using timer libpfc
** Inverse throughput for load/16-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
** Inverse throughput for load/32-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
48 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0
** Inverse throughput for load/64-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
48 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0
** Inverse throughput for load/128-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
48 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
** Inverse throughput for load/256-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
32 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
** Inverse throughput for load/512-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
------ STORE --------
** Inverse throughput for store/16-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0
** Inverse throughput for store/32-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
** Inverse throughput for store/64-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
** Inverse throughput for store/128-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
48 : 1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
** Inverse throughput for store/256-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
48 : 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
** Inverse throughput for store/512-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
16 : 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
32 : 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
48 : 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
Driver: intel_pstate, governor: performance
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
intel_pstate/no_turbo reports that turbo is already disabled
Using timer: libpfc
Reloading pfc.ko kernel module
USE_LIBPFC=1
sudo sh -c "echo 2 > /sys/bus/event_source/devices/cpu/rdpmc"
! lsmod | grep -q pfc || sudo rmmod pfc
sudo insmod libpfc/pfc.ko
Welcome to uarch-bench (0a51d90-dirty)
Supported CPU features: SSE3 PCLMULQDQ VMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ SGX BMI1 HLE AVX2 BMI2 ERMS RTM MPX RDSEED ADX CLFLUSHOPT INTEL_PT
libpfm4 initialized successfully
Event 'skl::MEM_INST_RETIRED.SPLIT_LOADS' resolved to 'skl::MEM_INST_RETIRED:SPLIT_LOADS:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0, short name: 'MEM_IN' with code 0x5341d0
Pinned to CPU 0
lipfc init OK
Running benchmarks groups using timer libpfc
** Running group memory/load-serial : Serial loads from fixed-size regions **
Benchmark Cycles MEM_IN
16-KiB serial loads 4.00 0.00
24-KiB serial loads 4.00 0.00
30-KiB serial loads 4.00 0.00
31-KiB serial loads 4.00 0.00
32-KiB serial loads 4.04 0.00
33-KiB serial loads 6.08 0.00
34-KiB serial loads 8.14 0.00
35-KiB serial loads 10.07 0.00
40-KiB serial loads 11.99 0.00
48-KiB serial loads 12.00 0.00
56-KiB serial loads 12.00 0.00
64-KiB serial loads 12.00 0.00
80-KiB serial loads 11.99 0.00
96-KiB serial loads 11.99 0.00
112-KiB serial loads 12.00 0.00
128-KiB serial loads 12.00 0.00
196-KiB serial loads 12.00 0.00
252-KiB serial loads 12.02 0.00
256-KiB serial loads 12.02 0.00
260-KiB serial loads 12.92 0.00
384-KiB serial loads 28.13 0.00
512-KiB serial loads 30.28 0.00
1024-KiB serial loads 34.04 0.00
2048-KiB serial loads 35.54 0.00
4096-KiB serial loads 36.26 0.00
8192-KiB serial loads 103.14 0.00
16384-KiB serial loads 141.98 0.00
32768-KiB serial loads 96.49 0.00
65536-KiB serial loads 94.72 0.00
131072-KiB serial loads 135.22 0.00
262144-KiB serial loads 163.91 0.00
** Running group memory/load-serial-crossing : Cacheline crossing loads from fixed-size regions **
Benchmark Cycles MEM_IN
8-KiB serial loads 11.00 1.00
16-KiB serial loads 11.00 1.00
32-KiB serial loads 11.08 1.00
64-KiB serial loads 22.31 1.00
128-KiB serial loads 24.16 1.00
256-KiB serial loads 24.79 1.00
512-KiB serial loads 40.31 1.00
1024-KiB serial loads 43.80 1.00
2048-KiB serial loads 45.36 1.00
4096-KiB serial loads 46.18 1.00
8192-KiB serial loads 137.21 1.00
16384-KiB serial loads 188.85 1.00
32768-KiB serial loads 211.99 1.00
65536-KiB serial loads 219.83 1.00
131072-KiB serial loads 203.40 1.00
262144-KiB serial loads 212.85 1.00
Driver: intel_pstate, governor: performance
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) W-2104 CPU @ 3.20GHz
intel_pstate/no_turbo reports that turbo is already disabled
Using timer: libpfc
Reloading pfc.ko kernel module
USE_LIBPFC=1
sudo sh -c "echo 2 > /sys/bus/event_source/devices/cpu/rdpmc"
[sudo] password for travis:
! lsmod | grep -q pfc || sudo rmmod pfc
sudo insmod libpfc/pfc.ko
Welcome to uarch-bench (97c09a3)
Supported CPU features: SSE3 PCLMULQDQ VMX SMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ BMI1 HLE AVX2 BMI2 ERMS RTM MPX PQE AVX512F AVX512DQ RDSEED ADX CLFLUSHOPT CLWB INTEL_PT AVX512CD AVX512BW AVX512VL
libpfm4 initialized successfully
WARNING: Event 'skl::MEM_INST_RETIRED.SPLIT_LOADS' could not be resolved and will be ignored. Reason: event not found
Use --list-events to list available events.
Pinned to CPU 0
lipfc init OK
Running benchmarks groups using timer libpfc
** Running group memory/load-serial : Serial loads from fixed-size regions **
Benchmark Cycles
16-KiB serial loads 4.00
24-KiB serial loads 4.01
30-KiB serial loads 4.01
31-KiB serial loads 4.01
32-KiB serial loads 4.01
33-KiB serial loads 6.66
34-KiB serial loads 9.23
35-KiB serial loads 11.65
40-KiB serial loads 13.96
48-KiB serial loads 13.98
56-KiB serial loads 14.00
64-KiB serial loads 13.99
80-KiB serial loads 13.99
96-KiB serial loads 14.00
112-KiB serial loads 14.01
128-KiB serial loads 14.00
196-KiB serial loads 14.00
252-KiB serial loads 14.00
256-KiB serial loads 14.00
260-KiB serial loads 14.00
384-KiB serial loads 14.01
512-KiB serial loads 14.00
1024-KiB serial loads 14.13
2048-KiB serial loads 76.74
4096-KiB serial loads 71.73
8192-KiB serial loads 76.11
16384-KiB serial loads 81.42
32768-KiB serial loads 86.46
65536-KiB serial loads 87.68
131072-KiB serial loads 94.23
262144-KiB serial loads 95.59
** Running group memory/load-serial-crossing : Cacheline crossing loads from fixed-size regions **
Benchmark Cycles
8-KiB serial loads 11.00
16-KiB serial loads 11.00
32-KiB serial loads 11.20
64-KiB serial loads 22.24
128-KiB serial loads 23.97
256-KiB serial loads 24.50
512-KiB serial loads 24.80
1024-KiB serial loads 24.97
2048-KiB serial loads 78.63
4096-KiB serial loads 86.56
8192-KiB serial loads 110.50
16384-KiB serial loads 252.92
32768-KiB serial loads 279.98
65536-KiB serial loads 292.90
131072-KiB serial loads 311.31
262144-KiB serial loads 313.89
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment