linux build
Linux-5.14.15 の make defconfig したものから make を二回やって二回目
[J] は、/sys/class/powercap/intel-rapl:0/energy_uj を読んで出たJoule値 (CPU内蔵センサー値なので、AMDとIntelで基準が違う可能性あり)
以下のようなのを rapl-run.py として、
#!/usr/bin/env python3
import sys
import subprocess
def main():
args = sys.argv[1:]
if len(args) == 0:
raise Exception("no arg")
f = open("/sys/class/powercap/intel-rapl:0/energy_uj", "r")
val = int(f.readline())
subprocess.call(args)
f.seek(0)
val2 = int(f.readline())
delta = val2 - val
print("%f [J]"%(delta/1e6))
if __name__ == '__main__':
main()
$ rapl-run.py perf stat make -j $(expr $(nproc) '*' 2)
として実行
Ryzen
- 111.841197130 seconds time elapsed
- 9396.697417 [J]
- 0.92 insn per cycle
i7
- 68.777557306 seconds time elapsed
- 9956.462915 [J]
- 1.09 insn per cycle
Ryzen 3700X が 111.8秒で終わってるのに対して、i7 12600K は 68.8秒で終わる。
この方法で見る場合だと、消費電力は i7 のほうが少し多い。
IPC は i7 のほうがいい。
P-core 8コアのみで実行すると、 74.5秒、11341 [J]
- 74.519648981 seconds time elapsed
- 11340.825215 [J]
- 1.11 insn per cycle
E-core 4コアのみで実行すると、
- 347.804299616 seconds time elapsed
- 7742.243791 [J]
- 1.15 insn per cycle
E-core のIPCが一番高い
# Ryzen 7 3700X
Performance counter stats for 'make -j32':
1,645,905.15 msec task-clock # 14.716 CPUs utilized
374,169 context-switches # 227.333 /sec
38,120 cpu-migrations # 23.161 /sec
53,201,666 page-faults # 32.324 K/sec
6,186,487,258,458 cycles # 3.759 GHz (83.79%)
584,145,164,956 stalled-cycles-frontend # 9.44% frontend cycles idle (83.77%)
387,679,849,583 stalled-cycles-backend # 6.27% backend cycles idle (83.74%)
5,664,603,284,033 instructions # 0.92 insn per cycle
# 0.10 stalled cycles per insn (83.77%)
1,197,485,237,223 branches # 727.554 M/sec (83.78%)
35,326,280,086 branch-misses # 2.95% of all branches (83.78%)
111.841197130 seconds time elapsed
1492.347945000 seconds user
142.398664000 seconds sys
9396.697417 [J]
# i7-12700K
Performance counter stats for 'make -j 40':
1,165,372.81 msec task-clock # 16.944 CPUs utilized
290,107 context-switches # 248.939 /sec
39,203 cpu-migrations # 33.640 /sec
53,182,055 page-faults # 45.635 K/sec
5,203,504,969,685 cycles # 4.465 GHz
5,650,225,661,659 instructions # 1.09 insn per cycle
1,192,600,178,609 branches # 1.023 G/sec
30,412,406,301 branch-misses # 2.55% of all branches
68.777557306 seconds time elapsed
1074.093308000 seconds user
90.981504000 seconds sys
p
9956.462915 [J]
# P core のみ
# $ numactl -C 0-15 rapl-run.py perf stat make -j 32
Performance counter stats for 'make -j 32':
1,081,639.26 msec task-clock # 14.515 CPUs utilized
247,198 context-switches # 228.540 /sec
39,099 cpu-migrations # 36.148 /sec
53,195,128 page-faults # 49.180 K/sec
5,069,304,612,802 cycles # 4.687 GHz
5,650,121,874,431 instructions # 1.11 insn per cycle
1,192,553,666,047 branches # 1.103 G/sec
29,885,232,531 branch-misses # 2.51% of all branches
74.519648981 seconds time elapsed
1002.042991000 seconds user
79.129940000 seconds sys
11340.825215 [J]
# E core のみ
# $ numactl -C 16-19 rapl-run.py perf stat make -j 8
Performance counter stats for 'make -j 8':
1,360,440.69 msec task-clock # 3.912 CPUs utilized
224,271 context-switches # 164.852 /sec
19,907 cpu-migrations # 14.633 /sec
53,176,928 page-faults # 39.088 K/sec
4,895,967,057,617 cycles # 3.599 GHz
5,649,308,720,014 instructions # 1.15 insn per cycle
1,192,383,821,310 branches # 876.469 M/sec
33,308,624,261 branch-misses # 2.79% of all branches
347.804299616 seconds time elapsed
1264.263234000 seconds user
96.071435000 seconds sys
7742.243791 [J]