Skip to content

Instantly share code, notes, and snippets.

@tanakamura
Last active November 7, 2021 21:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tanakamura/efbfab5cfdf6707d098714d616dd6ef3 to your computer and use it in GitHub Desktop.
Save tanakamura/efbfab5cfdf6707d098714d616dd6ef3 to your computer and use it in GitHub Desktop.

linux build

Linux-5.14.15 の make defconfig したものから make を二回やって二回目

[J] は、/sys/class/powercap/intel-rapl:0/energy_uj を読んで出たJoule値 (CPU内蔵センサー値なので、AMDとIntelで基準が違う可能性あり)

以下のようなのを rapl-run.py として、

#!/usr/bin/env python3
import sys
import subprocess

def main():
    args = sys.argv[1:]
    if len(args) == 0:
        raise Exception("no arg")

    f = open("/sys/class/powercap/intel-rapl:0/energy_uj", "r")

    val = int(f.readline())

    subprocess.call(args)

    f.seek(0)
    val2 = int(f.readline())

    delta = val2 - val

    print("%f [J]"%(delta/1e6))


if __name__ == '__main__':
    main()
 $ rapl-run.py perf stat make -j $(expr $(nproc) '*' 2)

として実行

Ryzen

  • 111.841197130 seconds time elapsed
  • 9396.697417 [J]
  • 0.92 insn per cycle

i7

  • 68.777557306 seconds time elapsed
  • 9956.462915 [J]
  • 1.09 insn per cycle

Ryzen 3700X が 111.8秒で終わってるのに対して、i7 12600K は 68.8秒で終わる。

この方法で見る場合だと、消費電力は i7 のほうが少し多い。

IPC は i7 のほうがいい。

P-core 8コアのみで実行すると、 74.5秒、11341 [J]

  • 74.519648981 seconds time elapsed
  • 11340.825215 [J]
  • 1.11 insn per cycle

E-core 4コアのみで実行すると、

  • 347.804299616 seconds time elapsed
  • 7742.243791 [J]
  • 1.15 insn per cycle

E-core のIPCが一番高い

# Ryzen 7 3700X

 Performance counter stats for 'make -j32':

      1,645,905.15 msec task-clock                #   14.716 CPUs utilized          
           374,169      context-switches          #  227.333 /sec                   
            38,120      cpu-migrations            #   23.161 /sec                   
        53,201,666      page-faults               #   32.324 K/sec                  
 6,186,487,258,458      cycles                    #    3.759 GHz                      (83.79%)
   584,145,164,956      stalled-cycles-frontend   #    9.44% frontend cycles idle     (83.77%)
   387,679,849,583      stalled-cycles-backend    #    6.27% backend cycles idle      (83.74%)
 5,664,603,284,033      instructions              #    0.92  insn per cycle         
                                                  #    0.10  stalled cycles per insn  (83.77%)
 1,197,485,237,223      branches                  #  727.554 M/sec                    (83.78%)
    35,326,280,086      branch-misses             #    2.95% of all branches          (83.78%)

     111.841197130 seconds time elapsed

    1492.347945000 seconds user
     142.398664000 seconds sys


9396.697417 [J]
# i7-12700K

 Performance counter stats for 'make -j 40':

      1,165,372.81 msec task-clock                #   16.944 CPUs utilized
           290,107      context-switches          #  248.939 /sec
            39,203      cpu-migrations            #   33.640 /sec
        53,182,055      page-faults               #   45.635 K/sec
 5,203,504,969,685      cycles                    #    4.465 GHz
 5,650,225,661,659      instructions              #    1.09  insn per cycle
 1,192,600,178,609      branches                  #    1.023 G/sec
    30,412,406,301      branch-misses             #    2.55% of all branches

      68.777557306 seconds time elapsed

    1074.093308000 seconds user
      90.981504000 seconds sys
p

9956.462915 [J]
# P core のみ
# $ numactl -C 0-15 rapl-run.py perf stat make -j 32

 Performance counter stats for 'make -j 32':

      1,081,639.26 msec task-clock                #   14.515 CPUs utilized
           247,198      context-switches          #  228.540 /sec
            39,099      cpu-migrations            #   36.148 /sec
        53,195,128      page-faults               #   49.180 K/sec
 5,069,304,612,802      cycles                    #    4.687 GHz
 5,650,121,874,431      instructions              #    1.11  insn per cycle
 1,192,553,666,047      branches                  #    1.103 G/sec
    29,885,232,531      branch-misses             #    2.51% of all branches

      74.519648981 seconds time elapsed

    1002.042991000 seconds user
      79.129940000 seconds sys


11340.825215 [J]
# E core のみ
# $ numactl -C 16-19 rapl-run.py perf stat make -j 8

 Performance counter stats for 'make -j 8':

      1,360,440.69 msec task-clock                #    3.912 CPUs utilized
           224,271      context-switches          #  164.852 /sec
            19,907      cpu-migrations            #   14.633 /sec
        53,176,928      page-faults               #   39.088 K/sec
 4,895,967,057,617      cycles                    #    3.599 GHz
 5,649,308,720,014      instructions              #    1.15  insn per cycle
 1,192,383,821,310      branches                  #  876.469 M/sec
    33,308,624,261      branch-misses             #    2.79% of all branches

     347.804299616 seconds time elapsed

    1264.263234000 seconds user
      96.071435000 seconds sys


7742.243791 [J]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment