Skip to content

Instantly share code, notes, and snippets.

View tanakamura's full-sized avatar

Takashi Nakamura tanakamura

View GitHub Profile
@tanakamura
tanakamura / pl2.md
Last active November 8, 2021 12:47
============= LATENCY ==============================================================================
                              instruction |     IPC         (   rel[%]),     CPI         (   rel[%])
------------------------------------------+---------------------------------------------------------
    m128                            addps |    0.50-0.25    ( 100.0[%]),    2.00-4.00    ( -50.0[%])
    m128                           aesdec |    0.33-0.14    ( 133.4[%]),    3.00-7.00    ( -57.1[%])
    m128                       aesdeclast |    0.33-0.14    ( 133.4[%]),    3.00-7.00    ( -57.1[%])
    m128                           aesenc |    0.33-0.14    ( 133.3[%]),    3.00-7.00    ( -57.1[%])
    m128                       aesenclast |    0.33-0.14    ( 133.4[%]),    3.00-7.00    ( -57.1[%])
    m128                          blendps |    1.00-1.00    (   0.1[%]),    1.00-1.00    (  -0.1[%])

linux build

Linux-5.14.15 の make defconfig したものから make を二回やって二回目

[J] は、/sys/class/powercap/intel-rapl:0/energy_uj を読んで出たJoule値 (CPU内蔵センサー値なので、AMDとIntelで基準が違う可能性あり)

以下のようなのを rapl-run.py として、

ooo ratio : 1.398357
ostimer: clock_gettime
userland_timer: cntvct
perf_counter: no
Qualcomm Snapdragon 710
==== idiv32-realtime ====
-> : divider_bit
| | 1| 2| 3| 4| 5| 6| 7| 8| 9| 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 27| 28| 29| 30| 31| 32
---------------------------------------------------------------------------------------------------------------------------------------------
| 0 | 2.9|2.9| 7.2| 2.9|2.9|2.9|2.9| 3.2|3.0|2.9| 2.9|2.9|2.9|2.9|2.9| 2.9|2.9|2.9|2.9|2.9| 2.9|2.9|2.9| 2.9|2.9|2.9|2.9|2.9|2.9|14.2|4.0|2.9
| |result
--------------------------
| ROB | 389
| INT PRF | 384
| FP PRF | 372
| INT(multi chain) | 32
| FP(multi chain) | 25
|INT(single chain) | 32
| FP(single chain) | 25
v : test_name
ostimer: clock_gettime
userland_timer: rdtscp
perf_counter: yes
Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
==== fpu ====
| | nsec/call
-------------------------
|denormal_add | 1.28803
| normal_add | 1.04813
|denormal_mul | 1.05132
ostimer: clock_gettime
userland_timer: rdtscp
perf_counter: yes
AMD Ryzen 7 3700X 8-Core Processor
==== libc ====
| | nsec/call
-----------------------------------
| atoi_99999 | 14.19927
| fflush_stdout | 5.74030
| sscanf_double_99999 | 122.22827
ostimer: clock_gettime
userland_timer: rdtscp
perf_counter: yes
AMD Ryzen 7 3700X 8-Core Processor
==== libc ====
| | nsec/call
----------------------------------
| atoi_99999 | 17.43726
| fflush_stdout | 11.08052
| sscanf_double_99999 | 121.75406
ostimer: clock_gettime
userland_timer: rdtscp
perf_counter: yes
Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
==== cache-bandwidth-1t ====
<copy>
| |GiB/s
--------------------
| 3072 |187.16192
| 4096 |159.48263