Skip to content

Instantly share code, notes, and snippets.

@fwsGonzo
Last active July 18, 2024 07:28
Show Gist options
  • Save fwsGonzo/a594727a9429cb29f2012652ad43fb37 to your computer and use it in GitHub Desktop.
Save fwsGonzo/a594727a9429cb29f2012652ad43fb37 to your computer and use it in GitHub Desktop.
-= Binary Translated =-
$ ./rvlinux ../binaries/STREAM/stream-tuned-rv64gvb
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 4 bytes per array element.
-------------------------------------------------------------
Array size = 20000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 14130 microseconds.
(= 14130 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 35281.1 0.004607 0.004535 0.004767
Scale: 33712.6 0.004784 0.004746 0.004818
Add: 32693.1 0.007415 0.007341 0.007539
Triad: 31716.7 0.007608 0.007567 0.007661
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-06 on all three arrays
-------------------------------------------------------------
>>> Program exited, exit code = 0 (0x0)
Instructions executed: 1305098370 Runtime: 388.267ms Insn/s: 3361mi/s
Pages in use: 20 (80 kB virtual memory, total 364 kB)
-= Interpreted =-
$ ./rvlinux ../binaries/STREAM/build/stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 4 bytes per array element.
-------------------------------------------------------------
Array size = 20000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 96551 microseconds.
(= 96551 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 18262.5 0.010872 0.008761 0.012888
Scale: 9456.8 0.017738 0.016919 0.022430
Add: 10730.6 0.022911 0.022366 0.024496
Triad: 7793.5 0.032249 0.030795 0.036608
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-06 on all three arrays
-------------------------------------------------------------
>>> Program exited, exit code = 0 (0x0)
Instructions executed: 1177805251 Runtime: 1489.893ms Insn/s: 791mi/s
Pages in use: 233 (932 kB virtual memory, total 1978 kB)
@fwsGonzo
Copy link
Author

fwsGonzo commented Nov 12, 2022

STREAM with libtcc as JIT-compiler:

$ ./rvlinux ../binaries/stream-rv64gc 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1023 microseconds.
Each test below will take on the order of 14335 microseconds.
   (= 14 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           26041.7     0.006827     0.006144     0.007168
Scale:          10416.7     0.015929     0.015360     0.016384
Add:            12335.5     0.020594     0.019456     0.021504
Triad:          11160.7     0.021959     0.021504     0.022528
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
>>> Program exited, exit code = 0 (0x0)
Runtime: 805.521ms   (Use --accurate for instruction counting)
Pages in use: 110 (440 kB virtual memory, total 710 kB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment