Skip to content

Instantly share code, notes, and snippets.

@sithhell
Created February 26, 2019 10:57
Show Gist options
  • Save sithhell/fc304c6e9e8e84bd63a4d54b4f61e0a7 to your computer and use it in GitHub Desktop.
Save sithhell/fc304c6e9e8e84bd63a4d54b4f61e0a7 to your computer and use it in GitHub Desktop.
heller@g2:~/programming/hpx/build/debug$ gcc -DSTREAM_ARRAY_SIZE=10000000 -O3 -fopenmp stream.c && perf stat -e cache-misses,cache-references ./a.out
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 818 microseconds.
(= 818 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 156103.4 0.001033 0.001025 0.001056
Scale: 138055.7 0.001178 0.001159 0.001243
Add: 155440.5 0.001555 0.001544 0.001582
Triad: 155129.1 0.001571 0.001547 0.001590
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Performance counter stats for './a.out':
108,483,660 cache-misses # 85.497 % of all cache refs
126,885,550 cache-references
0.137512927 seconds time elapsed
heller@g2:~/programming/hpx/build/debug$ gcc -DSTREAM_ARRAY_SIZE=5000000 -O3 -fopenmp stream.c && perf stat -e cache-misses,cache-references ./a.out
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 5000000 (elements), Offset = 0 (elements)
Memory per array = 38.1 MiB (= 0.0 GiB).
Total memory required = 114.4 MiB (= 0.1 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 381 microseconds.
(= 381 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 183859.9 0.000450 0.000435 0.000472
Scale: 153007.0 0.000550 0.000523 0.000618
Add: 172901.6 0.000712 0.000694 0.000747
Triad: 182625.7 0.000673 0.000657 0.000692
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Performance counter stats for './a.out':
44,022,735 cache-misses # 69.735 % of all cache refs
63,128,670 cache-references
0.072526769 seconds time elapsed
heller@g2:~/programming/hpx/build/debug$ gcc -DSTREAM_ARRAY_SIZE=2000000 -O3 -fopenmp stream.c && perf stat -e cache-misses,cache-references ./a.out
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 2000000 (elements), Offset = 0 (elements)
Memory per array = 15.3 MiB (= 0.0 GiB).
Total memory required = 45.8 MiB (= 0.0 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 103 microseconds.
(= 103 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 367719.8 0.000105 0.000087 0.000117
Scale: 293693.1 0.000126 0.000109 0.000136
Add: 300039.6 0.000185 0.000160 0.000237
Triad: 287199.1 0.000188 0.000167 0.000201
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Performance counter stats for './a.out':
1,873,756 cache-misses # 8.191 % of all cache refs
22,875,897 cache-references
0.032436497 seconds time elapsed
heller@g2:~/programming/hpx/build/debug$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment