Created
February 26, 2019 10:57
-
-
Save sithhell/fc304c6e9e8e84bd63a4d54b4f61e0a7 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
heller@g2:~/programming/hpx/build/debug$ gcc -DSTREAM_ARRAY_SIZE=10000000 -O3 -fopenmp stream.c && perf stat -e cache-misses,cache-references ./a.out | |
------------------------------------------------------------- | |
STREAM version $Revision: 5.10 $ | |
------------------------------------------------------------- | |
This system uses 8 bytes per array element. | |
------------------------------------------------------------- | |
Array size = 10000000 (elements), Offset = 0 (elements) | |
Memory per array = 76.3 MiB (= 0.1 GiB). | |
Total memory required = 228.9 MiB (= 0.2 GiB). | |
Each kernel will be executed 10 times. | |
The *best* time for each kernel (excluding the first iteration) | |
will be used to compute the reported bandwidth. | |
------------------------------------------------------------- | |
Number of Threads requested = 48 | |
Number of Threads counted = 48 | |
------------------------------------------------------------- | |
Your clock granularity/precision appears to be 1 microseconds. | |
Each test below will take on the order of 818 microseconds. | |
(= 818 clock ticks) | |
Increase the size of the arrays if this shows that | |
you are not getting at least 20 clock ticks per test. | |
------------------------------------------------------------- | |
WARNING -- The above is only a rough guideline. | |
For best results, please be sure you know the | |
precision of your system timer. | |
------------------------------------------------------------- | |
Function Best Rate MB/s Avg time Min time Max time | |
Copy: 156103.4 0.001033 0.001025 0.001056 | |
Scale: 138055.7 0.001178 0.001159 0.001243 | |
Add: 155440.5 0.001555 0.001544 0.001582 | |
Triad: 155129.1 0.001571 0.001547 0.001590 | |
------------------------------------------------------------- | |
Solution Validates: avg error less than 1.000000e-13 on all three arrays | |
------------------------------------------------------------- | |
Performance counter stats for './a.out': | |
108,483,660 cache-misses # 85.497 % of all cache refs | |
126,885,550 cache-references | |
0.137512927 seconds time elapsed | |
heller@g2:~/programming/hpx/build/debug$ gcc -DSTREAM_ARRAY_SIZE=5000000 -O3 -fopenmp stream.c && perf stat -e cache-misses,cache-references ./a.out | |
------------------------------------------------------------- | |
STREAM version $Revision: 5.10 $ | |
------------------------------------------------------------- | |
This system uses 8 bytes per array element. | |
------------------------------------------------------------- | |
Array size = 5000000 (elements), Offset = 0 (elements) | |
Memory per array = 38.1 MiB (= 0.0 GiB). | |
Total memory required = 114.4 MiB (= 0.1 GiB). | |
Each kernel will be executed 10 times. | |
The *best* time for each kernel (excluding the first iteration) | |
will be used to compute the reported bandwidth. | |
------------------------------------------------------------- | |
Number of Threads requested = 48 | |
Number of Threads counted = 48 | |
------------------------------------------------------------- | |
Your clock granularity/precision appears to be 1 microseconds. | |
Each test below will take on the order of 381 microseconds. | |
(= 381 clock ticks) | |
Increase the size of the arrays if this shows that | |
you are not getting at least 20 clock ticks per test. | |
------------------------------------------------------------- | |
WARNING -- The above is only a rough guideline. | |
For best results, please be sure you know the | |
precision of your system timer. | |
------------------------------------------------------------- | |
Function Best Rate MB/s Avg time Min time Max time | |
Copy: 183859.9 0.000450 0.000435 0.000472 | |
Scale: 153007.0 0.000550 0.000523 0.000618 | |
Add: 172901.6 0.000712 0.000694 0.000747 | |
Triad: 182625.7 0.000673 0.000657 0.000692 | |
------------------------------------------------------------- | |
Solution Validates: avg error less than 1.000000e-13 on all three arrays | |
------------------------------------------------------------- | |
Performance counter stats for './a.out': | |
44,022,735 cache-misses # 69.735 % of all cache refs | |
63,128,670 cache-references | |
0.072526769 seconds time elapsed | |
heller@g2:~/programming/hpx/build/debug$ gcc -DSTREAM_ARRAY_SIZE=2000000 -O3 -fopenmp stream.c && perf stat -e cache-misses,cache-references ./a.out | |
------------------------------------------------------------- | |
STREAM version $Revision: 5.10 $ | |
------------------------------------------------------------- | |
This system uses 8 bytes per array element. | |
------------------------------------------------------------- | |
Array size = 2000000 (elements), Offset = 0 (elements) | |
Memory per array = 15.3 MiB (= 0.0 GiB). | |
Total memory required = 45.8 MiB (= 0.0 GiB). | |
Each kernel will be executed 10 times. | |
The *best* time for each kernel (excluding the first iteration) | |
will be used to compute the reported bandwidth. | |
------------------------------------------------------------- | |
Number of Threads requested = 48 | |
Number of Threads counted = 48 | |
------------------------------------------------------------- | |
Your clock granularity/precision appears to be 1 microseconds. | |
Each test below will take on the order of 103 microseconds. | |
(= 103 clock ticks) | |
Increase the size of the arrays if this shows that | |
you are not getting at least 20 clock ticks per test. | |
------------------------------------------------------------- | |
WARNING -- The above is only a rough guideline. | |
For best results, please be sure you know the | |
precision of your system timer. | |
------------------------------------------------------------- | |
Function Best Rate MB/s Avg time Min time Max time | |
Copy: 367719.8 0.000105 0.000087 0.000117 | |
Scale: 293693.1 0.000126 0.000109 0.000136 | |
Add: 300039.6 0.000185 0.000160 0.000237 | |
Triad: 287199.1 0.000188 0.000167 0.000201 | |
------------------------------------------------------------- | |
Solution Validates: avg error less than 1.000000e-13 on all three arrays | |
------------------------------------------------------------- | |
Performance counter stats for './a.out': | |
1,873,756 cache-misses # 8.191 % of all cache refs | |
22,875,897 cache-references | |
0.032436497 seconds time elapsed | |
heller@g2:~/programming/hpx/build/debug$ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment