Last active
March 7, 2019 16:22
-
-
Save robertmryan/5de117b045b8c682727e673498926306 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
void matrixProduct3(int threads) { | |
int i, j, k, aik; | |
omp_set_num_threads(threads); | |
kdebug_signpost_start(3, threads, 0, 0, 3); | |
double start = omp_get_wtime(); | |
memset(c, n * n, sizeof(int)); | |
#pragma omp parallel for private(aik, j, k) default(none) | |
for (i = 0; i < n; i++) { | |
for (k = 0; k < n; k++) { | |
aik = a[i][k]; | |
for (j = 0; j < n; j++) { | |
c[i][j] += aik * b[k][j]; | |
} | |
} | |
} | |
double elapsed = omp_get_wtime() - start; | |
kdebug_signpost_end(3, threads, 0, 0, 3); | |
printf("%0.6f ", elapsed); | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Note that changed order of the
for
loops fromi
-j
-k
toi
-k
-j
(as suggested by Alain) to reduce cache sloshing. This has significant impact. That having been said, both your original algorithm and thei
-k
-j
rendition benefited similarly on a Intel i9 with 6 physical cores.It’s not really material, but notice that
i
doesn’t need to be inprivate
list. OpenMP automatically does that for us.Anyway, here are my results with 1000x1000 matrices, in seconds, where
n
is the number of threads, the serial is algorithm with noomp parallel
construct at all, theoriginal
column is your original algorithm, andi
-k
-j
is the above algorithm. Not a material difference between the algorithms in my configuration, but you still might want to try it because it might make a difference in edge cases. But my i9 6 physical core CPU also saw material improvements going from 1 to 6 threads: