Skip to content

Instantly share code, notes, and snippets.

@robertmryan
Last active March 7, 2019 16:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robertmryan/5de117b045b8c682727e673498926306 to your computer and use it in GitHub Desktop.
Save robertmryan/5de117b045b8c682727e673498926306 to your computer and use it in GitHub Desktop.
void matrixProduct3(int threads) {
int i, j, k, aik;
omp_set_num_threads(threads);
kdebug_signpost_start(3, threads, 0, 0, 3);
double start = omp_get_wtime();
memset(c, n * n, sizeof(int));
#pragma omp parallel for private(aik, j, k) default(none)
for (i = 0; i < n; i++) {
for (k = 0; k < n; k++) {
aik = a[i][k];
for (j = 0; j < n; j++) {
c[i][j] += aik * b[k][j];
}
}
}
double elapsed = omp_get_wtime() - start;
kdebug_signpost_end(3, threads, 0, 0, 3);
printf("%0.6f ", elapsed);
}
@robertmryan
Copy link
Author

robertmryan commented Mar 7, 2019

Note that changed order of the for loops from i-j-k to i-k-j (as suggested by Alain) to reduce cache sloshing. This has significant impact. That having been said, both your original algorithm and the i-k-j rendition benefited similarly on a Intel i9 with 6 physical cores.

It’s not really material, but notice that i doesn’t need to be in private list. OpenMP automatically does that for us.

Anyway, here are my results with 1000x1000 matrices, in seconds, where n is the number of threads, the serial is algorithm with no omp parallel construct at all, the original column is your original algorithm, and i-k-j is the above algorithm. Not a material difference between the algorithms in my configuration, but you still might want to try it because it might make a difference in edge cases. But my i9 6 physical core CPU also saw material improvements going from 1 to 6 threads:

 n   serial  original   i-k-j
--- -------- -------- --------
 1: 0.927087 0.774813 0.145663 
 2: 0.803497 0.387473 0.082006 
 3: 0.836431 0.271751 0.055481 
 4: 0.771813 0.208345 0.044307 
 5: 0.764092 0.190866 0.040347 
 6: 0.817030 0.165456 0.030246 
 7: 0.777482 0.198867 0.034163 
 8: 0.848919 0.265043 0.034415 
 9: 0.834285 0.266590 0.035344 
10: 0.812723 0.307124 0.030036 
11: 0.812753 0.409849 0.049907 
12: 0.868123 0.559404 0.030342 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment