Skip to content

Instantly share code, notes, and snippets.

@arlandism
Last active April 6, 2021 02:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arlandism/d0e3a2c29a6e18d570d52014cd0c1b09 to your computer and use it in GitHub Desktop.
Save arlandism/d0e3a2c29a6e18d570d52014cd0c1b09 to your computer and use it in GitHub Desktop.
// Unoptimized
// `0.81s to run 10000 tests (80750.60ns per test)`
// Method call 1x refactor
// `0.71s to run 10000 tests (70700.00ns per test)`
// Removing use of `get_vec_element` and fetching data upfront
// `0.22s to run 10000 tests (21686.00ns per test)`
// Loop unrolling using 5 accumulators
// `0.16s to run 10000 tests (16187.50ns per test)`
// Loop unrolling using 6 accumulators
// `0.16s to run 10000 tests (16145.60ns per test)`
// Variable renaming of temporary acc values
// `0.16s to run 10000 tests (16272.20ns per test)`
typedef long data_t;
data_t dotproduct(vec_ptr u, vec_ptr v) {
data_t u_val1, v_val1, u_val2, v_val2, u_val3, v_val3, u_val4, v_val4, u_val5, v_val5;
data_t sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0, sum5 = 0;
long l = vec_length(u);
data_t *u_data = u->data;
data_t *v_data = v->data;
long i = 0;
for (; i < l - 4; i+=5) { // we can assume both vectors are same length
u_val1 = u_data[i];
v_val1 = v_data[i];
sum1 += u_val1 * v_val1;
u_val2 = u_data[i + 1];
v_val2 = v_data[i + 1];
sum2 += u_val2 * v_val2;
u_val3 = u_data[i + 2];
v_val3 = v_data[i + 2];
sum3 += u_val3 * v_val3;
u_val4 = u_data[i + 3];
v_val4 = v_data[i + 3];
sum4 += u_val4 * v_val4;
u_val5 = u_data[i + 4];
v_val5 = v_data[i + 4];
sum5 += u_val5 * v_val5;
}
for (; i < l; i++) {
u_val1 = u_data[i];
v_val1 = v_data[i];
sum1 += u_val1 * v_val1;
}
return sum1 + sum2 + sum3 + sum4 + sum5;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment