-
-
Save davidavdav/a2332f6620c3bb259d51 to your computer and use it in GitHub Desktop.
## weirdness in innerloop optimization | |
## similar to sumabs2() | |
function mydot{T}(x::Array{T}) | |
s = zero(T) | |
for xx in x | |
s += xx*xx | |
end | |
s | |
end | |
function f!(x::Matrix, y::Matrix, n, usemydot=false) | |
nx, d = size(x) | |
r = similar(x, size(x,1), n) | |
yi = similar(x, d) | |
for k=1:n | |
a = 1.0 | |
b = 1.0 | |
rand!(y) # flush caches, y is some function of x | |
y += x # this takes sone time but is not the focus here | |
for i=1:nx | |
for j=1:d | |
yi[j] = y[i,j] | |
end | |
## This should be the critical lines | |
if usemydot | |
s = mydot(yi) | |
else | |
s = sumabs2(yi) | |
end | |
r[i,k] = a + b*s | |
end | |
end | |
r | |
end | |
## typical usage | |
x = randn(10000, 60) | |
y = randn(10000, 60) | |
n = 100 | |
function prof(usemydot=false) | |
Profile.clear() | |
@profile f!(x, y, n, usemydot) | |
Profile.print(format=:flat) | |
end |
davidavdav
commented
Nov 8, 2014
You likely have a bug on line 22. +=
isn't a in-place operator, but a short syntax for y = y+x
. That way you only modify y
once.
By the way, I don't get a significant difference between the two tests on master nor release-0.3.
Is there any way to not do this? You're killing the CPU cache by having the inner loop on the 2nd index because of column-major order.
for i=1:nx
for j=1:d
yi[j] = y[i,j]
end
@ivarne, this is not about line 22. These lines are just there as a placeholder for some more complicated calculations that would distract from the innerloop that is following. Basically y = somefunction(x) right there. I have included it because it might influence the cache.
OK, so if you don't see a difference in performance, then there might be something going on with my set-up. I'll try on the linux servers rather than my laptop. I always compile everything from source. I've tested this for release-0.3 and master myself, with the same results.
As a result of a discussion on julia-users, I replaced NumericExtensions.sumsq() by Base.sumabs2, but my timing results are the same.
@waTeim. I realize the row-vector yi is breaking column major order, the structure of y results from some larger in-place BLAS trmm! operation, it might be possible to flip the dimensions there.