Skip to content

Instantly share code, notes, and snippets.

@antoine-levitt
Created August 21, 2019 18:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save antoine-levitt/3e263dfe38a4435e02ac558e47a30fa9 to your computer and use it in GitHub Desktop.
Save antoine-levitt/3e263dfe38a4435e02ac558e47a30fa9 to your computer and use it in GitHub Desktop.
import LinearAlgebra.BLAS
const libblas = Base.libblas_name
const liblapack = Base.liblapack_name
import LinearAlgebra
import LinearAlgebra: BlasReal, BlasComplex, BlasFloat, BlasInt, DimensionMismatch, checksquare, stride1, chkstride1, axpy!
import Libdl
function blas_transpose!(B,A)
ccall((BLAS.@blasfunc("domatcopy_"), libblas), Cvoid, (Ref{UInt8}, Ref{UInt8}, Ref{BlasInt}, Ref{BlasInt}, Ref{Float64}, Ptr{Float64}, Ref{BlasInt}, Ptr{Float64},
Ref{BlasInt}),
'C',
'T',
size(A,1),
size(A,2),
1.0,
A,
size(A,1),
B,
size(B,1))
B
end
BLAS.set_num_threads(1) #does not seem to matter anyway
for i in 1:4
N = rand(500:1000)
M = rand(500:1000)
# M = N #uncomment for square
println("$M $N")
A = randn(M,N)
B = zeros(N,M)
@btime blas_transpose!($B,$A)
C = copy(B)
B .= 0
@btime transpose!($B,$A)
@assert norm(C-B) < 1e-10
end
@antoine-levitt
Copy link
Author

On a Haswell:

887 773
  920.267 μs (0 allocations: 0 bytes)
  932.760 μs (0 allocations: 0 bytes)
748 913
  847.834 μs (0 allocations: 0 bytes)
  953.112 μs (0 allocations: 0 bytes)
952 925
  1.365 ms (0 allocations: 0 bytes)
  1.376 ms (0 allocations: 0 bytes)
835 983
  1.119 ms (0 allocations: 0 bytes)
  1.276 ms (0 allocations: 0 bytes)

On a Sandybridge:

969 803
  4.901 ms (0 allocations: 0 bytes)
  1.438 ms (0 allocations: 0 bytes)
575 900
  2.123 ms (0 allocations: 0 bytes)
  804.264 μs (0 allocations: 0 bytes)
785 625
  3.029 ms (0 allocations: 0 bytes)
  786.346 μs (0 allocations: 0 bytes)
808 940
  4.826 ms (0 allocations: 0 bytes)
  1.371 ms (0 allocations: 0 bytes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment