Skip to content

Instantly share code, notes, and snippets.

@jfeist
Last active January 5, 2020 10:29
Show Gist options
  • Save jfeist/ed3bc59e86107ddcdeffbd32f414eb45 to your computer and use it in GitHub Desktop.
Save jfeist/ed3bc59e86107ddcdeffbd32f414eb45 to your computer and use it in GitHub Desktop.
using Pkg
Pkg.activate(".")
using TensorCast
using CuArrays
using BenchmarkTools
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 0.5
function test_cast(T)
println("T = $T")
N = 100
subN = 10:50
A = T(zeros(N,N,N))
B = T(rand(N))
C = T(rand(N,N))
D = T(rand(N,N,N,N))
println("transpose (2D)")
@btime CuArrays.@sync @cast $A[i1, i2, i3] = $B[i2] * $C[i1, i3]
@btime CuArrays.@sync @cast $A[i1, i2, i3] = $B[i2] * $C[i3, i1]
println("permute (3D)")
@btime CuArrays.@sync @cast $D[i1, i2, i3, i4] = $B[i2] * $A[i1, i3, i4]
@btime CuArrays.@sync @cast $D[i1, i2, i3, i4] = $B[i2] * $A[i3, i4, i1]
println("adjoint (2D)")
E = C'
@btime CuArrays.@sync @cast $A[i1, i2, i3] = $B[i2] * $E[i3, i1]
@btime CuArrays.@sync @cast $A[i1, i2, i3] = $B[i2] * $E[i1, i3]
println("view (2D)")
F = @view A[subN,subN,subN]
G = @view B[subN]
H = @view C[subN,subN]
@btime CuArrays.@sync @cast $F[i1, i2, i3] = $G[i2] * $H[i1, i3]
@btime CuArrays.@sync @cast $F[i1, i2, i3] = $G[i2] * $H[i3, i1]
nothing
end
test_cast(Array)
println()
test_cast(CuArray)

Example output from the script above (with a GeForce GTX 1080Ti)

Activating environment at `~/.julia/dev/TensorCast/Project.toml`
T = Array
transpose (2D)
  457.101 μs (7 allocations: 240 bytes)
  474.047 μs (10 allocations: 78.45 KiB)
permute (3D)
  102.921 ms (7 allocations: 256 bytes)
  99.391 ms (20 allocations: 7.63 MiB)
adjoint (2D)
  427.734 μs (7 allocations: 240 bytes)
  482.036 μs (9 allocations: 78.44 KiB)
view (2D)
  190.244 μs (6 allocations: 144 bytes)
  56.392 μs (10 allocations: 13.58 KiB)

T = CuArray
transpose (2D)
  103.190 μs (64 allocations: 3.28 KiB)
  143.827 μs (65 allocations: 3.59 KiB)
permute (3D)
  7.104 ms (64 allocations: 3.53 KiB)
  11.389 ms (99 allocations: 7.28 KiB)
adjoint (2D)
  89.853 μs (63 allocations: 3.27 KiB)
  145.562 μs (65 allocations: 3.59 KiB)
view (2D)
  38.042 μs (86 allocations: 7.20 KiB)
  38.880 μs (88 allocations: 7.30 KiB)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment