Skip to content

Instantly share code, notes, and snippets.

@Sixzero
Last active November 18, 2022 18:59
Show Gist options
  • Save Sixzero/8cb0898dd55ad2ca1a406972ced824b6 to your computer and use it in GitHub Desktop.
Save Sixzero/8cb0898dd55ad2ca1a406972ced824b6 to your computer and use it in GitHub Desktop.
Function size vs speed
using BenchmarkTools
using CUDA
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 1.00
test_sum(a, b, c) = begin
I = (blockIdx().x - 1) * blockDim().x + threadIdx().x
if I > 1000
return
end
Base.Cartesian.@nexprs 10 i -> begin
c[I, i] += b[I] + a[I] + i
end
return
end
N = 100
a = CuArray(fill(1.0f0, 1000))
b = CuArray(fill(1.0f0, 1000))
c = CuArray(fill(1.0f0, 1000, 200));
#%%
_test_sum = cufunction(test_sum, Tuple{CuDeviceArray{Float32,1,1},CuDeviceArray{Float32,1,1},CuDeviceArray{Float32,2,1},})
@time CUDA.@sync _test_sum(a, b, c, threads=512, blocks=2)
@time CUDA.@sync _test_sum(a, b, c, threads=512, blocks=2)
@time CUDA.@sync _test_sum(a, b, c, threads=512, blocks=2)
@time CUDA.@sync _test_sum(a, b, c, threads=512, blocks=2)
@time CUDA.@sync _test_sum(a, b, c, threads=512, blocks=2)
@time CUDA.@sync @cuda threads = 512 blocks = 2 test_sum(a, b, c)
@time CUDA.@sync @cuda threads = 512 blocks = 2 test_sum(a, b, c)
@show c[1:1], (N * N / 2 + 200) * 4
# (@benchmark CUDA.@sync @cuda threads = 512 blocks = 2 test_sum($a, $b, $c)) |> display
(@benchmark CUDA.@sync _test_sum($a, $b, $c, threads=512, blocks=2)) |> display
;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment