Skip to content

Instantly share code, notes, and snippets.



Last active May 30, 2018
What would you like to do?
Benchmarks for Seperable Convolutions in NNlib.jl

Benchmarks for Separable Convolutions in NNlib.jl

NOTE: All the times mentioned are in seconds and only the forward passes are being considered. The reported time is the mean of time for ten iterations.
NOTE: Separable Convolutions => Depthwise Convolution --> Pointwise Convolution. The reverse is also a popular alternative but it might end up taking more time
Input Dimensions Output Dimensions Separable Convolutions Convolutions
32x32x3x1 32x32x32x1 0.000673 0.000910
32x32x64x1 0.001390 0.001698
32x32x128x1 0.001609 0.002453
32x32x256x1 0.002118 0.003738
224x224x3x1 224x224x32x1 0.051545 0.059246
224x224x64x1 0.055120 0.068597
224x224x128x1 0.070255 0.092127
224x224x256x1 0.108135 0.162596
224x224x32x1 224x224x64x1 0.309428 0.368941
224x224x128x1 0.323922 0.391956
224x224x256x1 0.367167 0.406467
512x512x3x1 512x512x32x1 0.100630 0.113811
512x512x64x1 0.181873 0.247644
512x512x128x1 0.196968 0.260188
512x512x256x1 0.361108 0.545992
512x512x32x1 512x512x64x1 1.258738 1.386912
512x512x128x1 1.311868 1.579326
512x512x256x1 1.443842 2.055894
NOTE : The benchmarks are done by calling depthwiseconv2d! and conv2d! and preallocating all the arrays. Also the depth multiplier in all the cases have been kept as 1 since it is the most popular use-case.
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i3-6006U CPU @ 2.00GHz
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment