Skip to content

Instantly share code, notes, and snippets.

@pentschev
Created April 27, 2015 18:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pentschev/9e7c50c1321d2b7c067d to your computer and use it in GitHub Desktop.
Save pentschev/9e7c50c1321d2b7c067d to your computer and use it in GitHub Desktop.
FFT Benchmarks Comparing In-place and Out-of-place performance on FFTW, cuFFT and clFFT

Description

All benchmarks are composed of 10 batches of 2-dimensional matrices, with sizes varying from 128x128 to 4096x4096 with single-precision.

CUDA Results

NVIDIA Tesla K20

Matrix dimensions: 128x128 In-place C2C FFT time for 10 runs: 0.538662 ms Out-of-place C2C FFT time for 10 runs: 0.47247 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 0.693121 ms

Matrix dimensions: 256x256 In-place C2C FFT time for 10 runs: 1.57807 ms Out-of-place C2C FFT time for 10 runs: 1.57037 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 2.22042 ms

Matrix dimensions: 512x512 In-place C2C FFT time for 10 runs: 7.67981 ms Out-of-place C2C FFT time for 10 runs: 7.6587 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 10.1425 ms

Matrix dimensions: 1024x1024 In-place C2C FFT time for 10 runs: 30.556 ms Out-of-place C2C FFT time for 10 runs: 30.5772 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 40.4102 ms

Matrix dimensions: 2048x2048 In-place C2C FFT time for 10 runs: 121.884 ms Out-of-place C2C FFT time for 10 runs: 121.994 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 161.105 ms

Matrix dimensions: 4096x4096 In-place C2C FFT time for 10 runs: 523.779 ms Out-of-place C2C FFT time for 10 runs: 523.995 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 681.312 ms

OpenCL Results

NVIDIA Tesla K20

Matrix dimensions: 128x128 In-place C2C FFT time for 10 runs: 1.17162 ms Out-of-place C2C FFT time for 10 runs: 1.14539 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 1.15534 ms

Matrix dimensions: 256x256 In-place C2C FFT time for 10 runs: 4.63034 ms Out-of-place C2C FFT time for 10 runs: 4.59747 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 5.03566 ms

Matrix dimensions: 512x512 In-place C2C FFT time for 10 runs: 17.0226 ms Out-of-place C2C FFT time for 10 runs: 17.0692 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 19.2896 ms

Matrix dimensions: 1024x1024 In-place C2C FFT time for 10 runs: 115.695 ms Out-of-place C2C FFT time for 10 runs: 115.745 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 125.27 ms

Matrix dimensions: 2048x2048 In-place C2C FFT time for 10 runs: 550.308 ms Out-of-place C2C FFT time for 10 runs: 550.103 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 589.519 ms

Matrix dimensions: 4096x4096 In-place C2C FFT time for 10 runs: 4183.45 ms Out-of-place C2C FFT time for 10 runs: 4184.5 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 4335.56 ms

Intel @2.6GHz, 32 core unknown processor type (probably an engineering sample):

Matrix dimensions: 128x128 In-place C2C FFT time for 10 runs: 560.412 ms Out-of-place C2C FFT time for 10 runs: 519.319 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 423.556 ms

Matrix dimensions: 256x256 In-place C2C FFT time for 10 runs: 3644.53 ms Out-of-place C2C FFT time for 10 runs: 3531.74 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 3595.32 ms

Matrix dimensions: 512x512 In-place C2C FFT time for 10 runs: 7494.55 ms Out-of-place C2C FFT time for 10 runs: 7497.52 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 7627.94 ms

Matrix dimensions: 1024x1024 In-place C2C FFT time for 10 runs: 46827 ms Out-of-place C2C FFT time for 10 runs: 46500.3 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 46336.7 ms

Matrix dimensions: 2048x2048 In-place C2C FFT time for 10 runs: 779622 ms Out-of-place C2C FFT time for 10 runs: 778734 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 778423 ms

Single-Threaded CPU Results

Intel @2.6GHz, 32 core unknown processor type (probably an engineering sample):

Matrix dimensions: 128x128 In-place C2C FFT time for 10 runs: 45.5113 ms Out-of-place C2C FFT time for 10 runs: 40.2385 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 50.8627 ms

Matrix dimensions: 256x256 In-place C2C FFT time for 10 runs: 217.364 ms Out-of-place C2C FFT time for 10 runs: 208.23 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 250.482 ms

Matrix dimensions: 512x512 In-place C2C FFT time for 10 runs: 1013.16 ms Out-of-place C2C FFT time for 10 runs: 1001.12 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 1181.05 ms

Matrix dimensions: 1024x1024 In-place C2C FFT time for 10 runs: 4766.51 ms Out-of-place C2C FFT time for 10 runs: 5092.67 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 5761.17 ms

Matrix dimensions: 2048x2048 In-place C2C FFT time for 10 runs: 56153.4 ms Out-of-place C2C FFT time for 10 runs: 54032 ms Buffer Copy + Out-of-place C2C FFT time for 10 runs: 72128.2 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment