Skip to content

Instantly share code, notes, and snippets.

@geggo
Created February 1, 2017 22:49
Show Gist options
  • Save geggo/6bbc597439ffbec596353cf699fb025c to your computer and use it in GitHub Desktop.
Save geggo/6bbc597439ffbec596353cf699fb025c to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@SyamGadde
Copy link

SyamGadde commented Feb 2, 2017

This summarizes the issues nicely. Thanks.

I've been wavering between thinking real in-place should look similar to out-of-place (Option 2, what you currently have in commit geggo/gpyfft@2c07fa8) and real in-place should look similar to complex in-place (Option 1, what I have in SyamGadde/gpyfft@f39da92). But your option seems simpler. I've modified my test script (below) and it works well with your current commit. I think I can incorporate some of my bounds/stride checking to make it more idiot-proof (I include myself among them).

I do think having convenience functions to create the padded arrays would also be very helpful to the user, and a separate enqueue_real function would eliminate some of the ambiguity in usage -- the main enqueue function could still have the full functionality, but the wrapper could remove some of the guesswork (sending real=True appropriately, maybe even returning the correct output view).

Test script:

# works with geggo/gpyfft:2c07fa8e7674757c58646c5f11cf710163bd4fac

import numpy as np
import pyopencl as cl
import pyopencl.array as cla
import random
from gpyfft.fft import FFT
context = cl.create_some_context()
queue = cl.CommandQueue(context) 
in_type = np.float32
out_type = np.complex64

for order in ('F', 'C'):
    for axes in ((0,1), (1,0)):
        fastestdim = int(order == 'C')
        if axes[0] == fastestdim:
            print "=== THESE SHOULD ALL WORK ==="
        else:
            # if real-transform axis is not the fastest dimension then
            # all bets are off.
            continue
        for it in range(20):
            m = random.randrange(2, 12, 2)
            n = random.randrange(2, 12, 2)
            print "order=%s axes=%s (%d x %d) " % (order, axes, m, n),
            shape = [m,n]
            padshape = list(shape)
            if axes[0] == fastestdim:
                # first transformed axis (real-transform) is the fastest dim
                # need (N // 2) + 1 elements total
                padshape[fastestdim] += 2
            else:
                # need 2 * N elements total, but still doesn't work some times
                padshape[fastestdim] *= 2
            outslice = [slice(0, m), slice(0, n)]
            outslice[axes[0]] = slice(0, (outslice[axes[0]].stop // 2) + 1)
            outslice = tuple(outslice)
            x_in = np.zeros(padshape, dtype=in_type, order=order)
            x_in[:m, :n] = np.arange(1, (m * n) + 1, dtype=in_type).reshape([m,n], order=order)
            x_in_gpu = cla.to_device(queue, x_in)
            x_out_gpu = x_in_gpu.view(np.complex64)
            x_sub = x_in_gpu[:m, :n]
            #print "x_in_gpu.shape=%s" % (x_in_gpu.shape,)
            #print "x_in_gpu.strides=%s" % (x_in_gpu.strides,)
            #print "x_out_gpu.shape=%s" % (x_out_gpu.shape,)
            #print "x_sub.shape=%s" % (x_sub.shape,)
            #print "outslice=%s" % (outslice,)
            transform = FFT(context, queue, x_sub, axes=axes, out_array=x_out_gpu)
            event, = transform.enqueue()
            event.wait()
            x_out_gpu = x_in_gpu.view(np.complex64)
            result_host = x_out_gpu.get()[outslice]
            result_numpy = np.fft.rfftn(x_in[:m, :n], axes=axes[::-1])
            #print "x_in (zeros are padding)\n", x_in
            #print "gpyfft: fft(x_in)\n", result_host
            #print "numpy: rfft(x_in)\n", result_numpy
            if not np.allclose(result_host, result_numpy, atol=np.max(x_in)*1e-05):
                print "FAIL FORWARD: max abs diff: %s" % (np.max(np.abs(result_numpy - result_host)),)
                continue
    
            transform = FFT(context, queue, x_out_gpu[outslice], axes=axes, out_array=x_sub, real=True)
            event, = transform.enqueue(forward=False)
            event.wait()
            result_host2 = x_in_gpu.get()[:m, :n]
            #print "x_in[:m,:n]\n", x_in[:m, :n]
            #print "gpyfft: ifft(fft(x_in))\n", result_host2
            if not np.allclose(x_in[:m, :n], result_host2, atol=np.max(x_in)*1e-05):
                print "FAIL REVERSE: max abs diff: %s" % (np.max(np.abs(x_in[:m, :n] - result_host2)),)
                continue

            print "SUCCESS"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment