-
-
Save szagoruyko/440c561f7fce5f1b20e6154d801e6033 to your computer and use it in GitHub Desktop.
import pycuda.autoinit | |
import pycuda.driver as drv | |
import numpy as np | |
import torch | |
x = torch.cuda.FloatTensor(8) | |
from pycuda.compiler import SourceModule | |
mod = SourceModule(""" | |
__global__ void multiply_them(float *dest, float *a, float *b) | |
{ | |
const int i = threadIdx.x; | |
dest[i] = a[i] * b[i]; | |
} | |
""") | |
multiply_them = mod.get_function("multiply_them") | |
class Holder(pycuda.driver.PointerHolderBase): | |
def __init__(self, t): | |
super(Holder, self).__init__() | |
self.t = t | |
self.gpudata = t.data_ptr() | |
def get_pointer(): | |
return self.t.data_ptr() | |
a = np.random.randn(400).astype(np.float32) | |
b = np.random.randn(400).astype(np.float32) | |
a = torch.from_numpy(a).cuda() | |
b = torch.from_numpy(b).cuda() | |
dest = torch.Tensor(a.size()).cuda() | |
multiply_them( | |
Holder(dest), | |
Holder(a), | |
Holder(b), | |
block=(400,1,1), grid=(1,1)) | |
torch.cuda.synchronize() | |
print dest-a*b |
This single gist has been a lifesaver, as I've been using pycuda and torch together for a while now without understanding the cryptic cuda bugs that resulted. I might only add that I believe torch.cuda.init()
can be called instead of the "magic line" x = torch.cuda.FloatTensor(8)
As far as I can tell, this line simply causes torch to automatically initialize its cuda context.
@benjamindkilleen @Emerald01 @WhatAShot
Thanks for pointing out the core issue and providing the insightful discussion. I found using import pycuda.autoprimaryctx
instead of import pycuda.autoinit
will do the trick through the help of these links: issue 285, pycuda.autoprimaryctx docs. From the documentation:
The module
pycuda.autoprimaryctx
is similar topycuda.autoinit
, except that it retains the device primary context instead of creating a new context inpycuda.tools.make_default_context()
.
A modified version of this gist can be found as a fork.
I have written an easy helper class for multi-dimensional pytorch tensor access here.
what happens if I want to integrate it with a neural network? I want to do it in the forward, some conv2d can be applied before and after. How can I do that? how do I determine the block size etc?
@tjyuyao
@RoyAmoyal you can use pytorch's autograd.Function api, and implement forward and backward pass in separate pycuda functions.
@Emerald01
Same issue faced here, I wish there were an elegant way to convert from GPUarray (pycuda) to pytorch tensor and vice versa.
Anyone manage to get it to work for 2D Blocks?