sonots/gist:bdce8e970227eeb7bb0d29d0fa03452c

Last active February 5, 2018 16:35

Star 0 You must be signed in to star a gist
Fork 0 You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/sonots/bdce8e970227eeb7bb0d29d0fa03452c.js"></script>
Save sonots/bdce8e970227eeb7bb0d29d0fa03452c to your computer and use it in GitHub Desktop.

Download ZIP

nvvp with thrust::reduce

Raw

gistfile1.txt

Is is asynchronous? or synchronous? How can we run asynchronously?

Author

sonots commented Jan 6, 2018 •

edited

https://github.com/thrust/thrust/blob/master/examples/sum.cu

int sum = thrust::reduce(d_vec.begin(), d_vec.end(), init, binary_op);
int sum2 = thrust::reduce(d_vec.begin(), d_vec.end(), init, binary_op);

cudaMalloc -> reduce kernel -> cudaDeviceSynchronize -> cudaMemcpyAsync (DtoH) -> cudaFree.

So, it looks thrust::reduce blocks CPU. Q. Any ways to run asynchronously?

Author

sonots commented Jan 6, 2018 •

edited

A. https://github.com/thrust/thrust/blob/master/examples/cuda/async_reduce.cu (cudaStrream is extra)

ref. NVIDIA/thrust#827

Author

sonots commented Feb 5, 2018

max_element

cudaMalloc -> kernel -> cudaDeviceSynchronize -> cudaMalloc -> kernel -> cudaMemcpyAsync (DtoH) -> cudaFree -> cudaFree -> cudaMemcpyAsync (DtoH)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment