Skip to content

Instantly share code, notes, and snippets.

@sonots
Last active February 5, 2018 16:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sonots/bdce8e970227eeb7bb0d29d0fa03452c to your computer and use it in GitHub Desktop.
Save sonots/bdce8e970227eeb7bb0d29d0fa03452c to your computer and use it in GitHub Desktop.
nvvp with thrust::reduce
Is is asynchronous? or synchronous? How can we run asynchronously?
@sonots
Copy link
Author

sonots commented Jan 6, 2018

https://github.com/thrust/thrust/blob/master/examples/sum.cu

int sum = thrust::reduce(d_vec.begin(), d_vec.end(), init, binary_op);
int sum2 = thrust::reduce(d_vec.begin(), d_vec.end(), init, binary_op);

image

cudaMalloc -> reduce kernel -> cudaDeviceSynchronize -> cudaMemcpyAsync (DtoH) -> cudaFree.

So, it looks thrust::reduce blocks CPU. Q. Any ways to run asynchronously?

@sonots
Copy link
Author

sonots commented Jan 6, 2018

@sonots
Copy link
Author

sonots commented Feb 5, 2018

max_element

image

cudaMalloc -> kernel -> cudaDeviceSynchronize -> cudaMalloc -> kernel -> cudaMemcpyAsync (DtoH) -> cudaFree -> cudaFree -> cudaMemcpyAsync (DtoH)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment