kaushikcfd/PyOpenCL_threads.md

## PyOpenCL_threads.md

      
    Raw
  

              PyOpenCL_threads.md
            
          
    I was trying to understand the advantages of chaning the local work group sizes in order to check its advantagees.
Here are the results for a simple program which only takes in a random array, doubles it and returns. It also records the time needed for the whole operation.
The following plot explains the results for the various devices.

I don't get a few things:

Exact explanation of threads.
Why does the speedup increase in the case of increasing threads for CPU
What is the difference between POCL and Intel. I think that one of them is an implementation by Intel itslelf and PoCL is and implementation for OpenCL for CPU.

Finally I don't get how to abort the program running on a GPU.