-
-
Save OXPHOS/a9b79a995b75a0972811f634e80ff632 to your computer and use it in GitHub Desktop.
namespace linalg | |
{ | |
// Always return CPU T type??? | |
template<class *T> | |
T dot(SGVector<T> a, SGVector<T> b) | |
{ | |
#ifdef ENABLE_GPU // cmake -DENABLE_GPU | |
#ifdef HAVE_VIENNACL | |
if (a.onGPU() && b.onGPU()) | |
// May need transfer back to CPU. | |
// But the back transfer cannot be explicit??? | |
return viennacl::linalg::inner_prod(*a.GPUptr, *b.GPUptr); | |
else raise Error | |
#else | |
raise Error | |
#endif | |
#else //Eigen3 | |
typedef Eigen::Matrix<T, Eigen::Dynamic, 1> VectorXt; | |
Eigen::Map<VectorXt> vec_a = a; | |
Eigen::Map<VectorXt> vec_b = b; | |
return vec_a.dot(vec_b); | |
#endif | |
} | |
} |
template<class T> class SGVector : public SGReferencedData | |
{ | |
// If have ViennaCL: declare a pointer to GPU memory | |
// So only one GPU object is created for one SGVector | |
#ifdef HAVE_VIENNACL | |
typedef viennacl::backend::mem_handle VCLMemoryArray; | |
public: | |
shared_pointer<VCLMemoryArray> GPUptr(nullptr); | |
bool on_GPU(): | |
return (GPUptr != nullptr); | |
// Cannot do memory transfer without ViennaCL | |
#else | |
public: | |
bool on_GPU(): | |
raise error; | |
#endif | |
public: | |
// Transfer method. Cannot be static because of GPUptr? | |
void to_GPU() | |
{ | |
try: | |
if (on_GPU()): | |
return; | |
viennacl::backend::memory_create(*GPUptr, sizeof(T)*a.size(), viennacl::context()); | |
viennacl::backend::memory_write(*GPUptr, 0, a.size()*sizeof(T), *T); | |
catch: | |
raise Error | |
} | |
} |
@OXPHOS ok so the compile time macros in case of linalg is not something we want. the problem with this is that it still following the old approach, whereas you wanna go with what's @sanuj is creating, i.e. dynamic libraries.
so what you want actually in case of linalg part is that:
template<class *T>
T dot(SGVector<T> a, SGVector<T> b)
{
if (a.onGPU() && b.onGPU()) {
if (this->hasGPUBackend()) {
// do the gpu backend dot product
// you shouldn't care whether it's viennacl or some other GPU backend.
return this->gpu_backend->dot(*a.GPUptr, *b.GPUptr);
} else {
// either throw a RuntimeException or transfer back the data to cpu
throw new RuntimeException("user did not register GPU backend");
}
} else {
// take care that the matricies are on the same backend
// do the non-gpu based default backend:
// this should be actually as well implemented in a separate class's function and just that being called here:
// like:
return this->cpu_backend->dot(a.CPUptr, b.CPUptr);
}
}
@karlnapf @vigsterkr Thanks for your comments! I was not completely clear about the difference but here're some of my thoughts after reading yours:
- Flags for
onGPU()
indot.h
can be removed easily. But inclass SGVector
, currently GPU pointer and transfering data to GPU requires ViennaCL.- I can use openCL instead?
- Require
hasGPUBackend()
to transfer data to GPU in the first place?
- I cannot see how to completely avoid for eg.
ifdef HAVE_VIENNACL
flag. Otherwise how the class/methods know whether there are available backends? I assumethis.hasGPUBackend()
is still like:
bool hasGPUBackend():
#ifdef HAVE_VIENNACL
return true
#else
return false
#endif
One way to do this is via having a base class, (that appears in the interfaces).
The factory decides at runtime which of the subclasses is returned, (based on HAVE_VIENNACL for example).
In the linalg code, we then check the flag, do a static cast, and voila, we can access the memory data structure -- without having HAVE_VIENNACL in the interface, or hard required at all
I guess that's the only way it works as we expect them to.. I'd suggest just one tiny thing.. maybe we don't wanna mess up with the SGVector class.. maybe have another interface, keeping SGVec untouched.. Provide a few constructors and things work fine and dandy.. The SGMatrix thingi then can work with the linear operator interface that we already have.. We'll just have another subclass that does the GPU matrix thing..
I dont think this really ressembles what we had discussed in IRC. Remember we wanted to avoid statically (that is decided with preprocessors) linked calls.
@lambday and @vigsterkr lets comment a bit.