nudles/readme.md

## readme.md

      
    Raw
  

              readme.md
            
          
To capture changing scalar values, we need to store them into Tensors to be recorded by the graph.


If we store the value like other Tensors on gpu memory, then we need cudamemcpy to access the scalar value. However, this cudamemcpy may run in parallel with another operator that takes the scalar value as input.


For example, axpy takes scalar data types like float, axpy(float alpha, ...).
If alpha is stored in a tensor, then we need to get the value from gpu memory and pass it to axpy.
alpha = Tensor()
axpy(alpha, ...) #python code

axpy(Tensor alpha, ...)  # cpp code
   submit cudamemcpy operator to get alpha's value into a
   submit axpy operator/kernel and pass a into it
   add the dependency of axpy operator to cudamemcpy opeartor manually (needs special processing on the graph)
   
   or can we just put cudamemcpy and axpy kernel into a single operator, 
      which puts cudamemcpy and cublas_axpy into the same cuda stream to execute them in sequence.