-
To capture changing scalar values, we need to store them into Tensors to be recorded by the graph.
-
If we store the value like other Tensors on gpu memory, then we need cudamemcpy to access the scalar value. However, this cudamemcpy may run in parallel with another operator that takes the scalar value as input.
-
For example, axpy takes scalar data types like float,
axpy(float alpha, ...)
. If alpha is stored in a tensor, then we need to get the value from gpu memory and pass it to axpy.alpha = Tensor() axpy(alpha, ...) #python code axpy(Tensor alpha, ...) # cpp code submit cudamemcpy operator to get alpha's value into a submit axpy operator/kernel and pass a into it add the dependency of axpy operator to cudamemcpy opeartor manually (needs special processing on the graph) or can we just put cudamemcpy and axpy kernel into a single operator, which puts cudamemcpy and cublas_axpy into the same cuda stream to execute them in sequence.
Last active
June 3, 2020 12:08
-
-
Save nudles/4b9660052abae408458f33b2b3ef22ed to your computer and use it in GitHub Desktop.
Scalar Tensor
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment