Skip to content

Instantly share code, notes, and snippets.

@nudles
Last active June 3, 2020 12:08
Show Gist options
  • Save nudles/4b9660052abae408458f33b2b3ef22ed to your computer and use it in GitHub Desktop.
Save nudles/4b9660052abae408458f33b2b3ef22ed to your computer and use it in GitHub Desktop.
Scalar Tensor
  • To capture changing scalar values, we need to store them into Tensors to be recorded by the graph.

  • If we store the value like other Tensors on gpu memory, then we need cudamemcpy to access the scalar value. However, this cudamemcpy may run in parallel with another operator that takes the scalar value as input.

  • For example, axpy takes scalar data types like float, axpy(float alpha, ...). If alpha is stored in a tensor, then we need to get the value from gpu memory and pass it to axpy.

    alpha = Tensor()
    axpy(alpha, ...) #python code
    
    axpy(Tensor alpha, ...)  # cpp code
       submit cudamemcpy operator to get alpha's value into a
       submit axpy operator/kernel and pass a into it
       add the dependency of axpy operator to cudamemcpy opeartor manually (needs special processing on the graph)
       
       or can we just put cudamemcpy and axpy kernel into a single operator, 
          which puts cudamemcpy and cublas_axpy into the same cuda stream to execute them in sequence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment