$ nvcc simpleCallback.cu -O2 -o simpleCallback
$ nvprof -f -o simpleCallback.nvvp ./simpleCallback | grep elapsed
No callback: elapsed time = 1.534s
One callback: elapsed time = 1.498s
Two callback: elapsed time = 3.718s
Four callback: elapsed time = 5.194s
As increasing callbacks, it becomes slow...