Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ilia-cher/00a92f2c52cf72e420d981ce0727e62e to your computer and use it in GitHub Desktop.
Save ilia-cher/00a92f2c52cf72e420d981ce0727e62e to your computer and use it in GitHub Desktop.
~/local/pytorch $ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100
Profiling disabled, tensor size 1x1
avg. time: 0.663 ms, stddev: 0.036 ms ; time per iteration: 0.003 ms
Profiling enabled , tensor size 1x1
avg. time: 20.641 ms, stddev: 0.643 ms ; time per iteration: 0.081 ms
Profiling disabled, tensor size 32x32
avg. time: 1.256 ms, stddev: 0.055 ms ; time per iteration: 0.005 ms
Profiling enabled , tensor size 32x32
avg. time: 21.430 ms, stddev: 0.626 ms ; time per iteration: 0.084 ms
Profiling disabled, tensor size 128x128
avg. time: 9.208 ms, stddev: 0.424 ms ; time per iteration: 0.036 ms
Profiling enabled , tensor size 128x128
avg. time: 29.457 ms, stddev: 1.884 ms ; time per iteration: 0.115 ms
Profiling disabled, tensor size 256x256
avg. time: 16.352 ms, stddev: 0.346 ms ; time per iteration: 0.064 ms
Profiling enabled , tensor size 256x256
avg. time: 40.883 ms, stddev: 3.686 ms ; time per iteration: 0.160 ms
Profiling disabled, tensor size 512x512
avg. time: 119.053 ms, stddev: 1.903 ms ; time per iteration: 0.465 ms
Profiling enabled , tensor size 512x512
avg. time: 138.641 ms, stddev: 6.631 ms ; time per iteration: 0.542 ms
Payload: parallel; 256 iterations, N = 100
Profiling disabled, tensor size 1x1
avg. time: 0.287 ms, stddev: 0.054 ms ; time per iteration: 0.001 ms
Profiling enabled , tensor size 1x1
avg. time: 19.083 ms, stddev: 0.748 ms ; time per iteration: 0.075 ms
Profiling disabled, tensor size 32x32
avg. time: 0.492 ms, stddev: 0.118 ms ; time per iteration: 0.002 ms
Profiling enabled , tensor size 32x32
avg. time: 20.473 ms, stddev: 2.186 ms ; time per iteration: 0.080 ms
Profiling disabled, tensor size 128x128
avg. time: 6.842 ms, stddev: 0.951 ms ; time per iteration: 0.027 ms
Profiling enabled , tensor size 128x128
avg. time: 27.044 ms, stddev: 3.586 ms ; time per iteration: 0.106 ms
Profiling disabled, tensor size 256x256
avg. time: 15.606 ms, stddev: 1.528 ms ; time per iteration: 0.061 ms
Profiling enabled , tensor size 256x256
avg. time: 35.512 ms, stddev: 4.111 ms ; time per iteration: 0.139 ms
Profiling disabled, tensor size 512x512
avg. time: 82.051 ms, stddev: 6.189 ms ; time per iteration: 0.321 ms
Profiling enabled , tensor size 512x512
avg. time: 103.030 ms, stddev: 8.640 ms ; time per iteration: 0.402 ms
---
without TLS optimization:
~/local/pytorch $ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100
Profiling disabled, tensor size 1x1
avg. time: 0.508 ms, stddev: 0.028 ms ; time per iteration: 0.002 ms
Profiling enabled , tensor size 1x1
avg. time: 19.410 ms, stddev: 0.543 ms ; time per iteration: 0.076 ms
Profiling disabled, tensor size 32x32
avg. time: 0.918 ms, stddev: 0.032 ms ; time per iteration: 0.004 ms
Profiling enabled , tensor size 32x32
avg. time: 20.344 ms, stddev: 0.825 ms ; time per iteration: 0.079 ms
Profiling disabled, tensor size 128x128
avg. time: 9.021 ms, stddev: 0.387 ms ; time per iteration: 0.035 ms
Profiling enabled , tensor size 128x128
avg. time: 29.780 ms, stddev: 0.967 ms ; time per iteration: 0.116 ms
Profiling disabled, tensor size 256x256
avg. time: 15.854 ms, stddev: 0.608 ms ; time per iteration: 0.062 ms
Profiling enabled , tensor size 256x256
avg. time: 39.132 ms, stddev: 5.546 ms ; time per iteration: 0.153 ms
Profiling disabled, tensor size 512x512
avg. time: 119.083 ms, stddev: 10.857 ms ; time per iteration: 0.465 ms
Profiling enabled , tensor size 512x512
avg. time: 139.171 ms, stddev: 3.472 ms ; time per iteration: 0.544 ms
Payload: parallel; 256 iterations, N = 100
Profiling disabled, tensor size 1x1
avg. time: 0.275 ms, stddev: 0.066 ms ; time per iteration: 0.001 ms
Profiling enabled , tensor size 1x1
avg. time: 20.333 ms, stddev: 1.327 ms ; time per iteration: 0.079 ms
Profiling disabled, tensor size 32x32
avg. time: 0.478 ms, stddev: 0.115 ms ; time per iteration: 0.002 ms
Profiling enabled , tensor size 32x32
avg. time: 20.322 ms, stddev: 0.373 ms ; time per iteration: 0.079 ms
Profiling disabled, tensor size 128x128
avg. time: 6.615 ms, stddev: 0.931 ms ; time per iteration: 0.026 ms
Profiling enabled , tensor size 128x128
avg. time: 26.535 ms, stddev: 2.230 ms ; time per iteration: 0.104 ms
Profiling disabled, tensor size 256x256
avg. time: 15.361 ms, stddev: 1.943 ms ; time per iteration: 0.060 ms
Profiling enabled , tensor size 256x256
avg. time: 34.873 ms, stddev: 1.484 ms ; time per iteration: 0.136 ms
Profiling disabled, tensor size 512x512
avg. time: 84.417 ms, stddev: 6.605 ms ; time per iteration: 0.330 ms
Profiling enabled , tensor size 512x512
avg. time: 102.702 ms, stddev: 6.269 ms ; time per iteration: 0.401 ms
---
trunk
~/local/pytorch $ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100
Profiling disabled, tensor size 1x1
avg. time: 0.527 ms, stddev: 0.010 ms ; time per iteration: 0.002 ms
Profiling enabled , tensor size 1x1
avg. time: 21.159 ms, stddev: 3.329 ms ; time per iteration: 0.083 ms
Profiling disabled, tensor size 32x32
avg. time: 0.927 ms, stddev: 0.008 ms ; time per iteration: 0.004 ms
Profiling enabled , tensor size 32x32
avg. time: 20.517 ms, stddev: 0.528 ms ; time per iteration: 0.080 ms
Profiling disabled, tensor size 128x128
avg. time: 8.595 ms, stddev: 0.744 ms ; time per iteration: 0.034 ms
Profiling enabled , tensor size 128x128
avg. time: 30.312 ms, stddev: 1.679 ms ; time per iteration: 0.118 ms
Profiling disabled, tensor size 256x256
avg. time: 18.019 ms, stddev: 2.982 ms ; time per iteration: 0.070 ms
Profiling enabled , tensor size 256x256
avg. time: 38.106 ms, stddev: 0.649 ms ; time per iteration: 0.149 ms
Profiling disabled, tensor size 512x512
avg. time: 117.698 ms, stddev: 2.885 ms ; time per iteration: 0.460 ms
Profiling enabled , tensor size 512x512
avg. time: 137.435 ms, stddev: 5.857 ms ; time per iteration: 0.537 ms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment