Skip to content

Instantly share code, notes, and snippets.

@ilia-cher
Created December 30, 2020 22:26
Show Gist options
  • Save ilia-cher/5189904a08c4a12b24de1bf2ff9f1296 to your computer and use it in GitHub Desktop.
Save ilia-cher/5189904a08c4a12b24de1bf2ff9f1296 to your computer and use it in GitHub Desktop.
before:
~/local/pytorch (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop, 256 iterations; timer min. runtime = 10
Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fcc4ca50490>
payload()
Median: 688.50 us
IQR: 7.70 us (684.29 to 691.99)
145 measurements, 100 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
benchmarks/profiler_benchmark/profiler_bench.py:12: UserWarning: Calculating flops for aten::mm requires mat1_size and mat2_size in saved arguments. (Triggered internally at ../torch/csrc/autograd/profiler_utils.cpp:149.)
x = torch.mm(x, x)
<torch.utils.benchmark.utils.common.Measurement object at 0x7fcc9b7e6910>
payload()
Median: 30.51 ms
IQR: 1.35 ms (29.47 to 30.82)
329 measurements, 1 runs per measurement, 1 thread
after:
~/local/pytorch (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop, 256 iterations; timer min. runtime = 10
Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f0b60208d60>
payload()
Median: 668.95 us
IQR: 20.97 us (657.36 to 678.33)
150 measurements, 100 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f0baefdf250>
payload()
Median: 18.52 ms
IQR: 0.65 ms (18.17 to 18.82)
540 measurements, 1 runs per measurement, 1 thread
without 46506:
~/local/pytorch (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop, 256 iterations; timer min. runtime = 10
Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa78761bd60>
payload()
Median: 700.84 us
IQR: 22.55 us (694.64 to 717.19)
142 measurements, 100 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa7d63c2910>
payload()
Median: 19.32 ms
IQR: 0.50 ms (19.10 to 19.60)
515 measurements, 1 runs per measurement, 1 thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment