ilia-cher/gist:5189904a08c4a12b24de1bf2ff9f1296

## gistfile1.txt
before:

~/local/pytorch  (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fcc4ca50490>
payload()
  Median: 688.50 us
  IQR:    7.70 us (684.29 to 691.99)
  145 measurements, 100 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
benchmarks/profiler_benchmark/profiler_bench.py:12: UserWarning: Calculating flops for aten::mm requires mat1_size and mat2_size in saved arguments. (Triggered internally at  ../torch/csrc/autograd/profiler_utils.cpp:149.)
  x = torch.mm(x, x)
<torch.utils.benchmark.utils.common.Measurement object at 0x7fcc9b7e6910>
payload()
  Median: 30.51 ms
  IQR:    1.35 ms (29.47 to 30.82)
  329 measurements, 1 runs per measurement, 1 thread

after:

~/local/pytorch  (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f0b60208d60>
payload()
  Median: 668.95 us
  IQR:    20.97 us (657.36 to 678.33)
  150 measurements, 100 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f0baefdf250>
payload()
  Median: 18.52 ms
  IQR:    0.65 ms (18.17 to 18.82)
  540 measurements, 1 runs per measurement, 1 thread

without 46506:
~/local/pytorch  (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa78761bd60>
payload()
  Median: 700.84 us
  IQR:    22.55 us (694.64 to 717.19)
  142 measurements, 100 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa7d63c2910>
payload()
  Median: 19.32 ms
  IQR:    0.50 ms (19.10 to 19.60)
  515 measurements, 1 runs per measurement, 1 thread
	before:

	~/local/pytorch (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fcc4ca50490>
	payload()
	Median: 688.50 us
	IQR: 7.70 us (684.29 to 691.99)
	145 measurements, 100 runs per measurement, 1 thread
	Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
	benchmarks/profiler_benchmark/profiler_bench.py:12: UserWarning: Calculating flops for aten::mm requires mat1_size and mat2_size in saved arguments. (Triggered internally at ../torch/csrc/autograd/profiler_utils.cpp:149.)
	x = torch.mm(x, x)
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fcc9b7e6910>
	payload()
	Median: 30.51 ms
	IQR: 1.35 ms (29.47 to 30.82)
	329 measurements, 1 runs per measurement, 1 thread

	after:

	~/local/pytorch (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7f0b60208d60>
	payload()
	Median: 668.95 us
	IQR: 20.97 us (657.36 to 678.33)
	150 measurements, 100 runs per measurement, 1 thread
	Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7f0baefdf250>
	payload()
	Median: 18.52 ms
	IQR: 0.65 ms (18.17 to 18.82)
	540 measurements, 1 runs per measurement, 1 thread

	without 46506:
	~/local/pytorch (flops_warnings)$ python benchmarks/profiler_benchmark/profiler_bench.py
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fa78761bd60>
	payload()
	Median: 700.84 us
	IQR: 22.55 us (694.64 to 717.19)
	142 measurements, 100 runs per measurement, 1 thread
	Profiling enabled, tensor size 1x1, use cuda: False, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fa7d63c2910>
	payload()
	Median: 19.32 ms
	IQR: 0.50 ms (19.10 to 19.60)
	515 measurements, 1 runs per measurement, 1 thread