ilia-cher/gist:a5a9eb6b68504542a3cad5150fc39b1a

## gistfile1.txt
python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: True, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fb85eb71b80>
payload()
  Median: 3.20 ms
  IQR:    0.16 ms (3.11 to 3.27)
  3127 measurements, 1 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: True, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fb85eb71970>
payload()
  Median: 55.22 ms
  IQR:    1.35 ms (54.48 to 55.84)
  179 measurements, 1 runs per measurement, 1 thread
python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: True, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f51cfc2cb80>
payload()
  Median: 3.40 ms
  IQR:    0.17 ms (3.31 to 3.48)
  2931 measurements, 1 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: True, use kineto: False, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f51cfc2c970>
payload()
  Median: 56.25 ms
  IQR:    1.24 ms (55.60 to 56.84)
  179 measurements, 1 runs per measurement, 1 thread


python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda --use_kineto
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f5de8259b80>
payload()
  Median: 3.34 ms
  IQR:    0.16 ms (3.26 to 3.42)
  2987 measurements, 1 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f5eba5700d0>
payload()
  Median: 71.56 ms
  IQR:    1.72 ms (70.57 to 72.29)
  141 measurements, 1 runs per measurement, 1 thread

python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda --use_kineto
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fc5d7128b80>
payload()
  Median: 5.12 ms
  IQR:    1.92 ms (3.50 to 5.42)
  2201 measurements, 1 runs per measurement, 1 thread
  WARNING: Interquartile range is 37.4% of the median measurement.
           This suggests significant environmental influence.
Profiling enabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fc5d00a4310>
payload()
  Median: 77.10 ms
  IQR:    3.58 ms (75.78 to 79.36)
  129 measurements, 1 runs per measurement, 1 thread


python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda --use_kineto --cuda_only
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fe133b99b80>
payload()
  Median: 3.32 ms
  IQR:    0.18 ms (3.23 to 3.41)
  3004 measurements, 1 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7fe12c0b6eb0>
payload()
  Median: 20.00 ms
  IQR:    4.01 ms (17.03 to 21.04)
  517 measurements, 1 runs per measurement, 1 thread
  WARNING: Interquartile range is 20.1% of the median measurement.
           This could indicate system fluctuation.

python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda --use_kineto --cuda_only
Payload: loop, 256 iterations; timer min. runtime = 10

Profiling disabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f27930a9b80>
payload()
  Median: 3.53 ms
  IQR:    0.18 ms (3.44 to 3.62)
  2824 measurements, 1 runs per measurement, 1 thread
Profiling enabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
<torch.utils.benchmark.utils.common.Measurement object at 0x7f27845dae80>
payload()
  Median: 20.67 ms
  IQR:    0.70 ms (20.33 to 21.03)
  482 measurements, 1 runs per measurement, 1 thread
	python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: True, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fb85eb71b80>
	payload()
	Median: 3.20 ms
	IQR: 0.16 ms (3.11 to 3.27)
	3127 measurements, 1 runs per measurement, 1 thread
	Profiling enabled, tensor size 1x1, use cuda: True, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fb85eb71970>
	payload()
	Median: 55.22 ms
	IQR: 1.35 ms (54.48 to 55.84)
	179 measurements, 1 runs per measurement, 1 thread
	python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: True, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7f51cfc2cb80>
	payload()
	Median: 3.40 ms
	IQR: 0.17 ms (3.31 to 3.48)
	2931 measurements, 1 runs per measurement, 1 thread
	Profiling enabled, tensor size 1x1, use cuda: True, use kineto: False, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7f51cfc2c970>
	payload()
	Median: 56.25 ms
	IQR: 1.24 ms (55.60 to 56.84)
	179 measurements, 1 runs per measurement, 1 thread



	python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda --use_kineto
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7f5de8259b80>
	payload()
	Median: 3.34 ms
	IQR: 0.16 ms (3.26 to 3.42)
	2987 measurements, 1 runs per measurement, 1 thread
	Profiling enabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7f5eba5700d0>
	payload()
	Median: 71.56 ms
	IQR: 1.72 ms (70.57 to 72.29)
	141 measurements, 1 runs per measurement, 1 thread

	python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda --use_kineto
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fc5d7128b80>
	payload()
	Median: 5.12 ms
	IQR: 1.92 ms (3.50 to 5.42)
	2201 measurements, 1 runs per measurement, 1 thread
	WARNING: Interquartile range is 37.4% of the median measurement.
	This suggests significant environmental influence.
	Profiling enabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fc5d00a4310>
	payload()
	Median: 77.10 ms
	IQR: 3.58 ms (75.78 to 79.36)
	129 measurements, 1 runs per measurement, 1 thread






	python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda --use_kineto --cuda_only
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fe133b99b80>
	payload()
	Median: 3.32 ms
	IQR: 0.18 ms (3.23 to 3.41)
	3004 measurements, 1 runs per measurement, 1 thread
	Profiling enabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7fe12c0b6eb0>
	payload()
	Median: 20.00 ms
	IQR: 4.01 ms (17.03 to 21.04)
	517 measurements, 1 runs per measurement, 1 thread
	WARNING: Interquartile range is 20.1% of the median measurement.
	This could indicate system fluctuation.

	python benchmarks/profiler_benchmark/profiler_bench.py --with_cuda --use_kineto --cuda_only
	Payload: loop, 256 iterations; timer min. runtime = 10

	Profiling disabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7f27930a9b80>
	payload()
	Median: 3.53 ms
	IQR: 0.18 ms (3.44 to 3.62)
	2824 measurements, 1 runs per measurement, 1 thread
	Profiling enabled, tensor size 1x1, use cuda: True, use kineto: True, with stacks: False, use script: False
	<torch.utils.benchmark.utils.common.Measurement object at 0x7f27845dae80>
	payload()
	Median: 20.67 ms
	IQR: 0.70 ms (20.33 to 21.03)
	482 measurements, 1 runs per measurement, 1 thread