ilia-cher

## gist:f925ace8ed45d9dff400abf2ea176a31
=================================== FAILURES ===================================
_____________________ TestOperators.test_quantized_linear ______________________

mod = <torch._C.Function object at 0x7f93329439e8>
inputs = (tensor([[0.9231, 0.3115, 0.7289, 0.8404, 0.4845, 0.9304, 0.4694, 0.5023, 0.0949,
         0.8016, 0.9406, 0.9025],
  ...9277, 0.5878, 0.3807, 0.4685, 0.2934,
        0.2472, 0.2436, 0.8439, 0.9898, 0.7428, 0.9442, 0.7064, 0.7519, 0.7751]))
running_what = 'trace'

## gist:f1e094dae47fe23e55e7672ac4dcda2f
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install --cmake
./build/bin/record_function_benchmark

> time1_after = c(622.15, 611.69, 607.03, 612.22, 637.45)
> time1_before = c(653.35, 620.83, 659.3, 636.34, 600.21)
>
> time16_after = c(1257.72, 1153.95, 1245.65, 1172.81, 1209.96)
> time16_before = c(1210.11, 1204.22, 1230.56, 1244.74, 1147.77)
>
> mean(time1_after)

## gist:00a92f2c52cf72e420d981ce0727e62e
~/local/pytorch  $ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100

Profiling disabled, tensor size 1x1
        avg. time: 0.663 ms, stddev: 0.036 ms ;  time per iteration: 0.003 ms

Profiling enabled , tensor size 1x1
        avg. time: 20.641 ms, stddev: 0.643 ms ;  time per iteration: 0.081 ms

Profiling disabled, tensor size 32x32

## gist:3f37d54c3b2afb24d6776858e6860f69
Profiling CPU Resnet model
---------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                         Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CPU Mem          Self CPU Mem     CUDA Mem         Self CUDA Mem    Number of Calls  Input Shapes
---------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
empty                        0.69%            505.080us        0.69%            505.080us        4.106us          94.79 Mb         94.79 Mb         0 b              0 b              123              []
resize_                      0.01%            6.

## gist:2e147732299bb5521335de6e572c1a2d
~/local/pytorch  (rec_fn_bench_models)$ PYTHONPATH="$(pwd)/benchmarks/experimental_components" python benchmarks/record_function_benchmark/record_function_bench.py
Benchmarking RecordFunction overhead for resnet50_jit
Running warmup... finished
Benchmarking with RecordFunction, 1 threads ... finished
<utils.common.Measurement object at 0x7fd828431668>
Record function overhead: with_rec_fn
resnet50_jit
  Median: 6.80 s
  IQR:    1.01 s (6.32 to 7.34)
  9 measurements, 1 runs per measurement, 1 thread

## gist:2cb249f5b5e3382dda5cb71ff855c2db
~/local/pytorch  (rec_fn_bench_models)$ PYTHONPATH="$(pwd)/benchmarks/experimental_components" python benchmarks/record_function_benchmark/record_function_bench.py
Benchmarking RecordFunction overhead for lstm_jit
Running warmup... finished
Benchmarking with RecordFunction, 1 threads ... finished
<utils.common.Measurement object at 0x7fb23fb58c18>
Record function overhead: with_rec_fn
lstm_jit
  Median: 620.98 ms
  IQR:    26.02 ms (608.89 to 634.91)
  194 measurements, 1 runs per measurement, 1 thread

## gist:2baffdd98951ee2a5f2da56a04fe15d0
#!/usr/bin/env python3
import torch
import torch.nn as nn
import torch.cuda.profiler as profiler
N = 32
I = 128
H = 256
O = 1024
# 2 Layer MLP
model = torch.nn.Sequential(

## gist:bd13a3ab0c4905ce58b0079df66711fa
---------------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------------------------------------------------
Name                                           Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls  Source Location
---------------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------------------------------------------------
aten::mkldnn_convolution                       62.18%           33.268ms         62.45%           33.409ms         33.409ms         1                /data/users/iliacher/pytorch/torch/nn/modules/conv.py(416): _conv_forward


## gist:1ef60eb686ec49af8e405c03e6967e7f
python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100

Profiling disabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
        avg. time: 0.903 ms, stddev: 0.024 ms
        time per iteration: 0.004 ms

Profiling enabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
        avg. time: 14.326 ms, stddev: 1.697 ms
        time per iteration: 0.056 ms

## gist:79870d9be7fb5454c9059b19710fd58d
trunk:
python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100

Profiling disabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
        avg. time: 0.881 ms, stddev: 0.045 ms
        time per iteration: 0.003 ms

Profiling enabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
        avg. time: 13.236 ms, stddev: 2.367 ms
	=================================== FAILURES ===================================
	_____________________ TestOperators.test_quantized_linear ______________________

	mod = <torch._C.Function object at 0x7f93329439e8>
	inputs = (tensor([[0.9231, 0.3115, 0.7289, 0.8404, 0.4845, 0.9304, 0.4694, 0.5023, 0.0949,
	0.8016, 0.9406, 0.9025],
	...9277, 0.5878, 0.3807, 0.4685, 0.2934,
	0.2472, 0.2436, 0.8439, 0.9898, 0.7428, 0.9442, 0.7064, 0.7519, 0.7751]))
	running_what = 'trace'
	BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install --cmake
	./build/bin/record_function_benchmark

	> time1_after = c(622.15, 611.69, 607.03, 612.22, 637.45)
	> time1_before = c(653.35, 620.83, 659.3, 636.34, 600.21)
	>
	> time16_after = c(1257.72, 1153.95, 1245.65, 1172.81, 1209.96)
	> time16_before = c(1210.11, 1204.22, 1230.56, 1244.74, 1147.77)
	>
	> mean(time1_after)
	~/local/pytorch $ python benchmarks/profiler_benchmark/profiler_bench.py
	Payload: loop; 256 iterations, N = 100

	Profiling disabled, tensor size 1x1
	avg. time: 0.663 ms, stddev: 0.036 ms ; time per iteration: 0.003 ms

	Profiling enabled , tensor size 1x1
	avg. time: 20.641 ms, stddev: 0.643 ms ; time per iteration: 0.081 ms

	Profiling disabled, tensor size 32x32
	Profiling CPU Resnet model
	--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
	Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes
	--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
	empty 0.69% 505.080us 0.69% 505.080us 4.106us 94.79 Mb 94.79 Mb 0 b 0 b 123 []
	resize_ 0.01% 6.
	~/local/pytorch (rec_fn_bench_models)$ PYTHONPATH="$(pwd)/benchmarks/experimental_components" python benchmarks/record_function_benchmark/record_function_bench.py
	Benchmarking RecordFunction overhead for resnet50_jit
	Running warmup... finished
	Benchmarking with RecordFunction, 1 threads ... finished
	<utils.common.Measurement object at 0x7fd828431668>
	Record function overhead: with_rec_fn
	resnet50_jit
	Median: 6.80 s
	IQR: 1.01 s (6.32 to 7.34)
	9 measurements, 1 runs per measurement, 1 thread
	#!/usr/bin/env python3
	import torch
	import torch.nn as nn
	import torch.cuda.profiler as profiler
	N = 32
	I = 128
	H = 256
	O = 1024
	# 2 Layer MLP
	model = torch.nn.Sequential(
	--------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------------------------------------------------
	Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Source Location
	--------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------------------------------------------------
	aten::mkldnn_convolution 62.18% 33.268ms 62.45% 33.409ms 33.409ms 1 /data/users/iliacher/pytorch/torch/nn/modules/conv.py(416): _conv_forward
	python benchmarks/profiler_benchmark/profiler_bench.py
	Payload: loop; 256 iterations, N = 100

	Profiling disabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
	avg. time: 0.903 ms, stddev: 0.024 ms
	time per iteration: 0.004 ms

	Profiling enabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
	avg. time: 14.326 ms, stddev: 1.697 ms
	time per iteration: 0.056 ms
	trunk:
	python benchmarks/profiler_benchmark/profiler_bench.py
	Payload: loop; 256 iterations, N = 100

	Profiling disabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
	avg. time: 0.881 ms, stddev: 0.045 ms
	time per iteration: 0.003 ms

	Profiling enabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
	avg. time: 13.236 ms, stddev: 2.367 ms