Skip to content

Instantly share code, notes, and snippets.

=================================== FAILURES ===================================
_____________________ TestOperators.test_quantized_linear ______________________
mod = <torch._C.Function object at 0x7f93329439e8>
inputs = (tensor([[0.9231, 0.3115, 0.7289, 0.8404, 0.4845, 0.9304, 0.4694, 0.5023, 0.0949,
0.8016, 0.9406, 0.9025],
...9277, 0.5878, 0.3807, 0.4685, 0.2934,
0.2472, 0.2436, 0.8439, 0.9898, 0.7428, 0.9442, 0.7064, 0.7519, 0.7751]))
running_what = 'trace'
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install --cmake
./build/bin/record_function_benchmark
> time1_after = c(622.15, 611.69, 607.03, 612.22, 637.45)
> time1_before = c(653.35, 620.83, 659.3, 636.34, 600.21)
>
> time16_after = c(1257.72, 1153.95, 1245.65, 1172.81, 1209.96)
> time16_before = c(1210.11, 1204.22, 1230.56, 1244.74, 1147.77)
>
> mean(time1_after)
~/local/pytorch $ python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100
Profiling disabled, tensor size 1x1
avg. time: 0.663 ms, stddev: 0.036 ms ; time per iteration: 0.003 ms
Profiling enabled , tensor size 1x1
avg. time: 20.641 ms, stddev: 0.643 ms ; time per iteration: 0.081 ms
Profiling disabled, tensor size 32x32
Profiling CPU Resnet model
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
empty 0.69% 505.080us 0.69% 505.080us 4.106us 94.79 Mb 94.79 Mb 0 b 0 b 123 []
resize_ 0.01% 6.
~/local/pytorch (rec_fn_bench_models)$ PYTHONPATH="$(pwd)/benchmarks/experimental_components" python benchmarks/record_function_benchmark/record_function_bench.py
Benchmarking RecordFunction overhead for resnet50_jit
Running warmup... finished
Benchmarking with RecordFunction, 1 threads ... finished
<utils.common.Measurement object at 0x7fd828431668>
Record function overhead: with_rec_fn
resnet50_jit
Median: 6.80 s
IQR: 1.01 s (6.32 to 7.34)
9 measurements, 1 runs per measurement, 1 thread
~/local/pytorch (rec_fn_bench_models)$ PYTHONPATH="$(pwd)/benchmarks/experimental_components" python benchmarks/record_function_benchmark/record_function_bench.py
Benchmarking RecordFunction overhead for lstm_jit
Running warmup... finished
Benchmarking with RecordFunction, 1 threads ... finished
<utils.common.Measurement object at 0x7fb23fb58c18>
Record function overhead: with_rec_fn
lstm_jit
Median: 620.98 ms
IQR: 26.02 ms (608.89 to 634.91)
194 measurements, 1 runs per measurement, 1 thread
#!/usr/bin/env python3
import torch
import torch.nn as nn
import torch.cuda.profiler as profiler
N = 32
I = 128
H = 256
O = 1024
# 2 Layer MLP
model = torch.nn.Sequential(
--------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------------------------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Source Location
--------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------------------------------------------------
aten::mkldnn_convolution 62.18% 33.268ms 62.45% 33.409ms 33.409ms 1 /data/users/iliacher/pytorch/torch/nn/modules/conv.py(416): _conv_forward
python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100
Profiling disabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
avg. time: 0.903 ms, stddev: 0.024 ms
time per iteration: 0.004 ms
Profiling enabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
avg. time: 14.326 ms, stddev: 1.697 ms
time per iteration: 0.056 ms
trunk:
python benchmarks/profiler_benchmark/profiler_bench.py
Payload: loop; 256 iterations, N = 100
Profiling disabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
avg. time: 0.881 ms, stddev: 0.045 ms
time per iteration: 0.003 ms
Profiling enabled, tensor size 1x1, use cuda: False, with stacks: False, use script: False
avg. time: 13.236 ms, stddev: 2.367 ms