Skip to content

Instantly share code, notes, and snippets.

@mingfeima
Last active February 14, 2020 00:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mingfeima/a85de0f40647514e6b5cfa9622a50dc1 to your computer and use it in GitHub Desktop.
Save mingfeima/a85de0f40647514e6b5cfa9622a50dc1 to your computer and use it in GitHub Desktop.
keep log of cat performance regression

trace #30806 of torch.cat() performance regression.

benchmark_all_test result, command line:

python -m benchmark_all_test --operators cat --tag_filter all

with commit 7b50e76255aebbbcdae702ee1f00d07d86b0112b

(pytorch-mingfei) [mingfeim@mlt-skx090 operator_benchmark]$ python -m benchmark_all_test --operators cat --tag_filter all
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M1_N1_K1_dim0_cpu
# Input: M: 1, N: 1, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 5.464

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim0_cpu
# Input: M: 256, N: 512, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 19.216

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M512_N512_K2_dim1_cpu
# Input: M: 512, N: 512, K: 2, dim: 1, device: cpu
Forward Execution Time (us) : 25.436

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K1_dim0_cpu
# Input: M: 128, N: 128, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 8.929

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K1_dim1_cpu
# Input: M: 128, N: 128, K: 1, dim: 1, device: cpu
Forward Execution Time (us) : 10.152

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K1_dim2_cpu
# Input: M: 128, N: 128, K: 1, dim: 2, device: cpu
Forward Execution Time (us) : 24.093

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K2_dim0_cpu
# Input: M: 128, N: 128, K: 2, dim: 0, device: cpu
Forward Execution Time (us) : 12.196

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K2_dim1_cpu
# Input: M: 128, N: 128, K: 2, dim: 1, device: cpu
Forward Execution Time (us) : 14.418

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K2_dim2_cpu
# Input: M: 128, N: 128, K: 2, dim: 2, device: cpu
Forward Execution Time (us) : 246.521

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K1_dim0_cpu
# Input: M: 128, N: 1024, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 19.673

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K1_dim1_cpu
# Input: M: 128, N: 1024, K: 1, dim: 1, device: cpu
Forward Execution Time (us) : 18.370

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K1_dim2_cpu
# Input: M: 128, N: 1024, K: 1, dim: 2, device: cpu
Forward Execution Time (us) : 50.611

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K2_dim0_cpu
# Input: M: 128, N: 1024, K: 2, dim: 0, device: cpu
Forward Execution Time (us) : 21.160

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K2_dim1_cpu
# Input: M: 128, N: 1024, K: 2, dim: 1, device: cpu
Forward Execution Time (us) : 22.290

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K2_dim2_cpu
# Input: M: 128, N: 1024, K: 2, dim: 2, device: cpu
Forward Execution Time (us) : 275.948

revert 7b50e76255aebbbcdae702ee1f00d07d86b0112b

pytorch-mingfei) [mingfeim@mlt-skx090 operator_benchmark]$ python -m benchmark_all_test --operators cat --tag_filter all
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M1_N1_K1_dim0_cpu
# Input: M: 1, N: 1, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 3.267

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim0_cpu
# Input: M: 256, N: 512, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 122.176

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M512_N512_K2_dim1_cpu
# Input: M: 512, N: 512, K: 2, dim: 1, device: cpu
Forward Execution Time (us) : 263.905

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K1_dim0_cpu
# Input: M: 128, N: 128, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 7.064

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K1_dim1_cpu
# Input: M: 128, N: 128, K: 1, dim: 1, device: cpu
Forward Execution Time (us) : 10.002

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K1_dim2_cpu
# Input: M: 128, N: 128, K: 1, dim: 2, device: cpu
Forward Execution Time (us) : 555.969

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K2_dim0_cpu
# Input: M: 128, N: 128, K: 2, dim: 0, device: cpu
Forward Execution Time (us) : 11.476

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K2_dim1_cpu
# Input: M: 128, N: 128, K: 2, dim: 1, device: cpu
Forward Execution Time (us) : 15.089

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K2_dim2_cpu
# Input: M: 128, N: 128, K: 2, dim: 2, device: cpu
Forward Execution Time (us) : 569.621

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K1_dim0_cpu
# Input: M: 128, N: 1024, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 97.550

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K1_dim1_cpu
# Input: M: 128, N: 1024, K: 1, dim: 1, device: cpu
Forward Execution Time (us) : 65.711

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K1_dim2_cpu
# Input: M: 128, N: 1024, K: 1, dim: 2, device: cpu
Forward Execution Time (us) : 4446.804

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K2_dim0_cpu
# Input: M: 128, N: 1024, K: 2, dim: 0, device: cpu
Forward Execution Time (us) : 272.241

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K2_dim1_cpu
# Input: M: 128, N: 1024, K: 2, dim: 1, device: cpu
Forward Execution Time (us) : 126.997

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N1024_K2_dim2_cpu
# Input: M: 128, N: 1024, K: 2, dim: 2, device: cpu
Forward Execution Time (us) : 4529.495

follow ups

  1. FB test machine, envronment variables, OMP_NUM_THREADS, KMP_AFFINITY, w/wo jemalloc?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment