Skip to content

Instantly share code, notes, and snippets.

View vfdev-5's full-sized avatar
:octocat:
.\ |

vfdev vfdev-5

:octocat:
.\ |
View GitHub Profile
@vfdev-5
vfdev-5 / 20230329-181023-pr_vs_nightly-speedup.md
Last active March 29, 2023 16:34
PyTorch, Improved perfs for vectorized interpolate uint8 RGB-case
Description:
- 20230329-174512-pr
Torch version: 2.1.0a0+gitd6e220c
Torch config: PyTorch built with:
  - GCC 9.4
  - C++ Version: 201703
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - Build settings: BUILD_TYPE=Release, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -f
@vfdev-5
vfdev-5 / interpolation-code-notes.md
Last active February 9, 2023 11:43
Vectorized pytorch interpolate uint8

RGBA Image resizing with a vectorized algorithm

Horizontal pass vectorized algorithm on RGBA data

Input data is stored as

input = [r[0], g[0], b[0], a[0], r[1], g[1], b[1], a[1], r[2], g[2], b[2], a[2], ...]

Weights are float values computed for each output pixel and rescaled to uint16:

@vfdev-5
vfdev-5 / check_per_sample_grads_on_tvmodels.py
Last active February 17, 2022 18:04
functorch per-sample grads checks vs pytorch (torchvision models)
import torch
import torch.nn as nn
import torchvision
import torchvision.models as models
from functorch.version import __version__ as ft_version
from functorch import make_functional_with_buffers, grad, vmap
tested_models = []
for model_name in models.__dict__:
@vfdev-5
vfdev-5 / check_combine_state_for_ensemble_on_tv_det_models.py
Last active February 17, 2022 17:07
functorch combine_state_for_ensemble + vmap checks vs for-loop pytorch computations (torchvision models)
import torch
import torch.nn as nn
import torchvision
import torchvision.models.detection as tv_models
import functorch
from functorch import combine_state_for_ensemble, vmap
tested_models = []
@vfdev-5
vfdev-5 / check_make_functional_on_transformers.log
Created February 16, 2022 18:17
functorch make_functional + grad checks vs pytorch computed grads (NLP HF transformers)
Torch: 1.12.0.dev20220215+cu111
transformers: 4.16.2
Functorch: 0.2.0a0+c9d03e8
-- Check bert-base-cased model
-- Check gpt2 model
-- Check facebook/bart-large model
@vfdev-5
vfdev-5 / check_make_functional_on_tv_det_models.log
Last active February 16, 2022 18:16
functorch make_functional + grad checks vs pytorch computed grads (torchvision models)
Torch: 1.12.0.dev20220215+cu111
torchvision: 0.13.0.dev20220215+cu111
Functorch: 0.2.0a0+c9d03e8
-- Check fasterrcnn_resnet50_fpn model
-- Check fasterrcnn_mobilenet_v3_large_320_fpn model
-- Check fasterrcnn_mobilenet_v3_large_fpn model
-- Check maskrcnn_resnet50_fpn model
-- Check keypointrcnn_resnet50_fpn model
from pathlib import Path
import PIL
from PIL import Image
import torch
import torch.nn as nn
import torch.utils.benchmark as benchmark
import torchvision
import torchvision.transforms as T
@vfdev-5
vfdev-5 / b1.log
Last active October 25, 2021 09:25
Benchmark torchvision transforms on PIL vs Tensor
Torch config: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.1
@vfdev-5
vfdev-5 / bench.py
Last active January 19, 2022 18:53
Benchmark interpolation bilinear with anti-alias, https://github.com/pytorch/pytorch/pull/65142
import argparse
import PIL
from PIL import Image
import torch
import torch.utils.benchmark as benchmark
# Original image size: 906, 438
sizes = [
(320, 196),
@vfdev-5
vfdev-5 / pr-mode-nearest-exact.log
Last active November 8, 2021 11:10
PyTorch vs OpenCV vs Scikit-Image vs Scipy vs Pillow vs TF nearest interpolation comparision with resize and rescale ops
pytorch: 1.11.0a0+git12d4b58
skimage: 0.19.0.dev0
opencv: 4.5.4-dev
scipy: 1.7.2
Pillow: 8.4.0
TensorFlow: 2.7.0
------ Check resize op ------