Skip to content

Instantly share code, notes, and snippets.

View vfdev-5's full-sized avatar
:octocat:
.\ |

vfdev vfdev-5

:octocat:
.\ |
View GitHub Profile

Reproduce issue: pytorch/pytorch#54415

git clone git@github.com:Algomorph/NeuralTracking.git
cd NeuralTracking/
git checkout 6b8a10b2536eda26f9b613c3521b2fe4894602ad

nano .git/config
@vfdev-5
vfdev-5 / example_1.py
Last active April 14, 2023 08:48
PyTorch Distributed playground
# Run it
# torchrun --nproc_per_node=4 example_1.py
import os
import time
import torch
import torch.distributed as dist
def pprint(rank, msg):
@vfdev-5
vfdev-5 / README.md
Created November 24, 2020 10:10
PyTorch C++ dev with xeus-cling in Jupyter

How to setup interactive C++ interpreter with PyTorch C++ library in Jupyter using xeus-cling

Requirements

  • installed Jupyter notebook
  • installed conda, e.g. /opt/conda
  • built PyTorch
    • from source
    • libtorch
    • installed via conda (but can have an issue with _GLIBCXX_USE_CXX11_ABI)
@vfdev-5
vfdev-5 / benchmark.py
Created September 6, 2020 20:36 — forked from n2cholas/benchmark.py
Benchmarking ignite master branch vs metrics_impl on metrics.
'''
To run the CPU benchmark: `CUDA_VISIBLE_DEVICES="" python benchmark.py --name cpu`
To run the GPU benchmark: `CUDA_VISIBLE_DEVICES=0 python benchmark.py --name cuda`
To run the distributed benchmark: `python -u -m torch.distributed.launch --nproc_per_node=2 --use_env benchmark.py --name dist`
'''
import argparse
import time
import math
@vfdev-5
vfdev-5 / notes.md
Created July 10, 2020 15:34
PyTorch tests notes

How to run tests of PyTorch

Distributed module

  • Run all distributed tests
python test/run_test.py -pt -vi distributed/test_distributed
  • Run a single test
@vfdev-5
vfdev-5 / tech-share.md
Created July 6, 2020 15:59
PyTorch Tech Share (July 06 2020) - Simple PyTorch distributed computation functionality testing with `pytest-xdist`.

PyTorch Tech Share (July 06 2020) - Simple PyTorch distributed computation functionality testing with pytest-xdist.

It is about one of many other approaches on how we can test a custom distributed computation functionality by emulating multiple processes.

What is "distributed setting" in PyTorch ?

  • Communications between N application's processes
    • send/receive tensors
@vfdev-5
vfdev-5 / readme.md
Last active October 5, 2021 09:39
Remote debugging python code with VSCode

My two cents on debugging, if you have ssh access to some of your nodes, especially the one where an xp failed. There is also a possibility to remotely debug the code. This may seem a bit tricky however could help to check the pipeline. I tested it with VSCode.

  • ssh to the machine and the environment (docker container etc)

Using debugpy

  1. Install debugpy
pip install debugpy
@vfdev-5
vfdev-5 / Dockerfile
Created April 28, 2020 21:28
Torch XLA Image, CPU
FROM python:3.6-buster
# - Install gsutil to copy wheels from gcr
# - Install openblas
# - Download torch & xla
# - Setup Python 3.6 env for Torch XLA wheels
# - Install torch & xla
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
apt-get install -y apt-transport-https ca-certificates gnupg curl && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && \
@vfdev-5
vfdev-5 / check_torch_cuda_amp_CycleGAN_on_random.py
Created April 12, 2020 22:29
Reproduce loss=NaN in CycleGAN with torch.cuda.amp
#!/usr/bin/env python
# coding: utf-8
import torch
print(torch.__version__)
import ignite
print(ignite.__file__)
print(ignite.__version__)
@vfdev-5
vfdev-5 / benchmark_engine.sh
Last active January 13, 2020 16:44
Helper scripts to benchmark and check ignite's engine on MNIST, CIFAR10 tasks
#!/bin/bash
# Tests configuration:
if [ -z $version ]; then
export version="v0.2.1"
# export version="master"
# export version="engine_refactor"
echo "Setup version: $version"
fi