vfdev vfdev-5

## notes.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vfdev-5
                / notes.md
            
            
              Last active
              April 15, 2021 15:05
            
              
                Repro https://github.com/pytorch/pytorch/issues/54415
              
          
    Reproduce issue: pytorch/pytorch#54415


Download data and extract the data: from https://easyupload.io/4ut2q7

git clone git@github.com:Algomorph/NeuralTracking.git
cd NeuralTracking/
git checkout 6b8a10b2536eda26f9b613c3521b2fe4894602ad

nano .git/config


## example_1.py
# Run it
# torchrun --nproc_per_node=4 example_1.py

import os
import time
import torch
import torch.distributed as dist


def pprint(rank, msg):

## README.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              3 stars
            
          
                vfdev-5
                / README.md
            
            
              Created
              November 24, 2020 10:10
            
              
                PyTorch C++ dev with xeus-cling in Jupyter
              
          
    How to setup interactive C++ interpreter with PyTorch C++ library in Jupyter using xeus-cling

Requirements


installed Jupyter notebook
installed conda, e.g. /opt/conda
built PyTorch

from source
libtorch
installed via conda (but can have an issue with _GLIBCXX_USE_CXX11_ABI)


## benchmark.py
'''
To run the CPU benchmark: `CUDA_VISIBLE_DEVICES="" python benchmark.py --name cpu`
To run the GPU benchmark: `CUDA_VISIBLE_DEVICES=0 python benchmark.py --name cuda`
To run the distributed benchmark: `python -u -m torch.distributed.launch --nproc_per_node=2 --use_env  benchmark.py --name dist`
'''


import argparse
import time
import math

## notes.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vfdev-5
                / notes.md
            
            
              Created
              July 10, 2020 15:34
            
              
                PyTorch tests notes
              
          
    How to run tests of PyTorch

Distributed module


Run all distributed tests

python test/run_test.py -pt -vi distributed/test_distributed

Run a single test


## tech-share.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                vfdev-5
                / tech-share.md
            
            
              Created
              July 6, 2020 15:59
            
              
                PyTorch Tech Share (July 06 2020) - Simple PyTorch distributed computation functionality testing with `pytest-xdist`.
              
          
    PyTorch Tech Share (July 06 2020) - Simple PyTorch distributed computation functionality testing with pytest-xdist.

It is about one of many other approaches on how we can test a custom distributed computation functionality
by emulating multiple processes.
What is "distributed setting" in PyTorch ?


Communications between N application's processes

send/receive tensors


## readme.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                vfdev-5
                / readme.md
            
            
              Last active
              October 5, 2021 09:39
            
              
                Remote debugging python code with VSCode
              
          
    My two cents on debugging, if you have ssh access to some of your nodes, especially the one where an xp failed. There is also a possibility to remotely debug the code. This may seem a bit tricky however could help to check the pipeline. I tested it with VSCode.

ssh to the machine and the environment (docker container etc)

Using debugpy


Install debugpy

pip install debugpy


## Dockerfile
FROM python:3.6-buster

# - Install gsutil to copy wheels from gcr
# - Install openblas
# - Download torch & xla
# - Setup Python 3.6 env for Torch XLA wheels
# - Install torch & xla
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
  apt-get install -y apt-transport-https ca-certificates gnupg curl && \
  curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && \

## check_torch_cuda_amp_CycleGAN_on_random.py
#!/usr/bin/env python
# coding: utf-8

import torch
print(torch.__version__)

import ignite
print(ignite.__file__)
print(ignite.__version__)

## benchmark_engine.sh
#!/bin/bash

# Tests configuration:
if [ -z $version ]; then
    export version="v0.2.1"
#    export version="master"
#    export version="engine_refactor"
    echo "Setup version: $version"
fi
	# Run it
	# torchrun --nproc_per_node=4 example_1.py

	import os
	import time
	import torch
	import torch.distributed as dist


	def pprint(rank, msg):
	'''
	To run the CPU benchmark: `CUDA_VISIBLE_DEVICES="" python benchmark.py --name cpu`
	To run the GPU benchmark: `CUDA_VISIBLE_DEVICES=0 python benchmark.py --name cuda`
	To run the distributed benchmark: `python -u -m torch.distributed.launch --nproc_per_node=2 --use_env benchmark.py --name dist`
	'''


	import argparse
	import time
	import math
	FROM python:3.6-buster

	# - Install gsutil to copy wheels from gcr
	# - Install openblas
	# - Download torch & xla
	# - Setup Python 3.6 env for Torch XLA wheels
	# - Install torch & xla
	RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" \| tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
	apt-get install -y apt-transport-https ca-certificates gnupg curl && \
	curl https://packages.cloud.google.com/apt/doc/apt-key.gpg \| apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && \
	#!/usr/bin/env python
	# coding: utf-8

	import torch
	print(torch.__version__)

	import ignite
	print(ignite.__file__)
	print(ignite.__version__)
	#!/bin/bash

	# Tests configuration:
	if [ -z $version ]; then
	export version="v0.2.1"
	# export version="master"
	# export version="engine_refactor"
	echo "Setup version: $version"
	fi