Skip to content

Instantly share code, notes, and snippets.

View andrei-pokrovsky's full-sized avatar

Andrei Pokrovsky andrei-pokrovsky

View GitHub Profile
@andrei-pokrovsky
andrei-pokrovsky / gpu_profile.py
Created February 24, 2018 00:26 — forked from MInner/gpu_profile.py
A script to generate per-line GPU memory usage trace. For more meaningful results set `CUDA_LAUNCH_BLOCKING=1`.
import datetime
import linecache
import os
import pynvml3
import torch
print_tensor_sizes = True
last_tensor_sizes = set()
gpu_profile_fn = f'{datetime.datetime.now():%d-%b-%y-%H:%M:%S}-gpu_mem_prof.txt'
@andrei-pokrovsky
andrei-pokrovsky / pytorch_visualize.py
Created May 17, 2018 23:09 — forked from hyqneuron/pytorch_visualize.py
PyTorch graph visualization
import torch
import torch.nn as nn
from torch.nn import Parameter
from torch.autograd import Variable, Function
from collections import defaultdict
import graphviz
"""
This is a rather distorted implementation of graph visualization in PyTorch.
from graphviz import Digraph
import torch
from torch.autograd import Variable, Function
def iter_graph(root, callback):
queue = [root]
seen = set()
while queue:
fn = queue.pop()
if fn in seen:
@andrei-pokrovsky
andrei-pokrovsky / cudaMallocBench.cu
Created July 6, 2018 02:16 — forked from sonots/cudaMallocBench.cu
Benchmark of cudaMalloc. Allocate 1MB of memory totally with several block sizes
#include <sys/time.h>
#include <cuda_runtime.h>
#include <stdio.h>
inline double seconds()
{
struct timeval tp;
struct timezone tzp;
int i = gettimeofday(&tp, &tzp);
return ((double)tp.tv_sec + (double)tp.tv_usec * 1.e-6);
@andrei-pokrovsky
andrei-pokrovsky / git_notes.md
Created January 4, 2022 22:24 — forked from jaygooby/git_notes.md
Git, you bloody git

Do a pickaxe style search through all your stashes

git stash list -S "the thing" -p

A treasure trove of git scripts

John Wiegley has a ton of useful git scripts which he's collected from around the net: https://github.com/jwiegley/git-scripts

@andrei-pokrovsky
andrei-pokrovsky / nsight.sh
Created June 28, 2022 09:33 — forked from mcarilli/nsight.sh
Favorite nsight systems profiling commands for Pytorch scripts
# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.
# https://developer.nvidia.com/nsight-systems
# https://docs.nvidia.com/nsight-systems/profiling/index.html
# My preferred nsys (command line executable used to create profiles) commands
#
# In your script, write
# torch.cuda.nvtx.range_push("region name")
# ...
@andrei-pokrovsky
andrei-pokrovsky / embed_python.cc
Created July 11, 2022 07:20 — forked from willblatt/embed_python.cc
Example of embedding python into C++ application
/*
VC++ Directories
----------------
Include Directories: C:\Anaconda3\include
Library Directories: C:\Anaconda3\libs
*/
#include <iostream>
@andrei-pokrovsky
andrei-pokrovsky / ipc_demo.py
Created July 14, 2022 15:04 — forked from lebedov/ipc_demo.py
Demonstrate how to pass IPC handles to GPU data between processes in Python
#!/usr/bin/env python
"""
Demonstrate how to pass IPC handles to GPU data between processes in Python.
"""
import ctypes
import numpy as np
import multiprocessing as mp
import zmq
@andrei-pokrovsky
andrei-pokrovsky / .tmux.conf
Created July 15, 2022 10:16 — forked from paulodeleo/.tmux.conf
Tmux configuration to enable mouse scroll and mouse panel select, taken from: http://brainscraps.wikia.com/wiki/Extreme_Multitasking_with_tmux_and_PuTTY
# Make mouse useful in copy mode
setw -g mode-mouse on
# Allow mouse to select which pane to use
set -g mouse-select-pane on
# Allow mouse dragging to resize panes
set -g mouse-resize-pane on
# Allow mouse to select windows
@andrei-pokrovsky
andrei-pokrovsky / commands.md
Created August 16, 2022 14:35 — forked from mcarilli/commands.md
Single- and multiprocess profiling workflow with nvprof and NVVP (Nsight Systems coming soon...)

Ordinary launch commands (no profiling):

Single-process:

python main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/

Multi-process:

python -m torch.distributed.launch  --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/