Skip to content

Instantly share code, notes, and snippets.

View mkolod's full-sized avatar

Marek Kolodziej mkolod

  • San Francisco Bay Area, CA
View GitHub Profile
#include <chrono>
#include <iostream>
#include <vector>
#include <thread>
__global__ void do_nothing(int time_us, int clock_rate) {
clock_t start = clock64();
clock_t end;
for (;;) {
end = clock64();
@mkolod
mkolod / redirect_streams_and_cuda_checks.cu
Last active September 27, 2020 04:12
Redirect Streams and CUDA checks
#include <csignal>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <unistd.h>
#include <limits.h>
#include <iostream>
#include <sstream>
#include <stdexcept>

ImageNet validation set fix:

  1. The training set is organized in directories, with each directory matching a class, e.g. "n01751748" matching "sea snake." However, the valset is a flat dir of JPEGs. The ImageNet labels provided in the devkit for the validation set (ILSVRC2012_validation_ground_truth.txt) are not consistent with the ordering used by PyTorch/TF/Keras/MXNet/Caffe, etc. for pre-trained models. For example, in the the above ground truth label file, "sea snake" is 490, but in PyTorch/TF, it's 65.
    Proof:
  2. Untar the valset file, you will get a flat dir of JPEGs.
  3. Pull in the unflattening script into the directory where the val images were unpa
import torch.cuda
import torch.cuda.memory as cumem
import sys
import ctypes as C
GB = 1 << 30
def get_cuda_memory():
handle = C.cdll.LoadLibrary("libcudart.so")
free, total = C.c_long(), C.c_long()
" Use Vim settings, rather then Vi settings (much better!).
" This must be first, because it changes other options as a side effect.
set nocompatible
" TODO: this may not be in the correct place. It is intended to allow overriding <Leader>.
" source ~/.vimrc.before if it exists.
if filereadable(expand("~/.vimrc.before"))
source ~/.vimrc.before
endif
@mkolod
mkolod / profile.py
Created December 13, 2018 00:12 — forked from dojoteef/profile.py
A CUDA memory profiler for pytorch
'''
Memory profiling utilities
'''
import gc
import inspect
import linecache
import os.path
import sys
import time
import threading
import onnx
import sys
name = sys.argv[1]
model = onnx.load(name)
onnx.checker.check_model(model)
print(onnx.helper.printable_graph(model.graph))
RUN wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
RUN chmod +x cuda_10.2.89_440.33.01_linux.run && \
./cuda_10.2.89_440.33.01_linux.run --silent --toolkit
# After installing CUDA, you need to add the env vars below, otherwise you'll get CUDA runtime version/driver
# version issue, even if you're using nvidia-docker.
ENV LIBRARY_PATH=/usr/local/cuda/lib64/stubs
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
# precondition: d is power of 2
def mod_pow2(n, d):
return ( n & (d-1) )
print(n, "mod" , d , "is", mod_pow2(6, 4))
#include <iostream>
#include <sys/mman.h>
template<typename T>
T* pinned_malloc(size_t num_elem) {
size_t num_bytes = num_elem * sizeof(T);
T *ptr = static_cast<T *>(malloc(num_bytes));
if (ptr == nullptr) {