Skip to content

Instantly share code, notes, and snippets.

@y0ast
y0ast / subset_imagenet.py
Last active Mar 22, 2021
Obtain a dataset with a subset of ImageNet *classes* in PyTorch with minimal changes.
View subset_imagenet.py
import os
from torchvision import datasets, transforms
from torchvision.datasets.folder import IMG_EXTENSIONS, default_loader
def get_imagenet(root, num_classes=100):
class SubDatasetFolder(datasets.DatasetFolder):
def _find_classes(self, dir):
classes = [d.name for d in os.scandir(dir) if d.is_dir()]
classes.sort()
@y0ast
y0ast / train_cifar.py
Last active Apr 7, 2021
Getting high accuracy on CIFAR-10 is not straightforward. This self-contained script gets to 94% accuracy with a minimal setup. You can download a model trained with this script from: https://files.joo.st/cifar_model.pt
View train_cifar.py
import argparse
from tqdm import tqdm
import torch
import torch.nn.functional as F
from torchvision import models, datasets, transforms
def get_CIFAR10(root="./"):
@y0ast
y0ast / Faster MNIST.md
Last active Apr 28, 2021
Train 2-3x faster on MNIST with much less CPU usage by making a few simple changes to the PyTorch provided one.
View Faster MNIST.md

The PyTorch MNIST dataset is SLOW by default, because it wants to conform to the usual interface of returning a PIL image. This is unnecessary if you just want a normalized MNIST and are not interested in image transforms (such as rotation, cropping). By folding the normalization into the dataset initialization you can save your CPU and speed up training by 2-3x.

The bottleneck when training on MNIST with a GPU and a small-ish model is the CPU. In fact, even with six dataloader workers on a six core i7, the GPU utilization is only ~5-10%. Using FastMNIST increases GPU utilization to ~20-25% and reduces CPU utilization to near zero. On my particular model the steps per second with batch size 64 went from ~150 to ~500.

Instead of the default MNIST dataset, use this:

import torch
from torchvision.datasets import MNIST
@y0ast
y0ast / Tutorial.md
Last active Nov 23, 2015
Tutorial for using Torch7 on Amazon EC2 GPUs
View Tutorial.md

There used to be a tutorial here for using Torch7 on EC2, but it's now outdated. It is best to use an EC2 image that already has Torch7 and CUDA stuff preinstalled.