Skip to content

Instantly share code, notes, and snippets.

@WNoxchi
Last active May 27, 2018 08:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save WNoxchi/3213a5be3254229c884000b236800a2e to your computer and use it in GitHub Desktop.
Save WNoxchi/3213a5be3254229c884000b236800a2e to your computer and use it in GitHub Desktop.
A baseline / warmup notebook for L.Smith & J.Howard's training idea, using MNIST.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MNIST Test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"WNixalo 2018/5/19-20;25-27\n",
"\n",
"Making sure I have a working baseline for the MNIST dataset. See [forum thread](http://forums.fast.ai/t/research-collaboration-opportunity-with-leslie-smith/16454/) for motivation. PyTorch version: `0.3.1.post2`\n",
"\n",
"- For a walkthrough on converting binary IDX files to NumPy arrays, see [idx-to-numpy.ipynb](https://github.com/WNoxchi/Kaukasos/blob/master/research/idx-to-numpy.ipynb)\n",
"\n",
"- For a walkthrough debugging several issues with dataloading, see [mnist-dataloader-issue.ipynb](https://github.com/WNoxchi/Kaukasos/blob/master/research/mnist-dataloader-issue.ipynb)\n",
"\n",
"This notebook is in large part a practice stage for a research-oriented work flow.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%reload_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torchvision\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"import numpy as np\n",
"from pathlib import Path\n",
"import os\n",
"import struct # for IDX conversion\n",
"import gzip # for IDX conversion\n",
"from urllib.request import urlretrieve # for IDX conversion\n",
"\n",
"from fastai.conv_learner import * # if you want to use fastai Learner"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"PATH = Path('data/mnist')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"bs = 64\n",
"sz = 28"
]
},
{
"cell_type": "code",
"execution_count": 358,
"metadata": {},
"outputs": [],
"source": [
"def plot_loss(learner, val=None):\n",
" \"\"\"Plots iterations vs loss and learning rate. Plots training or validation.\"\"\"\n",
" lrs = learner.sched.lrs\n",
" x_axis = range(len(lrs))\n",
" loss = learner.sched.losses\n",
" min_loss = min(loss)\n",
" \n",
" fig,ax = plt.subplots(figsize=(14,7))\n",
" ax.set_xlim(left=-20, right=x_axis[-1]+20)\n",
" ax.plot(x_axis, loss, label='loss')\n",
" ax.plot(x_axis, lrs, label='learning rate', color='firebrick');\n",
" ax.set_xlabel('Iterations')\n",
" ax.set_ylabel('Loss & LR')\n",
" \n",
" # Validation Loss\n",
" if val is not None:\n",
" ep_end = len(lrs) // len(val)\n",
" ax.scatter(range(ep_end-1, len(lrs), ep_end), val, c='r', s=20, label='val loss')\n",
" # Minimum Loss\n",
" ax.axhline(y=min_loss, c='r', alpha=0.9, label='Min loss', lw=0.5)\n",
" idx = np.argmin(loss)\n",
" yscal = 1 / (ax.get_ylim()[1] - ax.get_ylim()[0])\n",
" yrltv = (min_loss - ax.get_ylim()[0]) * yscal\n",
" ax.axvline(x=x_axis[idx], ymin=0.5*yrltv, ymax=1.5*yrltv, c='r', alpha=0.9, lw=0.5)\n",
" # 150% Minimum Loss\n",
" idx = np.where(np.array(loss) <= 1.5*min_loss)[0]\n",
" idx = idx[0] if len(idx != 0) else None\n",
" if idx is not None: ax.axvline(x=x_axis[idx], c='slateblue', alpha=0.9, label='50% above Min Loss', lw=0.5)\n",
" # 50% Maximum Loss\n",
" idx = np.where(np.array(loss) <= 0.5*max(loss))[0]\n",
" idx = idx[0] if len(idx != 0) else None\n",
" if idx is not None: ax.axvline(x=x_axis[idx], c='teal', alpha=0.9, label='50% of Max Loss', lw=0.5)\n",
" \n",
" fig.legend(bbox_to_anchor=(0.82,0.82), loc=\"upper right\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.1 PyTorch method:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The basic method for creating a DataLoader in PyTorch. Adapted from [their tutorial](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html?highlight=mnist#) and an older [notebook](https://github.com/WNoxchi/Kaukasos/blob/master/PyTorch/practice-mnist.ipynb). \n",
"- **NOTE** the [normalization values are largely arbitrary](https://discuss.pytorch.org/t/normalization-in-the-mnist-example/457/7?u=wnixalo)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# torchvision datasets are PIL.Image images of range [0,1]. Must trsfm them \n",
"# to Tensors of normalized range [-1,1]\n",
"transform = torchvision.transforms.Compose(\n",
" [torchvision.transforms.ToTensor(),\n",
" torchvision.transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# see: https://gist.github.com/kevinzakka/d33bf8d6c7f06a9d8c76d97a7879f5cb\n",
"# frm: https://github.com/pytorch/pytorch/issues/1106\n",
"\n",
"trainset = torchvision.datasets.MNIST(root=PATH, train=True, download=True,\n",
" transform=transform)\n",
"validset = torchvision.datasets.MNIST(root=PATH, train=True, download=True,\n",
" transform=transform)\n",
"testset = torchvision.datasets.MNIST(root=PATH, train=False, download=True,\n",
" transform=transform)\n",
"p_val = 0.15\n",
"n_val = int(p_val * len(trainset))\n",
"idxs = np.arange(len(trainset))\n",
"np.random.shuffle(idxs)\n",
"train_idxs, valid_idxs = idxs[n_val:], idxs[:n_val]\n",
"train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idxs)\n",
"valid_sampler = torch.utils.data.sampler.SequentialSampler(valid_idxs)\n",
"\n",
"trainloader = torch.utils.data.DataLoader(trainset, batch_size=bs,\n",
" sampler=train_sampler, num_workers=2)\n",
"validloader = torch.utils.data.DataLoader(validset, batch_size=bs,\n",
" sampler=valid_sampler, num_workers=2)\n",
"testloader = torch.utils.data.DataLoader(testset, batch_size=bs, num_workers=2)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"classes = [str(i) for i in range(10)]; classes"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"#### 1.1.1 Aside: DataLoaders – PyTorch & fastai:"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"- See [mnist-dataloader-issue.ipynb](https://github.com/WNoxchi/Kaukasos/blob/master/research/mnist-dataloader-issue.ipynb) for an in depth dive.\n",
"\n",
"The FastAI DataLoader shares some similarities in construction with the PyTorch one. The logic defining pytorch's DataLoader [in the PyTorch source code](https://pytorch.org/docs/master/_modules/torch/utils/data/dataloader.html#DataLoader):\n",
"```\n",
"if batch_sampler is None:\n",
" if sampler is None:\n",
" if shuffle:\n",
" sampler = RandomSampler(dataset)\n",
" else:\n",
" sampler = SequentialSampler(dataset)\n",
" batch_sampler = BatchSampler(sampler, batch_size, drop_last)\n",
"```\n",
"is the same as [that in fast.ai's](https://github.com/fastai/fastai/blob/master/fastai/dataloader.py#L24-43)\n",
"\n",
"```\n",
"if batch_sampler is None:\n",
" if sampler is None:\n",
" sampler = RandomSampler(dataset) if shuffle else SequentialSampler(dataset)\n",
" batch_sampler = BatchSampler(sampler, batch_size, drop_last)\n",
"```\n",
"\n",
"So now I'm not confused about not using a batch sampler when building a pytorch dataloader, although I see one in fastai's DataLoader –– that's because pytorch does it too."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.2 Custom Method (for Fast AI Model Data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This loads and converts the MNIST IDX files into NumPy arrays. For MNIST data this looks to be about 45 MB for the images. This way allows for easy use of FastAI's ModelData class, and thus its (extremely useful) Learner abstraction and all other capabilities that come with it. The arrays can be loaded via: `ImageClassifierData.from_arrays(..)`"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def download_mnist(path=Path('data/mnist')):\n",
" os.makedirs(path, exist_ok=True)\n",
" urls = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',\n",
" 'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',\n",
" 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',\n",
" 'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz',]\n",
" for url in urls:\n",
" fname = url.split('/')[-1]\n",
" if not os.path.exists(path/fname): urlretrieve(url, path/fname)\n",
"\n",
"def read_IDX(fname):\n",
" \"\"\"see: https://gist.github.com/tylerneylon/ce60e8a06e7506ac45788443f7269e40\"\"\"\n",
" with gzip.open(fname) as f:\n",
" zero, data_type, dims = struct.unpack('>HBB', f.read(4))\n",
" shape = tuple(struct.unpack('>I', f.read(4))[0] for d in range(dims))\n",
" return np.frombuffer(f.read(), dtype=np.uint8).reshape(shape)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"download_mnist()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['train-images-idx3-ubyte.gz',\n",
" 't10k-labels-idx1-ubyte.gz',\n",
" 'train-labels-idx1-ubyte.gz',\n",
" 't10k-images-idx3-ubyte.gz']"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fnames = [o for o in os.listdir(PATH) if 'ubyte.gz' in o] # could just use glob\n",
"fnames"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# thanks to: https://stackoverflow.com/a/14849322\n",
"trn_x_idx = [i for i,s in enumerate(fnames) if 'train-imag' in s][0]\n",
"trn_y_idx = [i for i,s in enumerate(fnames) if 'train-lab' in s][0]\n",
"# test data:\n",
"tst_x_idx = [i for i,s in enumerate(fnames) if 't10k-imag' in s][0]\n",
"tst_y_idx = [i for i,s in enumerate(fnames) if 't10k-lab' in s][0]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# load entire IDX files into memory as ndarrays\n",
"train_x_array = read_IDX(PATH/fnames[trn_x_idx])\n",
"train_y_array = read_IDX(PATH/fnames[trn_y_idx])\n",
"# test data:\n",
"test_x_array = read_IDX(PATH/fnames[tst_x_idx])\n",
"test_y_array = read_IDX(PATH/fnames[tst_y_idx])"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(44.86083984375, 0.057220458984375)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# size of numpy arrays in MBs\n",
"train_x_array.nbytes / 2**20, train_y_array.nbytes / 2**20"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3 Fast AI Model Data object"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`inception_stats` have the same Normalization that the pytorch transform above uses for its dataloader. I don't do any data augmentation besides that normalization. I also use the same train/val indices from the pytorch dataloader – to ensure my pytorch model and fastai learner are working on the same data.\n",
"\n",
"Additionally in order to use pretrained models I'm going to concatenate the dataset to have 3 channels instead of 1 by copying dimensions. Another option is to forego a pretrained model and use a fresh resnet set to have only 1 input channel."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"tfms = tfms_from_stats(inception_stats, sz=sz)\n",
"# `inception_stats` are: ([0.5,0.5,0.5],[0.5,0.5,0.5])\n",
"# see: https://github.com/fastai/fastai/blob/master/fastai/transforms.py#L695"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# using same trn/val indices as pytorch dataloader\n",
"valid_x_array, valid_y_array = train_x_array[valid_idxs], train_y_array[valid_idxs]\n",
"train_x_array, train_y_array = train_x_array[train_idxs], train_y_array[train_idxs]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# stack dims for 3 channels\n",
"train_x_array = np.stack((train_x_array, train_x_array, train_x_array), axis=-1)\n",
"valid_x_array = np.stack((valid_x_array, valid_x_array, valid_x_array), axis=-1)\n",
"test_x_array = np.stack((test_x_array, test_x_array, test_x_array), axis=-1)\n",
"# convert labels to np.int8\n",
"train_y_array = train_y_array.astype(np.int8)\n",
"valid_y_array = valid_y_array.astype(np.int8)\n",
"test_y_array = test_y_array.astype(np.int8)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"model_data = ImageClassifierData.from_arrays(PATH, \n",
" (train_x_array, train_y_array), (valid_x_array, valid_y_array),\n",
" bs=bs, tfms=tfms, num_workers=2, test=(test_x_array, test_y_array))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Architecture"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I want to have a \"solid\" simple ConvNet to use throughout these experiments. This model will include a large field-of-view input conv layer followed by several conv layers. Each conv layer uses BatchNorm and Leaky ReLU (I don't know if this is better than ReLU, but it *sounds* like a good'ish idea to me). The model's head uses an AdaptiveConcat Pooling layer (Fast AI invention that concatenates two adaptive average and max pooling layers) leading to a Linear layer. This model doesn't use dropout (I'll add that if it looks like it needs it)."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"class AdaptiveConcatPool2d(nn.Module):\n",
" \"\"\"fast.ai, see: https://github.com/fastai/fastai/tree/master/fastai/layers.py\"\"\"\n",
" def __init__(self, sz=None):\n",
" super().__init__()\n",
" sz = sz or (1,1)\n",
" self.ap = torch.nn.AdaptiveAvgPool2d(sz)\n",
" self.mp = torch.nn.AdaptiveAvgPool2d(sz)\n",
" def forward(self, x):\n",
" return torch.cat([self.mp(x), self.ap(x)], 1)\n",
" \n",
"class Flatten(nn.Module):\n",
" \"\"\"fast.ai, see: https://github.com/fastai/fastai/tree/master/fastai/layers.py\"\"\"\n",
" def __init__(self):\n",
" super().__init__()\n",
" def forward(self, x):\n",
" return x.view(x.size(0), -1)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"class ConvBNLayer(nn.Module):\n",
" \"\"\"conv layer with batchnorm\"\"\"\n",
" def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0):\n",
" super().__init__()\n",
" self.conv = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride)\n",
" self.bn = nn.BatchNorm2d(ch_out, momentum=0.1) # mom at default 0.1\n",
" self.lrelu = nn.LeakyReLU(0.01, inplace=True) # neg slope at default 0.01\n",
" def forward(self, x): return self.lrelu(self.bn(self.conv(x)))\n",
"\n",
"class ConvNet(nn.Module):\n",
" # see ref: https://github.com/fastai/fastai/blob/master/fastai/models/darknet.py\n",
" def __init__(self, ch_in=1):\n",
" super().__init__()\n",
" self.conv0 = ConvBNLayer(ch_in, 16, kernel_size=7, stride=1, padding=2) # large FoV Conv\n",
" self.conv1 = ConvBNLayer(16, 32)\n",
" self.conv2 = ConvBNLayer(32, 64)\n",
" self.conv3 = ConvBNLayer(64, 128)\n",
" self.neck = nn.Sequential(*[AdaptiveConcatPool2d(1), Flatten()])\n",
" self.head = nn.Sequential(*[nn.BatchNorm2d(256), \n",
" nn.Dropout(p=0.25),\n",
" nn.Linear(256, 10)]) \n",
" def forward(self, x):\n",
" x = self.conv0(x)\n",
" x = self.conv1(x)\n",
" x = self.conv2(x)\n",
" x = self.conv3(x)\n",
" x = self.neck(x)\n",
" x = self.head(x)\n",
" return F.log_softmax(x, dim=-1)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"convnet = ConvNet()"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"#### 2.0.1 Aside: Discovering AdaptiveConcatPool doubles input tensor length"
]
},
{
"cell_type": "code",
"execution_count": 216,
"metadata": {
"hidden": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"> <ipython-input-204-3df4356516d4>(24)forward()\n",
"-> x = self.conv0(x)\n",
"(Pdb) n\n",
"> <ipython-input-204-3df4356516d4>(25)forward()\n",
"-> x = self.conv1(x)\n",
"(Pdb) n\n",
"> <ipython-input-204-3df4356516d4>(26)forward()\n",
"-> x = self.conv2(x)\n",
"(Pdb) n\n",
"> <ipython-input-204-3df4356516d4>(27)forward()\n",
"-> x = self.conv3(x)\n",
"(Pdb) n\n",
"> <ipython-input-204-3df4356516d4>(28)forward()\n",
"-> x = self.neck(x)\n",
"(Pdb) x.shape # sanity check\n",
"torch.Size([64, 128, 16, 16])\n",
"(Pdb) AdaptiveConcatPool2d(1)(x).shape\n",
"torch.Size([64, 256, 1, 1])\n",
"(Pdb) q\n"
]
},
{
"ename": "BdbQuit",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mBdbQuit\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-216-965816993670>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtrainloader\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mVariable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mVariable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mconvnet\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m~/Miniconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *input, **kwargs)\u001b[0m\n\u001b[1;32m 355\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_slow_forward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0minput\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 356\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 357\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mforward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0minput\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 358\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mhook\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_forward_hooks\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 359\u001b[0m \u001b[0mhook_result\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhook\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minput\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m<ipython-input-204-3df4356516d4>\u001b[0m in \u001b[0;36mforward\u001b[0;34m(self, x)\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconv2\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 27\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconv3\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 28\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mneck\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 29\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mF\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlog_softmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdim\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m<ipython-input-204-3df4356516d4>\u001b[0m in \u001b[0;36mforward\u001b[0;34m(self, x)\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconv2\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 27\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconv3\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 28\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mneck\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 29\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mF\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlog_softmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdim\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/Miniconda3/envs/fastai/lib/python3.6/bdb.py\u001b[0m in \u001b[0;36mtrace_dispatch\u001b[0;34m(self, frame, event, arg)\u001b[0m\n\u001b[1;32m 49\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;31m# None\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 50\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mevent\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'line'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 51\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdispatch_line\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mframe\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 52\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mevent\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'call'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 53\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdispatch_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mframe\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/Miniconda3/envs/fastai/lib/python3.6/bdb.py\u001b[0m in \u001b[0;36mdispatch_line\u001b[0;34m(self, frame)\u001b[0m\n\u001b[1;32m 68\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstop_here\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mframe\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbreak_here\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mframe\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 69\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0muser_line\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mframe\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 70\u001b[0;31m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mquitting\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mBdbQuit\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 71\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrace_dispatch\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 72\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mBdbQuit\u001b[0m: "
]
}
],
"source": [
"x,y = next(iter(trainloader))\n",
"x,y = Variable(x), Variable(y)\n",
"convnet(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 Fast AI Learner\n",
"\n",
"I'll use two fast.ai learners: the basic convnet defined above that the pytorch model will also use, and a resnet18. I'll also use an ImageNet-pretrained resnet18 to see if that helps at all. If `.pretrained` is not called, you will need to either use `ConvnetBuilder` or define a custom head yourself. **NOTE** also that the standard pytorch ResNet model has a 7x7 ouput pooling layer by default, which may restrict your model's performance if it's not replaced (such as with ConvnetBuilder).\n",
"\n",
"The non-pretrained learner's will need their conv layers unfrozen to train them."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(10, False, False)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_data.c, model_data.is_multi, model_data.is_reg"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"resnet_model = ConvnetBuilder(resnet18, model_data.c, model_data.is_multi, model_data.is_reg, pretrained=False)\n",
"\n",
"resnet_learner = ConvLearner(model_data, resnet_model)\n",
"custom_learner = ConvLearner.from_model_data(ConvNet(ch_in=3), model_data)\n",
"pt_res_learner = ConvLearner.pretrained(resnet18, model_data, metrics=[accuracy]) ## NOTE: metrics=[accuracy] not needed - is default"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"#### 2.1.1 Aside: Layers"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Again, the learners' conv layers are initially frozen:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"True in [[layer.trainable for layer in layer_group] for layer_group in resnet_learner.get_layer_groups()]"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"By default only the 'head' classification layer is trainable:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[[False, False, False, False, False, False],\n",
" [False, False, False, False],\n",
" [True, True, True, True, True, True, True, True]]"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[[layer.trainable for layer in layer_group] for layer_group in resnet_learner.get_layer_groups()]"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Construct the custom learner with ConvnetBuilder in order to make it's layers iterable:"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"hidden": true
},
"outputs": [
{
"ename": "TypeError",
"evalue": "'ConvBNLayer' object is not iterable",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-66-e14f1b642468>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mlayer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrainable\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mlayer\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mlayer_group\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mlayer_group\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mcustom_learner\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_layer_groups\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-66-e14f1b642468>\u001b[0m in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mlayer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrainable\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mlayer\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mlayer_group\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mlayer_group\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mcustom_learner\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_layer_groups\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: 'ConvBNLayer' object is not iterable"
]
}
],
"source": [
"[[layer.trainable for layer in layer_group] for layer_group in custom_learner.get_layer_groups()]"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"<fastai.core.BasicModel at 0x133b41c50>"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"custom_learner.models"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"<fastai.conv_learner.ConvnetBuilder at 0x13087b4e0>"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"resnet_learner.models"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"# custom_learner"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"hidden": true,
"scrolled": true
},
"outputs": [],
"source": [
"# resnet_learner"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"hidden": true,
"scrolled": true
},
"outputs": [],
"source": [
"# pt_res_learner"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.1.2 Recap: Models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll be comparing 4 models:\n",
"1. **`convnet`** a 1-input channel custom CNN trained in straight PyTorch\n",
"2. **`custom_learner`** a 3-input channel custom CNN trained with Fast AI\n",
"3. **`resnet_learner`** a 3-input channel fresh ResNet18 trained with Fast AI\n",
"4. **`pt_res_learner`** a 3-input channel pretrained (ImageNet) ResNet18 trained with Fast AI.\n",
"\n",
"Perhaps it'd be a good idea to replace the fresh ResNet18's input layer with a 1-channel input to compare it directly to the custom CNN. That's for a future run if I or anyone chooses to do so."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Loss Function\n",
"\n",
"[`torch.nn.CrossEntropyLoss`](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/loss.py#L701)\n",
"\n",
"Do `nn.functional.` loss functions go in the architecture, and `nn.` loss functions become criterion? [Huh, interesting. It calls `nn.functional.`](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/loss.py#L778)."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"criterion = torch.nn.NLLLoss() # log_softmax already in arch; nll(log_softmax) <=> CE\n",
"optimizer = torch.optim.SGD(convnet.parameters(), lr=0.01, momentum=0.9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Fast.ai Learners:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"custom_learner.crit"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"resnet_learner.crit"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pt_res_learner.crit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Training"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As far as I know, training in base PyTorch is tedious, so I'll do a sanity-check of it first, then do all my training with Fast AI. See ref: §4: Training or §9.1: Train ConvNet & ConvNetMod in [this notebook](https://github.com/WNoxchi/Kaukasos/blob/master/PyTorch/practice-mnist.ipynb).\n",
"\n",
"There are ways to implement learning-rate scheduling and other advanced techniques in PyTorch – but by that point unless you're doing it for practice or testing a new module: *that's what Fast.AI is for*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.1 base PyTorch"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"797"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(trainloader) # ceil(51,000 / bs) batches"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are more improvements to doing train / valid phases – including learning rate scheduling and automatically saving best weights (see: [pytorch tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html?highlight=dataloaders#load-data)) – but that's what fast.ai's for. I'll practice those in the future. Also since the FastAI library is pending an update to PyTorch 0.4, `torch.set_grad_enabled` can't be used for inference mode. Instead I follow the advice on this [pytorch forum thread](https://discuss.pytorch.org/t/resolved-validation-loss/3501). For now:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<torch.optim.sgd.SGD at 0x7f54e1448550>"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"optimizer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**NOTE 1** the criterion and optimizer need to be initialized *after* the model is sent to the GPU if it is. See [pytorch thread](https://discuss.pytorch.org/t/effect-of-calling-model-cuda-after-constructing-an-optimizer/15165).\n",
"\n",
"**NOTE 2**: `Variable.volatile = True` can only be set immediately after a Variable is created. See [pytorch thread](https://discuss.pytorch.org/t/runtimeerror-volatile-can-only-be-set-on-leaf-variables/15338/2?u=wnixalo). (this is for using a validation set and *not* affecting the gradients) – I got this error when trying to set `.volatile=True` after sending the val data to GPU (`torch.FloatTensor` $\\rightarrow$ `torch.cuda.FloatTensor`)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"def train(model=None, crit=None, trainloader=None, valloader=None, num_epochs=1, verbose=True):\n",
" # if verbose:\n",
" # displays = 5\n",
" # display_step = max(len(dataloader) // displays, 1)\n",
" t0 = time.time()\n",
" \n",
" dataloaders = {'train':trainloader}\n",
" if valloader: dataloaders['valid'] = valloader\n",
" \n",
"# model.to('cuda:0' if torch.cuda.is_available() else 'cpu') # pytorch >= 0.4\n",
" to_gpu(model)\n",
" criterion = torch.nn.NLLLoss() # log_softmax already in arch; nll(log_softmax) <=> CE\n",
" optimizer = torch.optim.SGD(convnet.parameters(), lr=0.01, momentum=0.9)\n",
" \n",
" # epoch w/ train & val phases\n",
" for epoch in range(num_epochs): \n",
" print(f'Epoch {epoch+1}/{num_epochs}\\n{\"-\"*10}')\n",
" \n",
" for phase in dataloaders:\n",
" running_loss = 0.0\n",
" running_correct = 0\n",
" \n",
" for i,datum in enumerate(dataloaders[phase]):\n",
" inputs, labels = datum\n",
" inputs, labels = torch.autograd.Variable(inputs), torch.autograd.Variable(labels)\n",
" \n",
" # zero param gradients\n",
" optimizer.zero_grad()\n",
"\n",
" # (forward) track history if train\n",
" # with torch.set_grad_enabled(phase=='train'): # pytorch >= 0.4\n",
" if phase == 'valid': # pytorch 3.1 #\n",
" inputs.volatile=True #\n",
" labels.volatile=True #\n",
" # send data to gpu\n",
" inputs, labels = to_gpu(inputs), to_gpu(labels) # pytorch < 0.4\n",
" outputs = model(inputs) #\n",
" loss = crit(outputs, labels) #\n",
" _, preds= torch.max(outputs, 1) # for accuracy metric\n",
" #\n",
" # backward & optimize if train #\n",
" if phase == 'train': #\n",
" loss.backward() #\n",
" optimizer.step() # indent for pytorch >= 0.4\n",
"\n",
" # stats\n",
"# pdb.set_trace()\n",
" running_loss += loss.data[0]\n",
" running_correct += torch.sum(preds == V(labels.data)) # wrap in V; pytorch 3.1\n",
" \n",
" epoch_loss = running_loss / len(dataloaders[phase])\n",
"# if phase == 'valid': pdb.set_trace()\n",
" epoch_acc = float(running_correct.double() / len(dataloaders[phase])) # ? pytorch 3.1 reqs float conversion?\n",
"# pdb.set_trace()\n",
" print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')\n",
" \n",
" time_elapsed = time.time() - t0\n",
" print(f'Training Time {num_epochs} Epochs: {time_elapsed:.3f}s')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Manual PyTorch train / val training phases. See: [pytorch tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html?highlight=validation#training-the-model)\n",
"\n",
"*(forward) track history only if in train:*\n",
"```\n",
"with torch.set_grad_enabled(False):\n",
" outputs = model(inputs)\n",
" _, preds = torch.max(outputs, 1)\n",
" loss = criterion(outputs, labels)\n",
"```\n",
"*backward + optimize only if in training phase*\n",
"```\n",
" if phase == 'train':\n",
" loss.backward()\n",
" optimizer.step()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**NOTE**: I think I'm doing something wrong with the validation phase. [Saving](https://discuss.pytorch.org/t/saving-and-loading-a-model-in-pytorch/2610/7?u=wnixalo). [PyTorch Docs on Saving](https://pytorch.org/docs/master/notes/serialization.html)."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/1\n",
"----------\n",
"train Loss: 0.1861 Acc: 0.2334\n",
"valid Loss: 0.0878 Acc: 0.4610\n",
"Training Time 1 Epochs: 17.535s\n"
]
}
],
"source": [
"train(model=convnet, crit=criterion, trainloader=trainloader, valloader=validloader)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Previous run on CPU:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/1\n",
"----------\n",
"train Loss: 0.1932 Acc: 0.0540\n",
"valid Loss: 0.0766 Acc: 0.7518\n",
"Training Time 1 Epochs: 230.497s\n"
]
}
],
"source": [
"# train(model=convnet, crit=criterion, trainloader=trainloader, valloader=validloader)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"torch.save(convnet.state_dict(), 'convnet_mnist_base.pth')"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"convnet.load_state_dict(torch.load('convnet_mnist_base.pth'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.2 with Fast AI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.2.1 Finding Learning Rates\n",
"\n",
"To keep things simple, I won't be using [1-Cycle](http://forums.fast.ai/t/the-1cycle-policy-an-experiment-that-investigate-super-convergence-phenomenon-described-in-leslie-smiths-research/14737), [Progressive Resizing](http://www.fast.ai/2018/04/30/dawnbench-fastai/#imagenet), or much in the way of [Cyclical Learning Rates](https://arxiv.org/abs/1506.01186). That could be a topic for later runs."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('int8')"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_data.trn_ds.get1item(0)[1].dtype"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d071c93c053f4c3281953b9cfce1c7e5",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" 84%|████████▍ | 673/797 [00:17<00:03, 37.97it/s, loss=1.06] "
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"custom_learner.lr_find()\n",
"custom_learner.sched.plot()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"custom_learner.sched.plot_lr()"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# next(iter(model_data.get_dl(model_data.trn_ds, False)))"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a32d11af2ced4545ac7dd8303ad23f6a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" 82%|████████▏ | 653/797 [00:14<00:03, 44.88it/s, loss=2.78] "
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"resnet_learner.lr_find()\n",
"resnet_learner.sched.plot()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3674e09939a94f9498f87922ff980c74",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" 82%|████████▏ | 653/797 [00:14<00:03, 45.07it/s, loss=3.6] "
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\r",
" 82%|████████▏ | 653/797 [00:25<00:05, 25.36it/s, loss=3.6]"
]
}
],
"source": [
"pt_res_learner.lr_find()\n",
"pt_res_learner.sched.plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll use `1e-2` as the `lr` for all of them."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"lrs = 1e-2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.2.2 `custom_learner`"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[True, True, True, True, True, True]"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# checking all conv layers are being trained:\n",
"[layer.trainable for layer in custom_learner.models.get_layer_groups()]"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2439d63c387f45ed8be876294f4c041c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.088194 0.068054 0.980333 \n",
"CPU times: user 20.2 s, sys: 7.77 s, total: 28 s\n",
"Wall time: 22.8 s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.06805]), 0.9803333334392972]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time custom_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x504 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_metrics(custom_learner)"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"#### 4.2.2.1 *Aside*: Fast.ai Automatic LR scaling:"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Just noticed this very useful feature. Even at very stripped-down settings, Fastai still 'revs' the learning rate up during train-start and back down before train-end:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"custom_learner.sched.plot_lr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.2.3 `resnet_learner`"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[False, False, True]"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[layer[0].trainable for layer in resnet_learner.models.get_layer_groups()]"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"resnet_learner.unfreeze()"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[True, True, True]"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[layer[0].trainable for layer in resnet_learner.models.get_layer_groups()]"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "61f5573668f64e8d9e40eaa80d2798be",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.087478 0.05272 0.983444 \n",
"CPU times: user 39.5 s, sys: 15.5 s, total: 55.1 s\n",
"Wall time: 49.7 s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.05272]), 0.9834444443914625]"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time resnet_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x504 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_metrics(resnet_learner)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.2.4 `pt_res_learner`"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "dcff30fd68cf48e2a0f98180960a23fa",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.554677 0.58673 0.891556 \n",
"CPU times: user 19.6 s, sys: 6.02 s, total: 25.6 s\n",
"Wall time: 20.4 s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.58673]), 0.8915555556085375]"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# only training classifier head\n",
"%time pt_res_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 199,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5546770245500905"
]
},
"execution_count": 199,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# min(pt_res_learner.sched.losses)\n",
"pt_res_learner.sched.losses[-1]"
]
},
{
"cell_type": "code",
"execution_count": 200,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.5867299038039313]"
]
},
"execution_count": 200,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pt_res_learner.sched.val_losses"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x504 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_metrics(pt_res_learner)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.1 PyTorch convnet"
]
},
{
"cell_type": "code",
"execution_count": 182,
"metadata": {},
"outputs": [],
"source": [
"x,y = next(iter(testloader)) # shape: ([64,1,28,28]; [64])\n",
"out = convnet(V(x)) # shape: ([64, 10])"
]
},
{
"cell_type": "code",
"execution_count": 183,
"metadata": {},
"outputs": [],
"source": [
"_, preds = torch.max(out.data, 1)"
]
},
{
"cell_type": "code",
"execution_count": 184,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(7, 7), (2, 2), (1, 1), (0, 0), (4, 4), (1, 1), (4, 4), (9, 9), (5, 5)]"
]
},
"execution_count": 184,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(zip(preds[:9], y[:9]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cool, even with that little training it's able to get a lot right."
]
},
{
"cell_type": "code",
"execution_count": 187,
"metadata": {},
"outputs": [],
"source": [
"def test_pytorch(model, dataloader):\n",
" \"\"\"evaluation script. Returns tuple: (list of predictions, ratio correct)\"\"\"\n",
" correct = 0\n",
" total = 0\n",
" \n",
" predictions = []\n",
"\n",
" for batch in dataloader:\n",
" images, labels = batch ## could also go w: testloader.dataset.test_labels\n",
" images, labels = to_gpu(images), to_gpu(labels)\n",
" outputs = convnet(Variable(images))\n",
" _, preds = torch.max(outputs.data, 1)\n",
" total += labels.size(0)\n",
" correct += (preds == labels).sum()\n",
" \n",
" predictions.extend(preds)\n",
" \n",
" return predictions, correct/total"
]
},
{
"cell_type": "code",
"execution_count": 364,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9744444444444444"
]
},
"execution_count": 364,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"preds, val_acc = test_pytorch(convnet, validloader)\n",
"val_acc"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9783"
]
},
"execution_count": 188,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"preds, test_acc = test_pytorch(convnet, testloader)\n",
"test_acc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"97-98% accuracy on test set. Just checking:"
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(7, 7), (2, 2), (1, 1), (0, 0), (4, 4), (1, 1), (4, 4), (9, 9), (5, 5)]"
]
},
"execution_count": 189,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"_,y = next(iter(testloader))\n",
"list(zip(preds[:9], y[:9]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2 `custom_learner`"
]
},
{
"cell_type": "code",
"execution_count": 191,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9819"
]
},
"execution_count": 191,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get output predictions\n",
"log_preds = custom_learner.predict(is_test=True)\n",
"# compare top-scoring preds against dataset\n",
"np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"#### 5.2.1 Aside: (untrained) `custom_learner` Sanity Checks:"
]
},
{
"cell_type": "code",
"execution_count": 195,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"## 2-3 ways to do the same thing\n",
"# log_preds_dl = custom_learner.predict_dl(testloader) # make sure num channels correct before trying this; havent tested\n",
"log_preds_dl = custom_learner.predict_dl(model_data.test_dl)\n",
"log_preds = custom_learner.predict(is_test=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"I had some confusion. You *do* take the max as the top prediction; to get the actual probabilities, since it's a log softmax ouput, you exponentiate. "
]
},
{
"cell_type": "code",
"execution_count": 196,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"((10000, 10), (10000, 10))"
]
},
"execution_count": 196,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_preds_dl.shape, log_preds.shape # same shape"
]
},
{
"cell_type": "code",
"execution_count": 199,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"array([ True])"
]
},
"execution_count": 199,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.unique(log_preds_dl == log_preds) # same values"
]
},
{
"cell_type": "code",
"execution_count": 232,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"\n",
" 7\n",
" 2\n",
" 1\n",
"⋮ \n",
" 4\n",
" 5\n",
" 6\n",
"[torch.LongTensor of size 10000]"
]
},
"execution_count": 232,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 236,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"0.0892"
]
},
"execution_count": 236,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.equal(testloader.dataset.test_labels, np.argmax(log_preds, axis=1)).sum() / len(testloader.dataset.test_labels)"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Untrained CNN gets sub-random (< 10%) accuracy. No surprise, it only ever guesses '5', and sometimes '4':"
]
},
{
"cell_type": "code",
"execution_count": 242,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"({4, 5}, array([5, 5, 5, ..., 5, 5, 5]))"
]
},
"execution_count": 242,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(np.argmax(log_preds, axis=1)), np.argmax(log_preds, axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.3 `resnet_learner`"
]
},
{
"cell_type": "code",
"execution_count": 192,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9863"
]
},
"execution_count": 192,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_preds = resnet_learner.predict(is_test=True)\n",
"np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.4 `pt_res_learner`"
]
},
{
"cell_type": "code",
"execution_count": 193,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8923"
]
},
"execution_count": 193,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_preds = pt_res_learner.predict(is_test=True)\n",
"np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Further Training & Testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Seeing how far I can go (simply) before overfitting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6.1 `custom_learner`:"
]
},
{
"cell_type": "code",
"execution_count": 273,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0f4832822b874563bd6aa4e31f06b1f0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=2), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.067517 0.049205 0.986222 \n",
" 1 0.050665 0.043011 0.987444 \n",
"CPU times: user 41.1 s, sys: 15.1 s, total: 56.1 s\n",
"Wall time: 45.4 s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.04301]), 0.9874444445504083]"
]
},
"execution_count": 273,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# prev trn/val loss & valacc: 0.088194 0.068054 0.980333 \n",
"%time custom_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 336,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "aa0d00dbf7b74e7d8ac02bb897802266",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=4), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.043123 0.036729 0.989444 \n",
" 1 0.043052 0.033036 0.989778 \n",
" 2 0.033544 0.030643 0.990889 \n",
" 3 0.043682 0.030089 0.990556 \n",
"CPU times: user 1min 22s, sys: 30.8 s, total: 1min 53s\n",
"Wall time: 1min 31s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.03009]), 0.9905555556615193]"
]
},
"execution_count": 336,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time custom_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 361,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x504 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_loss(custom_learner, val=custom_learner.sched.val_losses)"
]
},
{
"cell_type": "code",
"execution_count": 338,
"metadata": {},
"outputs": [],
"source": [
"custom_learner.save('customcnn_mnist_acc_99056')"
]
},
{
"cell_type": "code",
"execution_count": 339,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9892"
]
},
"execution_count": 339,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_preds = custom_learner.predict(is_test=True)\n",
"np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I think that's good enough for an MNIST warm up."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6.2 `resnet_learner`:"
]
},
{
"cell_type": "code",
"execution_count": 342,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "f1adc23e26b74dfdaa0e8c858a41e77f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=2), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.063101 0.038616 0.988444 \n",
" 1 0.041075 0.034616 0.990222 \n",
"CPU times: user 1min 20s, sys: 30.1 s, total: 1min 50s\n",
"Wall time: 1min 39s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.03462]), 0.9902222221692403]"
]
},
"execution_count": 342,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time resnet_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 343,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e10107b5aba643c9bd83aac54ffc8742",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=4), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.039452 0.030857 0.990667 \n",
" 1 0.032786 0.028692 0.992111 \n",
" 2 0.024677 0.029187 0.991778 \n",
" 3 0.02215 0.028211 0.991333 \n",
"CPU times: user 2min 39s, sys: 1min 1s, total: 3min 41s\n",
"Wall time: 3min 19s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.02821]), 0.9913333334392972]"
]
},
"execution_count": 343,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time resnet_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 360,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x504 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_loss(resnet_learner, val=resnet_learner.sched.val_losses)"
]
},
{
"cell_type": "code",
"execution_count": 345,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9931"
]
},
"execution_count": 345,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_preds = resnet_learner.predict(is_test=True)\n",
"np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6.3 `pt_res_learner`:"
]
},
{
"cell_type": "code",
"execution_count": 346,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3386eb97d80943b294383c1c63790887",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=2), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.499828 0.521596 0.908778 \n",
" 1 0.456638 0.385642 0.914556 \n",
"CPU times: user 39.5 s, sys: 12.2 s, total: 51.7 s\n",
"Wall time: 40.9 s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.38564]), 0.9145555555025736]"
]
},
"execution_count": 346,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time pt_res_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 347,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "053282fc8c4b45d3b822fa0ebb3ae8f3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(IntProgress(value=0, description='Epoch', max=4), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch trn_loss val_loss accuracy \n",
" 0 0.450119 0.430365 0.917333 \n",
" 1 0.435357 0.407292 0.922667 \n",
" 2 0.412722 0.429438 0.923556 \n",
" 3 0.411739 0.334759 0.925889 \n",
"CPU times: user 1min 18s, sys: 24.3 s, total: 1min 43s\n",
"Wall time: 1min 21s\n"
]
},
{
"data": {
"text/plain": [
"[array([0.33476]), 0.9258888889948527]"
]
},
"execution_count": 347,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time pt_res_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)"
]
},
{
"cell_type": "code",
"execution_count": 359,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x504 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_loss(pt_res_learner, val=pt_res_learner.sched.val_losses)"
]
},
{
"cell_type": "code",
"execution_count": 365,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9236"
]
},
"execution_count": 365,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_preds = pt_res_learner.predict(is_test=True)\n",
"np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Comparisons & Thoughts"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With single-epoch test set accuracies already in the 90%s, I'm not sure how useful a standard-regime baseline with MNIST will be.\n",
"\n",
"What has been *extremely* valuable was the practice setting this up has been. With pytorch, with fastai callbacks, with data processing, and a lot else. This'll hopefully make the next experiments with CIFAR-10 and ImageNet much smoother and to the point.\n",
"\n",
"### **Stats**:\n",
"\n",
"- The custom CNN model **`convnet`** in a simple pytorch training loop achieved a **97.83**% test - accuracy after 1 epoch. I think I wrote the validation procedure wrong (current Pytorch documentation is for version 0.4; I'm working with 0.3.1), nonetheless a val loss of **0.0878** was recorded after 1 epoch.\n",
"\n",
"- The custom CNN learner **`custom_learner`** achieved a **98.92**% test accuracy after 7 epochs of training, **98.19**% after only 1. Validation Loss (ep 7,1): **0.030089, 0.068054**\n",
"\n",
"- The fresh ResNet18 learner **`resnet_learner`** achieved a **99.31**% test accuracy after 7, and **98.63**% after 1. Validation Loss (ep 7,1): **0.028211, 0.05272**\n",
"\n",
"- The pretrained ResNet18 learner **`pt_res_learner`** (training only the classifier head) achieved a **92.36**% test accuracy after 7, and **89.23**% after 1. Validation Loss (ep 7,1): **0.334759, 0.58673**\n",
"\n",
"No model overfit, and only the fresh ResNet18 learner had a training loss better than validation. All learners appeared to be beginning to bottom-out in validation loss roughly around **0.3**, maintaining the default Cosine Annealing learning-rate schedule fastai uses.\n",
"\n",
"In looking up what default LR scheduler fastai uses: apparently *fastai has a built-in [`SaveBestModel` callback](https://github.com/fastai/fastai/tree/master/fastai/sgdr.py#L331)* in sgdr.py.\n",
"\n",
"|model/learner|1-epoch val loss|7-epoch val loss|1-epoch test accuracy|7-epoch test accuracy|\n",
"|-|:-|-|-|-|\n",
"|`convnet`|0.0878|–|97.83%|–|\n",
"|`custom_learner`|0.068054|0.030089|98.19%|98.92%|\n",
"|`resnet_learner`|**0.05272**|**0.028211**|**98.63**%|**99.31**%|\n",
"|`pt_res_learner`|0.58673|0.334759|89.23%|92.36%|"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment