Skip to content

Instantly share code, notes, and snippets.

@nervanazoo
Created February 9, 2016 21:24
Show Gist options
  • Save nervanazoo/47198f475260e77f64fe to your computer and use it in GitHub Desktop.
Save nervanazoo/47198f475260e77f64fe to your computer and use it in GitHub Desktop.
neon all cnn cifar10 implementation

##Model

This is an implementation of a deep convolutional neural network model inspired by the paper Springenberg, Dosovitskiy, Brox, Riedmiller 2014.

Model script

The model run script is included below (cifar10_allcnn.py).

Trained weights

The trained weights file can be downloaded from AWS (cifar10_allcnn_e350.p)

Performance

This model is acheiving 89.5% top-1 accuracy on the validation data set. This is done using zca whitened, global contrast normalized data, without crops or flips. This is the same performance we achieve running the same model configuration and data through Caffe.

Instructions

This script was tested with the neon commit SHA e7ab2c2e2.
Make sure that your local repo is synced to this commit and run the installation procedure before proceeding.

If neon is installed into a virtualenv, make sure that it is activated before running the commands below. Also, the commands below use the GPU backend by default so add -b cpu if you are running on a system without a compatible GPU.

To test the model performance on the validation data set use the following command:

python cifar10_allcnn.py --model_file cifar10_allcnn_e350.p -eval 1

To train the model from scratch for 350 epochs, use the command:

python cifar10_allcnn.py -b gpu -e 350 -s cifar10_allcnn_trained.p

Additional options are available to add features like saving checkpoints and displaying logging information, use the --help option for details.

Benchmarks

Machine and GPU specs:

Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
Ubuntu 14.04.2 LTS
GPU: GeForce GTX TITAN X
CUDA Driver Version 7.0

The run times for the fprop and bprop pass are given in the table below. The same model configuration is used in neon and caffe. 50 iterations are timed in each framework and only the mean value is reported.

-------------------------------------------
|    Func     | neon (mean) | caffe (mean)|
-------------------------------------------
| fprop       |    14 ms    |    19 ms    |
| bprop       |    34 ms    |    65 ms    |
| update      |     3 ms    |    *        | 
| iteration   |    51 ms    |    85 ms    |
-------------------------------------------
* caffe update operation may be included in bprop or iteration time but is not individually timed.

Citation

Jost Tobias Springenberg,  Alexey Dosovitskiy, Thomas Brox and Martin A. Riedmiller. 
Striving for Simplicity: The All Convolutional Net. 
arXiv preprint arXiv:1412.6806, 2014.
#!/usr/bin/env python
# ----------------------------------------------------------------------------
# Copyright 2015 Nervana Systems Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------
"""
AllCNN style convnet on CIFAR10 data.
Reference:
Striving for Simplicity: the All Convolutional Net `[Springenberg2015]`_
.. _[Springenber2015]: http://arxiv.org/pdf/1412.6806.pdf
"""
from neon.initializers import Gaussian
from neon.optimizers import GradientDescentMomentum, Schedule
from neon.layers import Conv, Dropout, Activation, Pooling, GeneralizedCost
from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, Misclassification
from neon.models import Model
from neon.data import ArrayIterator, load_cifar10
from neon.callbacks.callbacks import Callbacks
from neon.util.argparser import NeonArgparser
# parse the command line arguments
parser = NeonArgparser(__doc__)
parser.add_argument("--learning_rate", default=0.05, help="initial learning rate")
parser.add_argument("--weight_decay", default=0.001, help="weight decay")
parser.add_argument('--deconv', action='store_true',
help='save visualization data from deconvolution')
args = parser.parse_args()
# hyperparameters
num_epochs = args.epochs
(X_train, y_train), (X_test, y_test), nclass = load_cifar10(path=args.data_dir,
normalize=False,
contrast_normalize=True,
whiten=True)
# really 10 classes, pad to nearest power of 2 to match conv output
train_set = ArrayIterator(X_train, y_train, nclass=16, lshape=(3, 32, 32))
valid_set = ArrayIterator(X_test, y_test, nclass=16, lshape=(3, 32, 32))
init_uni = Gaussian(scale=0.05)
opt_gdm = GradientDescentMomentum(learning_rate=float(args.learning_rate), momentum_coef=0.9,
wdecay=float(args.weight_decay),
schedule=Schedule(step_config=[200, 250, 300], change=0.1))
relu = Rectlin()
conv = dict(init=init_uni, batch_norm=False, activation=relu)
convp1 = dict(init=init_uni, batch_norm=False, activation=relu, padding=1)
convp1s2 = dict(init=init_uni, batch_norm=False, activation=relu, padding=1, strides=2)
layers = [Dropout(keep=.8),
Conv((3, 3, 96), **convp1),
Conv((3, 3, 96), **convp1),
Conv((3, 3, 96), **convp1s2),
Dropout(keep=.5),
Conv((3, 3, 192), **convp1),
Conv((3, 3, 192), **convp1),
Conv((3, 3, 192), **convp1s2),
Dropout(keep=.5),
Conv((3, 3, 192), **convp1),
Conv((1, 1, 192), **conv),
Conv((1, 1, 16), **conv),
Pooling(8, op="avg"),
Activation(Softmax())]
cost = GeneralizedCost(costfunc=CrossEntropyMulti())
model = Model(layers=layers)
if args.model_file:
import os
assert os.path.exists(args.model_file), '%s not found' % args.model_file
model.load_params(args.model_file)
# configure callbacks
callbacks = Callbacks(model, eval_set=valid_set, **args.callback_args)
if args.deconv:
callbacks.add_deconv_callback(train_set, valid_set)
model.fit(train_set, optimizer=opt_gdm, num_epochs=num_epochs, cost=cost, callbacks=callbacks)
print('Misclassification error = %.1f%%' % (model.eval(valid_set, metric=Misclassification())*100))
@ferzik85
Copy link

ferzik85 commented Oct 20, 2016

Could someone explain in more details why we need to pad to nearest power of 2 to match conv output? I mean why not to
train_set = ArrayIterator(X_train, y_train, nclass=10, lshape=(3, 32, 32))
valid_set = ArrayIterator(X_test, y_test, nclass=10, lshape=(3, 32, 32))
Conv((1, 1, 16), *_conv) replace with Conv((1, 1, 10), *_conv),

@guruprad
Copy link

I have the same question as @ferzik85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment