nervanazoo/cifar10_allcnn.py

## readme.md

      
    Raw
  

              readme.md
            
          
    ##Model
This is an implementation of a deep convolutional neural network model inspired by the paper
Springenberg, Dosovitskiy, Brox, Riedmiller 2014.
Model script

The model run script is included below
(cifar10_allcnn.py).
Trained weights

The trained weights file can be downloaded from AWS
(cifar10_allcnn_e350.p)
Performance

This model is acheiving 89.5% top-1 accuracy on the validation data set.  This is done using zca whitened,
global contrast normalized data, without crops or flips.  This is the same performance we achieve running the
same model configuration and data through Caffe.
Instructions

This script was tested with the
neon commit SHA e7ab2c2e2.

Make sure that your local repo is synced to this commit and run the
installation procedure before proceeding.
If neon is installed into a virtualenv, make sure that it is activated before running the commands below.  Also, the commands below use the GPU backend by default so add -b cpu if you are running on a system without a compatible GPU.
To test the model performance on the validation data set use the following command:
python cifar10_allcnn.py --model_file cifar10_allcnn_e350.p -eval 1

To train the model from scratch for 350 epochs, use the command:
python cifar10_allcnn.py -b gpu -e 350 -s cifar10_allcnn_trained.p

Additional options are available to add features like saving checkpoints and displaying logging information,
use the --help option for details.
Benchmarks

Machine and GPU specs:
Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
Ubuntu 14.04.2 LTS
GPU: GeForce GTX TITAN X
CUDA Driver Version 7.0

The run times for the fprop and bprop pass are given in the table below.  The same model configuration is used in neon and caffe.  50 iterations are timed in each framework and only the mean value is reported.
-------------------------------------------
|    Func     | neon (mean) | caffe (mean)|
-------------------------------------------
| fprop       |    14 ms    |    19 ms    |
| bprop       |    34 ms    |    65 ms    |
| update      |     3 ms    |    *        | 
| iteration   |    51 ms    |    85 ms    |
-------------------------------------------
* caffe update operation may be included in bprop or iteration time but is not individually timed.

Citation

Jost Tobias Springenberg,  Alexey Dosovitskiy, Thomas Brox and Martin A. Riedmiller. 
Striving for Simplicity: The All Convolutional Net. 
arXiv preprint arXiv:1412.6806, 2014.


## cifar10_allcnn.py
#!/usr/bin/env python
# ----------------------------------------------------------------------------
# Copyright 2015 Nervana Systems Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------

"""
AllCNN style convnet on CIFAR10 data.
Reference:
    Striving for Simplicity: the All Convolutional Net `[Springenberg2015]`_
..  _[Springenber2015]: http://arxiv.org/pdf/1412.6806.pdf
"""

from neon.initializers import Gaussian
from neon.optimizers import GradientDescentMomentum, Schedule
from neon.layers import Conv, Dropout, Activation, Pooling, GeneralizedCost
from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, Misclassification
from neon.models import Model
from neon.data import ArrayIterator, load_cifar10
from neon.callbacks.callbacks import Callbacks
from neon.util.argparser import NeonArgparser

# parse the command line arguments
parser = NeonArgparser(__doc__)
parser.add_argument("--learning_rate", default=0.05, help="initial learning rate")
parser.add_argument("--weight_decay", default=0.001, help="weight decay")
parser.add_argument('--deconv', action='store_true',
                    help='save visualization data from deconvolution')
args = parser.parse_args()

# hyperparameters
num_epochs = args.epochs

(X_train, y_train), (X_test, y_test), nclass = load_cifar10(path=args.data_dir,
                                                            normalize=False,
                                                            contrast_normalize=True,
                                                            whiten=True)

# really 10 classes, pad to nearest power of 2 to match conv output
train_set = ArrayIterator(X_train, y_train, nclass=16, lshape=(3, 32, 32))
valid_set = ArrayIterator(X_test, y_test, nclass=16, lshape=(3, 32, 32))

init_uni = Gaussian(scale=0.05)
opt_gdm = GradientDescentMomentum(learning_rate=float(args.learning_rate), momentum_coef=0.9,
                                  wdecay=float(args.weight_decay),
                                  schedule=Schedule(step_config=[200, 250, 300], change=0.1))

relu = Rectlin()
conv = dict(init=init_uni, batch_norm=False, activation=relu)
convp1 = dict(init=init_uni, batch_norm=False, activation=relu, padding=1)
convp1s2 = dict(init=init_uni, batch_norm=False, activation=relu, padding=1, strides=2)

layers = [Dropout(keep=.8),
          Conv((3, 3, 96), **convp1),
          Conv((3, 3, 96), **convp1),
          Conv((3, 3, 96), **convp1s2),
          Dropout(keep=.5),
          Conv((3, 3, 192), **convp1),
          Conv((3, 3, 192), **convp1),
          Conv((3, 3, 192), **convp1s2),
          Dropout(keep=.5),
          Conv((3, 3, 192), **convp1),
          Conv((1, 1, 192), **conv),
          Conv((1, 1, 16), **conv),
          Pooling(8, op="avg"),
          Activation(Softmax())]

cost = GeneralizedCost(costfunc=CrossEntropyMulti())

model = Model(layers=layers)

if args.model_file:
    import os
    assert os.path.exists(args.model_file), '%s not found' % args.model_file
    model.load_params(args.model_file)

# configure callbacks
callbacks = Callbacks(model, eval_set=valid_set, **args.callback_args)

if args.deconv:
    callbacks.add_deconv_callback(train_set, valid_set)

model.fit(train_set, optimizer=opt_gdm, num_epochs=num_epochs, cost=cost, callbacks=callbacks)
print('Misclassification error = %.1f%%' % (model.eval(valid_set, metric=Misclassification())*100))
	#!/usr/bin/env python
	# ----------------------------------------------------------------------------
	# Copyright 2015 Nervana Systems Inc.
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software
	# distributed under the License is distributed on an "AS IS" BASIS,
	# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	# See the License for the specific language governing permissions and
	# limitations under the License.
	# ----------------------------------------------------------------------------

	"""
	AllCNN style convnet on CIFAR10 data.
	Reference:
	Striving for Simplicity: the All Convolutional Net `[Springenberg2015]`_
	.. _[Springenber2015]: http://arxiv.org/pdf/1412.6806.pdf
	"""

	from neon.initializers import Gaussian
	from neon.optimizers import GradientDescentMomentum, Schedule
	from neon.layers import Conv, Dropout, Activation, Pooling, GeneralizedCost
	from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, Misclassification
	from neon.models import Model
	from neon.data import ArrayIterator, load_cifar10
	from neon.callbacks.callbacks import Callbacks
	from neon.util.argparser import NeonArgparser

	# parse the command line arguments
	parser = NeonArgparser(__doc__)
	parser.add_argument("--learning_rate", default=0.05, help="initial learning rate")
	parser.add_argument("--weight_decay", default=0.001, help="weight decay")
	parser.add_argument('--deconv', action='store_true',
	help='save visualization data from deconvolution')
	args = parser.parse_args()

	# hyperparameters
	num_epochs = args.epochs

	(X_train, y_train), (X_test, y_test), nclass = load_cifar10(path=args.data_dir,
	normalize=False,
	contrast_normalize=True,
	whiten=True)

	# really 10 classes, pad to nearest power of 2 to match conv output
	train_set = ArrayIterator(X_train, y_train, nclass=16, lshape=(3, 32, 32))
	valid_set = ArrayIterator(X_test, y_test, nclass=16, lshape=(3, 32, 32))

	init_uni = Gaussian(scale=0.05)
	opt_gdm = GradientDescentMomentum(learning_rate=float(args.learning_rate), momentum_coef=0.9,
	wdecay=float(args.weight_decay),
	schedule=Schedule(step_config=[200, 250, 300], change=0.1))

	relu = Rectlin()
	conv = dict(init=init_uni, batch_norm=False, activation=relu)
	convp1 = dict(init=init_uni, batch_norm=False, activation=relu, padding=1)
	convp1s2 = dict(init=init_uni, batch_norm=False, activation=relu, padding=1, strides=2)

	layers = [Dropout(keep=.8),
	Conv((3, 3, 96), **convp1),
	Conv((3, 3, 96), **convp1),
	Conv((3, 3, 96), **convp1s2),
	Dropout(keep=.5),
	Conv((3, 3, 192), **convp1),
	Conv((3, 3, 192), **convp1),
	Conv((3, 3, 192), **convp1s2),
	Dropout(keep=.5),
	Conv((3, 3, 192), **convp1),
	Conv((1, 1, 192), **conv),
	Conv((1, 1, 16), **conv),
	Pooling(8, op="avg"),
	Activation(Softmax())]

	cost = GeneralizedCost(costfunc=CrossEntropyMulti())

	model = Model(layers=layers)

	if args.model_file:
	import os
	assert os.path.exists(args.model_file), '%s not found' % args.model_file
	model.load_params(args.model_file)

	# configure callbacks
	callbacks = Callbacks(model, eval_set=valid_set, **args.callback_args)

	if args.deconv:
	callbacks.add_deconv_callback(train_set, valid_set)

	model.fit(train_set, optimizer=opt_gdm, num_epochs=num_epochs, cost=cost, callbacks=callbacks)
	print('Misclassification error = %.1f%%' % (model.eval(valid_set, metric=Misclassification())*100))