Skip to content

Instantly share code, notes, and snippets.

@nervanazoo
Created February 9, 2016 21:18
Show Gist options
  • Save nervanazoo/e74ebe6418852f547aa8 to your computer and use it in GitHub Desktop.
Save nervanazoo/e74ebe6418852f547aa8 to your computer and use it in GitHub Desktop.
neon vgg implementation

##Model

Here we have ported the weights for the 16 and 19 layer VGG models from the Caffe model zoo (see link)

Model script

The model run script is included below (vgg_neon.py). This script can easily be adapted for fine tuning this network but we have focused on inference here because a successful training protocol may require details beyond what is available from the Caffe model zoo.

Trained weights

The trained weights file can be downloaded from AWS using the following links: VGG_D.p and VGG_E.p.

Performance

Accuracy

Testing the image classification performance for the two models on the ILSVRC 2012 validation data set gives the results in the table below:

 ------------------------------
|         |       Accuracy     |
| Model   |  Top 1   |  Top 5  |
 ------------------------------
| VGG D   |  69.2 %  | 88.9 %  |
| VGG E   |  69.3 %  | 88.8 %  |
 ------------------------------

These results are calculated using a single scale, using a 224x224 crop of each image. These results are comparable to the classification accuracy we computed using the Caffe model zoo 16 and 19 layer VGG models using Caffe Caffe model zoo.

Speed

We ran speed benchmarks on this model using neon. These results are using a 64 image batch size with 3x224x224 input images. The results are in the tables below:

VGG D

 ----------------------
|    Func  |    Time   |
 ----------------------
| fprop    |   366 ms  |
| bprop    |   767 ms  |
| update   |    19 ms  |
 ----------------------
| Total    |  1152 ms  |
 ----------------------

VGG E

 -----------------------
|    Func  |    Time    |
 -----------------------
| fprop    |    452 ms  |
| bprop    |    940 ms  | 
| update   |     20 ms  |
 -----------------------
| Total    |   1412 ms  |
 -----------------------

The run times for the fprop and bprop pass and the parameter update are given in the table below. The iteration row is the combined runtime for all functions in a training iteration. These results are for each minibatch consisting of 64 images of shape 224x224x3. The model was run 12 times, the first two passes were ignored and the last 10 were used to get the benchmark results.

System specs:

Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
Ubunutu 14.04
GPU: GeForce GTX TITAN X
CUDA Driver Version 7.0

Instructions

Make sure that your local repo is synced to the proper neon repo commit (see version below) and run the installation procedure before proceeding. To run this model script on the ILSVRC2012 dataset, you will need to have the data in the neon macrobatch format; follow the instructions in the neon documentations for setting up the data sets.

If neon is installed into a virtualenv, make sure that it is activated before running the commands below.

To run the evaluation of the model:

# for 16 layer VGG D model
python vgg_neon.py --vgg_ver D --model_file VGG_D.p -w path/to/dataset/batches -z 64 --caffe

# for 16 layer VGG D model
python vgg_neon.py --vgg_ver E --model_file VGG_E.p -w path/to/dataset/batches -z 64 --caffe

Note that the --caffe option is needed to match the dropout implementation used by Caffe.

The batch size is set to 64 in the examples above because with larger batch size the model may not fit on some GPUs. Use smaller batch sizes if necessary. The script given here can easily be altered for model fine tuning. See the neon user manual for help with that.

Version compatibility

Neon version: commit SHA e7ab2c2e2.

Citation

Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
arXiv:1409.1556

License

For the model weight files please abide by the license posted with the Caffe weights files: http://creativecommons.org/licenses/by-nc/4.0/ (non-commercial use only)

#!/usr/bin/env python
# ----------------------------------------------------------------------------
# Copyright 2015 Nervana Systems Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------
"""
Simplified version of VGG model D and E
Based on manuscript:
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
arXiv:1409.1556
"""
from neon.util.argparser import NeonArgparser
from neon.backends import gen_backend
from neon.initializers import Constant, GlorotUniform, Xavier
from neon.layers import Conv, Dropout, Pooling, GeneralizedCost, Affine
from neon.optimizers import GradientDescentMomentum, Schedule, MultiOptimizer
from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, TopKMisclassification
from neon.models import Model
from neon.data import ImageLoader
from neon.callbacks.callbacks import Callbacks
# parse the command line arguments
parser = NeonArgparser(__doc__)
parser.add_argument('--vgg_version', default='D', choices=['D', 'E'],
help='vgg model type')
parser.add_argument('--subset_pct', type=float, default=100,
help='subset of training dataset to use (percentage)')
args = parser.parse_args()
img_set_options = dict(repo_dir=args.data_dir, inner_size=224,
subset_pct=args.subset_pct, dtype=args.datatype)
train = ImageLoader(set_name='train', scale_range=(256, 384),
shuffle=True, **img_set_options)
test = ImageLoader(set_name='validation', scale_range=(256, 256), do_transforms=False,
shuffle=False, **img_set_options)
init1 = Xavier(local=True)
initfc = GlorotUniform()
relu = Rectlin()
conv_params = {'init': init1,
'strides': 1,
'padding': 1,
'bias': Constant(0),
'activation': relu}
# Set up the model layers
layers = []
# set up 3x3 conv stacks with different feature map sizes
for nofm in [64, 128, 256, 512, 512]:
layers.append(Conv((3, 3, nofm), **conv_params))
layers.append(Conv((3, 3, nofm), **conv_params))
if nofm > 128:
layers.append(Conv((3, 3, nofm), **conv_params))
if args.vgg_version == 'E':
layers.append(Conv((3, 3, nofm), **conv_params))
layers.append(Pooling(2, strides=2))
layers.append(Affine(nout=4096, init=initfc, bias=Constant(0), activation=relu))
layers.append(Dropout(keep=0.5))
layers.append(Affine(nout=4096, init=initfc, bias=Constant(0), activation=relu))
layers.append(Dropout(keep=0.5))
layers.append(Affine(nout=1000, init=initfc, bias=Constant(0), activation=Softmax()))
cost = GeneralizedCost(costfunc=CrossEntropyMulti())
model = Model(layers=layers)
# configure callbacks
top5 = TopKMisclassification(k=5)
callbacks = Callbacks(model, eval_set=test, metric=top5, **args.callback_args)
model.load_params(args.model_file)
mets=model.eval(test, metric=TopKMisclassification(k=5))
print 'Loss = %f' % mets[0]
print 'Top 1 Accuracy = %.1f' % ((1.0-mets[1])*100.0)
print 'Top 5 Accuracy = %.1f' % ((1.0-mets[2])*100.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment