Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
ILSVRC-2014 model (VGG team) with 16 weight layers

##Information

name: 16-layer model from the arXiv paper: "Very Deep Convolutional Networks for Large-Scale Image Recognition"

caffemodel: VGG_ILSVRC_16_layers

caffemodel_url: http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel

license: see http://www.robots.ox.ac.uk/~vgg/research/very_deep/

caffe_version: trained using a custom Caffe-based framework

gist_id: 211839e770f7b538e2d8

Description

The model is an improved version of the 16-layer model used by the VGG team in the ILSVRC-2014 competition. The details can be found in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
arXiv:1409.1556

Please cite the paper if you use the model.

In the paper, the model is denoted as the configuration D trained with scale jittering. The input images should be zero-centered by mean pixel (rather than mean image) subtraction. Namely, the following BGR values should be subtracted: [103.939, 116.779, 123.68].

Caffe compatibility

The models are currently supported by the dev branch of Caffe, but are not yet compatible with master. An example of how to use the models in Matlab can be found in matlab/caffe/matcaffe_demo_vgg.m

ILSVRC-2012 performance

Using dense single-scale evaluation (the smallest image side rescaled to 384), the top-5 classification error on the validation set of ILSVRC-2012 is 8.1% (see Table 3 in the arXiv paper).

Using dense multi-scale evaluation (the smallest image side rescaled to 256, 384, and 512), the top-5 classification error is 7.5% on the validation set and 7.4% on the test set of ILSVRC-2012 (see Table 4 in the arXiv paper).

name: "VGG_ILSVRC_16_layers"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 224
input_dim: 224
layers {
bottom: "data"
top: "conv1_1"
name: "conv1_1"
type: CONVOLUTION
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv1_1"
top: "conv1_1"
name: "relu1_1"
type: RELU
}
layers {
bottom: "conv1_1"
top: "conv1_2"
name: "conv1_2"
type: CONVOLUTION
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv1_2"
top: "conv1_2"
name: "relu1_2"
type: RELU
}
layers {
bottom: "conv1_2"
top: "pool1"
name: "pool1"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layers {
bottom: "pool1"
top: "conv2_1"
name: "conv2_1"
type: CONVOLUTION
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv2_1"
top: "conv2_1"
name: "relu2_1"
type: RELU
}
layers {
bottom: "conv2_1"
top: "conv2_2"
name: "conv2_2"
type: CONVOLUTION
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv2_2"
top: "conv2_2"
name: "relu2_2"
type: RELU
}
layers {
bottom: "conv2_2"
top: "pool2"
name: "pool2"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layers {
bottom: "pool2"
top: "conv3_1"
name: "conv3_1"
type: CONVOLUTION
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv3_1"
top: "conv3_1"
name: "relu3_1"
type: RELU
}
layers {
bottom: "conv3_1"
top: "conv3_2"
name: "conv3_2"
type: CONVOLUTION
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv3_2"
top: "conv3_2"
name: "relu3_2"
type: RELU
}
layers {
bottom: "conv3_2"
top: "conv3_3"
name: "conv3_3"
type: CONVOLUTION
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv3_3"
top: "conv3_3"
name: "relu3_3"
type: RELU
}
layers {
bottom: "conv3_3"
top: "pool3"
name: "pool3"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layers {
bottom: "pool3"
top: "conv4_1"
name: "conv4_1"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv4_1"
top: "conv4_1"
name: "relu4_1"
type: RELU
}
layers {
bottom: "conv4_1"
top: "conv4_2"
name: "conv4_2"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv4_2"
top: "conv4_2"
name: "relu4_2"
type: RELU
}
layers {
bottom: "conv4_2"
top: "conv4_3"
name: "conv4_3"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv4_3"
top: "conv4_3"
name: "relu4_3"
type: RELU
}
layers {
bottom: "conv4_3"
top: "pool4"
name: "pool4"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layers {
bottom: "pool4"
top: "conv5_1"
name: "conv5_1"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv5_1"
top: "conv5_1"
name: "relu5_1"
type: RELU
}
layers {
bottom: "conv5_1"
top: "conv5_2"
name: "conv5_2"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv5_2"
top: "conv5_2"
name: "relu5_2"
type: RELU
}
layers {
bottom: "conv5_2"
top: "conv5_3"
name: "conv5_3"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv5_3"
top: "conv5_3"
name: "relu5_3"
type: RELU
}
layers {
bottom: "conv5_3"
top: "pool5"
name: "pool5"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layers {
bottom: "pool5"
top: "fc6"
name: "fc6"
type: INNER_PRODUCT
inner_product_param {
num_output: 4096
}
}
layers {
bottom: "fc6"
top: "fc6"
name: "relu6"
type: RELU
}
layers {
bottom: "fc6"
top: "fc6"
name: "drop6"
type: DROPOUT
dropout_param {
dropout_ratio: 0.5
}
}
layers {
bottom: "fc6"
top: "fc7"
name: "fc7"
type: INNER_PRODUCT
inner_product_param {
num_output: 4096
}
}
layers {
bottom: "fc7"
top: "fc7"
name: "relu7"
type: RELU
}
layers {
bottom: "fc7"
top: "fc7"
name: "drop7"
type: DROPOUT
dropout_param {
dropout_ratio: 0.5
}
}
layers {
bottom: "fc7"
top: "fc8"
name: "fc8"
type: INNER_PRODUCT
inner_product_param {
num_output: 1000
}
}
layers {
bottom: "fc8"
top: "prob"
name: "prob"
type: SOFTMAX
}
@roseperrone

This comment has been minimized.

Show comment
Hide comment
@roseperrone

roseperrone Nov 30, 2014

Where can I find the train_val.prototxt and solver.prototxt?

Where can I find the train_val.prototxt and solver.prototxt?

@EvanWeiner

This comment has been minimized.

Show comment
Hide comment
@EvanWeiner

EvanWeiner Dec 9, 2014

Is this supported in Caffe Master branch yet?

Also - any Python examples of using the .caffemodel?

Is this supported in Caffe Master branch yet?

Also - any Python examples of using the .caffemodel?

@dabilied

This comment has been minimized.

Show comment
Hide comment
@dabilied

dabilied Dec 16, 2014

@EvanWeiner you can write py script under matcaffe_demo_vgg_mean_pix.m file in Dev , it runs well on Master

@EvanWeiner you can write py script under matcaffe_demo_vgg_mean_pix.m file in Dev , it runs well on Master

@EvanWeiner

This comment has been minimized.

Show comment
Hide comment
@EvanWeiner

EvanWeiner Dec 19, 2014

@dabilied, thank you. What do you mean by write a script "under"?

@dabilied, thank you. What do you mean by write a script "under"?

@cristinasegalin

This comment has been minimized.

Show comment
Hide comment
@cristinasegalin

cristinasegalin Jan 3, 2015

How to avoid protobuf warning due to too long message to read. I cannot read VGG_19_layers model

How to avoid protobuf warning due to too long message to read. I cannot read VGG_19_layers model

@whereaswhile

This comment has been minimized.

Show comment
Hide comment
@whereaswhile

whereaswhile Jan 11, 2015

@cristinasegalin you can change the upper limit in the following line:
coded_input->SetTotalBytesLimit(1073741824, 536870912);
but this function itself is limited by the maximum integer size

@cristinasegalin you can change the upper limit in the following line:
coded_input->SetTotalBytesLimit(1073741824, 536870912);
but this function itself is limited by the maximum integer size

@jeppe88

This comment has been minimized.

Show comment
Hide comment
@jeppe88

jeppe88 Jan 28, 2015

Is there a train_val.prototxt and solver.prototxt file available?

jeppe88 commented Jan 28, 2015

Is there a train_val.prototxt and solver.prototxt file available?

@andresromero

This comment has been minimized.

Show comment
Hide comment
@andresromero

andresromero Feb 17, 2015

Does someone have the train_val.prototxt file?

Does someone have the train_val.prototxt file?

@karpathy

This comment has been minimized.

Show comment
Hide comment
@karpathy

karpathy Mar 1, 2015

I also came looking for the train_val file for convenience, which doesn't seem to be available. I wrote this one for ImageNet, in case it's helpful to others: http://cs.stanford.edu/people/karpathy/vgg_train_val.prototxt

Note that I've zerod out all blobs_lr everywhere - presumably you want to build on this.
Set up this way and running this with

./build/tools/caffe train \
--solver=models/vgg/solver.prototxt \
--weights models/vgg/VGG_ILSVRC_16_layers.caffemodel

I get, on validation set:
Test net output #0: accuracy@1 = 0.683579
Test net output #1: accuracy@5 = 0.884442
Test net output #2: loss/loss = 1.30089 (* 1 = 1.30089 loss)

(i.e. top5 val error is 11.5% with no bells and whistles and single forward pass).
Swapping out the path to val lmdb to train lmdb, we can see also on training set:
Test net output #0: accuracy@1 = 0.77418
Test net output #1: accuracy@5 = 0.938281
Test net output #2: loss/loss = 0.852522 (* 1 = 0.852522 loss)

karpathy commented Mar 1, 2015

I also came looking for the train_val file for convenience, which doesn't seem to be available. I wrote this one for ImageNet, in case it's helpful to others: http://cs.stanford.edu/people/karpathy/vgg_train_val.prototxt

Note that I've zerod out all blobs_lr everywhere - presumably you want to build on this.
Set up this way and running this with

./build/tools/caffe train \
--solver=models/vgg/solver.prototxt \
--weights models/vgg/VGG_ILSVRC_16_layers.caffemodel

I get, on validation set:
Test net output #0: accuracy@1 = 0.683579
Test net output #1: accuracy@5 = 0.884442
Test net output #2: loss/loss = 1.30089 (* 1 = 1.30089 loss)

(i.e. top5 val error is 11.5% with no bells and whistles and single forward pass).
Swapping out the path to val lmdb to train lmdb, we can see also on training set:
Test net output #0: accuracy@1 = 0.77418
Test net output #1: accuracy@5 = 0.938281
Test net output #2: loss/loss = 0.852522 (* 1 = 0.852522 loss)

@jmendozais

This comment has been minimized.

Show comment
Hide comment
@jmendozais

jmendozais Apr 6, 2015

Was someone able to train this model using a gpu with 4gb ram?. I just trained this model with batch size(bs) of 10, greater bs give me an out of memory error. I am running the model on an ec2 g2 instance.
Thanks in advance !!!

Was someone able to train this model using a gpu with 4gb ram?. I just trained this model with batch size(bs) of 10, greater bs give me an out of memory error. I am running the model on an ec2 g2 instance.
Thanks in advance !!!

@wuxinhong

This comment has been minimized.

Show comment
Hide comment
@wuxinhong

wuxinhong Apr 8, 2015

@karpathy
could you give your VGG's solver.prototxt?

@karpathy
could you give your VGG's solver.prototxt?

@databig

This comment has been minimized.

Show comment
Hide comment
@databig

databig Jun 15, 2015

@karpathy
could you give your VGG's solver.prototxt?

databig commented Jun 15, 2015

@karpathy
could you give your VGG's solver.prototxt?

@jongsony

This comment has been minimized.

Show comment
Hide comment
@jongsony

jongsony Jul 17, 2015

@karpathy
could you give your VGG's solver.prototxt?

@karpathy
could you give your VGG's solver.prototxt?

@Linzert

This comment has been minimized.

Show comment
Hide comment
@Linzert

Linzert Jul 30, 2015

@karpathy When I finetune by the way you provided,I got an error message below:
Check failed: ShapeEquals(proto) shape mismatch(reshape not set).
Could you tell me if you modify something else?

Linzert commented Jul 30, 2015

@karpathy When I finetune by the way you provided,I got an error message below:
Check failed: ShapeEquals(proto) shape mismatch(reshape not set).
Could you tell me if you modify something else?

@Silver-Shen

This comment has been minimized.

Show comment
Hide comment
@Silver-Shen

Silver-Shen Jul 31, 2015

I am trying to start training from VGG-A(from the paper we can find we should train from Configuration A to Configuration D(VGG16) or Configuration E(VGG19)), but the loss still not decrease, has anybody met the same problem?

I am trying to start training from VGG-A(from the paper we can find we should train from Configuration A to Configuration D(VGG16) or Configuration E(VGG19)), but the loss still not decrease, has anybody met the same problem?

@safwanwshah

This comment has been minimized.

Show comment
Hide comment
@safwanwshah

safwanwshah Aug 24, 2015

it looks that the weights and bias values are not initialized the file provided by karpathy

it looks that the weights and bias values are not initialized the file provided by karpathy

@YutingZhang

This comment has been minimized.

Show comment
Hide comment
@YutingZhang

YutingZhang Nov 3, 2015

@karpathy
I got pretty the same validation accuracy as you. However, in their paper ( http://arxiv.org/pdf/1409.1556.pdf ), it claims an accuracy of 100-27.0=73.0 (Table 3 - D).
I converted the fully connected layers to convolutional layers, convolved the whole network (224x224 input) on the 256x256 images, and fuse the predictions by sum-pooling. Do I miss anything?

@karpathy
I got pretty the same validation accuracy as you. However, in their paper ( http://arxiv.org/pdf/1409.1556.pdf ), it claims an accuracy of 100-27.0=73.0 (Table 3 - D).
I converted the fully connected layers to convolutional layers, convolved the whole network (224x224 input) on the 256x256 images, and fuse the predictions by sum-pooling. Do I miss anything?

@Venkatesh-Murthy

This comment has been minimized.

Show comment
Hide comment
@Venkatesh-Murthy

Venkatesh-Murthy Nov 23, 2015

Even I tested it and got the same numbers as reported by @karpathy. I think what is missing is dense multi-scale evaluation procedure.

Even I tested it and got the same numbers as reported by @karpathy. I think what is missing is dense multi-scale evaluation procedure.

@mingminzhen

This comment has been minimized.

Show comment
Hide comment
@mingminzhen

mingminzhen Nov 24, 2015

why can not i get the matlab/caffe/matcaffe_demo_vgg.m? who can help paste the code?

why can not i get the matlab/caffe/matcaffe_demo_vgg.m? who can help paste the code?

@coldmanck

This comment has been minimized.

Show comment
Hide comment
@coldmanck

coldmanck Dec 12, 2015

@mingminzhen: May be you can refer to this
@Linzert: Did you change the last layer's name to finetune the network?

@mingminzhen: May be you can refer to this
@Linzert: Did you change the last layer's name to finetune the network?

@1292765944

This comment has been minimized.

Show comment
Hide comment
@1292765944

1292765944 Mar 5, 2016

the vgg 16caffemodel cannot be loaded by caffe?
the error is below:
F0305 21:19:47.159997 28198 upgrade_proto.cpp:75] Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: ../data/model/VGG16-NET/VGG_ILSVRC_16_layers.caffemodel
*** Check failure stack trace: ***
@ 0x7f34ac35b61c google::LogMessage::Fail()
@ 0x7f34ac35b568 google::LogMessage::SendToLog()
@ 0x7f34ac35af6a google::LogMessage::Flush()
@ 0x7f34ac35df01 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f34ac827ebe caffe::ReadNetParamsFromBinaryFileOrDie()
@ 0x7f34ac72aa47 caffe::Net<>::CopyTrainedLayersFromBinaryProto()
@ 0x7f34ac72aab6 caffe::Net<>::CopyTrainedLayersFrom()
@ 0x406907 Classifier::Classifier()
@ 0x4082d7 main
@ 0x7f34ab886ec5 (unknown)
@ 0x406739 (unknown)

the vgg 16caffemodel cannot be loaded by caffe?
the error is below:
F0305 21:19:47.159997 28198 upgrade_proto.cpp:75] Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: ../data/model/VGG16-NET/VGG_ILSVRC_16_layers.caffemodel
*** Check failure stack trace: ***
@ 0x7f34ac35b61c google::LogMessage::Fail()
@ 0x7f34ac35b568 google::LogMessage::SendToLog()
@ 0x7f34ac35af6a google::LogMessage::Flush()
@ 0x7f34ac35df01 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f34ac827ebe caffe::ReadNetParamsFromBinaryFileOrDie()
@ 0x7f34ac72aa47 caffe::Net<>::CopyTrainedLayersFromBinaryProto()
@ 0x7f34ac72aab6 caffe::Net<>::CopyTrainedLayersFrom()
@ 0x406907 Classifier::Classifier()
@ 0x4082d7 main
@ 0x7f34ab886ec5 (unknown)
@ 0x406739 (unknown)

@szm2015

This comment has been minimized.

Show comment
Hide comment
@szm2015

szm2015 May 14, 2016

Hi
I want to extract features from my own data set using this network, I have extracted features using caffe reference model as explained in http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html and it has worked just fine. I need to make an imagenet_val.prototxt as that of the caffe referece model which is exactly like its deploy.prototxt with the following lines:
name: "CaffeNet"
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
transform_param {
mirror: false
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
image_data_param {
source: "examples/fe/file_list.txt"
batch_size: 10
new_height: 256
new_width: 256
}
}

at the begining instead of:
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 3
dim: 227
dim: 227
}

so following this example I made an imagenet_val.prototxt file (available on https://gist.github.com/szm2015/37ce9f126d69bbafa5267f29b3e63336) out of the deploy.prototxt provided here. but when I ran the feature extraction command (./extract_features models/VGG_ILSVRC_16_layers/VGG_ILSVRC_16_layers.caffemodel examples/fe/imagenet_val.prototxt fc7 examples/fe/features 58 leveldb GPU) I got this error:
Unknown bottom blob 'data' (layer 'conv1_1', bottom index 0)

I searched and It seemed than the problem is with the old notation mixing with new one (layers instead of layer and DATA intstead of "Data" etc.) so I fixed it but still I get the same error (the gist provided is the fixed one).

I appreciate any help in advance and sorry if the comment is too long!

szm2015 commented May 14, 2016

Hi
I want to extract features from my own data set using this network, I have extracted features using caffe reference model as explained in http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html and it has worked just fine. I need to make an imagenet_val.prototxt as that of the caffe referece model which is exactly like its deploy.prototxt with the following lines:
name: "CaffeNet"
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
transform_param {
mirror: false
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
image_data_param {
source: "examples/fe/file_list.txt"
batch_size: 10
new_height: 256
new_width: 256
}
}

at the begining instead of:
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 3
dim: 227
dim: 227
}

so following this example I made an imagenet_val.prototxt file (available on https://gist.github.com/szm2015/37ce9f126d69bbafa5267f29b3e63336) out of the deploy.prototxt provided here. but when I ran the feature extraction command (./extract_features models/VGG_ILSVRC_16_layers/VGG_ILSVRC_16_layers.caffemodel examples/fe/imagenet_val.prototxt fc7 examples/fe/features 58 leveldb GPU) I got this error:
Unknown bottom blob 'data' (layer 'conv1_1', bottom index 0)

I searched and It seemed than the problem is with the old notation mixing with new one (layers instead of layer and DATA intstead of "Data" etc.) so I fixed it but still I get the same error (the gist provided is the fixed one).

I appreciate any help in advance and sorry if the comment is too long!

@toshi-k

This comment has been minimized.

Show comment
Hide comment
@toshi-k

toshi-k May 22, 2016

I have a question about license.
In VGG web site, VGG model is provided by "CC BY 4.0" (commercial use is allowed).
http://www.robots.ox.ac.uk/~vgg/research/very_deep/

In this page, however, VGG model is provided by "CC BY-NC 4.0" (non-commercial use only).
Why is there difference, although caffemodel_URL is exactly the same ?

toshi-k commented May 22, 2016

I have a question about license.
In VGG web site, VGG model is provided by "CC BY 4.0" (commercial use is allowed).
http://www.robots.ox.ac.uk/~vgg/research/very_deep/

In this page, however, VGG model is provided by "CC BY-NC 4.0" (non-commercial use only).
Why is there difference, although caffemodel_URL is exactly the same ?

@mrgloom

This comment has been minimized.

Show comment
Hide comment
@mrgloom

mrgloom Sep 18, 2016

Here is working example of VGG-16 that I have trained using NVIDIA DIGITS with Caffe backend.
https://github.com/mrgloom/kaggle-dogs-vs-cats-solution/tree/master/learning_from_scratch/Models/VGG-16

mrgloom commented Sep 18, 2016

Here is working example of VGG-16 that I have trained using NVIDIA DIGITS with Caffe backend.
https://github.com/mrgloom/kaggle-dogs-vs-cats-solution/tree/master/learning_from_scratch/Models/VGG-16

@konglingchoa

This comment has been minimized.

Show comment
Hide comment

Thanks

@RafaRuiz

This comment has been minimized.

Show comment
Hide comment
@RafaRuiz

RafaRuiz Mar 9, 2018

Link is broken

RafaRuiz commented Mar 9, 2018

Link is broken

@xiaoyanzhuo

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment