Skip to content

Instantly share code, notes, and snippets.

Last active September 19, 2023 14:36
Show Gist options
  • Save ksimonyan/211839e770f7b538e2d8 to your computer and use it in GitHub Desktop.
Save ksimonyan/211839e770f7b538e2d8 to your computer and use it in GitHub Desktop.
ILSVRC-2014 model (VGG team) with 16 weight layers


name: 16-layer model from the arXiv paper: "Very Deep Convolutional Networks for Large-Scale Image Recognition"

caffemodel: VGG_ILSVRC_16_layers


license: see

caffe_version: trained using a custom Caffe-based framework

gist_id: 211839e770f7b538e2d8


The model is an improved version of the 16-layer model used by the VGG team in the ILSVRC-2014 competition. The details can be found in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman

Please cite the paper if you use the model.

In the paper, the model is denoted as the configuration D trained with scale jittering. The input images should be zero-centered by mean pixel (rather than mean image) subtraction. Namely, the following BGR values should be subtracted: [103.939, 116.779, 123.68].

Caffe compatibility

The models are currently supported by the dev branch of Caffe, but are not yet compatible with master. An example of how to use the models in Matlab can be found in matlab/caffe/matcaffe_demo_vgg.m

ILSVRC-2012 performance

Using dense single-scale evaluation (the smallest image side rescaled to 384), the top-5 classification error on the validation set of ILSVRC-2012 is 8.1% (see Table 3 in the arXiv paper).

Using dense multi-scale evaluation (the smallest image side rescaled to 256, 384, and 512), the top-5 classification error is 7.5% on the validation set and 7.4% on the test set of ILSVRC-2012 (see Table 4 in the arXiv paper).

name: "VGG_ILSVRC_16_layers"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 224
input_dim: 224
layers {
bottom: "data"
top: "conv1_1"
name: "conv1_1"
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
layers {
bottom: "conv1_1"
top: "conv1_1"
name: "relu1_1"
type: RELU
layers {
bottom: "conv1_1"
top: "conv1_2"
name: "conv1_2"
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
layers {
bottom: "conv1_2"
top: "conv1_2"
name: "relu1_2"
type: RELU
layers {
bottom: "conv1_2"
top: "pool1"
name: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
layers {
bottom: "pool1"
top: "conv2_1"
name: "conv2_1"
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
layers {
bottom: "conv2_1"
top: "conv2_1"
name: "relu2_1"
type: RELU
layers {
bottom: "conv2_1"
top: "conv2_2"
name: "conv2_2"
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
layers {
bottom: "conv2_2"
top: "conv2_2"
name: "relu2_2"
type: RELU
layers {
bottom: "conv2_2"
top: "pool2"
name: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
layers {
bottom: "pool2"
top: "conv3_1"
name: "conv3_1"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
layers {
bottom: "conv3_1"
top: "conv3_1"
name: "relu3_1"
type: RELU
layers {
bottom: "conv3_1"
top: "conv3_2"
name: "conv3_2"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
layers {
bottom: "conv3_2"
top: "conv3_2"
name: "relu3_2"
type: RELU
layers {
bottom: "conv3_2"
top: "conv3_3"
name: "conv3_3"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
layers {
bottom: "conv3_3"
top: "conv3_3"
name: "relu3_3"
type: RELU
layers {
bottom: "conv3_3"
top: "pool3"
name: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
layers {
bottom: "pool3"
top: "conv4_1"
name: "conv4_1"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
layers {
bottom: "conv4_1"
top: "conv4_1"
name: "relu4_1"
type: RELU
layers {
bottom: "conv4_1"
top: "conv4_2"
name: "conv4_2"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
layers {
bottom: "conv4_2"
top: "conv4_2"
name: "relu4_2"
type: RELU
layers {
bottom: "conv4_2"
top: "conv4_3"
name: "conv4_3"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
layers {
bottom: "conv4_3"
top: "conv4_3"
name: "relu4_3"
type: RELU
layers {
bottom: "conv4_3"
top: "pool4"
name: "pool4"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
layers {
bottom: "pool4"
top: "conv5_1"
name: "conv5_1"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
layers {
bottom: "conv5_1"
top: "conv5_1"
name: "relu5_1"
type: RELU
layers {
bottom: "conv5_1"
top: "conv5_2"
name: "conv5_2"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
layers {
bottom: "conv5_2"
top: "conv5_2"
name: "relu5_2"
type: RELU
layers {
bottom: "conv5_2"
top: "conv5_3"
name: "conv5_3"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
layers {
bottom: "conv5_3"
top: "conv5_3"
name: "relu5_3"
type: RELU
layers {
bottom: "conv5_3"
top: "pool5"
name: "pool5"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
layers {
bottom: "pool5"
top: "fc6"
name: "fc6"
inner_product_param {
num_output: 4096
layers {
bottom: "fc6"
top: "fc6"
name: "relu6"
type: RELU
layers {
bottom: "fc6"
top: "fc6"
name: "drop6"
dropout_param {
dropout_ratio: 0.5
layers {
bottom: "fc6"
top: "fc7"
name: "fc7"
inner_product_param {
num_output: 4096
layers {
bottom: "fc7"
top: "fc7"
name: "relu7"
type: RELU
layers {
bottom: "fc7"
top: "fc7"
name: "drop7"
dropout_param {
dropout_ratio: 0.5
layers {
bottom: "fc7"
top: "fc8"
name: "fc8"
inner_product_param {
num_output: 1000
layers {
bottom: "fc8"
top: "prob"
name: "prob"
Copy link

databig commented Jun 15, 2015

could you give your VGG's solver.prototxt?

Copy link

could you give your VGG's solver.prototxt?

Copy link

Linzert commented Jul 30, 2015

@karpathy When I finetune by the way you provided,I got an error message below:
Check failed: ShapeEquals(proto) shape mismatch(reshape not set).
Could you tell me if you modify something else?

Copy link

I am trying to start training from VGG-A(from the paper we can find we should train from Configuration A to Configuration D(VGG16) or Configuration E(VGG19)), but the loss still not decrease, has anybody met the same problem?

Copy link

it looks that the weights and bias values are not initialized the file provided by karpathy

Copy link

I got pretty the same validation accuracy as you. However, in their paper ( ), it claims an accuracy of 100-27.0=73.0 (Table 3 - D).
I converted the fully connected layers to convolutional layers, convolved the whole network (224x224 input) on the 256x256 images, and fuse the predictions by sum-pooling. Do I miss anything?

Copy link

Even I tested it and got the same numbers as reported by @karpathy. I think what is missing is dense multi-scale evaluation procedure.

Copy link

why can not i get the matlab/caffe/matcaffe_demo_vgg.m? who can help paste the code?

Copy link

@mingminzhen: May be you can refer to this
@Linzert: Did you change the last layer's name to finetune the network?

Copy link

the vgg 16caffemodel cannot be loaded by caffe?
the error is below:
F0305 21:19:47.159997 28198 upgrade_proto.cpp:75] Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: ../data/model/VGG16-NET/VGG_ILSVRC_16_layers.caffemodel
*** Check failure stack trace: ***
@ 0x7f34ac35b61c google::LogMessage::Fail()
@ 0x7f34ac35b568 google::LogMessage::SendToLog()
@ 0x7f34ac35af6a google::LogMessage::Flush()
@ 0x7f34ac35df01 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f34ac827ebe caffe::ReadNetParamsFromBinaryFileOrDie()
@ 0x7f34ac72aa47 caffe::Net<>::CopyTrainedLayersFromBinaryProto()
@ 0x7f34ac72aab6 caffe::Net<>::CopyTrainedLayersFrom()
@ 0x406907 Classifier::Classifier()
@ 0x4082d7 main
@ 0x7f34ab886ec5 (unknown)
@ 0x406739 (unknown)

Copy link

szm-R commented May 14, 2016

I want to extract features from my own data set using this network, I have extracted features using caffe reference model as explained in and it has worked just fine. I need to make an imagenet_val.prototxt as that of the caffe referece model which is exactly like its deploy.prototxt with the following lines:
name: "CaffeNet"
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
transform_param {
mirror: false
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
image_data_param {
source: "examples/fe/file_list.txt"
batch_size: 10
new_height: 256
new_width: 256

at the begining instead of:
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 3
dim: 227
dim: 227

so following this example I made an imagenet_val.prototxt file (available on out of the deploy.prototxt provided here. but when I ran the feature extraction command (./extract_features models/VGG_ILSVRC_16_layers/VGG_ILSVRC_16_layers.caffemodel examples/fe/imagenet_val.prototxt fc7 examples/fe/features 58 leveldb GPU) I got this error:
Unknown bottom blob 'data' (layer 'conv1_1', bottom index 0)

I searched and It seemed than the problem is with the old notation mixing with new one (layers instead of layer and DATA intstead of "Data" etc.) so I fixed it but still I get the same error (the gist provided is the fixed one).

I appreciate any help in advance and sorry if the comment is too long!

Copy link

toshi-k commented May 22, 2016

I have a question about license.
In VGG web site, VGG model is provided by "CC BY 4.0" (commercial use is allowed).

In this page, however, VGG model is provided by "CC BY-NC 4.0" (non-commercial use only).
Why is there difference, although caffemodel_URL is exactly the same ?

Copy link

mrgloom commented Sep 18, 2016

Here is working example of VGG-16 that I have trained using NVIDIA DIGITS with Caffe backend.

Copy link


Copy link

RafaRuiz commented Mar 9, 2018

Link is broken

Copy link

Copy link

So I want to the use Keras implementation of this model, which links to this page.

Just to clarify, I understand the model was trained on pixel-wise mean-centered images.

But should the input images be in RGB format or BGR format?

Section 2.1 of the paper says "RGB" but the description section on this page uses the term "BGR".

Copy link

So I want to the use Keras implementation of this model, which links to this page.

Just to clarify, I understand the model was trained on pixel-wise mean-centered images.

But should the input images be in RGB format or BGR format?

Section 2.1 of the paper says "RGB" but the description section on this page uses the term "BGR".

I have the same question. Have you fixed it ?

Copy link

pablodz commented Jan 18, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment