Skip to content

Instantly share code, notes, and snippets.

@ksimonyan
Last active September 22, 2020 00:52
Show Gist options
  • Save ksimonyan/fd8800eeb36e276cd6f9 to your computer and use it in GitHub Desktop.
Save ksimonyan/fd8800eeb36e276cd6f9 to your computer and use it in GitHub Desktop.
CNN_S model from the BMVC-2014 paper "Return of the Devil in the Details: Delving Deep into Convolutional Nets"

##Information

name: CNN_S model from the BMVC-2014 paper: "Return of the Devil in the Details: Delving Deep into Convolutional Nets"

mean_file_mat: http://www.robots.ox.ac.uk/~vgg/software/deep_eval/releases/bvlc/VGG_mean.mat

mean_file_proto: http://www.robots.ox.ac.uk/~vgg/software/deep_eval/releases/bvlc/VGG_mean.binaryproto

caffemodel: VGG_CNN_S

caffemodel_url: http://www.robots.ox.ac.uk/~vgg/software/deep_eval/releases/bvlc/VGG_CNN_S.caffemodel

license: non-commercial use only

caffe_version: trained using a custom Caffe-based framework

gist_id: fd8800eeb36e276cd6f9

Description

The CNN_S model is trained on the ILSVRC-2012 dataset. The details can be found in the following BMVC-2014 paper:

Return of the Devil in the Details: Delving Deep into Convolutional Nets
K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman
British Machine Vision Conference, 2014 (arXiv ref. cs1405.3531)

Please cite the paper if you use the model.

The model is trained on 224x224 crops sampled from images, rescaled so that the smallest side is 256 (preserving the aspect ratio). The released mean BGR image should be subtracted from 224x224 crops.

Further details can be found in the paper and on the project website: http://www.robots.ox.ac.uk/~vgg/research/deep_eval/

Note

The model is stored in a different format than the one released at http://www.robots.ox.ac.uk/~vgg/software/deep_eval/ to make it compatible with BVLC Caffe and BGR images (the network weights are the same). The class order is also different; the one used here corresponds to synsets.txt in http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz

ILSVRC-2012 performance

Using 10 test crops (corners, centre, and horizontal flips), the top-5 classification error on the validation set of ILSVRC-2012 is 13.1%.

Using a single central crop, the top-5 classification error on the validation set of ILSVRC-2012 is 15.4%

The details of the evaluation can be found in the paper.

name: "VGG_CNN_S"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 224
input_dim: 224
layers {
bottom: "data"
top: "conv1"
name: "conv1"
type: CONVOLUTION
convolution_param {
num_output: 96
kernel_size: 7
stride: 2
}
}
layers {
bottom: "conv1"
top: "conv1"
name: "relu1"
type: RELU
}
layers {
bottom: "conv1"
top: "norm1"
name: "norm1"
type: LRN
lrn_param {
local_size: 5
alpha: 0.0005
beta: 0.75
k: 2
}
}
layers {
bottom: "norm1"
top: "pool1"
name: "pool1"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 3
stride: 3
}
}
layers {
bottom: "pool1"
top: "conv2"
name: "conv2"
type: CONVOLUTION
convolution_param {
num_output: 256
kernel_size: 5
}
}
layers {
bottom: "conv2"
top: "conv2"
name: "relu2"
type: RELU
}
layers {
bottom: "conv2"
top: "pool2"
name: "pool2"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layers {
bottom: "pool2"
top: "conv3"
name: "conv3"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv3"
top: "conv3"
name: "relu3"
type: RELU
}
layers {
bottom: "conv3"
top: "conv4"
name: "conv4"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv4"
top: "conv4"
name: "relu4"
type: RELU
}
layers {
bottom: "conv4"
top: "conv5"
name: "conv5"
type: CONVOLUTION
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layers {
bottom: "conv5"
top: "conv5"
name: "relu5"
type: RELU
}
layers {
bottom: "conv5"
top: "pool5"
name: "pool5"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 3
stride: 3
}
}
layers {
bottom: "pool5"
top: "fc6"
name: "fc6"
type: INNER_PRODUCT
inner_product_param {
num_output: 4096
}
}
layers {
bottom: "fc6"
top: "fc6"
name: "relu6"
type: RELU
}
layers {
bottom: "fc6"
top: "fc6"
name: "drop6"
type: DROPOUT
dropout_param {
dropout_ratio: 0.5
}
}
layers {
bottom: "fc6"
top: "fc7"
name: "fc7"
type: INNER_PRODUCT
inner_product_param {
num_output: 4096
}
}
layers {
bottom: "fc7"
top: "fc7"
name: "relu7"
type: RELU
}
layers {
bottom: "fc7"
top: "fc7"
name: "drop7"
type: DROPOUT
dropout_param {
dropout_ratio: 0.5
}
}
layers {
bottom: "fc7"
top: "fc8"
name: "fc8"
type: INNER_PRODUCT
inner_product_param {
num_output: 1000
}
}
layers {
bottom: "fc8"
top: "prob"
name: "prob"
type: SOFTMAX
}
@wuxinhong
Copy link

could you give us you train_val.prototxt?

@DCurro
Copy link

DCurro commented Nov 3, 2016

@wuxinhong: Here is what I used. You asked this long ago, but here it is, just in case anybody wants it.

name: VGG-CNN-S

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: false
    crop_size: 224
    mean_file: "VGG_mean.binaryproto"
  }
  image_data_param {
    source: "your-train-list.txt"
    root_folder: "data/"
    batch_size: 128
  }
}

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 224
    mean_file: "VGG_mean.binaryproto"
  }
  image_data_param {
    source: "your-val-list.txt"
    root_folder: "data/"
    batch_size: 128
  }
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 0
    decay_mult: 1
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }   
    bias_filler {
      type: "constant"
      value: 0
    }   
  }
}

layer {
  bottom: "conv1"
  top: "conv1"
  name: "relu1"
  type: "ReLU"
}
layer {
  bottom: "conv1"
  top: "norm1"
  name: "norm1"
  type: "LRN"
  lrn_param {
    local_size: 5
    alpha: 0.0005
    beta: 0.75
    k: 2
  }
}
layer {
  bottom: "norm1"
  top: "pool1"
  name: "pool1"
  type: "Pooling"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 3
  }
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 0
    decay_mult: 1
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }   
    bias_filler {
      type: "constant"
      value: 0
    }   
  }
}

layer {
  bottom: "conv2"
  top: "conv2"
  name: "relu2"
  type: "ReLU"
}
layer {
  bottom: "conv2"
  top: "pool2"
  name: "pool2"
  type: "Pooling"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 0
    decay_mult: 1
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }   
    bias_filler {
      type: "constant"
      value: 0
    }   
  }
}

layer {
  bottom: "conv3"
  top: "conv3"
  name: "relu3"
  type: "ReLU"
}

layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 0
    decay_mult: 1
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }   
    bias_filler {
      type: "constant"
      value: 0
    }   
  }
}

layer {
  bottom: "conv4"
  top: "conv4"
  name: "relu4"
  type: "ReLU"
}

layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 0
    decay_mult: 1
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }   
    bias_filler {
      type: "constant"
      value: 0
    }   
  }
}

layer {
  bottom: "conv5"
  top: "conv5"
  name: "relu5"
  type: "ReLU"
}
layer {
  bottom: "conv5"
  top: "pool5"
  name: "pool5"
  type: "Pooling"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 3
  }
}

layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }   
    bias_filler {
      type: "constant"
      value: 1
    }   
  }
}

layer {
  bottom: "fc6"
  top: "fc6"
  name: "relu6"
  type: "ReLU"
}
layer {
  bottom: "fc6"
  top: "fc6"
  name: "drop6"
  type: "Dropout"
  dropout_param {
    dropout_ratio: 0.5
  }
}


layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }   
    bias_filler {
      type: "constant"
      value: 1
    }   
  }
}

layer {
  bottom: "fc7"
  top: "fc7"
  name: "relu7"
  type: "ReLU"
}
layer {
  bottom: "fc7"
  top: "fc7"
  name: "drop7"
  type: "Dropout"
  dropout_param {
    dropout_ratio: 0.5
  }
}

layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.005
    }   
    bias_filler {
      type: "constant"
      value: 1
    }   
  }
}

layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8"
  bottom: "label"
  top: "accuracy"
  accuracy_param {
    top_k: 5
  }
  include {
    phase: TEST
  }
}

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

@abhaydoke09
Copy link

In the above model input data images are given directly instead of the LMDB format, right?

@kevinkit
Copy link

kevinkit commented Jan 3, 2017

Noyes. They are in a textfile leading to a path where the image is stored. The textfile is written like: Path/to/image LABEL
Where label is an integer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment