Skip to content

Instantly share code, notes, and snippets.

Last active February 5, 2023 13:02
Show Gist options
  • Save mavenlin/d802a5849de39225bcc6 to your computer and use it in GitHub Desktop.
Save mavenlin/d802a5849de39225bcc6 to your computer and use it in GitHub Desktop.
Network in Network Imagenet Model


name: Network in Network Imagenet Model

caffemodel: nin_imagenet.caffemodel

caffemodel_url: license: BSD

caffe_commit: pull request yet to be merged

gist_id: d802a5849de39225bcc6


This model is a 4 layer Network in Network model trained on imagenet dataset.

Thanks to the replacement of fully connected layer with a global average pooling layer, this model has greatly reduced parameters, which results in a snapshot of size 29MB, compared to AlexNet which is about 230MB, it is one eighth the size.

The top 1 performance of this model on validation set is 59.36%, which is slightly better than AlexNet. (Using the average of 10 crops, (4 + 1 center) * 2 mirror, should obtain a bit higher accuracy.)

The training time of the model is also greatly reduced compared to AlexNet because of the faster convergence. It takes 4-5 days to train on a GTX Titan.



net: "models/nin_imagenet/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 200000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/nin_imagenet/nin_imagenet_train"
solver_mode: GPU
name: "nin_imagenet"
layers {
top: "data"
top: "label"
name: "data"
type: DATA
data_param {
source: "/home/linmin/IMAGENET-LMDB/imagenet-train-lmdb"
backend: LMDB
batch_size: 64
transform_param {
crop_size: 224
mirror: true
mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean"
include: { phase: TRAIN }
layers {
top: "data"
top: "label"
name: "data"
type: DATA
data_param {
source: "/home/linmin/IMAGENET-LMDB/imagenet-val-lmdb"
backend: LMDB
batch_size: 89
transform_param {
crop_size: 224
mirror: false
mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean"
include: { phase: TEST }
layers {
bottom: "data"
top: "conv1"
name: "conv1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
mean: 0
std: 0.01
bias_filler {
type: "constant"
value: 0
layers {
bottom: "conv1"
top: "conv1"
name: "relu0"
type: RELU
layers {
bottom: "conv1"
top: "cccp1"
name: "cccp1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "cccp1"
top: "cccp1"
name: "relu1"
type: RELU
layers {
bottom: "cccp1"
top: "cccp2"
name: "cccp2"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "cccp2"
top: "cccp2"
name: "relu2"
type: RELU
layers {
bottom: "cccp2"
top: "pool0"
name: "pool0"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
layers {
bottom: "pool0"
top: "conv2"
name: "conv2"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "conv2"
top: "conv2"
name: "relu3"
type: RELU
layers {
bottom: "conv2"
top: "cccp3"
name: "cccp3"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "cccp3"
top: "cccp3"
name: "relu5"
type: RELU
layers {
bottom: "cccp3"
top: "cccp4"
name: "cccp4"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "cccp4"
top: "cccp4"
name: "relu6"
type: RELU
layers {
bottom: "cccp4"
top: "pool2"
name: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
layers {
bottom: "pool2"
top: "conv3"
name: "conv3"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.01
bias_filler {
type: "constant"
value: 0
layers {
bottom: "conv3"
top: "conv3"
name: "relu7"
type: RELU
layers {
bottom: "conv3"
top: "cccp5"
name: "cccp5"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 384
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "cccp5"
top: "cccp5"
name: "relu8"
type: RELU
layers {
bottom: "cccp5"
top: "cccp6"
name: "cccp6"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 384
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "cccp6"
top: "cccp6"
name: "relu9"
type: RELU
layers {
bottom: "cccp6"
top: "pool3"
name: "pool3"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
layers {
bottom: "pool3"
top: "pool3"
name: "drop"
dropout_param {
dropout_ratio: 0.5
layers {
bottom: "pool3"
top: "conv4"
name: "conv4-1024"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 1024
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "conv4"
top: "conv4"
name: "relu10"
type: RELU
layers {
bottom: "conv4"
top: "cccp7"
name: "cccp7-1024"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 1024
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.05
bias_filler {
type: "constant"
value: 0
layers {
bottom: "cccp7"
top: "cccp7"
name: "relu11"
type: RELU
layers {
bottom: "cccp7"
top: "cccp8"
name: "cccp8-1024"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 1000
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0
std: 0.01
bias_filler {
type: "constant"
value: 0
layers {
bottom: "cccp8"
top: "cccp8"
name: "relu12"
type: RELU
layers {
bottom: "cccp8"
top: "pool4"
name: "pool4"
pooling_param {
pool: AVE
kernel_size: 6
stride: 1
layers {
name: "accuracy"
bottom: "pool4"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
layers {
bottom: "pool4"
bottom: "label"
name: "loss"
include: { phase: TRAIN }
Copy link

Has anyone else trained any other Network In Network (NIN) models? Or is this the only one?

Copy link

mrgloom commented Oct 15, 2016

layers {
  bottom: "cccp8"
  top: "pool4"
  name: "pool4"
  type: POOLING
  pooling_param {
    pool: AVE
    kernel_size: 6
    stride: 1

Seems this is old Caffe .prototxt, do we need now specify global_pooling: true?
As far as I can see NIN use global average pooling layer, not just average pooling. [link to paper](global average poolin)

layer {
  name: "pool4"
  type: "Pooling"
  bottom: "cccp8"
  top: "pool4"
  pooling_param {
    pool: AVE
    global_pooling: true

Copy link

moyix commented Oct 9, 2017

Hi @mavenlin,

I noticed that the SHA1 of the caffe model does not match what's listed here (the SHA1 listed here is 8e89c8fcd46e02780e16c867a5308e7bb7af0803 but the SHA1 of the downloaded model is 2794deb2aada04f667894b7d6d929371b4689ea9). Maybe this should be fixed so that people can be sure their download was successful and they're getting the correct model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment