Skip to content

Instantly share code, notes, and snippets.

@mavenlin
Last active January 30, 2023 06:40
Show Gist options
  • Star 28 You must be signed in to star a gist
  • Fork 15 You must be signed in to fork a gist
  • Save mavenlin/e56253735ef32c3c296d to your computer and use it in GitHub Desktop.
Save mavenlin/e56253735ef32c3c296d to your computer and use it in GitHub Desktop.
Network in Network CIFAR10

Info

name: Network in Network CIFAR10 Model

caffemodel: cifar10_nin.caffemodel

caffemodel_url: https://www.dropbox.com/s/blrajqirr1p31v0/cifar10_nin.caffemodel?dl=1

license: BSD

sha1: 8e89c8fcd46e02780e16c867a5308e7bb7af0803

caffe_commit: c69b3b49084b503e23b95dc387329975245949c2

gist_id: e56253735ef32c3c296d

Descriptions

This model is a 3 layer Network in Network model trained on CIFAR10 dataset.

The performance of this model on validation set is 89.6% The detailed descriptions are in the paper Network in Network

The preprocessed CIFAR10 data is downloadable in lmdb format here:

License

The data used to train this model comes from http://www.cs.toronto.edu/~kriz/cifar.html Please follow the license there if used.

net: "train_test.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.1
momentum: 0.9
weight_decay: 0.0001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 100
max_iter: 120000
snapshot: 10000
snapshot_prefix: "cifar10_nin"
solver_mode: GPU
name: "CIFAR10_full"
layers {
name: "cifar"
type: DATA
top: "data"
top: "label"
data_param {
source: "cifar-train-leveldb"
batch_size: 128
}
include: { phase: TRAIN }
}
layers {
name: "cifar"
type: DATA
top: "data"
top: "label"
data_param {
source: "cifar-test-leveldb"
batch_size: 100
}
include: { phase: TEST }
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1.
weight_decay: 0.
convolution_param {
num_output: 192
pad: 2
kernel_size: 5
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "cccp1"
type: CONVOLUTION
bottom: "conv1"
top: "cccp1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 160
group: 1
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu_cccp1"
type: RELU
bottom: "cccp1"
top: "cccp1"
}
layers {
name: "cccp2"
type: CONVOLUTION
bottom: "cccp1"
top: "cccp2"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 96
group: 1
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu_cccp2"
type: RELU
bottom: "cccp2"
top: "cccp2"
}
layers {
name: "pool1"
type: POOLING
bottom: "cccp2"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "drop3"
type: DROPOUT
bottom: "pool1"
top: "pool1"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "pool1"
top: "conv2"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1.
weight_decay: 0.
convolution_param {
num_output: 192
pad: 2
kernel_size: 5
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "relu2"
type: RELU
bottom: "conv2"
top: "conv2"
}
layers {
name: "cccp3"
type: CONVOLUTION
bottom: "conv2"
top: "cccp3"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 192
group: 1
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu_cccp3"
type: RELU
bottom: "cccp3"
top: "cccp3"
}
layers {
name: "cccp4"
type: CONVOLUTION
bottom: "cccp3"
top: "cccp4"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 192
group: 1
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu_cccp4"
type: RELU
bottom: "cccp4"
top: "cccp4"
}
layers {
name: "pool2"
type: POOLING
bottom: "cccp4"
top: "pool2"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layers {
name: "drop6"
type: DROPOUT
bottom: "pool2"
top: "pool2"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "pool2"
top: "conv3"
blobs_lr: 1.
blobs_lr: 2.
weight_decay: 1.
weight_decay: 0.
convolution_param {
num_output: 192
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "relu3"
type: RELU
bottom: "conv3"
top: "conv3"
}
layers {
name: "cccp5"
type: CONVOLUTION
bottom: "conv3"
top: "cccp5"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 192
group: 1
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu_cccp5"
type: RELU
bottom: "cccp5"
top: "cccp5"
}
layers {
name: "cccp6"
type: CONVOLUTION
bottom: "cccp5"
top: "cccp6"
blobs_lr: 0.1
blobs_lr: 0.1
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 10
group: 1
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.05
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu_cccp6"
type: RELU
bottom: "cccp6"
top: "cccp6"
}
layers {
name: "pool3"
type: POOLING
bottom: "cccp6"
top: "pool3"
pooling_param {
pool: AVE
kernel_size: 8
stride: 1
}
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "pool3"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "pool3"
bottom: "label"
top: "loss"
}
@jasonustc
Copy link

@zashani I downloaded the pre-processed data, but I think it's in the format of leveldb. The problem is, when I extraced the files to a folder, then run this model, but it seems that the model can not read data from the folder(empty), any body comes into this problem?

@wwnigel
Copy link

wwnigel commented Aug 5, 2015

@jasonustc

Have you solved the problem? I am facing the same problem, i.e. the leveldb data in the extracted folder cannot be read by train_test.prototxt.

@PeterPan1990
Copy link

@DannaShavit, I also got 50% accuracy on the pretrained cifar10 data, did you find the solution?

@PeterPan1990
Copy link

@RyanLiuNtust, would you like to share the experience of reproducing the results on cifar10? I got stucked of 50% accuracy? I have send you a email, Thanks!

@mfigurnov
Copy link

@PeterPan1990 maybe your problems with reproducing the results are caused by the issue BVLC/caffe#2688

Here is the model definition with workaround for the issue: pool1 layer is forced to use Caffe engine https://gist.github.com/mfigurnov/4736f2f4a6e1676d074d

@diaomin
Copy link

diaomin commented Sep 10, 2015

@jasonustc Hello, jasonustc, have you solved the problem? I am facing the same problem. The model can not read the given pre-processed data.

@Coldmooon
Copy link

@diaomin Have you changed net: "train_test.prototxt" to net: "train_val.prototxt" in the solver.prototxt file? The solver.prototxt points to train_test.prototxt which is actually train_val.prototxt.

@hengck23
Copy link

@diaomin , @PeterPan1990
I would like to share my experiences on repeating the experiments results for network-in-network on cifar 10 [1]. In summary, the paper results is correct and can be repeated somehow.

There are two open-source implementation:

1) caffe implementation

  • dataset (caffe leveldb format) can be download from :https://gist.github.com/mavenlin/e56253735ef32c3c296d
  • the files are "cifar-test-leveldb" and "cifar-train-leveldb"
  • my initial version of window caffe (I forget the version) cannot read this.
  • I have to change my leveldb version to this "https://github.com/maxd/leveldbwin".
  • Then, I verify that testing "cifar10_nin.caffemodel" on "cifar-test-leveldb" does indeed produce accuracy of 89.4%.
  • However, I cannot get the same results if I train my own model using the network and solver prototxt files provided. (I suspected the issue may be the version of caffe used )
  • I tried changing the base learn rate from 0.1 to 0.01.
  • I also tried renaming the "top" and "bottom" blob names of the layer to prevent "in-place" operation like relu, dropout.
  • I also tried switching to caffe engine for max pooling.
  • I got test accuracy stuck at 10% (loss 2.30).
  • I can get test accuracy near 60% after playing with the solver parameters.

2) convnet implementation

  • there is a convnet implementation by the author at: https://github.com/mavenlin/cuda-convnet
  • dataset can be downloaded too. It is in python format and the file is "cifar-10-py-colmajor".
  • I verify that this python file is same as the caffe leveldb version (up to small difference of near 1e-7)
  • I compiled the convnet code (in windows) and can get the same results as claimed , near 89%.

3) my implementation

  • I wrote my own c++ code. (mainly by hacking into caffe and copy their layer source code into mine).
  • I set up the same architecture using the caffe network and solver prototxt files.
  • I can get around 88% accuracy.

4) Others

some information that may be useful.

[1]"Network In Network" - M. Lin, Q. Chen, S. Yan, ICLR-2014.
[2] "Empirical Evaluation of Rectified Activations in Convolutional Network"-Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li, arxiv 2015

@Coldmooon
Copy link

Coldmooon commented Oct 13, 2015

@mfigurnov Thanks great! This solved my problem.

I've reproduced the paper result two times and got the accuracy of 89.47%( learning rate is 0.1 ) and 87.2( downgrade learning rate to 0.01).

@kgl-prml
Copy link

@hengck23 Using current caffe version, I also cannot repeat the results. After reading your experience, I am quite curious about why current caffe implementation cannot do this. Does that means there may exist some bugs in caffe?

@kgl-prml
Copy link

@Coldmooon How do you reproduce the paper result? Just use the current caffe implementation or you have performed any additional work?

@mollahosseini
Copy link

@kgl-prml, we followed the paper outline and did global contrast normalization and ZCA whitening and we were able to reproduce %10.4 error rate on cifar-10

@mollahosseini
Copy link

Has anybody been able to reproduce the paper results on cifar-100? We could get the paper result on cifar-10 (%10 error rate) but the same network doesn't converge to less than %75 error rate on cifar-100! Do you have any suggestion?

@rockstone533
Copy link

Hi, @hengck23. I try to reproduce NIN with caffe and get nearly 88% accuracy follow your advice.Now, I encounter some questions.
I want to use the network architecture to test a new image in python,
1.How should I implement GCN? I guess subtract mean and divide std, is it right?
2.Is the dropout layer needed to test a new image?
By the way, do you know how to visualize the feature map as shown in the paper? The patch size seems smaller than the input.

@tingtinglu
Copy link

I also want to implement this paper (NIN). But i am new in caffe and deep learning. I download the pre-trained model and the cifar-test-leveldb file from the website of the author(https://gist.github.com/mavenlin/e56253735ef32c3c296d), but my caffe cannot read the cifar-test-leveldb. What is wrong with this? and how can implement this experiment easily? thanks so much!

@Perseus14
Copy link

@mavenlin Can you upload the deploy.prototxt or show how to convert train_val.prototxt to deploy.prototxt as I don't know the input params. I want to take the model weights and biases without downloading the entire dataset.

@happyzhouch
Copy link

@hengck23
Hello,I also meet the problem that I can't read the downloaded data,I see you change your leveldb.Can you tell me how to change it ? I don't understand it.
I have to change my leveldb version to this "https://github.com/maxd/leveldbwin".

@happyzhouch
Copy link

@hengck23
How to use the leveldbwin? I can't understand it.I am a fresher in caffe and hope to get the solution from you sincerely.

@ectg
Copy link

ectg commented Feb 13, 2017

@Perseus14, did you find the deploy.prototxt ?

@sayadyaghoobi
Copy link

i want to finetune the nin model just for 3 classes, but i dont know what layer must be changed? i tried to rename cccp6 but it didnt work. anyone has idea about it? please share

@sayadyaghoobi
Copy link

i'm trying to train nin with my data, and i don't want to use pretrained weights, instead i'm going to try with random weights and use the nin network, my data consists of 3 classes and i just tried to to change the output of cccp6 from 10 to 3. when i run it i got wrong constant accuracy 0.3368 in every testing process. so if anyone has idea to help me with this error?
i did'nt change anything except the cccp6' output 10 to 3. thnx very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment