Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
GoogLeNet in Keras

GoogLeNet in Keras

Here is a Keras model of GoogLeNet (a.k.a Inception V1). I created it by converting the GoogLeNet model from Caffe.

GoogLeNet paper:

Going deeper with convolutions.
Szegedy, Christian, et al. 
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

Requirements

The code now runs with Python 3.6, Keras 2.2.4, and either Theano 1.0.4 or Tensorflow 1.14.0. You will also need to install the following:

pip install pillow numpy imageio

To switch to the Theano backend, change your ~/.keras/keras.json file to

{"epsilon": 1e-07, "floatx": "float32", "backend": "theano", "image_data_format": "channels_first"}

Or for the Tensorflow backend,

{"epsilon": 1e-07, "floatx": "float32", "backend": "tensorflow", "image_data_format": "channels_first"}

Note that in either case, the code requires the channels_first option for image_data_format.

Running the Demo (googlenet.py)

To create a GoogLeNet model, call the following from within Python:

from googlenet import create_googlenet
model = create_googlenet()

googlenet.py also contains a demo image classification. To run the demo, you will need to install the pre-trained weights and the class labels. You will also need this test image. Once these are downloaded and moved to the working directory, you can run googlenet.py from the terminal:

$ python googlenet.py

which will output the predicted class label for the image.

from __future__ import print_function
import imageio
from PIL import Image
import numpy as np
import keras
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, AveragePooling2D, ZeroPadding2D, Dropout, Flatten, Concatenate, Reshape, Activation
from keras.models import Model
from keras.regularizers import l2
from keras.optimizers import SGD
from pool_helper import PoolHelper
from lrn import LRN
if keras.backend.backend() == 'tensorflow':
from keras import backend as K
import tensorflow as tf
from keras.utils.conv_utils import convert_kernel
def create_googlenet(weights_path=None):
# creates GoogLeNet a.k.a. Inception v1 (Szegedy, 2015)
input = Input(shape=(3, 224, 224))
input_pad = ZeroPadding2D(padding=(3, 3))(input)
conv1_7x7_s2 = Conv2D(64, (7,7), strides=(2,2), padding='valid', activation='relu', name='conv1/7x7_s2', kernel_regularizer=l2(0.0002))(input_pad)
conv1_zero_pad = ZeroPadding2D(padding=(1, 1))(conv1_7x7_s2)
pool1_helper = PoolHelper()(conv1_zero_pad)
pool1_3x3_s2 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid', name='pool1/3x3_s2')(pool1_helper)
pool1_norm1 = LRN(name='pool1/norm1')(pool1_3x3_s2)
conv2_3x3_reduce = Conv2D(64, (1,1), padding='same', activation='relu', name='conv2/3x3_reduce', kernel_regularizer=l2(0.0002))(pool1_norm1)
conv2_3x3 = Conv2D(192, (3,3), padding='same', activation='relu', name='conv2/3x3', kernel_regularizer=l2(0.0002))(conv2_3x3_reduce)
conv2_norm2 = LRN(name='conv2/norm2')(conv2_3x3)
conv2_zero_pad = ZeroPadding2D(padding=(1, 1))(conv2_norm2)
pool2_helper = PoolHelper()(conv2_zero_pad)
pool2_3x3_s2 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid', name='pool2/3x3_s2')(pool2_helper)
inception_3a_1x1 = Conv2D(64, (1,1), padding='same', activation='relu', name='inception_3a/1x1', kernel_regularizer=l2(0.0002))(pool2_3x3_s2)
inception_3a_3x3_reduce = Conv2D(96, (1,1), padding='same', activation='relu', name='inception_3a/3x3_reduce', kernel_regularizer=l2(0.0002))(pool2_3x3_s2)
inception_3a_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_3a_3x3_reduce)
inception_3a_3x3 = Conv2D(128, (3,3), padding='valid', activation='relu', name='inception_3a/3x3', kernel_regularizer=l2(0.0002))(inception_3a_3x3_pad)
inception_3a_5x5_reduce = Conv2D(16, (1,1), padding='same', activation='relu', name='inception_3a/5x5_reduce', kernel_regularizer=l2(0.0002))(pool2_3x3_s2)
inception_3a_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_3a_5x5_reduce)
inception_3a_5x5 = Conv2D(32, (5,5), padding='valid', activation='relu', name='inception_3a/5x5', kernel_regularizer=l2(0.0002))(inception_3a_5x5_pad)
inception_3a_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_3a/pool')(pool2_3x3_s2)
inception_3a_pool_proj = Conv2D(32, (1,1), padding='same', activation='relu', name='inception_3a/pool_proj', kernel_regularizer=l2(0.0002))(inception_3a_pool)
inception_3a_output = Concatenate(axis=1, name='inception_3a/output')([inception_3a_1x1,inception_3a_3x3,inception_3a_5x5,inception_3a_pool_proj])
inception_3b_1x1 = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_3b/1x1', kernel_regularizer=l2(0.0002))(inception_3a_output)
inception_3b_3x3_reduce = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_3b/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_3a_output)
inception_3b_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_3b_3x3_reduce)
inception_3b_3x3 = Conv2D(192, (3,3), padding='valid', activation='relu', name='inception_3b/3x3', kernel_regularizer=l2(0.0002))(inception_3b_3x3_pad)
inception_3b_5x5_reduce = Conv2D(32, (1,1), padding='same', activation='relu', name='inception_3b/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_3a_output)
inception_3b_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_3b_5x5_reduce)
inception_3b_5x5 = Conv2D(96, (5,5), padding='valid', activation='relu', name='inception_3b/5x5', kernel_regularizer=l2(0.0002))(inception_3b_5x5_pad)
inception_3b_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_3b/pool')(inception_3a_output)
inception_3b_pool_proj = Conv2D(64, (1,1), padding='same', activation='relu', name='inception_3b/pool_proj', kernel_regularizer=l2(0.0002))(inception_3b_pool)
inception_3b_output = Concatenate(axis=1, name='inception_3b/output')([inception_3b_1x1,inception_3b_3x3,inception_3b_5x5,inception_3b_pool_proj])
inception_3b_output_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_3b_output)
pool3_helper = PoolHelper()(inception_3b_output_zero_pad)
pool3_3x3_s2 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid', name='pool3/3x3_s2')(pool3_helper)
inception_4a_1x1 = Conv2D(192, (1,1), padding='same', activation='relu', name='inception_4a/1x1', kernel_regularizer=l2(0.0002))(pool3_3x3_s2)
inception_4a_3x3_reduce = Conv2D(96, (1,1), padding='same', activation='relu', name='inception_4a/3x3_reduce', kernel_regularizer=l2(0.0002))(pool3_3x3_s2)
inception_4a_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_4a_3x3_reduce)
inception_4a_3x3 = Conv2D(208, (3,3), padding='valid', activation='relu', name='inception_4a/3x3' ,kernel_regularizer=l2(0.0002))(inception_4a_3x3_pad)
inception_4a_5x5_reduce = Conv2D(16, (1,1), padding='same', activation='relu', name='inception_4a/5x5_reduce', kernel_regularizer=l2(0.0002))(pool3_3x3_s2)
inception_4a_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_4a_5x5_reduce)
inception_4a_5x5 = Conv2D(48, (5,5), padding='valid', activation='relu', name='inception_4a/5x5', kernel_regularizer=l2(0.0002))(inception_4a_5x5_pad)
inception_4a_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4a/pool')(pool3_3x3_s2)
inception_4a_pool_proj = Conv2D(64, (1,1), padding='same', activation='relu', name='inception_4a/pool_proj', kernel_regularizer=l2(0.0002))(inception_4a_pool)
inception_4a_output = Concatenate(axis=1, name='inception_4a/output')([inception_4a_1x1,inception_4a_3x3,inception_4a_5x5,inception_4a_pool_proj])
loss1_ave_pool = AveragePooling2D(pool_size=(5,5), strides=(3,3), name='loss1/ave_pool')(inception_4a_output)
loss1_conv = Conv2D(128, (1,1), padding='same', activation='relu', name='loss1/conv', kernel_regularizer=l2(0.0002))(loss1_ave_pool)
loss1_flat = Flatten()(loss1_conv)
loss1_fc = Dense(1024, activation='relu', name='loss1/fc', kernel_regularizer=l2(0.0002))(loss1_flat)
loss1_drop_fc = Dropout(rate=0.7)(loss1_fc)
loss1_classifier = Dense(1000, name='loss1/classifier', kernel_regularizer=l2(0.0002))(loss1_drop_fc)
loss1_classifier_act = Activation('softmax')(loss1_classifier)
inception_4b_1x1 = Conv2D(160, (1,1), padding='same', activation='relu', name='inception_4b/1x1', kernel_regularizer=l2(0.0002))(inception_4a_output)
inception_4b_3x3_reduce = Conv2D(112, (1,1), padding='same', activation='relu', name='inception_4b/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4a_output)
inception_4b_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_4b_3x3_reduce)
inception_4b_3x3 = Conv2D(224, (3,3), padding='valid', activation='relu', name='inception_4b/3x3', kernel_regularizer=l2(0.0002))(inception_4b_3x3_pad)
inception_4b_5x5_reduce = Conv2D(24, (1,1), padding='same', activation='relu', name='inception_4b/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4a_output)
inception_4b_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_4b_5x5_reduce)
inception_4b_5x5 = Conv2D(64, (5,5), padding='valid', activation='relu', name='inception_4b/5x5', kernel_regularizer=l2(0.0002))(inception_4b_5x5_pad)
inception_4b_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4b/pool')(inception_4a_output)
inception_4b_pool_proj = Conv2D(64, (1,1), padding='same', activation='relu', name='inception_4b/pool_proj', kernel_regularizer=l2(0.0002))(inception_4b_pool)
inception_4b_output = Concatenate(axis=1, name='inception_4b/output')([inception_4b_1x1,inception_4b_3x3,inception_4b_5x5,inception_4b_pool_proj])
inception_4c_1x1 = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_4c/1x1', kernel_regularizer=l2(0.0002))(inception_4b_output)
inception_4c_3x3_reduce = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_4c/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4b_output)
inception_4c_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_4c_3x3_reduce)
inception_4c_3x3 = Conv2D(256, (3,3), padding='valid', activation='relu', name='inception_4c/3x3', kernel_regularizer=l2(0.0002))(inception_4c_3x3_pad)
inception_4c_5x5_reduce = Conv2D(24, (1,1), padding='same', activation='relu', name='inception_4c/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4b_output)
inception_4c_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_4c_5x5_reduce)
inception_4c_5x5 = Conv2D(64, (5,5), padding='valid', activation='relu', name='inception_4c/5x5', kernel_regularizer=l2(0.0002))(inception_4c_5x5_pad)
inception_4c_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4c/pool')(inception_4b_output)
inception_4c_pool_proj = Conv2D(64, (1,1), padding='same', activation='relu', name='inception_4c/pool_proj', kernel_regularizer=l2(0.0002))(inception_4c_pool)
inception_4c_output = Concatenate(axis=1, name='inception_4c/output')([inception_4c_1x1,inception_4c_3x3,inception_4c_5x5,inception_4c_pool_proj])
inception_4d_1x1 = Conv2D(112, (1,1), padding='same', activation='relu', name='inception_4d/1x1', kernel_regularizer=l2(0.0002))(inception_4c_output)
inception_4d_3x3_reduce = Conv2D(144, (1,1), padding='same', activation='relu', name='inception_4d/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4c_output)
inception_4d_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_4d_3x3_reduce)
inception_4d_3x3 = Conv2D(288, (3,3), padding='valid', activation='relu', name='inception_4d/3x3', kernel_regularizer=l2(0.0002))(inception_4d_3x3_pad)
inception_4d_5x5_reduce = Conv2D(32, (1,1), padding='same', activation='relu', name='inception_4d/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4c_output)
inception_4d_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_4d_5x5_reduce)
inception_4d_5x5 = Conv2D(64, (5,5), padding='valid', activation='relu', name='inception_4d/5x5', kernel_regularizer=l2(0.0002))(inception_4d_5x5_pad)
inception_4d_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4d/pool')(inception_4c_output)
inception_4d_pool_proj = Conv2D(64, (1,1), padding='same', activation='relu', name='inception_4d/pool_proj', kernel_regularizer=l2(0.0002))(inception_4d_pool)
inception_4d_output = Concatenate(axis=1, name='inception_4d/output')([inception_4d_1x1,inception_4d_3x3,inception_4d_5x5,inception_4d_pool_proj])
loss2_ave_pool = AveragePooling2D(pool_size=(5,5), strides=(3,3), name='loss2/ave_pool')(inception_4d_output)
loss2_conv = Conv2D(128, (1,1), padding='same', activation='relu', name='loss2/conv', kernel_regularizer=l2(0.0002))(loss2_ave_pool)
loss2_flat = Flatten()(loss2_conv)
loss2_fc = Dense(1024, activation='relu', name='loss2/fc', kernel_regularizer=l2(0.0002))(loss2_flat)
loss2_drop_fc = Dropout(rate=0.7)(loss2_fc)
loss2_classifier = Dense(1000, name='loss2/classifier', kernel_regularizer=l2(0.0002))(loss2_drop_fc)
loss2_classifier_act = Activation('softmax')(loss2_classifier)
inception_4e_1x1 = Conv2D(256, (1,1), padding='same', activation='relu', name='inception_4e/1x1', kernel_regularizer=l2(0.0002))(inception_4d_output)
inception_4e_3x3_reduce = Conv2D(160, (1,1), padding='same', activation='relu', name='inception_4e/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4d_output)
inception_4e_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_4e_3x3_reduce)
inception_4e_3x3 = Conv2D(320, (3,3), padding='valid', activation='relu', name='inception_4e/3x3', kernel_regularizer=l2(0.0002))(inception_4e_3x3_pad)
inception_4e_5x5_reduce = Conv2D(32, (1,1), padding='same', activation='relu', name='inception_4e/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4d_output)
inception_4e_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_4e_5x5_reduce)
inception_4e_5x5 = Conv2D(128, (5,5), padding='valid', activation='relu', name='inception_4e/5x5', kernel_regularizer=l2(0.0002))(inception_4e_5x5_pad)
inception_4e_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4e/pool')(inception_4d_output)
inception_4e_pool_proj = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_4e/pool_proj', kernel_regularizer=l2(0.0002))(inception_4e_pool)
inception_4e_output = Concatenate(axis=1, name='inception_4e/output')([inception_4e_1x1,inception_4e_3x3,inception_4e_5x5,inception_4e_pool_proj])
inception_4e_output_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_4e_output)
pool4_helper = PoolHelper()(inception_4e_output_zero_pad)
pool4_3x3_s2 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid', name='pool4/3x3_s2')(pool4_helper)
inception_5a_1x1 = Conv2D(256, (1,1), padding='same', activation='relu', name='inception_5a/1x1', kernel_regularizer=l2(0.0002))(pool4_3x3_s2)
inception_5a_3x3_reduce = Conv2D(160, (1,1), padding='same', activation='relu', name='inception_5a/3x3_reduce', kernel_regularizer=l2(0.0002))(pool4_3x3_s2)
inception_5a_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_5a_3x3_reduce)
inception_5a_3x3 = Conv2D(320, (3,3), padding='valid', activation='relu', name='inception_5a/3x3', kernel_regularizer=l2(0.0002))(inception_5a_3x3_pad)
inception_5a_5x5_reduce = Conv2D(32, (1,1), padding='same', activation='relu', name='inception_5a/5x5_reduce', kernel_regularizer=l2(0.0002))(pool4_3x3_s2)
inception_5a_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_5a_5x5_reduce)
inception_5a_5x5 = Conv2D(128, (5,5), padding='valid', activation='relu', name='inception_5a/5x5', kernel_regularizer=l2(0.0002))(inception_5a_5x5_pad)
inception_5a_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_5a/pool')(pool4_3x3_s2)
inception_5a_pool_proj = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_5a/pool_proj', kernel_regularizer=l2(0.0002))(inception_5a_pool)
inception_5a_output = Concatenate(axis=1, name='inception_5a/output')([inception_5a_1x1,inception_5a_3x3,inception_5a_5x5,inception_5a_pool_proj])
inception_5b_1x1 = Conv2D(384, (1,1), padding='same', activation='relu', name='inception_5b/1x1', kernel_regularizer=l2(0.0002))(inception_5a_output)
inception_5b_3x3_reduce = Conv2D(192, (1,1), padding='same', activation='relu', name='inception_5b/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_5a_output)
inception_5b_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_5b_3x3_reduce)
inception_5b_3x3 = Conv2D(384, (3,3), padding='valid', activation='relu', name='inception_5b/3x3', kernel_regularizer=l2(0.0002))(inception_5b_3x3_pad)
inception_5b_5x5_reduce = Conv2D(48, (1,1), padding='same', activation='relu', name='inception_5b/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_5a_output)
inception_5b_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_5b_5x5_reduce)
inception_5b_5x5 = Conv2D(128, (5,5), padding='valid', activation='relu', name='inception_5b/5x5', kernel_regularizer=l2(0.0002))(inception_5b_5x5_pad)
inception_5b_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_5b/pool')(inception_5a_output)
inception_5b_pool_proj = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_5b/pool_proj', kernel_regularizer=l2(0.0002))(inception_5b_pool)
inception_5b_output = Concatenate(axis=1, name='inception_5b/output')([inception_5b_1x1,inception_5b_3x3,inception_5b_5x5,inception_5b_pool_proj])
pool5_7x7_s1 = AveragePooling2D(pool_size=(7,7), strides=(1,1), name='pool5/7x7_s2')(inception_5b_output)
loss3_flat = Flatten()(pool5_7x7_s1)
pool5_drop_7x7_s1 = Dropout(rate=0.4)(loss3_flat)
loss3_classifier = Dense(1000, name='loss3/classifier', kernel_regularizer=l2(0.0002))(pool5_drop_7x7_s1)
loss3_classifier_act = Activation('softmax', name='prob')(loss3_classifier)
googlenet = Model(inputs=input, outputs=[loss1_classifier_act,loss2_classifier_act,loss3_classifier_act])
if weights_path:
googlenet.load_weights(weights_path)
if keras.backend.backend() == 'tensorflow':
# convert the convolutional kernels for tensorflow
ops = []
for layer in googlenet.layers:
if layer.__class__.__name__ == 'Conv2D':
original_w = K.get_value(layer.kernel)
converted_w = convert_kernel(original_w)
ops.append(tf.assign(layer.kernel, converted_w).op)
K.get_session().run(ops)
return googlenet
if __name__ == "__main__":
img = imageio.imread('cat.jpg', pilmode='RGB')
img = np.array(Image.fromarray(img).resize((224, 224))).astype(np.float32)
img[:, :, 0] -= 123.68
img[:, :, 1] -= 116.779
img[:, :, 2] -= 103.939
img[:,:,[0,1,2]] = img[:,:,[2,1,0]]
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, axis=0)
# Test pretrained model
model = create_googlenet('googlenet_weights.h5')
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy')
out = model.predict(img) # note: the model has three outputs
labels = np.loadtxt('synset_words.txt', str, delimiter='\t')
predicted_label = np.argmax(out[2])
predicted_class_name = labels[predicted_label]
print('Predicted Class: ', predicted_label, ', Class Name: ', predicted_class_name)
from keras.layers.core import Layer
from keras import backend as K
if K.backend() == 'theano':
import theano.tensor as T
elif K.backend() == 'tensorflow':
import tensorflow as tf
else:
raise NotImplementedError
class LRN(Layer):
def __init__(self, alpha=0.0001, k=1, beta=0.75, n=5, **kwargs):
self.alpha = alpha
self.k = k
self.beta = beta
self.n = n
super(LRN, self).__init__(**kwargs)
def call(self, x, mask=None):
b, ch, r, c = x.shape
half_n = self.n // 2 # half the local region
input_sqr = K.square(x) # square the input
if K.backend() == 'theano':
# make an empty tensor with zero pads along channel dimension
zeros = T.alloc(0., b, ch + 2*half_n, r, c)
# set the center to be the squared input
input_sqr = T.set_subtensor(zeros[:, half_n:half_n+ch, :, :], input_sqr)
else:
input_sqr = tf.pad(input_sqr, [[0, 0], [half_n, half_n], [0, 0], [0, 0]])
scale = self.k # offset for the scale
norm_alpha = self.alpha / self.n # normalized alpha
for i in range(self.n):
scale += norm_alpha * input_sqr[:, i:i+ch, :, :]
scale = scale ** self.beta
x = x / scale
return x
def get_config(self):
config = {"alpha": self.alpha,
"k": self.k,
"beta": self.beta,
"n": self.n}
base_config = super(LRN, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
from keras.layers.core import Layer
class PoolHelper(Layer):
def __init__(self, **kwargs):
super(PoolHelper, self).__init__(**kwargs)
def call(self, x, mask=None):
return x[:,:,1:,1:]
def get_config(self):
config = {}
base_config = super(PoolHelper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@AnubhavCR7
Copy link

AnubhavCR7 commented Jul 15, 2020

Hi, Can anyone tell me how can I chop off the Fully Connected layer of Inceptionv1 model, by passing an include_top argument in the function defination (like in keras) ? I want to use only the convolutional blocks of the model. And can anyone give me the link to the pre-trained Imagenet weights file suitable for this model ?
Regards.

@swghosh
Copy link

swghosh commented Jul 15, 2020

Hi, Can anyone tell me how can I chop off the Fully Connected layer of Inceptionv1 model, by passing an include_top argument in the function defination (like in keras) ? I want to use only the convolutional blocks of the model. And can anyone give me the link to the pre-trained Imagenet weights file suitable for this model ?
Regards.

Hi @AnubhavCR7,

As this is a custom implementation, the argument include_top=True|False is not provided (at least as of now) unlike other networks that form a part of the keras.applications and comes with a distinctive API.

As a workaround, for using the convolutional base only, I'd recommend using the following code snippet:

googlenet = create_googlenet('googlenet_weights.h5')
base_conv_model = keras.Model(inputs=googlenet.input, outputs=googlenet.get_layer('inception_5b/output').output)

# in case you're looking to train a custom classifier beneath,
# would require bit additional code
base_conv_model.trainable = False
new_classifier = keras.Sequential([
     base_conv_model,
     keras.layers.GlobalAveragePooling2D(),
     keras.layers.Dropout(rate=0.4),
     keras.layers.Dense(new_num_classes, activation='softmax')
])

new_classifier.compile(loss='categorical_cross_entropy', optimizer='sgd', metrics=['accuracy'])
new_classifier.fit(X, y)

Please make sure that images in X (above code) are: (ref: https://gist.github.com/joelouismarino/a2ede9ab3928f999575423b9887abd14#file-googlenet-py-L186-L193)

  • mean normalized using [R: 123.68, G: 116.779, B: 103.939]
  • uses BGR format
  • is in channels first format i.e. NCHW not NHWC!

An easy way to do all of this using a single function call would be:

X = keras.applications.imagenet_utils.preprocess_input(images, mode='caffe')

PS: some parts of the code might not be compatible with latest version of Keras (v2.4, as of writing this comment) a.k.a tf.keras (the new default) and might need some additional changes in code base to use the same.

And, regarding the pre-trained (ImageNet) weights for GoogLeNet, a Google Drive link is provided by author in the very starting readme of this gist.

googlenet.py also contains a demo image classification. To run the demo, you will need to install the pre-trained weights and the class labels. You will also need this test image.

@AnubhavCR7
Copy link

AnubhavCR7 commented Jul 16, 2020

@swghosh Thank you for your response. I was actually thinking of fine-tuning the GoogleNet model to run on my own custom dataset. I am using Tensorflow 1.15.0, Keras 2.3.1 and Python 3.7.7 The modifications that you said like, Mean normalization, using BGR format, channels first format etc will still be required ?

And one more thing, if I want to train the last 2 conv blocks of the base model, I should start training from layer number 102, right ?
Regards.

@swghosh
Copy link

swghosh commented Jul 16, 2020

@swghosh Thank you for your response. I was actually thinking of fine-tuning the GoogleNet model to run on my own custom dataset. I am using Tensorflow 1.15.0, Keras 2.3.1 and Python 3.7.7 The modifications that you said like, Mean normalization, using BGR format, channels first format etc will still be required ?

And one more thing, if I want to train the last 2 conv blocks of the base model, I should start training from layer number 102, right ?
Regards.

The data pre-processing steps (mean normalization, BGR, channels first) are a must and will still be required. Data pre-processing should work right away and prefer using, keras.applications.imagenet_utils.preprocess_input(image, mode='caffe') (also, works with keras.preprocessing.image.ImageDataGenerator, something like gen = ImageDataGenerator(preprocessing_function=lambda x: preprocess_input(x, mode='caffe') should suffice).

Also, since you are using TFv1.x + Keras v2.3.x, I am hoping that you shouldn't run into any errors. (if you do, feel free to ping here with error details).

Lastly, for training the 2 conv blocks (fine-tuning) yes you could choose the layer number from the codebase (PS: I din't check layer number for last 2 conv blocks, please reverify if required) or you may use layer names as an alternative, bit cleaner way to do.

@AnubhavCR7
Copy link

AnubhavCR7 commented Jul 23, 2020

@swghosh Thank you for your response. I was actually thinking of fine-tuning the GoogleNet model to run on my own custom dataset. I am using Tensorflow 1.15.0, Keras 2.3.1 and Python 3.7.7 The modifications that you said like, Mean normalization, using BGR format, channels first format etc will still be required ?
And one more thing, if I want to train the last 2 conv blocks of the base model, I should start training from layer number 102, right ?
Regards.

The data pre-processing steps (mean normalization, BGR, channels first) are a must and will still be required. Data pre-processing should work right away and prefer using, keras.applications.imagenet_utils.preprocess_input(image, mode='caffe') (also, works with keras.preprocessing.image.ImageDataGenerator, something like gen = ImageDataGenerator(preprocessing_function=lambda x: preprocess_input(x, mode='caffe') should suffice).

Also, since you are using TFv1.x + Keras v2.3.x, I am hoping that you shouldn't run into any errors. (if you do, feel free to ping here with error details).

Lastly, for training the 2 conv blocks (fine-tuning) yes you could choose the layer number from the codebase (PS: I din't check layer number for last 2 conv blocks, please reverify if required) or you may use layer names as an alternative, bit cleaner way to do.

@swghosh Thank you. The weights file for GoogleNet will consist of the weights of the top_layer or classification layer also. So if I try to load that weights file into the GoogleNet architecture with it's top_layer chopped off, i.e. only the conv base will be there, that should not create any problem, right ?

@swghosh
Copy link

swghosh commented Jul 23, 2020

@swghosh Thank you. The weights file for GoogleNet will consist of the weights of the top_layer or classification layer also. So if I try to load that weights file into the GoogleNet architecture with it's top_layer chopped off, i.e. only the conv base will be there, that should not create any problem, right ?

It won't work with the top_layer chopped off, as Keras expects that the number of layers in weights file exactly be equal to the number of layers in the architecture where you are trying to load the weights. Please load weights on complete archietcture as given in create_googlenet function, then remove the unnecessary layers using:

base_conv_model = keras.Model(inputs=googlenet.input, outputs=googlenet.get_layer('inception_5b/output').output)

as shared above.

@AnubhavCR7
Copy link

AnubhavCR7 commented Jul 24, 2020

@swghosh
Ok, I see, I am using Tensorflow backend with "channels_last" data format but the code requires "channels_first" data format. Setting the keras backend to "channels_first" at the very beginning of the code should solve this problem, right ?

@swghosh
Copy link

swghosh commented Jul 24, 2020

@swghosh
Ok, I see, I am using Tensorflow backend with "channels_last" data format but the code requires "channels_first" data format. Setting the keras backend to "channels_first" at the very beginning of the code should solve this problem, right ?

Right. (read in comments above)

@AnubhavCR7
Copy link

AnubhavCR7 commented Jul 24, 2020

if keras.backend.image_data_format() == 'channels_last': shape = (224, 224, 3) else: shape = (3, 224,224)

And then in the very first line where the input shape is specified in the model architecture : input = Input(shape = shape)

This snippet should work I guess. Regards.

@3dimaging
Copy link

3dimaging commented Aug 4, 2020

Thank you for your nice work! The code works very well for me.

@mansi-aggarwal-2504
Copy link

mansi-aggarwal-2504 commented Sep 8, 2020

out = model.predict(img) # note: the model has three outputs

What do these three outputs denote? Also, when I retrieve the predicted class using out[2], I get the same predicted class for all inputs.
Can someone help me this?

@swghosh
Copy link

swghosh commented Sep 8, 2020

out = model.predict(img) # note: the model has three outputs

What do these three outputs denote? Also, when I retrieve the predicted class using out[2], I get the same predicted class for all inputs.
Can someone help me this?

@mansi-aggarwal-2504, I'm not sure why out[2] producing same predicted class for all images. Seems like a problem though (if pre-trained weights are loaded correctly).

PS: out[2] is the softmax prediction from the actual classifier (primary classifier of our interest) whereas out[0] and out[1] are outputs of auxiliary classifiers just used to make sure that the model converges faster during training. There is little to no use of the auxiliary classifiers at inference time.

However, if the pre-trained weights are loaded correctly and valid images (containing objects from the 1000 ImageNet classes) are used for inference the problem of getting same predicted class (even from auxiliary outputs) should not occur.

If you think that this seems like a problem with the pre-trained weights / the implementation here, feel free to post a Colab notebook link with minimal code required to replicate the issue. I'll look into it accordingly and try to figure out.

@mansi-aggarwal-2504
Copy link

mansi-aggarwal-2504 commented Sep 8, 2020

@swghosh thank you for your reply. Actually, I am not using the pre-trained model. I am using the architecture to train a model and I have two classes. And I am getting the same predicted class for all inputs.
Also, I will try to provide my code snipped soon. Thank you.

@swghosh
Copy link

swghosh commented Sep 9, 2020

UPDATE (09/09/2020)

TensorFlow Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning.

A clean and easier alternative to the codebase and pre-trained weights provided here would be to make use of InceptionV1 from TensorFlow Hub as that also allows for interoperability with the latest Keras (tf.keras) and TensorFlow 2.x frameworks. Typically, it should serve the same purpose as this gist.

Imagenet Inception V1 on TensorFlow Hub: https://tfhub.dev/google/imagenet/inception_v1/classification/1

@mikechen66
Copy link

mikechen66 commented Sep 12, 2020

After a lot of trial and errors I could setup the LRN layer even with the TensorFlow backend. The major resistance was being offered by the fact that the latest Keras API lacks the LRN layer. Also, keras.backend.* set of functions couldn't help it much but only produce incomplete shape errors.

Setting the batch_input_shape for the old LRN2D class also do not appear to be working, and produces the same error "Cannot convert a partially known TensorShape to a Tensor".

The workaround is actually pretty simple and based on use of keras.layers.Lambda with the tf.nn.local_response_normalization function.

googlenet_custom_layers.py

from keras.layers.core import Layer, Lambda
import tensorflow as tf

# wraps up the tf.nn.local_response_normalisation
# into the keras.layers.Lambda so
# we can have a custom keras layer as
# a class that will perform LRN ops     
class LRN(Lambda):

    def __init__(self, alpha=0.0001, beta=0.75, depth_radius=5, **kwargs):
        # using parameter defaults as per GoogLeNet
        params = {
            "alpha": alpha,
            "beta": beta,
            "depth_radius": depth_radius
        }
        # construct a function for use with Keras Lambda
        lrn_fn = lambda inputs: tf.nn.local_response_normalization(inputs, **params)

        # pass the function to Keras Lambda
        return super().__init__(lrn_fn, **kwargs)

# this layer is also required by GoogLeNet (same as above)
class PoolHelper(Layer):
    
    def __init__(self, **kwargs):
        super(PoolHelper, self).__init__(**kwargs)
    
    def call(self, x, mask=None):
        return x[:,:,1:,1:]
    
    def get_config(self):
        config = {}
        base_config = super(PoolHelper, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Hope this helps!

It generated the same error as the original script with the TensorFlow.

if I change the shape from (3, 224, 224) to (224, 224, 3)
input = Input(shape=(224, 224, 3))

It has the following error.

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 28, 28, 64), (None, 28, 28, 128), (None, 28, 28, 32), (None, 28, 28, 32)]

or

If the shape keeps no changed.
input = Input(shape=(3, 224, 224))

It has the following error.

[ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1, 28, 64), (None, 1, 28, 128), (None, 1, 28, 32), (None, 1, 28, 32)]

@swghosh
Copy link

swghosh commented Sep 12, 2020

@mikechen66

After a lot of trial and errors I could setup the LRN layer even with the TensorFlow backend. The major resistance was being offered by the fact that the latest Keras API lacks the LRN layer. Also, keras.backend.* set of functions couldn't help it much but only produce incomplete shape errors.
Setting the batch_input_shape for the old LRN2D class also do not appear to be working, and produces the same error "Cannot convert a partially known TensorShape to a Tensor".
The workaround is actually pretty simple and based on use of keras.layers.Lambda with the tf.nn.local_response_normalization function.
googlenet_custom_layers.py

from keras.layers.core import Layer, Lambda
import tensorflow as tf

# wraps up the tf.nn.local_response_normalisation
# into the keras.layers.Lambda so
# we can have a custom keras layer as
# a class that will perform LRN ops     
class LRN(Lambda):

    def __init__(self, alpha=0.0001, beta=0.75, depth_radius=5, **kwargs):
        # using parameter defaults as per GoogLeNet
        params = {
            "alpha": alpha,
            "beta": beta,
            "depth_radius": depth_radius
        }
        # construct a function for use with Keras Lambda
        lrn_fn = lambda inputs: tf.nn.local_response_normalization(inputs, **params)

        # pass the function to Keras Lambda
        return super().__init__(lrn_fn, **kwargs)

# this layer is also required by GoogLeNet (same as above)
class PoolHelper(Layer):
    
    def __init__(self, **kwargs):
        super(PoolHelper, self).__init__(**kwargs)
    
    def call(self, x, mask=None):
        return x[:,:,1:,1:]
    
    def get_config(self):
        config = {}
        base_config = super(PoolHelper, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Hope this helps!

It generated the same error as the original script.
ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1, 28, 64), (None, 1, 28, 128), (None, 1, 28, 32), (None, 1, 28, 32)]

https://gist.github.com/joelouismarino/a2ede9ab3928f999575423b9887abd14#gistcomment-3451596

@swghosh
Copy link

swghosh commented Sep 12, 2020

@mikechen66

After a lot of trial and errors I could setup the LRN layer even with the TensorFlow backend. The major resistance was being offered by the fact that the latest Keras API lacks the LRN layer. Also, keras.backend.* set of functions couldn't help it much but only produce incomplete shape errors.
Setting the batch_input_shape for the old LRN2D class also do not appear to be working, and produces the same error "Cannot convert a partially known TensorShape to a Tensor".
The workaround is actually pretty simple and based on use of keras.layers.Lambda with the tf.nn.local_response_normalization function.
googlenet_custom_layers.py

from keras.layers.core import Layer, Lambda
import tensorflow as tf

# wraps up the tf.nn.local_response_normalisation
# into the keras.layers.Lambda so
# we can have a custom keras layer as
# a class that will perform LRN ops     
class LRN(Lambda):

    def __init__(self, alpha=0.0001, beta=0.75, depth_radius=5, **kwargs):
        # using parameter defaults as per GoogLeNet
        params = {
            "alpha": alpha,
            "beta": beta,
            "depth_radius": depth_radius
        }
        # construct a function for use with Keras Lambda
        lrn_fn = lambda inputs: tf.nn.local_response_normalization(inputs, **params)

        # pass the function to Keras Lambda
        return super().__init__(lrn_fn, **kwargs)

# this layer is also required by GoogLeNet (same as above)
class PoolHelper(Layer):
    
    def __init__(self, **kwargs):
        super(PoolHelper, self).__init__(**kwargs)
    
    def call(self, x, mask=None):
        return x[:,:,1:,1:]
    
    def get_config(self):
        config = {}
        base_config = super(PoolHelper, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Hope this helps!

It generated the same error as the original script.
ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1, 28, 64), (None, 1, 28, 128), (None, 1, 28, 32), (None, 1, 28, 32)]

https://gist.github.com/joelouismarino/a2ede9ab3928f999575423b9887abd14#gistcomment-3451596

Prefer to use the LRN layer definition provided in this Gist (instead of my custom code) as it well supports (atleast older versions of) TensorFlow so that you can correctly use the pre-trained weights used over here.

In case your're training your model from scratch, feel free to use a Keras Lambda layer wrapper around tf.nn.local_response_normalization.

Repeated doc: this implementation requires channels_first format whatsover and channels_last won't work.

@3dimaging
Copy link

3dimaging commented Sep 12, 2020

I have tried both "tensorflow" and "theano" backend to train model from scratch. I didn't see any issue.

@mikechen66
Copy link

mikechen66 commented Sep 13, 2020

Thanks for your reply. I use Keras 2.4.3 and TensorFlow 2.3. It might be a reason for the issue of version compatibility. After updating the following lines of code to adapt to the above-mentioned environment, I can run the script with the correct classification. But I get the wrong total parameter number while running googlenet.summary().

1. Modify the script to adapt to TensorFlow 2.4.3 and Keras 2.4

Predicted Class: 282 , Class Name: n02123159 tiger cat

I make the following modification.

Modify the import statements

if keras.backend.backend() == 'tensorflow':
    # -from keras import backend as K
    from keras import backend
    # -import tensorflow as tf
    import tensorflow.compat.v1 as tf
    tf.compat.v1.disable_eager_execution()
    from keras.utils.conv_utils import convert_kernel

Set up the GPU to avoid the runtime error: Could not create cuDNN handle...

gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

Delete the following lines of code highlighted with "# -" and add the new lines of code below the highlighted code.

    if keras.backend.backend() == 'tensorflow':
        # convert the convolutional kernels for tensorflow
        ops = []
        for layer in googlenet.layers:
            if layer.__class__.__name__ == 'Conv2D':
                # -original_w = K.get_value(layer.kernel)
                original_w = keras.backend.get_value(layer.kernel)
                converted_w = convert_kernel(original_w)
                # -ops.append(tf.assign(layer.kernel, converted_w).op)
                ops.append(tf.compat.v1.assign(layer.kernel, converted_w).op)
        # -K.get_session().run(ops)
        tf.compat.v1.keras.backend.get_session().run(ops)

2. Total parameter number(wrong)

I get the total parameter number of 13,378,280. But the original GoogLeNet Inception v1 has 5.79+ million parameters in total. What's wrong with the huge gap of the total parameter numbers?

After adding the three line of code and deleting the sections including the sections from "if weights_path..." to the main section.

input = Input(shape=(3, 224, 224))
googlenet = create_googlenet(input)
googlenet.summary()

Reference:
Google Inception v1 Paper: Page: 5/9
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

Cheers

@mikechen66
Copy link

mikechen66 commented Sep 14, 2020

I

@3dimaging
Copy link

3dimaging commented Sep 14, 2020

Mikechen66 is right! The parameters is way more than reported. I got 10,309,430 parameters. However, the googlenet model from keras repository gives me 5,587,394 parameters, which is close to the report.

@debaditya-unimelb
Copy link

debaditya-unimelb commented Sep 18, 2020

@swghosh Thank you for the code, I really appreciate it. However, when I try to input image size different from 224 x 224 (the new image size is 742 x 224) I get matrix Multiplication errors like the following:

Matrix size-incompatible: In[0]: [50,7168], In[1]: [7680,1024] [[{{node loss1/fcNew/MatMul}}]]

Where loss1/fcNew is a newly added layer with 1024 dimension.

I was able to get off with the error when I deteted all the pool_helper function from the network. Any thoughts why? What exactly is the pool_helper function doing?

@debaditya-unimelb
Copy link

debaditya-unimelb commented Sep 18, 2020

@swghosh Also, I notice that I could perhaps remove all the ZeroPadding2D paddings, and replace the next convolutional layer with padding='same', and the code will still work. Will it be wrong to do so? If not what will be the difference?

@mikechen66
Copy link

mikechen66 commented Oct 5, 2020

I change the whole code to comply with TensorFlow 2.x as follows. But users need to generate the new googlenet weights in the h5 format in order to adapt the above TensorFlow data_format.

1. Change the data_format setting based on TensorFlow 2.x

Change "axis = 1" to "axis = -1" to comply with TensorFlow. It corresponds to "channel_dim = -1 if K.image_data_format() == 'channels_last' ". It is very annoying to change the parameter data_format=='channels_first' in keras.json regularly because most of the applications being developed by TensorFlow. It is the most popular setting in the Machine Learning. '

2. Remove flatten layers

The flatten layers indirectly help to generate huge parameters int the dense layers in both the main and auxiliary classifiers. After removing the expensive flatten layers, I get the correct total size of 9+ million parameters. it is very close to the total size that Google has announced.

3. Change the padding parameter

Change the parameter 'padding=valid' to 'padding='same' in the Inception section to completely comply with the official GoogleNet Paper.

4. Delete the parameter of zero-padding

The function of ZeroPadding2D is not useful in the TF realization as follows.

5. GoogleNet weights

The original googlenet weights could not be used i the following new script since the incompatible data_format. Therefore, users need to generate the new googlenet weights in the h5 format in order to adapt to TensorFlow (width, height, channels).

6. The script is modified as follows.

import imageio
from PIL import Image
import numpy as np
import tensorflow as tf

import keras
from keras.models import Model
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, AveragePooling2D, \
    Dropout, Flatten, Concatenate, Reshape, Activation

from keras.regularizers import l2
from keras.optimizers import SGD
from lrn import LRN 
from keras import backend


# Set up the GPU to avoid the runtime error: Could not create cuDNN handle...
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

# Define the Googlenet class 
class Googlenet(object):

    # Adopt the static method to enbale the elegant realization of the model  
    @staticmethod
    # Build the GoogLeNet Inveption v1
    def build(input_shape, num_classes):

        input = Input(shape=input_shape)
 
        conv1_7x7_s2 = Conv2D(64, kernel_size=(7,7), strides=(2,2), padding='same', activation='relu', name='conv1/7x7_s2', kernel_regularizer=l2(0.0002))(input)
        pool1_3x3_s2 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same', name='pool1/3x3_s2')(conv1_7x7_s2)
        pool1_norm1 = LRN(name='pool1/norm1')( pool1_3x3_s2)
        conv2_3x3_reduce = Conv2D(64, kernel_size=(1,1), padding='valid', activation='relu', name='conv2/3x3_reduce', kernel_regularizer=l2(0.0002))(pool1_norm1)
        conv2_3x3 = Conv2D(192, kernel_size=(3,3), padding='same', activation='relu', name='conv2/3x3', kernel_regularizer=l2(0.0002))(conv2_3x3_reduce)
        conv2_norm2 = LRN(name='conv2/norm2')(conv2_3x3)
        pool2_3x3_s2 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same', name='pool2/3x3_s2')(conv2_norm2)

        inception_3a_1x1 = Conv2D(64, kernel_size=(1,1), padding='same', activation='relu', name='inception_3a/1x1', kernel_regularizer=l2(0.0002))(pool2_3x3_s2)
        inception_3a_3x3_reduce = Conv2D(96, kernel_size=(1,1), padding='same', activation='relu', name='inception_3a/3x3_reduce', kernel_regularizer=l2(0.0002))(pool2_3x3_s2)
        inception_3a_3x3 = Conv2D(128, kernel_size=(3,3), padding='same', activation='relu', name='inception_3a/3x3', kernel_regularizer=l2(0.0002))(inception_3a_3x3_reduce)
        inception_3a_5x5_reduce = Conv2D(16, kernel_size=(1,1), padding='same', activation='relu', name='inception_3a/5x5_reduce', kernel_regularizer=l2(0.0002))(pool2_3x3_s2)
        inception_3a_5x5 = Conv2D(32, kernel_size=(5,5), padding='same', activation='relu', name='inception_3a/5x5', kernel_regularizer=l2(0.0002))(inception_3a_5x5_reduce)
        inception_3a_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_3a/pool')(pool2_3x3_s2)
        inception_3a_pool_proj = Conv2D(32, kernel_size=(1,1), padding='same', activation='relu', name='inception_3a/pool_proj', kernel_regularizer=l2(0.0002))(inception_3a_pool)
        inception_3a_output = Concatenate(axis=-1, name='inception_3a/output')([inception_3a_1x1, inception_3a_3x3, inception_3a_5x5, inception_3a_pool_proj])

        inception_3b_1x1 = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='inception_3b/1x1', kernel_regularizer=l2(0.0002))(inception_3a_output)
        inception_3b_3x3_reduce = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='inception_3b/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_3a_output)
        inception_3b_3x3 = Conv2D(192, kernel_size=(3,3), padding='same', activation='relu', name='inception_3b/3x3', kernel_regularizer=l2(0.0002))(inception_3b_3x3_reduce)
        inception_3b_5x5_reduce = Conv2D(32, kernel_size=(1,1), padding='same', activation='relu', name='inception_3b/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_3a_output)
        inception_3b_5x5 = Conv2D(96, kernel_size=(5,5), padding='same', activation='relu', name='inception_3b/5x5', kernel_regularizer=l2(0.0002))(inception_3b_5x5_reduce)
        inception_3b_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_3b/pool')(inception_3a_output)
        inception_3b_pool_proj = Conv2D(64, kernel_size=(1,1), padding='same', activation='relu', name='inception_3b/pool_proj', kernel_regularizer=l2(0.0002))(inception_3b_pool)
        inception_3b_output = Concatenate(axis=-1, name='inception_3b/output')([inception_3b_1x1, inception_3b_3x3, inception_3b_5x5, inception_3b_pool_proj])

        inception_4a_1x1 = Conv2D(192, kernel_size=(1,1), padding='same', activation='relu', name='inception_4a/1x1', kernel_regularizer=l2(0.0002))(inception_3b_output)
        inception_4a_3x3_reduce = Conv2D(96, kernel_size=(1,1), padding='same', activation='relu', name='inception_4a/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_3b_output)
        inception_4a_3x3 = Conv2D(208,kernel_size=(3,3), padding='same', activation='relu', name='inception_4a/3x3' ,kernel_regularizer=l2(0.0002))(inception_4a_3x3_reduce)
        inception_4a_5x5_reduce = Conv2D(16, kernel_size=(1,1), padding='same', activation='relu', name='inception_4a/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_3b_output)
        inception_4a_5x5 = Conv2D(48, kernel_size=(5,5), padding='same', activation='relu', name='inception_4a/5x5', kernel_regularizer=l2(0.0002))(inception_4a_5x5_reduce)
        inception_4a_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4a/pool')(inception_3b_output)
        inception_4a_pool_proj = Conv2D(64, kernel_size=(1,1), padding='same', activation='relu', name='inception_4a/pool_proj', kernel_regularizer=l2(0.0002))(inception_4a_pool)
        inception_4a_output = Concatenate(axis=-1, name='inception_4a/output')([inception_4a_1x1, inception_4a_3x3, inception_4a_5x5, inception_4a_pool_proj])

        loss1_ave_pool = AveragePooling2D(pool_size=(5,5), strides=(3,3), name='loss1/ave_pool')(inception_4a_output)
        loss1_conv = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='loss1/conv', kernel_regularizer=l2(0.0002))(loss1_ave_pool)
        loss1_fc = Dense(1024, activation='relu', name='loss1/fc', kernel_regularizer=l2(0.0002))(loss1_conv)
        loss1_drop_fc = Dropout(rate=0.7)(loss1_fc)
        loss1_classifier = Dense(num_classes, name='loss1/classifier', kernel_regularizer=l2(0.0002))(loss1_drop_fc)
        loss1_classifier_act = Activation('softmax')(loss1_classifier)

        inception_4b_1x1 = Conv2D(160, kernel_size=(1,1), padding='same', activation='relu', name='inception_4b/1x1', kernel_regularizer=l2(0.0002))(inception_4a_output)
        inception_4b_3x3_reduce = Conv2D(112, kernel_size=(1,1), padding='same', activation='relu', name='inception_4b/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4a_output)
        inception_4b_3x3 = Conv2D(224, kernel_size=(3,3), padding='same', activation='relu', name='inception_4b/3x3', kernel_regularizer=l2(0.0002))(inception_4b_3x3_reduce)
        inception_4b_5x5_reduce = Conv2D(24, kernel_size=(1,1), padding='same', activation='relu', name='inception_4b/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4a_output)
        inception_4b_5x5 = Conv2D(64, kernel_size=(5,5), padding='same', activation='relu', name='inception_4b/5x5', kernel_regularizer=l2(0.0002))(inception_4b_5x5_reduce)
        inception_4b_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4b/pool')(inception_4a_output)
        inception_4b_pool_proj = Conv2D(64, kernel_size=(1,1), padding='same', activation='relu', name='inception_4b/pool_proj', kernel_regularizer=l2(0.0002))(inception_4b_pool)
        inception_4b_output = Concatenate(axis=-1, name='inception_4b/output')([inception_4b_1x1, inception_4b_3x3, inception_4b_5x5, inception_4b_pool_proj])
        
        inception_4c_1x1 = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='inception_4c/1x1', kernel_regularizer=l2(0.0002))(inception_4b_output)
        inception_4c_3x3_reduce = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='inception_4c/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4b_output)
        inception_4c_3x3 = Conv2D(256, kernel_size=(3,3), padding='same', activation='relu', name='inception_4c/3x3', kernel_regularizer=l2(0.0002))(inception_4c_3x3_reduce)
        inception_4c_5x5_reduce = Conv2D(24, kernel_size=(1,1), padding='same', activation='relu', name='inception_4c/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4b_output)
        inception_4c_5x5 = Conv2D(64, kernel_size=(5,5), padding='same', activation='relu', name='inception_4c/5x5', kernel_regularizer=l2(0.0002))(inception_4c_5x5_reduce)
        inception_4c_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4c/pool')(inception_4b_output)
        inception_4c_pool_proj = Conv2D(64, kernel_size=(1,1), padding='same', activation='relu', name='inception_4c/pool_proj', kernel_regularizer=l2(0.0002))(inception_4c_pool)
        inception_4c_output = Concatenate(axis=-1, name='inception_4c/output')([inception_4c_1x1, inception_4c_3x3, inception_4c_5x5, inception_4c_pool_proj])

        inception_4d_1x1 = Conv2D(112, kernel_size=(1,1), padding='same', activation='relu', name='inception_4d/1x1', kernel_regularizer=l2(0.0002))(inception_4c_output)
        inception_4d_3x3_reduce = Conv2D(144, kernel_size=(1,1), padding='same', activation='relu', name='inception_4d/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4c_output)
        inception_4d_3x3 = Conv2D(288, kernel_size=(3,3), padding='same', activation='relu', name='inception_4d/3x3', kernel_regularizer=l2(0.0002))(inception_4d_3x3_reduce)
        inception_4d_5x5_reduce = Conv2D(32, kernel_size=(1,1), padding='same', activation='relu', name='inception_4d/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4c_output)
        inception_4d_5x5 = Conv2D(64, kernel_size=(5,5), padding='same', activation='relu', name='inception_4d/5x5', kernel_regularizer=l2(0.0002))(inception_4d_5x5_reduce)
        inception_4d_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4d/pool')(inception_4c_output)
        inception_4d_pool_proj = Conv2D(64, kernel_size=(1,1), padding='same', activation='relu', name='inception_4d/pool_proj', kernel_regularizer=l2(0.0002))(inception_4d_pool)
        inception_4d_output = Concatenate(axis=-1, name='inception_4d/output')([inception_4d_1x1, inception_4d_3x3, inception_4d_5x5, inception_4d_pool_proj])
    
        loss2_ave_pool = AveragePooling2D(pool_size=(5,5), strides=(3,3), name='loss2/ave_pool')(inception_4d_output)
        loss2_conv = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='loss2/conv', kernel_regularizer=l2(0.0002))(loss2_ave_pool)
        loss2_fc = Dense(1024, activation='relu', name='loss2/fc', kernel_regularizer=l2(0.0002))(loss2_conv)
        loss2_drop_fc = Dropout(rate=0.7)(loss2_fc)
        loss2_classifier = Dense(num_classes, name='loss2/classifier', kernel_regularizer=l2(0.0002))(loss2_drop_fc)
        loss2_classifier_act = Activation('softmax')(loss2_classifier)


        inception_4e_1x1 = Conv2D(256, kernel_size=(1,1), padding='same', activation='relu', name='inception_4e/1x1', kernel_regularizer=l2(0.0002))(inception_4d_output)
        inception_4e_3x3_reduce = Conv2D(160, kernel_size=(1,1), padding='same', activation='relu', name='inception_4e/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4d_output)
        inception_4e_3x3 = Conv2D(320, kernel_size=(3,3), padding='same', activation='relu', name='inception_4e/3x3', kernel_regularizer=l2(0.0002))(inception_4e_3x3_reduce)
        inception_4e_5x5_reduce = Conv2D(32, kernel_size=(1,1), padding='same', activation='relu', name='inception_4e/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4d_output)
        inception_4e_5x5 = Conv2D(128, kernel_size=(5,5), padding='same', activation='relu', name='inception_4e/5x5', kernel_regularizer=l2(0.0002))(inception_4e_5x5_reduce)
        inception_4e_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_4e/pool')(inception_4d_output)
        inception_4e_pool_proj = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='inception_4e/pool_proj', kernel_regularizer=l2(0.0002))(inception_4e_pool)
        inception_4e_output = Concatenate(axis=-1, name='inception_4e/output')([inception_4e_1x1, inception_4e_3x3, inception_4e_5x5, inception_4e_pool_proj])


        inception_5a_1x1 = Conv2D(256, kernel_size=(1,1), padding='same', activation='relu', name='inception_5a/1x1', kernel_regularizer=l2(0.0002))(inception_4e_output)
        inception_5a_3x3_reduce = Conv2D(160, kernel_size=(1,1), padding='same', activation='relu', name='inception_5a/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_4e_output)
        inception_5a_3x3 = Conv2D(320, kernel_size=(3,3), padding='same', activation='relu', name='inception_5a/3x3', kernel_regularizer=l2(0.0002))(inception_5a_3x3_reduce)
        inception_5a_5x5_reduce = Conv2D(32, kernel_size=(1,1), padding='same', activation='relu', name='inception_5a/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_4e_output)
        inception_5a_5x5 = Conv2D(128, kernel_size=(5,5), padding='same', activation='relu', name='inception_5a/5x5', kernel_regularizer=l2(0.0002))(inception_5a_5x5_reduce)
        inception_5a_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_5a/pool')(inception_4e_output)
        inception_5a_pool_proj = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='inception_5a/pool_proj', kernel_regularizer=l2(0.0002))(inception_5a_pool)
        inception_5a_output = Concatenate(axis=-1, name='inception_5a/output')([inception_5a_1x1, inception_5a_3x3, inception_5a_5x5, inception_5a_pool_proj])


        inception_5b_1x1 = Conv2D(384, kernel_size=(1,1), padding='same', activation='relu', name='inception_5b/1x1', kernel_regularizer=l2(0.0002))(inception_5a_output)
        inception_5b_3x3_reduce = Conv2D(192, kernel_size=(1,1), padding='same', activation='relu', name='inception_5b/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_5a_output)
        inception_5b_3x3 = Conv2D(384, kernel_size=(3,3), padding='same', activation='relu', name='inception_5b/3x3', kernel_regularizer=l2(0.0002))(inception_5b_3x3_reduce)
        inception_5b_5x5_reduce = Conv2D(48, kernel_size=(1,1), padding='same', activation='relu', name='inception_5b/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_5a_output)
        inception_5b_5x5 = Conv2D(128, kernel_size=(5,5), padding='same', activation='relu', name='inception_5b/5x5', kernel_regularizer=l2(0.0002))(inception_5b_5x5_reduce)
        inception_5b_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_5b/pool')(inception_5a_output)
        inception_5b_pool_proj = Conv2D(128, kernel_size=(1,1), padding='same', activation='relu', name='inception_5b/pool_proj', kernel_regularizer=l2(0.0002))(inception_5b_pool)
        inception_5b_output = Concatenate(axis=-1, name='inception_5b/output')([inception_5b_1x1, inception_5b_3x3, inception_5b_5x5, inception_5b_pool_proj])


        pool5_7x7_s1 = AveragePooling2D(pool_size=(7,7), strides=(1,1), name='pool5/7x7_s2')(inception_5b_output)
        pool5_drop_7x7_s1 = Dropout(rate=0.4)(pool5_7x7_s1)
        loss3_classifier = Dense(num_classes, name='loss3/classifier', kernel_regularizer=l2(0.0002))(pool5_drop_7x7_s1)
        loss3_classifier_act = Activation('softmax', name='prob')(loss3_classifier)

        inception_v1 = Model(inputs=input, outputs=[loss1_classifier_act, loss2_classifier_act, loss3_classifier_act])

        return inception_v1


if __name__ == "__main__":

    input_shape = (224, 224, 3)
    num_classes = 1000

    inception_v1 = Googlenet.build(input_shape, num_classes)

    inception_v1.summary()

@mikechen66
Copy link

mikechen66 commented Oct 5, 2020

Even though the plain model of Inception v1 has the detailed description of the layers, I prefer to the simplified model with the total size of 6+ million parameters(removing auxiliary classifiers) as follows. Since swghosh provided the googlenet_custom_layers.py, I has changed its name to lrn.py as a library.

import tensorflow as tf 
from tensorflow.keras.layers import Input, Conv2D, Dense, Dropout, MaxPooling2D, AveragePooling2D
from tensorflow.keras.layers import concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.regularizers import l2
from lrn import LRN 

# Set up the GPU to avoid the runtime error: Could not create cuDNN handle...
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

def googlenet(input_shape, num_classes):

    input = Input(shape=input_shape)

    conv1_7x7 = Conv2D(filters=64, kernel_size=(7,7), strides=(2,2), padding='same', activation='relu', 
                       kernel_regularizer=l2(0.01))(input)
    maxpool1_3x3 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same')(conv1_7x7)
    pool1_norm1 = LRN()(maxpool1_3x3)
    conv2_3x3_reduce = Conv2D(filters=64, kernel_size=(1,1),  strides=(1,1), padding='valid', activation='relu', 
                       kernel_regularizer=l2(0.01))(pool1_norm1)
    conv2_3x3 = Conv2D(filters=192, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu', 
                       kernel_regularizer=l2(0.01))(conv2_3x3_reduce)
    conv2_norm2 = LRN()(conv2_3x3)
    maxpool2_3x3 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same')(conv2_norm2)

    inception_3a = inception(input=maxpool2_3x3, axis=3, params=[(64,),(96,128),(16,32),(32,)])
    inception_3b = inception(input=inception_3a, axis=3, params=[(128,),(128,192),(32,96),(64,)])
    maxpool3_3x3 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same')(inception_3b)

    inception_4a = inception(input=maxpool3_3x3, axis=3, params=[(192,),(96,208),(16,48),(64,)])
    inception_4b = inception(input=inception_4a, axis=3, params=[(160,),(112,224),(24,64),(64,)])
    inception_4c = inception(input=inception_4b, axis=3, params=[(128,),(128,256),(24,64),(64,)])
    inception_4d = inception(input=inception_4c, axis=3, params=[(112,),(144,288),(32,64),(64,)])
    inception_4e = inception(input=inception_4d, axis=3, params=[(256,),(160,320),(32,128),(128,)])
    maxpool4_3x3 = MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same')(inception_4e)

    inception_5a = inception(input=maxpool4_3x3, axis=3, params=[(256,),(160,320),(32,128),(128,)])
    inception_5b = inception(input=inception_5a, axis=3, params=[(384,),(192,384),(48,128),(128,)]) 
    avgpool1_7x7 = AveragePooling2D(pool_size=(7,7), strides=(7,7), padding='same')(inception_5b)

    drop = Dropout(rate=0.4)(avgpool1_7x7)
    linear = Dense(num_classes, activation='softmax', kernel_regularizer=l2(0.01))(drop)
    
    model = Model(inputs=input, outputs=linear)

    return model 

def inception(input, axis, params):

    # Bind the vertical cells tegother for an elegant realization 
    [branch1, branch2, branch3, branch4] = params

    conv_11 = Conv2D(filters=branch1[0], kernel_size=(1,1), padding='same', activation='relu', 
                     kernel_regularizer=l2(0.01))(input)

    conv_12 = Conv2D(filters=branch2[0], kernel_size=(1,1), padding='same', activation='relu', 
                     kernel_regularizer=l2(0.01))(input)
    conv_22 = Conv2D(filters=branch2[1], kernel_size=(3,3), padding='same', activation='relu', 
                     kernel_regularizer=l2(0.01))(conv_12)

    conv_13 = Conv2D(filters=branch3[0], kernel_size=(1,1), padding='same', activation='relu', 
                     kernel_regularizer=l2(0.01))(input)
    conv_23 = Conv2D(filters=branch3[1], kernel_size=(5,5), padding='same', activation='relu', 
                     kernel_regularizer=l2(0.01))(conv_13)

    maxpool_14 = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same')(input)
    maxpool_proj_24 = Conv2D(filters=branch4[0], kernel_size=(1,1), strides=(1,1), padding='same', 
                             activation='relu', kernel_regularizer=l2(0.01))(maxpool_14)

    inception_output = concatenate([conv_11, conv_22, conv_23, maxpool_proj_24], axis=3)  

    return inception_output

if __name__ == "__main__":

    input_shape = (224, 224, 3)
    num_classes = 1000

    # Assign the values 
    model = googlenet(input_shape, num_classes)

    model.summary()

Cheers!

@babloogpb1
Copy link

babloogpb1 commented Oct 29, 2020

Thanks for your reply. I use Keras 2.4.3 and TensorFlow 2.3. It might be a reason for the issue of version compatibility. After updating the following lines of code to adapt to the above-mentioned environment, I can run the script with the correct classification. But I get the wrong total parameter number while running googlenet.summary().

1. Modify the script to adapt to TensorFlow 2.4.3 and Keras 2.4

Predicted Class: 282 , Class Name: n02123159 tiger cat

I make the following modification.

Modify the import statements

if keras.backend.backend() == 'tensorflow':
    # -from keras import backend as K
    from keras import backend
    # -import tensorflow as tf
    import tensorflow.compat.v1 as tf
    tf.compat.v1.disable_eager_execution()
    from keras.utils.conv_utils import convert_kernel

Set up the GPU to avoid the runtime error: Could not create cuDNN handle...

gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

Delete the following lines of code highlighted with "# -" and add the new lines of code below the highlighted code.

    if keras.backend.backend() == 'tensorflow':
        # convert the convolutional kernels for tensorflow
        ops = []
        for layer in googlenet.layers:
            if layer.__class__.__name__ == 'Conv2D':
                # -original_w = K.get_value(layer.kernel)
                original_w = keras.backend.get_value(layer.kernel)
                converted_w = convert_kernel(original_w)
                # -ops.append(tf.assign(layer.kernel, converted_w).op)
                ops.append(tf.compat.v1.assign(layer.kernel, converted_w).op)
        # -K.get_session().run(ops)
        tf.compat.v1.keras.backend.get_session().run(ops)

2. Total parameter number(wrong)

I get the total parameter number of 13,378,280. But the original GoogLeNet Inception v1 has 5.79+ million parameters in total. What's wrong with the huge gap of the total parameter numbers?

After adding the three line of code and deleting the sections including the sections from "if weights_path..." to the main section.

input = Input(shape=(3, 224, 224))
googlenet = create_googlenet(input)
googlenet.summary()

Reference:
Google Inception v1 Paper: Page: 5/9
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

Cheers

Thank you for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment