Skip to content

Instantly share code, notes, and snippets.

@fchollet
Last active November 28, 2023 07:12
Show Gist options
  • Save fchollet/0830affa1f7f19fd47b06d4cf89ed44d to your computer and use it in GitHub Desktop.
Save fchollet/0830affa1f7f19fd47b06d4cf89ed44d to your computer and use it in GitHub Desktop.
Updated to the Keras 2.0 API.
'''This script goes along the blog post
"Building powerful image classification models using very little data"
from blog.keras.io.
It uses data that can be downloaded at:
https://www.kaggle.com/c/dogs-vs-cats/data
In our setup, we:
- created a data/ folder
- created train/ and validation/ subfolders inside data/
- created cats/ and dogs/ subfolders inside train/ and validation/
- put the cat pictures index 0-999 in data/train/cats
- put the cat pictures index 1000-1400 in data/validation/cats
- put the dogs pictures index 12500-13499 in data/train/dogs
- put the dog pictures index 13500-13900 in data/validation/dogs
So that we have 1000 training examples for each class, and 400 validation examples for each class.
In summary, this is our directory structure:
```
data/
train/
dogs/
dog001.jpg
dog002.jpg
...
cats/
cat001.jpg
cat002.jpg
...
validation/
dogs/
dog001.jpg
dog002.jpg
...
cats/
cat001.jpg
cat002.jpg
...
```
'''
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
# dimensions of our images.
img_width, img_height = 150, 150
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size)
model.save_weights('first_try.h5')
@ConnorKevin
Copy link

hello!

I wonder if the file type must be .h5 in the sentence "model.save_weights('first_try.h5')" ?
and I have the same question like her answer @MasterWas

Thanks !!

@jitendersaini
Copy link

Hi, I generated the confusion matrix on prediction result and that is [[2500 0] [2500 0] ]. I think it's not up to the mark. I'm training dogs and cats , training images are 20000 and validation images are 5000.
Thanks

@ypirkani
Copy link

ypirkani commented Jun 3, 2020

Hi, I generated the confusion matrix on prediction result and that is [[2500 0] [2500 0] ]. I think it's not up to the mark. I'm training dogs and cats , training images are 20000 and validation images are 5000.
Thanks

Can you tell me how you generated the confusion matrix?

@alif2499
Copy link

alif2499 commented Jul 2, 2020

@austinchencym
Hi
Actually I got not more than 70% when I increased number of dataset. However, It looks not stable.
Any new with you? How can you use one-hot encoding based on the example?

Hello.did you implement the one hot encoding in your code?if so then will you please help me out and is it a must to use one hot encoding in multi class image classification??

Thank you

@alif2499
Copy link

alif2499 commented Jul 2, 2020

I wonder, don't we need the 'label' ? There is no such vector "y_train" being used in the code?

I have this same question.can anyone please clarify?

Thanks

@xxwtiancai
Copy link

Accuracy not rising over 0.5000

I met the same problem! Did you solve it.

@Tech-49
Copy link

Tech-49 commented Dec 29, 2020

Hi, I am doing Python and classification both the first time. Can anyone tell me is it necessary to keep the index as part of the image name or as long as I have a unique image name it will work?

@rvencu
Copy link

rvencu commented Apr 26, 2021

I wonder, don't we need the 'label' ? There is no such vector "y_train" being used in the code?

I have this same question.can anyone please clarify?

Thanks

This line of code takes care of the label (i.e. the name of the subdirectory is the label)

train_datagen.flow_from_directory

@yousrakateb
Copy link

how can i test this code on new images ?
thank you

@diouck
Copy link

diouck commented Jun 2, 2021

Hello excellent tutorial, While trying the code I came across this error. Do you have an idea. It's the same code.
Thank you

Found 21 images belonging to 2 classes. /home/abdou/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:1905: UserWarning: Model.predict_generatoris deprecated and will be removed in a future version. Please useModel.predict, which supports generators. warnings.warn('Model.predict_generatoris deprecated and ' WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at leaststeps_per_epoch * epochs` batches (in this case, 125.0 batches). You may need to use the repeat() function when building your dataset.

TypeError Traceback (most recent call last)
in ()
24
25
---> 26 save_bottlebeck_features()
27 train_top_model()

in save_bottlebeck_features()
26 generator, nb_train_samples / batch_size)
27 np.save(open('bottleneck_features_train.npy', 'w'),
---> 28 bottleneck_features_train)
29
30 generator = datagen.flow_from_directory(

<array_function internals> in save(*args, **kwargs)

/home/abdou/.local/lib/python3.6/site-packages/numpy/lib/npyio.py in save(file, arr, allow_pickle, fix_imports)
527 arr = np.asanyarray(arr)
528 format.write_array(fid, arr, allow_pickle=allow_pickle,
--> 529 pickle_kwargs=dict(fix_imports=fix_imports))
530
531

/home/abdou/.local/lib/python3.6/site-packages/numpy/lib/format.py in write_array(fp, array, version, allow_pickle, pickle_kwargs)
646 """
647 _check_version(version)
--> 648 _write_array_header(fp, header_data_from_array_1_0(array), version)
649
650 if array.itemsize == 0:

/home/abdou/.local/lib/python3.6/site-packages/numpy/lib/format.py in _write_array_header(fp, d, version)
426 else:
427 header = _wrap_header(header, version)
--> 428 fp.write(header)
429
430 def write_array_header_1_0(fp, d):

TypeError: write() argument must be str, not bytes`

@fellipeassuncao
Copy link

@diouck I think that your dataset is very small to use this CNN. Try to classify with more samples and update your packages!

@gulf1324
Copy link

data/
    train/
        dogs/
            dog001.jpg
            dog002.jpg
            ...
        cats/
            cat001.jpg
            cat002.jpg
            ...
    validation/
        dogs/
            dog001.jpg
            dog002.jpg
            ...
        cats/
            cat001.jpg
            cat002.jpg
            ...

'''
Showing the directory structure as image is so helpful! including simple examples! tysm this helped me a lot!!

@kimtiago
Copy link

Damn the Getúlio Vargas Foundation for demanding this example in a high-level public tender

https://conhecimento.fgv.br/sites/default/files/concursos/auditor-fiscal-da-receita-estadual-tecnologia-da-informacaocns003-tipo-1.pdf

page 7

we will never forget it...

@CESI2Jaafar
Copy link

I got this error, please give a bit detail how to solve this problem:
Found 0 images belonging to 0 classes.
Found 0 images belonging to 0 classes.
:70: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.
model.fit_generator(

ValueError Traceback (most recent call last)
in <cell line: 70>()
68 class_mode='binary')
69
---> 70 model.fit_generator(
71 train_generator,
72 steps_per_epoch=nb_train_samples // batch_size,

2 frames
/usr/local/lib/python3.9/dist-packages/keras/preprocessing/image.py in getitem(self, idx)
101 def getitem(self, idx):
102 if idx >= len(self):
--> 103 raise ValueError(
104 "Asked to retrieve element {idx}, "
105 "but the Sequence "

ValueError: Asked to retrieve element 0, but the Sequence has length 0

@CESI2Jaafar
Copy link

RuntimeError Traceback (most recent call last)
in <cell line: 54>()
52 model = Sequential()
53
---> 54 model.fit(
55 train_generator,
56 steps_per_epoch=2000,

1 frames
/usr/local/lib/python3.9/dist-packages/keras/engine/training.py in _assert_compile_was_called(self)
3683 # (i.e. whether the model is built and its inputs/outputs are set).
3684 if not self._is_compiled:
-> 3685 raise RuntimeError(
3686 "You must compile your model before "
3687 "training/testing. "

RuntimeError: You must compile your model before training/testing. Use model.compile(optimizer, loss).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment