Be able to use the multi-gpu on Keras 2.2.4
Mask R-CNN
Multi-GPU Support for Keras.
Copyright (c) 2017 Matterport, Inc.
Licensed under the MIT License (see LICENSE for details)
Written by Waleed Abdulla
Ideas and a small code snippets from these sources:
import tensorflow as tf
import keras.backend as K
import keras.layers as KL
import keras.models as KM
class ParallelModel(KM.Model):
"""Subclasses the standard Keras Model and adds multi-GPU support.
It works by creating a copy of the model on each GPU. Then it slices
the inputs and sends a slice to each copy of the model, and then
merges the outputs together and applies the loss on the combined
def __init__(self, keras_model, gpu_count):
"""Class constructor.
keras_model: The Keras model to parallelize
gpu_count: Number of GPUs. Must be > 1
self.inner_model = keras_model
self.gpu_count = gpu_count
merged_outputs = self.make_parallel()
super(ParallelModel, self).__init__(inputs=self.inner_model.inputs,
def __getattribute__(self, attrname):
"""Redirect loading and saving methods to the inner model. That's where
the weights are stored."""
if 'load' in attrname or 'save' in attrname:
return getattr(self.inner_model, attrname)
return super(ParallelModel, self).__getattribute__(attrname)
def summary(self, *args, **kwargs):
"""Override summary() to display summaries of both, the wrapper
and inner models."""
super(ParallelModel, self).summary(*args, **kwargs)
self.inner_model.summary(*args, **kwargs)
def make_parallel(self):
"""Creates a new wrapper model that consists of multiple replicas of
the original model placed on different GPUs.
# Slice inputs. Slice inputs on the CPU to avoid sending a copy
# of the full inputs to all GPUs. Saves on bandwidth and memory.
input_slices = {name: tf.split(x, self.gpu_count)
for name, x in zip(self.inner_model.input_names,
output_names = self.inner_model.output_names
outputs_all = []
for i in range(len(self.inner_model.outputs)):
# Run the model call() on each GPU to place the ops there
for i in range(self.gpu_count):
with tf.device('/gpu:%d' % i):
with tf.name_scope('tower_%d' % i):
# Run a slice of inputs through this replica
zipped_inputs = zip(self.inner_model.input_names,
inputs = [
KL.Lambda(lambda s: input_slices[name][i],
output_shape=lambda s: (None,) + s[1:])(tensor)
for name, tensor in zipped_inputs]
# Create the model replica and get the outputs
outputs = self.inner_model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
# Save the outputs for merging back together later
for l, o in enumerate(outputs):
# Merge outputs on CPU
with tf.device('/cpu:0'):
merged = []
for outputs, name in zip(outputs_all, output_names):
# Concatenate or average outputs?
# Outputs usually have a batch dimension and we concatenate
# across it. If they don't, then the output is likely a loss
# or a metric value that gets averaged across the batch.
# Keras expects losses and metrics to be scalars.
if K.int_shape(outputs[0]) == ():
# Average
m = KL.Lambda(lambda o: tf.add_n(o) / len(outputs), name=name)(outputs)
# Concatenate
m = KL.Concatenate(axis=0, name=name)(outputs)
return merged
if __name__ == "__main__":
# Testing code below. It creates a simple model to train on MNIST and
# tries to run it on 2 GPUs. It saves the graph so it can be viewed
# in TensorBoard. Run it as:
# python3
import os
import numpy as np
import keras.optimizers
from keras.datasets import mnist
from keras.preprocessing.image import ImageDataGenerator
# Root directory of the project
ROOT_DIR = os.path.abspath("../")
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
def build_model(x_train, num_classes):
# Reset default graph. Keras leaves old ops in the graph,
# which are ignored for execution but clutter graph
# visualization in TensorBoard.
inputs = KL.Input(shape=x_train.shape[1:], name="input_image")
x = KL.Conv2D(32, (3, 3), activation='relu', padding="same",
x = KL.Conv2D(64, (3, 3), activation='relu', padding="same",
x = KL.MaxPooling2D(pool_size=(2, 2), name="pool1")(x)
x = KL.Flatten(name="flat1")(x)
x = KL.Dense(128, activation='relu', name="dense1")(x)
x = KL.Dense(num_classes, activation='softmax', name="dense2")(x)
return KM.Model(inputs, x, "digit_classifier_model")
# Load MNIST Data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = np.expand_dims(x_train, -1).astype('float32') / 255
x_test = np.expand_dims(x_test, -1).astype('float32') / 255
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
# Build data generator and model
datagen = ImageDataGenerator()
model = build_model(x_train, 10)
# Add multi-GPU support.
model = ParallelModel(model, GPU_COUNT)
optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, clipnorm=5.0)
optimizer=optimizer, metrics=['accuracy'])
# Train
datagen.flow(x_train, y_train, batch_size=64),
steps_per_epoch=50, epochs=10, verbose=1,
validation_data=(x_test, y_test),
Hello, I've implemented this function to use multi-gpu with Keras 2.2.4 But there is still a trouble when i tried to train the model. I got an error:

MaybeEncodingError: Error sending result: '([array([[[[-123.7, -116.8, -103.9],

I don't really know how to fix this. Have you an idea? Thks

Hey Beno,

Sorry, I haven't seen that error before and am not sure where to look to trouble shoot that one.

What is the main change you bring to the parallel_model compared to the one we have in MaskRCNN implementation?
Has that code work for you with Keras 2.2.4?
If so, what are the specs in your and
GPU >1 ? workers>1, use_multiprocessing =True?
Thank you

What is the main change you bring to the parallel_model compared to the one we have in MaskRCNN implementation?
Has that code work for you with Keras 2.2.4?
If so, what are the specs in your and
GPU >1 ? workers>1, use_multiprocessing =True?
Thank you

The end of the error is Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'

Also add in def init, super(ParallelModel, self).init()

This seems to be odd to me. I'm using python 3.5.2.

From memory, I am using Python 3.6
When I am back in the office tomorrow I will do a dump out of conda and pip for the versions of the packages that I am using.

Ok cool.

I've upgraded python to Python 3.6, but same errors. I'm wondering if it came from or config. py.
You confirm the code work with Keras 2.2.4
Anyway, thank you for your help.

Great it's working now. I have upgraded Python to Python 3.6 with Keras 2.2.4 and tensorflow and tensorflow-gpu packages. I have also upgraded the size of the cluster i'm using (it was the key element).
Thank you for your time

No worries,

But for anyone else that may come across this issue in the future, I was using Python 3.6.8 and this is the Conda and Pip dump:


zcunyi commented Apr 19, 2021

Hello, I've implemented this function to use multi-gpu with Keras 2.2.4 But there is still a trouble when i tried to train the model. I got an error:

AttributeError: 'Model' object has no attribute 'input_names'

I don't really know how to fix this. Have you an idea? Thks

