Skip to content

Instantly share code, notes, and snippets.

@jamesmishra
Created October 16, 2020 00:17
Show Gist options
  • Save jamesmishra/34bac09176bc07b1f0c33886e4b19dc7 to your computer and use it in GitHub Desktop.
Save jamesmishra/34bac09176bc07b1f0c33886e4b19dc7 to your computer and use it in GitHub Desktop.
Calculating Keras model memory usage
def keras_model_memory_usage_in_bytes(model, *, batch_size: int):
"""
Return the estimated memory usage of a given Keras model in bytes.
This includes the model weights and layers, but excludes the dataset.
The model shapes are multipled by the batch size, but the weights are not.
Args:
model: A Keras model.
batch_size: The batch size you intend to run the model with. If you
have already specified the batch size in the model itself, then
pass `1` as the argument here.
Returns:
An estimate of the Keras model's memory usage in bytes.
"""
default_dtype = tf.keras.backend.floatx()
shapes_mem_count = 0
internal_model_mem_count = 0
for layer in model.layers:
if isinstance(layer, tf.keras.Model):
internal_model_mem_count += keras_model_memory_usage_in_bytes(
layer, batch_size=batch_size
)
single_layer_mem = tf.as_dtype(layer.dtype or default_dtype).size
out_shape = layer.output_shape
if isinstance(out_shape, list):
out_shape = out_shape[0]
for s in out_shape:
if s is None:
continue
single_layer_mem *= s
shapes_mem_count += single_layer_mem
trainable_count = sum(
[tf.keras.backend.count_params(p) for p in model.trainable_weights]
)
non_trainable_count = sum(
[tf.keras.backend.count_params(p) for p in model.non_trainable_weights]
)
total_memory = (
batch_size * shapes_mem_count
+ internal_model_mem_count
+ trainable_count
+ non_trainable_count
)
return total_memory
@jamesmishra
Copy link
Author

[...] I dont think I ever found a simple way to manually compute the output shape for all layers (it has been a long time since I looked at this, so I may be wrong on this point).

@Bidski, I've come to the same conclusion since I last replied to you.

In general, any Keras layer can create an arbitrary amount of tensors in the layer's __init__(), build(), and call() methods, These tensors will not appear in the layer's output shape, so my keras_model_memory_usage_in_bytes() will continually underestimate a model's actual memory usage.

However, I still find an underestimate to be useful. When I am automatically generating models during a hyperparameter search, I can skip over models that are 100% guaranteed to be too large for my GPUs.

@Bidski
Copy link

Bidski commented Oct 6, 2021

However, I still find an underestimate to be useful. When I am automatically generating models during a hyperparameter search, I can skip over models that are 100% guaranteed to be too large for my GPUs.

Unfortunately, an underestimate doesn't meet the usage that I had in mind as I was interested in knowing whether a particular model would fit into a particular GPU (and what batch size would result in "optimal" memory usage). Since we have no real idea as to how much we are underestimating by it is impossible to answer this question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment