Skip to content

Instantly share code, notes, and snippets.

@jamesmishra
Created October 16, 2020 00:17
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jamesmishra/34bac09176bc07b1f0c33886e4b19dc7 to your computer and use it in GitHub Desktop.
Save jamesmishra/34bac09176bc07b1f0c33886e4b19dc7 to your computer and use it in GitHub Desktop.
Calculating Keras model memory usage
def keras_model_memory_usage_in_bytes(model, *, batch_size: int):
"""
Return the estimated memory usage of a given Keras model in bytes.
This includes the model weights and layers, but excludes the dataset.
The model shapes are multipled by the batch size, but the weights are not.
Args:
model: A Keras model.
batch_size: The batch size you intend to run the model with. If you
have already specified the batch size in the model itself, then
pass `1` as the argument here.
Returns:
An estimate of the Keras model's memory usage in bytes.
"""
default_dtype = tf.keras.backend.floatx()
shapes_mem_count = 0
internal_model_mem_count = 0
for layer in model.layers:
if isinstance(layer, tf.keras.Model):
internal_model_mem_count += keras_model_memory_usage_in_bytes(
layer, batch_size=batch_size
)
single_layer_mem = tf.as_dtype(layer.dtype or default_dtype).size
out_shape = layer.output_shape
if isinstance(out_shape, list):
out_shape = out_shape[0]
for s in out_shape:
if s is None:
continue
single_layer_mem *= s
shapes_mem_count += single_layer_mem
trainable_count = sum(
[tf.keras.backend.count_params(p) for p in model.trainable_weights]
)
non_trainable_count = sum(
[tf.keras.backend.count_params(p) for p in model.non_trainable_weights]
)
total_memory = (
batch_size * shapes_mem_count
+ internal_model_mem_count
+ trainable_count
+ non_trainable_count
)
return total_memory
@Bidski
Copy link

Bidski commented Oct 21, 2020

@jamesmishra, in regards to tensorflow issue 36327, not sure how much you will be able to figure out from this but here is the output from model.summary() and the input_shape is (12, 256, 512, 3) (there are 2 inputs for the network of the same shape)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
LeftRetina (Retina)          multiple                  0         
_________________________________________________________________
RightRetina (Retina)         multiple                  0         
_________________________________________________________________
features (UnaryFeatures)     multiple                  161248    
_________________________________________________________________
cost_volume (CostVolume)     multiple                  0         
_________________________________________________________________
reg_down0 (RegulariserDownBl multiple                  55424     
_________________________________________________________________
reg_down1 (RegulariserDownBl multiple                  27776     
_________________________________________________________________
reg_down2 (RegulariserDownBl multiple                  110848    
_________________________________________________________________
reg_down3 (RegulariserDownBl multiple                  110848    
_________________________________________________________________
reg_down4 (RegulariserDownBl multiple                  110848    
_________________________________________________________________
reg_down5 (RegulariserDownBl multiple                  110848    
_________________________________________________________________
reg_down6 (RegulariserDownBl multiple                  110848    
_________________________________________________________________
reg_down7 (RegulariserDownBl multiple                  110848    
_________________________________________________________________
reg_down8 (RegulariserDownBl multiple                  110848    
_________________________________________________________________
reg_down9 (RegulariserDownBl multiple                  110848    
_________________________________________________________________
reg_down10 (RegulariserDownB multiple                  110848    
_________________________________________________________________
reg_down11 (RegulariserDownB multiple                  221696    
_________________________________________________________________
reg_down12 (RegulariserDownB multiple                  442880    
_________________________________________________________________
reg_down13 (RegulariserDownB multiple                  442880    
_________________________________________________________________
reg_up0 (RegulariserUpBlock) multiple                  221440    
_________________________________________________________________
reg_up1 (RegulariserUpBlock) multiple                  110848    
_________________________________________________________________
reg_up2 (RegulariserUpBlock) multiple                  110848    
_________________________________________________________________
reg_up3 (RegulariserUpBlock) multiple                  55424     
_________________________________________________________________
output (Conv3DTranspose)     multiple                  864       
=================================================================
Total params: 2,848,960
Trainable params: 2,845,376
Non-trainable params: 3,584

@jamesmishra
Copy link
Author

Sorry for the delay, @Bidski. Are you building this model by directly subclassing tf.keras.Model?

In these cases, it is likely that input and output shapes are not accurately computed until the model is called. This is probably why all of your output shapes are "multiple".

see: tensorflow/tensorflow#29132 and tensorflow/tensorflow#25036

If you have a way of calculating the shapes of every layer without actually allocating memory on the GPU, then we should be able to integrate that into my function in this Gist.

@jizhang02
Copy link

I have a question about internal_model_mem_count, it seems to be always 0? but what does this mean? why compute this? thank you

@jamesmishra
Copy link
Author

@jizhang02,

You can treat an entire Keras model as a single layer in a larger model. e.g.:

model_a = tf.keras.Sequential([
    tf.keras.Input((10,)),
    tf.keras.layers.Activation("linear")
])

model_b = tf.keras.Sequential([
    tf.keras.Input((10,)),
    model_a,
    tf.keras.layers.Activation("softmax")
])

Because of this, keras_model_memory_usage_in_bytes() recursively applies itself to nested model layers, and it tracks the approximate memory usage of nested models in the internal_model_mem_count variable.

@Bidski
Copy link

Bidski commented Oct 4, 2021

Sorry for the delay, @Bidski. Are you building this model by directly subclassing tf.keras.Model?

Yes, direct subclassing of tf.keras.Model.

In these cases, it is likely that input and output shapes are not accurately computed until the model is called. This is probably why all of your output shapes are "multiple".

This may be a different issue entirely, but I have a different model that is also subclassed from tf.keras.Model. After loading in the trained model and printing the summary the output shapes are still listed as multiple (the model and all of its layers have been called as it is a fully trained model, but it was also just loaded from disk so its possible that this information isnt saved in the model).

see: tensorflow/tensorflow#29132 and tensorflow/tensorflow#25036

I may be missing something here, but it seems that both of these issues are basically saying that the "best" option is to call the network with dummy data and, hence, actually allocating memory for all of the layers?

If you have a way of calculating the shapes of every layer without actually allocating memory on the GPU, then we should be able to integrate that into my function in this Gist.

I mean, all of the layers are basically combinations of tensorflow ops (convolutions, dense layers, reshapes, etc) so it is easily possible to calculate all of the output shapes at instantiation time if you know the input shapes, but I dont think I ever found a simple way to manually compute the output shape for all layers (it has been a long time since I looked at this, so I may be wrong on this point).

@jamesmishra
Copy link
Author

[...] I dont think I ever found a simple way to manually compute the output shape for all layers (it has been a long time since I looked at this, so I may be wrong on this point).

@Bidski, I've come to the same conclusion since I last replied to you.

In general, any Keras layer can create an arbitrary amount of tensors in the layer's __init__(), build(), and call() methods, These tensors will not appear in the layer's output shape, so my keras_model_memory_usage_in_bytes() will continually underestimate a model's actual memory usage.

However, I still find an underestimate to be useful. When I am automatically generating models during a hyperparameter search, I can skip over models that are 100% guaranteed to be too large for my GPUs.

@Bidski
Copy link

Bidski commented Oct 6, 2021

However, I still find an underestimate to be useful. When I am automatically generating models during a hyperparameter search, I can skip over models that are 100% guaranteed to be too large for my GPUs.

Unfortunately, an underestimate doesn't meet the usage that I had in mind as I was interested in knowing whether a particular model would fit into a particular GPU (and what batch size would result in "optimal" memory usage). Since we have no real idea as to how much we are underestimating by it is impossible to answer this question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment