Skip to content

Instantly share code, notes, and snippets.

@ClementWalter
Created September 23, 2020 15:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ClementWalter/7dfa5570953b634b320c16cbc008760a to your computer and use it in GitHub Desktop.
Save ClementWalter/7dfa5570953b634b320c16cbc008760a to your computer and use it in GitHub Desktop.
How to deploy a tensorflow model on Heroku with tensorflow serving

How to deploy a tensorflow model on Heroku with tensorflow serving

After spending minutes or hours playing around with all the wonderful examples available for instance on the Google AI hub, one may wants to deploy one model or another online.

This article presents a fast, optimal and neat way of doing it with Tensorflow Serving and Heroku.

Introduction

There is gap between being able to train and test a single model on a single notebook using for instance Google colab and deploying a model to production that can handle updates, batch and async predictions, etc.

Fortunately Google has publicly released its own framework for managing the whole lifecycle of a model from data storage to serving to logging, etc. The reading of the Tensorflow Extended is worth for any data scientist of software engineer looking for information about deploying models and real applications in general.

This simple REST API example shows for instance how to train a simple classifier and eventually make inference using the Tensorflow Serving Rest API.

This short blog post goes one step further to show how to deploy the model on Heroku so that it is available online and can be consumed by another api.

About serving a model

Serving a model is the fact of using it to make predictions of some kind: it is supposed to be requested with inputs and return a hopefully relevant answer. From the serving point-of-view, the model is a black-box that should steadily returns an output.

In other words, when thinking about serving the model, you should not have to rebuild it. The serving is model agnostic: if you serve a computer vision classifier, its goal is to receive the original picture to be classified and to return a score per label. That being said, the cropping, preprocessing, etc. steps should be encapsulated into the model at the time it is built for serving, ie at the end of training.

Indeed, in a production fashion, each trained model is a potential candidate for becoming the SOTA of your problem. Everything the comes from the raw data to be received during serving and the final prediction should be stored into the same object so that anyone (and especially not the data scientist training the model) can safely consume it without common mistake such as pixel normalization, wrong cropping, etc.

Using tensorflow

The Tensorflow library exposes the saved_model API that is especially design for packaging a model into a binary cross-platform format that can later be used everywhere without troubles. The signatures parameters allows for defining several routes and ops to be performed on the model from the corresponding REST api for instance.

The notebook build for serving from the Keras-FewShotLearning repo is a good example of how to use the tf.function to create routes (signatures) that can then be easily called with the tensorflow serving API.

For instance, given the preprocessing used during training:

one can add the following tf.function:

and export the model as follows:

Doing so, the default signature will return a score per class when receiving a full base64 encoded images but calling the "preprocessing" signature will only return the preprocess image.

The request_served_model.py notebook from Keras-FewShotLearning repo shows then how to run the tensorflow serving with docker and how to request the different signatures, for instance:

Heroku

So far we have run the model locally. The serving is done from within a docker container (namely from tensorflow/serving image). Using any container-orchestration system like docker-compose or Kubernetes will allow you to clearly separate the models served with tensorflow from the app or any other services. You will benefit from all the good work from Google people, for example hot reloading of the model as soon as a new version is pushed into the target directory.

However if you want to deploy this container on Heroku you will face a final small difficulties I am going to alleviate. Tensorflow serving serves the Rest API over the port 8501 but Heroku assigns a random port when it runs the dyno.

The custom Dockerfile is as follows:

with the entrypoint to be slightly modified to take the $PORT env variable into account:

Et voilà! Your model can now be requested from anywhere on earth. If you want to call it from a static website, you may face a CORS issue, but it is another story.

Conclusion

I have presented in this tutorial how to use the Tensorflow extended framework to build, deploy and serve a tensorflow model with a highly efficient API (from Google so I guess). I would love to hear from you about other benefits of using TFX, tricks ans more.

Don't forget to follow me for updates and other tensorflow related articles!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment