ClementWalter/tf_serving_heroku.md

## tf_serving_heroku.md

      
    Raw
  

              tf_serving_heroku.md
            
          
    How to deploy a tensorflow model on Heroku with tensorflow serving

After spending minutes or hours playing around with all the wonderful examples available for instance on the
Google AI hub, one may wants to deploy one model or another
online.
This article presents a fast, optimal and neat way of doing it with Tensorflow Serving
and Heroku.
Introduction

There is gap between being able to train and test a single model on a single notebook using for instance Google colab
and deploying a model to production that can handle updates, batch and async predictions, etc.
Fortunately Google has publicly released its own framework for managing the whole lifecycle of a model from data
storage to serving to logging, etc. The reading of the Tensorflow Extended is worth
for any data scientist of software engineer looking for information about deploying models and real applications in
general.
This simple REST API example shows for instance how to
train a simple classifier and eventually make inference using the Tensorflow Serving
Rest API.
This short blog post goes one step further to show how to deploy the model on Heroku so that
it is available online and can be consumed by another api.
About serving a model

Serving a model is the fact of using it to make predictions of some kind: it is supposed to be requested with inputs and
return a hopefully relevant answer. From the serving point-of-view, the model is a black-box that should steadily
returns an output.
In other words, when thinking about serving the model, you should not have to rebuild it. The serving
is model agnostic: if you serve a computer vision classifier, its goal is to receive the original picture to be
classified and to return a score per label. That being said, the cropping, preprocessing, etc. steps should be
encapsulated into the model at the time it is built for serving, ie at the end of training.
Indeed, in a production fashion, each trained model is a potential candidate for becoming the SOTA of your problem.
Everything the comes from the raw data to be received during serving and the final prediction should be stored into the
same object so that anyone (and especially not the data scientist training the model) can safely consume it without
common mistake such as pixel normalization, wrong cropping, etc.
Using tensorflow

The Tensorflow library exposes the saved_model
API that is especially design for packaging a model into a binary cross-platform format that can later be used everywhere
without troubles. The signatures parameters allows for defining several routes and ops to be performed on the model
from the corresponding REST api for instance.
The notebook build for serving
from the Keras-FewShotLearning repo is a good example of
how to use the tf.function to create routes (signatures) that can then be easily called with the tensorflow serving
API.
For instance, given the preprocessing used during training:

one can add the following tf.function:

and export the model as follows:

Doing so, the default signature will return a score per class when receiving a full base64 encoded images but
calling the "preprocessing" signature will only return the preprocess image.
The request_served_model.py notebook
from Keras-FewShotLearning repo shows then how to
run the tensorflow serving with docker and how to request the different signatures, for instance:

Heroku

So far we have run the model locally. The serving is done from within a docker container (namely from tensorflow/serving
image). Using any container-orchestration system like docker-compose or Kubernetes
will allow you to clearly separate the models served with tensorflow from the app or any other services. You will
benefit from all the good work from Google people, for example hot reloading of the model as soon as a new version is
pushed into the target directory.
However if you want to deploy this container on Heroku you will face a final small difficulties I am going to alleviate.
Tensorflow serving serves the Rest API over the port 8501 but Heroku assigns a random port when it runs the dyno.
The custom Dockerfile is as follows:

with the entrypoint to be slightly modified to take the $PORT env variable into account:

Et voilà! Your model can now be requested from anywhere on earth. If you want to call it from a static website, you may
face a CORS issue, but it is another story.
Conclusion

I have presented in this tutorial how to use the Tensorflow extended framework to build, deploy and serve a tensorflow
model with a highly efficient API (from Google so I guess). I would love to hear from you about other benefits of using
TFX, tricks ans more.
Don't forget to follow me for updates and other tensorflow related articles!