Skip to content

Instantly share code, notes, and snippets.

@junwan01

junwan01/blog.md Secret

Created April 22, 2019 05:19
Show Gist options
  • Save junwan01/63adcf5fac0c61827e809a70a4e27c7c to your computer and use it in GitHub Desktop.
Save junwan01/63adcf5fac0c61827e809a70a4e27c7c to your computer and use it in GitHub Desktop.

Introduction

Google TensorFlow is a popular Machine Learning toolkit, which includes TF Serving which can serve the saved ML models via a Docker image that exposes RESTful and gRPC API.

Here is a introduction of gRPC. The TF Serving's gRPC APIs are defined inside protobuf files (for example model serving, among others), and provide slightly more functionalities than the RESTful API. With these .proto files, you can generate the necessary client source code for various languages, and integrate the model serving function into your own application.

For people who just want a Java client library to use, you can simply download the jar file under lib/ and use it.

If you want to build it yourself, to use more recent .proto releases or for other reasons, then read on ...

Build your own Java client

Step 1. Get TensorFlow protobuf files

Check out the tensorflow projects somewhere

https://gist.github.com/97e6bfb826b357ff9e145badc51b3342

Gather all the .proto files and organize them into a new Java project.

The libraries we checked out contain many files, but we only need some .proto files in order to compile our gRPC Java client. Let's make a project to host the source .proto files and future .java files.

Our end goal is to get all the .proto files required directly or indirectly by tensorflow_serving/apis/*_service.proto files. However, I am not aware of any tools that can start with a few .proto files and trace through the import statements and list all other .proto files required. So figuring out what files are needed is done by trying to compile the resulting Java classes till no 'no class def found' complaints. Alternatively one could simply include all .proto files from tensorflow_serving/ and tensorflow/, but it will result in much bigger Java package.

For the above mentioned tensorflow and tensorflow_serving releases, the .proto files from the following dirs are enough (still include some unnecessary ones):

https://gist.github.com/a5e8cfa7730c71ebd041542233bbe3fa

Let's try to pick out only .proto files, while still keep the directory structure, which is assumed by the import statements in these .proto files. Let's put .proto files into the new project's directories respectively, under src/main/proto/: https://gist.github.com/c17cd243fef4678901fa9b2ed8ce7ce1

Note: The .proto files in these directories can change between releases, new files can be added, and file content can also change. So it is possible that the above 4 directories will contain .proto files that require other .proto files from directories outside, and we have to expand the .proto files to include.

Step 2. Generate the Java files

Now we have a project with only .proto files under src/main/proto/. Let's compile them into Java source files.

Option 1 - build with maven

Build can be automated by using maven, the key dependencies declared in pom file are: https://gist.github.com/bdd071370386b18261c31c5684971db9

Additionally, use the protobuf-maven-plugin which will compile .proto files to .java files. It will also generate extra *Grpc.java service stub files for each *_service.proto files:

https://gist.github.com/5ed3720b333015d32a06d0770cf59917

Here is the documentation of this plugin, including the list of goals available. You can see it can compile the .proto files to Java, C++, C#, Javascript, or Python.

A few notes:

  • The compile-custom goal in the above pom will generate the *Grpc.java files, which are essential for the Java client, so keep it in your goal list.
  • The plugin includes pre-compiled protoc executable for Linux, and is compiled using glibc, so it may not run correctly in Linux systems without glibc, e.g. alpine linux. So don't use a build server based on alpine Linux. See more details.
  • The os-maven-plugin extension is used to provide ${os.detected.classifier}, in order to pull the correct executable for the build server/OS. Eclipse IDE may require some special handling.

Option 2 - Compile manually

It takes extra steps to build manually. You may want to do this to keep the resulting .java files static, as part of your source code, instead of generating dynamically each time, in order to reduce the build complexity.

1. Build the grpc-java plugin

Checkout the grpc-java repo, and build the plugin (protoc-gen-grpc-java), which will be used later in protoc calls to generate Java implementation of gRPC client.

https://gist.github.com/4192b57d3bab57ca504e2e1a98150d33

2. Install protobuf utilities to compile .proto files

For Mac, it is like this: https://gist.github.com/c1f834929830304d7464994d96f2fe9d

Compiling it manually reduces the Java build complexity as you can treat the result .java files as source code, and start there, therefore completely ignore any .proto business going forward. However everytime you need to do any updates for any reason, you will need to repeat above steps.

Sample Java client code to make gRPC calls

https://gist.github.com/bd3f043d9ba077ba8d3f8fc80f7e81ca

Additional engineering considerations:

  • Creating a channel is an expensive operation, should be cached.
  • protobuf classes are dumb data holders, used for serialization and communication. You should build separate application specific object models that wraps around these protobuf classes, to provide additional behavior. Don't extend the protobuf classes for this purpose. See Protobuf Java Tutorial.
@penzhang
Copy link

penzhang commented Jun 4, 2019

Very useful blog!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment