Google TensorFlow is a popular Machine Learning toolkit, which includes TF Serving which can serve the saved ML models via a Docker image that exposes RESTful and gRPC API.
Here is a introduction of gRPC. The TF Serving's gRPC APIs are defined inside protobuf files (for example model serving, among others), and provide slightly more functionalities than the RESTful API. With these .proto files, you can generate the necessary client source code for various languages, and integrate the model serving function into your own application.
For people who just want a Java client library to use, you can simply download the jar file under lib/ and use it.
If you want to build it yourself, to use more recent .proto releases or for other reasons, then read on ...
https://gist.github.com/97e6bfb826b357ff9e145badc51b3342
The libraries we checked out contain many files, but we only need some .proto files in order to compile our gRPC Java client. Let's make a project to host the source .proto files and future .java files.
Our end goal is to get all the .proto files required directly or indirectly by tensorflow_serving/apis/*_service.proto files. However, I am not aware of any tools that can start with a few .proto files and trace through the import statements and list all other .proto files required. So figuring out what files are needed is done by trying to compile the resulting Java classes till no 'no class def found' complaints. Alternatively one could simply include all .proto files from tensorflow_serving/ and tensorflow/, but it will result in much bigger Java package.
For the above mentioned tensorflow and tensorflow_serving releases, the .proto files from the following dirs are enough (still include some unnecessary ones):
https://gist.github.com/a5e8cfa7730c71ebd041542233bbe3fa
Let's try to pick out only .proto files, while still keep the directory structure, which is assumed by the import statements in these .proto files. Let's put .proto files into the new project's directories respectively, under src/main/proto/
:
https://gist.github.com/c17cd243fef4678901fa9b2ed8ce7ce1
Note: The .proto files in these directories can change between releases, new files can be added, and file content can also change. So it is possible that the above 4 directories will contain .proto files that require other .proto files from directories outside, and we have to expand the .proto files to include.
Now we have a project with only .proto files under src/main/proto/
. Let's compile them into Java source files.
Build can be automated by using maven, the key dependencies declared in pom file are: https://gist.github.com/bdd071370386b18261c31c5684971db9
Additionally, use the protobuf-maven-plugin
which will compile .proto files to .java files. It will also generate extra *Grpc.java
service stub files for each *_service.proto files:
https://gist.github.com/5ed3720b333015d32a06d0770cf59917
Here is the documentation of this plugin, including the list of goals available. You can see it can compile the .proto files to Java, C++, C#, Javascript, or Python.
A few notes:
- The
compile-custom
goal in the above pom will generate the*Grpc.java
files, which are essential for the Java client, so keep it in your goal list. - The plugin includes pre-compiled
protoc
executable for Linux, and is compiled using glibc, so it may not run correctly in Linux systems without glibc, e.g. alpine linux. So don't use a build server based on alpine Linux. See more details. - The
os-maven-plugin
extension is used to provide${os.detected.classifier}
, in order to pull the correct executable for the build server/OS. Eclipse IDE may require some special handling.
It takes extra steps to build manually. You may want to do this to keep the resulting .java files static, as part of your source code, instead of generating dynamically each time, in order to reduce the build complexity.
Checkout the grpc-java repo, and build the plugin (protoc-gen-grpc-java
), which will be used later in protoc
calls to generate Java implementation of gRPC client.
https://gist.github.com/4192b57d3bab57ca504e2e1a98150d33
For Mac, it is like this: https://gist.github.com/c1f834929830304d7464994d96f2fe9d
Compiling it manually reduces the Java build complexity as you can treat the result .java files as source code, and start there, therefore completely ignore any .proto business going forward. However everytime you need to do any updates for any reason, you will need to repeat above steps.
https://gist.github.com/bd3f043d9ba077ba8d3f8fc80f7e81ca
Additional engineering considerations:
- Creating a channel is an expensive operation, should be cached.
- protobuf classes are dumb data holders, used for serialization and communication. You should build separate application specific object models that wraps around these protobuf classes, to provide additional behavior. Don't extend the protobuf classes for this purpose. See Protobuf Java Tutorial.
Very useful blog!