R is the main language of the Data Science team at Billie, so it is only natural that we use at every stage of our analytical workflow, including deployment in production. The amazing Plumber package allows R functions to be served as a RESTful service with only special comments, and does most of the heavy-lifting for us:
https://gist.github.com/1b962840d67b9973032c8644ba92caf7
Out-of-the-box Plumber handles requests in JSON format or character strings, and responses in a handful of formats including JSON, PDF and JPEG. This is enough for a lot of applications, but for the volumes of data we deal with, JSON it is too slow and inefficient. Enter Protocol Buffers.
Protocol Buffers are a technology developed at Google to serialize structured data, and allow communication between different services in a fast, simple and efficient manner. Broadly speaking, it replaces JSON or XML as a serialization method, which are good for application in which, for example, human readability is necessary; though once you start sending gigabytes of data between APIs, suddenly all those brackets and colons become a huge burden. Protocol Buffers removes all that cruft, resulting in a lean(er), binary message.
To use it, you first start with a schema defining what variables are contained in the message, their type, and their order:
https://gist.github.com/8c7fcd776eda77a24265741a72a03e72
You write this definition once for your data, and use it on both ends of the wire: the API sending the request encodes the data based on it, and the target API will quickly decode the message. Additionally, because the schema is statically typed, it works as a contract between the two ends of the wire, and ensures consistency. No more typecasting JSON strings to the correct types!
Most languages have an implementation for Protocol Buffers, R being no exception in the form of the RProtoBuf package, which has good documentation with useful examples. To our delight, Plumber also allows setting custom filters (handling inputs) and serializers (handling outputs), which means we could extend Plumber to speak ProtoBuf 🤘.
After browsing stackoverlow for inspiration, and exploring Plumber’s source-code and docs, we found that:
- Plumber holds the original, raw request in
req$rook.input
req
is an environment, which means it can hold our ProtoBuf message e.g. asreq$protobuf
After some experimentation we got information flowing in the correct format, and protopretzel was born 🥨, an R package expanding Plumber with a working filter and serializer for Protocol Buffers. The package is under active development, and even though it is already capable of handling our use-cases, the API might still change, so consider it experimental.
This is how you get Protocol Buffers support in Plumber using protopretzel
:
- Write a
.proto
descriptor file. If you already have one, that's also great! - Add our new serializer (before the call creating the API object), and our new filter:
https://gist.github.com/f9295978fd3e11d6301cc840ada5166f
- Tag your functions with
#* @serializer Protobuf
so they use the new serializer, and modify your functions so that they return anRProtoBuf
object:
https://gist.github.com/c14d168cee7fe4c2e0b1afb6917de095
- Start your API.
- Send a request including the type of message in the header, so that the filter knows which type of message to unserialize:
https://gist.github.com/52a1706c4b44b751636fcba8acf920cf
The response header will include the messagetype
of the output function in the same format, e.g. Content-Type: application/x-protobuf; messagetype=example.TestResponse
.
- Done! Plumber is now receiving, handling and responding with binary ProtoBuf messages 😄🥇
For a simple implementation, and more examples, take a look at the protopretzel-playground repository.
There is still a lot of room for optimization and improvements, and we very much welcome PRs and issues. Above all we wish protopretzel
can be useful to a lot of R users out there! 🥨