Written by Itay Bittan.
Most of our code project was written in C. Here are some of the pitfalls we suffered from:
-
While working with separated modules/sub-projects for each component of the system, we had to maintain some shared place for most of the struct's definitions - so moving data from one module to another could be achieved (for example, via TCP sockets). It turned out that it is not a simple task and lead to a lot of dependencies between sub-projects, redundants headers files includes and sometimes even struct duplications.
-
For persistency, some C structs were written into binary files (for example, query metadata). During the project lifecycle, these structs were changed (increased by more fields most of the time). Trying to read data from the old binaries files (which suitable to the old structs) cause malformed incorrect data and sometimes even caused to segmentation faults and system crash.
-
Starting developing in other languages, such as C# (for our Windows Desktop Application), python (end-to-end testing) and javascript (our node.js web server) raised an ugly code which was really hard to maintain (changes in some struct of the main project had to be manually changed in the C# code as well).
-
Debugging - printing the structs in C in not an easy task, we had to create unique print function for each one in order to view it's content.
It turns out that there are a lot of great solution for our needs. BSON (used by mongoDB), MessagePack (which has a lot of programming languages support), Avro (Apache), Thrift (also Apache) and of course protocol buffers. Most of them are quite known and I found the rest when I came across this twite.
Finally we choose protocol buffers mainly for it's simplicity (and of course because it fits our needs).
We have created one central folder called proto
for all of the *.proto
files. With each build of the project, which triggered our compile.sh
bash script, we have generated the relevant files (to support all of the programming languages we are using).
Now, regard the pitfalls above:
-
Backward compatibility By definition. see more details in section Extending a Protocol Buffer - There are some rules that need to be followed.
-
Debug easily In the next section I'll demonstrate how to print buffer's data for each language.
-
Cross language - officially supports c++, c#, go, java and python. Quick search found some other languages support in open source projects. For our node.js web server we have used protobuf.js project.
-
Generate
*.pb.cc
and*.pb.h
files for the c++ modules.function generateCPlusPlusProtocMessages() { # Remove old auto generated files rm -rf ./proto/protoc/*.pb.cc rm -rf ./proto/protoc/*.pb.h echo -e "Generate protocol buffers messages (C++).. \c" # Get all *.proto files PROTO=$(find ./proto/ -path . -prune -o -name *.proto -print) for FULL_NAME in $PROTO do SRC_DIR=$(dirname "$FULL_NAME") # Create protoc directory if not exist mkdir -p $SRC_DIR/protoc # Generate *.pb.cc *.pb.h files PROTO_RES=`protoc -I=$SRC_DIR --cpp_out=$SRC_DIR/protoc $FULL_NAME 2>&1` if [ "$PROTO_RES" != "" ]; then echo -e "\ncpp protoc error: $PROTO_RES" exit 1 fi done echo "Done" }
C++ code example: Lets say we have file called
Table.proto
, and files calledmyTable.cpp
/MyTable.h
with our code://MyTable.h #include <Table.pb.h>
//MyTable.cpp #include <MyTable.h> MyTable::MyTable(const protobuf::Table& iTable) { // TODO: fill add object members from protobuf data // print buffer data cout << iTable.DebugString(); } google::protobuf::Message* MyTable::ToProtoc() { protobuf::Table* myTable = new protobuf::Table(); // TODO: add all table fields return myTable; }
-
Generate
*_pb2.py
files for the python end-to-end tests codefunction generatePythonProtocMessages() { # Remove old auto generated files rm -rf ./proto/protoc/*_pb2.py echo -e "Generate protocol buffers messages (Python).. \c" # Get all *.proto files PROTO=$(find ./proto/ -path . -prune -o -name *.proto -print) for FULL_NAME in $PROTO do SRC_DIR=$(dirname "$FULL_NAME") # Create protoc directory if not exist mkdir -p $SRC_DIR/protoc # Generate *_pb2.py files PROTO_RES=`protoc -I=$SRC_DIR --python_out=$SRC_DIR/protoc $FULL_NAME 2>&1` if [ "$PROTO_RES" != "" ]; then echo -e "\npython protoc error: $PROTO_RES" exit 1 fi done echo "Done" }
Python code example:
import Table_pb2 #load buffer from file f = open('filename', 'r') buffer = f.read() try: table = Table_pb2.Table() table.ParseFromString(buffer) print table except Exception as inst: print inst
-
Generate
All.proto
file which include all others:// All.proto //Auto generated file - DO NOT EDIT! import "Table.proto" import "File1.proto" import "File2.proto"
function generateAllProtoFile() { FILE='./proto/All.proto' echo "//Auto generated file - DO NOT EDIT!" > $FILE # Get all *.proto files PROTO=$(find ./proto/ -path . -prune -o -name *.proto -print) for FULL_NAME in $PROTO do SRC_FILE=$(basename "$FULL_NAME") if [ "$SRC_FILE" != "All.proto" ]; then echo "import \"$SRC_FILE\";" >> $FILE fi done }
Javascript code example:
const fs = require('fs'); var ProtoBuf = require("protobufjs"), builder = ProtoBuf.loadProtoFile("./proto/All.proto"), protobuf = builder.build("protobuf"); var buffer = fs.readFileSync('table.data'); var newProto = protobuf.Table.decode64(buffer); console.log(JSON.parse(newProto.encodeJSON()));
Finally, add these functions calls at the beginning of the
compile.sh
script:# Generate protocol buffers *.pb.cc *.pb.h files generateProtocCPlusPlusMessages # Generate protocol buffers _pb2.py files generateProtocPythonMessages # Generate All.proto file (for webserver) generateAllProtoFile
-
-
Protobuf documentation (compile.sh) Thanks to @poponiv we found a greate open-source project called protoc-gen-doc which auto generates documentation from our
./proto/*.proto
files. Because of some OS issues we currently running it with docker:// ProtobufDockerfile FROM ubuntu:14.04 RUN apt-get -yqq update && \ apt-get -yqq install \ build-essential \ curl \ dh-autoreconf \ git \ pkg-config \ qt5-default \ unzip && \ # Install protobuf git clone https://github.com/google/protobuf.git /protobuf && \ cd /protobuf && \ git checkout v2.6.1 && \ ./autogen.sh && \ ./configure && \ make && \ make install && \ ldconfig && \ # Install protoc-gen-doc git clone https://github.com/estan/protoc-gen-doc.git /protoc-gen-doc && \ cd /protoc-gen-doc && \ qmake && \ make && \ cp protoc-gen-doc /usr/local/bin && \ # Cleanup apt-get -yqq remove git curl unzip curl dh-autoreconf && \ apt-get -yqq clean && \ rm -rf /protobuf /protoc-gen-doc && \ cd / && \ mkdir doc ADD proto/gendoc / #WORKDIR /proto #ENTRYPOINT ["protoc"] CMD ["/gendoc"]
Create docker image with:
docker build -t proto -f ProtobufDockerfile .
Run docker container with:
docker run --rm -v $PWD/proto:/proto proto
- Inheritance - not supported in protocol buffers :-(