Skip to content

Instantly share code, notes, and snippets.

@itayB
Last active July 27, 2016 09:08
Show Gist options
  • Save itayB/f54ef4ceba32a03b3a89f55c74325bf7 to your computer and use it in GitHub Desktop.
Save itayB/f54ef4ceba32a03b3a89f55c74325bf7 to your computer and use it in GitHub Desktop.
Protocol Buffers summary

Written by Itay Bittan.

Google Protocol Buffer

Background / Motivation

Most of our code project was written in C. Here are some of the pitfalls we suffered from:

  1. While working with separated modules/sub-projects for each component of the system, we had to maintain some shared place for most of the struct's definitions - so moving data from one module to another could be achieved (for example, via TCP sockets). It turned out that it is not a simple task and lead to a lot of dependencies between sub-projects, redundants headers files includes and sometimes even struct duplications.

  2. For persistency, some C structs were written into binary files (for example, query metadata). During the project lifecycle, these structs were changed (increased by more fields most of the time). Trying to read data from the old binaries files (which suitable to the old structs) cause malformed incorrect data and sometimes even caused to segmentation faults and system crash.

  3. Starting developing in other languages, such as C# (for our Windows Desktop Application), python (end-to-end testing) and javascript (our node.js web server) raised an ugly code which was really hard to maintain (changes in some struct of the main project had to be manually changed in the C# code as well).

  4. Debugging - printing the structs in C in not an easy task, we had to create unique print function for each one in order to view it's content.

The solution

It turns out that there are a lot of great solution for our needs. BSON (used by mongoDB), MessagePack (which has a lot of programming languages support), Avro (Apache), Thrift (also Apache) and of course protocol buffers. Most of them are quite known and I found the rest when I came across this twite.

Finally we choose protocol buffers mainly for it's simplicity (and of course because it fits our needs).

Our design

We have created one central folder called proto for all of the *.proto files. With each build of the project, which triggered our compile.sh bash script, we have generated the relevant files (to support all of the programming languages we are using).

Now, regard the pitfalls above:

  1. Backward compatibility By definition. see more details in section Extending a Protocol Buffer - There are some rules that need to be followed.

  2. Debug easily In the next section I'll demonstrate how to print buffer's data for each language.

  3. Cross language - officially supports c++, c#, go, java and python. Quick search found some other languages support in open source projects. For our node.js web server we have used protobuf.js project.

    1. Generate *.pb.cc and *.pb.h files for the c++ modules.

      function generateCPlusPlusProtocMessages() {
      	# Remove old auto generated files
      	rm -rf ./proto/protoc/*.pb.cc
          rm -rf ./proto/protoc/*.pb.h
      
      	echo -e "Generate protocol buffers messages (C++).. \c"
      
      	# Get all *.proto files
      	PROTO=$(find ./proto/ -path . -prune -o -name *.proto -print)
      
      	for FULL_NAME in $PROTO
      	do
      		SRC_DIR=$(dirname "$FULL_NAME")
      
      		# Create protoc directory if not exist
      		mkdir -p $SRC_DIR/protoc
      
      		# Generate *.pb.cc *.pb.h files
      		PROTO_RES=`protoc -I=$SRC_DIR --cpp_out=$SRC_DIR/protoc $FULL_NAME 2>&1`
      
      		if [ "$PROTO_RES" != "" ]; then
      			echo -e "\ncpp protoc error: $PROTO_RES"
      			exit 1
      		fi
      	done
      	echo "Done"
      }

      C++ code example: Lets say we have file called Table.proto, and files called myTable.cpp/MyTable.h with our code:

      //MyTable.h
      #include <Table.pb.h>
      //MyTable.cpp
      #include <MyTable.h>
      
      MyTable::MyTable(const protobuf::Table& iTable)
      {
      	// TODO: fill add object members from protobuf data
      
      	// print buffer data
      	cout << iTable.DebugString();
      }
      
      google::protobuf::Message* MyTable::ToProtoc()
      {
          protobuf::Table* myTable = new protobuf::Table();
      	
      	// TODO: add all table fields
      	
          return myTable;
      }
    2. Generate *_pb2.py files for the python end-to-end tests code

      function generatePythonProtocMessages() {
      	# Remove old auto generated files
      	rm -rf ./proto/protoc/*_pb2.py
      
      	echo -e "Generate protocol buffers messages (Python).. \c"
      
      	# Get all *.proto files
      	PROTO=$(find ./proto/ -path . -prune -o -name *.proto -print)
      
      	for FULL_NAME in $PROTO
      	do
      		SRC_DIR=$(dirname "$FULL_NAME")
      
      		# Create protoc directory if not exist
      		mkdir -p $SRC_DIR/protoc
      
      		# Generate *_pb2.py files
      		PROTO_RES=`protoc -I=$SRC_DIR --python_out=$SRC_DIR/protoc $FULL_NAME 2>&1`
      
                  if [ "$PROTO_RES" != "" ]; then
                  	echo -e "\npython protoc error: $PROTO_RES"
                      exit 1
                  fi
      	done
      	echo "Done"
      }

      Python code example:

      import Table_pb2
      
      #load buffer from file
      f = open('filename', 'r')
      buffer = f.read()
      
      try:
      	table = Table_pb2.Table()
      	table.ParseFromString(buffer)
      	print table
      except Exception as inst:
      	print inst
    3. Generate All.proto file which include all others:

      // All.proto
      //Auto generated file - DO NOT EDIT!
      
      import "Table.proto"
      import "File1.proto"
      import "File2.proto"
      
      
      function generateAllProtoFile() {
      	FILE='./proto/All.proto'
      	echo "//Auto generated file - DO NOT EDIT!" > $FILE
      
      	# Get all *.proto files
      	PROTO=$(find ./proto/ -path . -prune -o -name *.proto -print)
      
      	for FULL_NAME in $PROTO
      	do
      		SRC_FILE=$(basename "$FULL_NAME")
      		if [ "$SRC_FILE" != "All.proto" ]; then
      			echo "import \"$SRC_FILE\";" >> $FILE
      		fi
      	done
      }

      Javascript code example:

      const fs = require('fs');
      var ProtoBuf = require("protobufjs"),
          builder = ProtoBuf.loadProtoFile("./proto/All.proto"),
          protobuf = builder.build("protobuf");
      
      var buffer = fs.readFileSync('table.data');
      var newProto = protobuf.Table.decode64(buffer);
      console.log(JSON.parse(newProto.encodeJSON()));

      Finally, add these functions calls at the beginning of the compile.sh script:

      # Generate protocol buffers *.pb.cc *.pb.h files
      generateProtocCPlusPlusMessages
      
      # Generate protocol buffers _pb2.py files
      generateProtocPythonMessages
      
      # Generate All.proto file (for webserver)
      generateAllProtoFile
  4. Protobuf documentation (compile.sh) Thanks to @poponiv we found a greate open-source project called protoc-gen-doc which auto generates documentation from our ./proto/*.protofiles. Because of some OS issues we currently running it with docker:

    // ProtobufDockerfile
    
    FROM ubuntu:14.04
    
    RUN apt-get -yqq update && \
        apt-get -yqq install \
          build-essential \
          curl \
          dh-autoreconf \
          git \
          pkg-config \
          qt5-default \
          unzip && \
        # Install protobuf
        git clone https://github.com/google/protobuf.git /protobuf && \
        cd /protobuf && \
          git checkout v2.6.1 && \
          ./autogen.sh && \
          ./configure && \
          make && \
          make install && \
          ldconfig && \
        # Install protoc-gen-doc
        git clone https://github.com/estan/protoc-gen-doc.git /protoc-gen-doc && \
        cd /protoc-gen-doc && \
          qmake && \
          make && \
          cp protoc-gen-doc /usr/local/bin && \
        # Cleanup
        apt-get -yqq remove git curl unzip curl dh-autoreconf && \
        apt-get -yqq clean && \
        rm -rf /protobuf /protoc-gen-doc && \
        cd / && \
        mkdir doc
    ADD proto/gendoc /
    #WORKDIR /proto
    #ENTRYPOINT ["protoc"]
    CMD ["/gendoc"]
    

    Create docker image with: docker build -t proto -f ProtobufDockerfile .

    Run docker container with: docker run --rm -v $PWD/proto:/proto proto

Cons

  • Inheritance - not supported in protocol buffers :-(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment