Skip to content

Instantly share code, notes, and snippets.

@arganzheng
Created January 14, 2014 01:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arganzheng/8411693 to your computer and use it in GitHub Desktop.
Save arganzheng/8411693 to your computer and use it in GitHub Desktop.
title layout
protocal-buffers学习笔记
post

产生背景(Why Google Invent PB)

在官方文档上 A bit of history 介绍了PB的产生原因。

一开始主要是为了解决版本兼容问题和多语言调用问题:

Protocol buffers were designed to solve many of these problems:

  • New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.
  • Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)

但是用户还是需要自己去解析。所以后来加上了autogen的功能。还增加了持久化数据存储格式和RPC接口定义的支持。

As the system evolved, it acquired a number of other features and uses:

  • Automatically-generated serialization and deserialization code avoided the need for hand parsing.
  • In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).
  • Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface.

语言规范(Language Guide)

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2; // Which page number do we want?
  optional int32 result_per_page = 3 [default = 10]; // Number of results to return per page.
  enum Corpus {
    UNIVERSAL = 0;
    WEB = 1;
    IMAGES = 2;
    LOCAL = 3;
    NEWS = 4;
    PRODUCTS = 5;
    VIDEO = 6;
  }
  optional Corpus corpus = 4 [default = UNIVERSAL];
}

字段可以有如下三个修饰符:

  • required: a well-formed message must have exactly one of this field.
  • optional: a well-formed message can have zero or one of this field (but not more than one).
  • repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved. 例如: repeated int32 samples = 4 [packed=true];

特点

  • 支持注释,采用C/C++风格的注释。
  • 支持必填校验
  • 支持可选和默认值。optional int32 result_per_page = 3 [default = 10];

If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For bools, the default value is false. For numeric types, the default value is zero. For enums, the default value is the first value listed in the enum's type definition.

  • 支持List,采用repeated修饰符。

  • 支持枚举,但是是C++的枚举,不是Java那么强大的枚举。

  • builder模式

  • 支持自定义类型。不支持Map和Set。

    message SearchResponse { repeated Result result = 1; }

    message Result { required string url = 1; optional string title = 2; repeated string snippets = 3; }

  • 支持外部引用(类似于C的头文件include)

You can use definitions from other .proto files by importing them. To import another .proto's definitions, you add an import statement to the top of your file:

import "myproject/other_protos.proto";
  • 支持嵌套类型定义,类似于Java中的嵌套类。

  • int32, uint32, int64, uint64, and bool are all compatible. PB会进行截断处理。

  • 支持拓展(Extensions):奇怪的东西,不知道感觉没有什么必要。

  • 支持命名空间(package)

  • 支持服务定义(Defining Services)

    service SearchService { rpc Search (SearchRequest) returns (SearchResponse); }

类型

1. Scalar Value Types

.proto Type Notes C++ Type Java Type Python Type[2]
double double double float
float float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long[3]
uint32 Uses variable-length encoding. uint32 int[1] int/long[3]
uint64 Uses variable-length encoding. uint64 long[1] int/long[3]
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long[3]
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 228. uint32 int[1] int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 256. uint64 long[1] int/long[3]
sfixed32 Always four bytes. int32 int int
sfixed64 Always eight bytes. int64 long int/long[3]
bool bool boolean boolean
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode[4]
bytes May contain any arbitrary sequence of bytes. string ByteString str

You can find out more about how these types are encoded when you serialize your message in Protocol Buffer Encoding.

[1] In Java, unsigned 32-bit and 64-bit integers are represented using their signed counterparts, with the top bit simply being stored in the sign bit. [2] In all cases, setting values to a field will perform type checking to make sure it is valid. [3] 64-bit or unsigned 32-bit integers are always represented as long when decoded, but can be an int if an int is given when setting the field. In all cases, the value must fit in the type represented when set. See [2]. [4] Python strings are represented as unicode on decode but can be str if an ASCII string is given (this is subject to change).

2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment