arganzheng/protocal-buffer-study.md

## protocal-buffer-study.md

      
    Raw
  

              protocal-buffer-study.md
            
          
  title
  layout
  
  
  protocal-buffers学习笔记
  post
  
  
产生背景(Why Google Invent PB)

在官方文档上 A bit of history 介绍了PB的产生原因。
一开始主要是为了解决版本兼容问题和多语言调用问题：

Protocol buffers were designed to solve many of these problems:


New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.
Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)

但是用户还是需要自己去解析。所以后来加上了autogen的功能。还增加了持久化数据存储格式和RPC接口定义的支持。

As the system evolved, it acquired a number of other features and uses:


Automatically-generated serialization and deserialization code avoided the need for hand parsing.
In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).
Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface.

语言规范(Language Guide)

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2; // Which page number do we want?
  optional int32 result_per_page = 3 [default = 10]; // Number of results to return per page.
  enum Corpus {
    UNIVERSAL = 0;
    WEB = 1;
    IMAGES = 2;
    LOCAL = 3;
    NEWS = 4;
    PRODUCTS = 5;
    VIDEO = 6;
  }
  optional Corpus corpus = 4 [default = UNIVERSAL];
}

字段可以有如下三个修饰符：

required: a well-formed message must have exactly one of this field.
optional: a well-formed message can have zero or one of this field (but not more than one).
repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved. 例如: repeated int32 samples = 4 [packed=true];

特点


支持注释，采用C/C++风格的注释。
支持必填校验
支持可选和默认值。optional int32 result_per_page = 3 [default = 10];


If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For bools, the default value is false. For numeric types, the default value is zero. For enums, the default value is the first value listed in the enum's type definition.


支持List，采用repeated修饰符。


支持枚举，但是是C++的枚举，不是Java那么强大的枚举。


builder模式


支持自定义类型。不支持Map和Set。
message SearchResponse {
repeated Result result = 1;
}
message Result {
required string url = 1;
optional string title = 2;
repeated string snippets = 3;
}


支持外部引用（类似于C的头文件include）


You can use definitions from other .proto files by importing them. To import another .proto's definitions, you add an import statement to the top of your file:
import "myproject/other_protos.proto";


支持嵌套类型定义，类似于Java中的嵌套类。


int32, uint32, int64, uint64, and bool are all compatible. PB会进行截断处理。


支持拓展(Extensions)：奇怪的东西，不知道感觉没有什么必要。


支持命名空间(package)


支持服务定义(Defining Services)
service SearchService {
rpc Search (SearchRequest) returns (SearchResponse);
}


类型

1. Scalar Value Types


.proto Type
Notes
C++ Type
Java Type
Python Type[2]


double

double
double
float


float

float
float
float


int32
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.
int32
int
int


int64
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.
int64
long
int/long[3]


uint32
Uses variable-length encoding.
uint32
int[1]
int/long[3]


uint64
Uses variable-length encoding.
uint64
long[1]
int/long[3]


sint32
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.
int32
int
int


sint64
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.
int64
long
int/long[3]


fixed32
Always four bytes. More efficient than uint32 if values are often greater than 228.
uint32
int[1]
int


fixed64
Always eight bytes. More efficient than uint64 if values are often greater than 256.
uint64
long[1]
int/long[3]


sfixed32
Always four bytes.
int32
int
int


sfixed64
Always eight bytes.
int64
long
int/long[3]


bool

bool
boolean
boolean


string
A string must always contain UTF-8 encoded or 7-bit ASCII text.
string
String
str/unicode[4]


bytes
May contain any arbitrary sequence of bytes.
string
ByteString
str


You can find out more about how these types are encoded when you serialize your message in Protocol Buffer Encoding.


[1] In Java, unsigned 32-bit and 64-bit integers are represented using their signed counterparts, with the top bit simply being stored in the sign bit.
[2] In all cases, setting values to a field will perform type checking to make sure it is valid.
[3] 64-bit or unsigned 32-bit integers are always represented as long when decoded, but can be an int if an int is given when setting the field. In all cases, the value must fit in the type represented when set. See [2].
[4] Python strings are represented as unicode on decode but can be str if an ASCII string is given (this is subject to change).
2.
.proto Type	Notes	C++ Type	Java Type	Python Type[2]
double		double	double	float
float		float	float	float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int	int
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	long	int/long[3]
uint32	Uses variable-length encoding.	uint32	int[1]	int/long[3]
uint64	Uses variable-length encoding.	uint64	long[1]	int/long[3]
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int	int
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	long	int/long[3]
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 228.	uint32	int[1]	int
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 256.	uint64	long[1]	int/long[3]
sfixed32	Always four bytes.	int32	int	int
sfixed64	Always eight bytes.	int64	long	int/long[3]
bool		bool	boolean	boolean
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	String	str/unicode[4]
bytes	May contain any arbitrary sequence of bytes.	string	ByteString	str