Skip to content

Instantly share code, notes, and snippets.

@niku
Last active Jun 1, 2019
Embed
What would you like to do?
Slide for Erlang & Elixir Fest 2019 -- https://elixir-fest.jp/

Protocol Buffers implementation with using Elixir

Agenda

  • How to parse a binary 00001000_10010110_00000001 to %{1 => 150}
  • How to do by using Elixir

Me

  • https://github.com/niku
  • Living in Sapporo, Hokkaido
  • Work at Farmnote
    • A Software and Hardware company to solve agriculutual issues particularly hard management
  • Enjoy sapporo-beam
    • A community to talk about ErlangVM
    • Almost every Thursday
    • Since 2014

BEAM bitstring and binary basis

Bitstring and binary in BEAM

descriptionrepresentationa kind of bitstring?a kind of binary?
bitstringa sequence of zero or more bits<<10::size(4)>>YesNo
binaryA bitstring its size is divisible by 8<<30>>**Yes**Yes

Manupirations

TypeHow toResult
Making a binary as you readbinary**=:binary.encode_unsigned(0b10101100_00000010)=**<<172, 2>>
bitstring**=<<0::1, 1::1, 0::1, 1::1>>=**<<5::size(4)>>
Pattern matchingbinary**=<<x, rest::binary>>=** = <<1, 2, 3>>x is 1, rest is <<2, 3>>
bitstring**=<<x::1, rest::bitstring>>=** = <<1, 2, 3>>x is 0, rest is <<2, 4, 3::size(7)>>
Concatenationbinary**=<<1>> <> <<127, 233>>=**<<1, 127, 233>>
bitstringa = <<1::1, 0::1>>; b = <<0::1, 1::1>>; **=<<a::bitstring, b::bitstring>>=**<<9::size(4)>>
Conversion to integerbinary**=:binary.decode_unsigned(<<1, 255>>)=**511
bitstringa = <<1::1, 0::1>>; size = bit_size(a); <<x::integer-size(size)>> = a; **=x=**2

ProtocolBuffers(PB)

  • A mechanism for serializing structured data like XML
    • simpler
    • 3 to 10 times smaller
    • 20 to 100 times faster
  • By default gRPC uses protocol buffers

Overview

StructuredData:
%MyMessage{a: 150}

Proto:
message MyMessage {
  optional int32 a = 1;
}

Binary:
00001000_10010110_00000001

[Serialize]              [Deserialize]

StructuredData            Binary
              \__Binary         \__StructuredData
              /                 /
       +-----+           +-----+
       |Proto|           |Proto|
       +-----+           +-----+

Binary

            00001000_10010110_00000001_......
msb 0       ^||||||| |        |
key number   ^^^^||| |        |
value type       ^^^ |        |
msb 1                ^        |
msb 0                         ^
drop msb              0010110  0000001
reverse               0000001  0010110
concatenate            000000_10010110
part        +-key--+ +----value------+
kv pair     +------------------------+
kv pair                                +--------+
message     +-----------------------------------+

Value part

                     00001000_10010110_00000001_..........
         msb 0       ^||||||| |        |
         key number   ^^^^||| |        |
+---+    value type       ^^^ |        |
| H | => msb 1                ^        |
| E | => msb 0                         ^
|   | => drop msb              0010110  0000001
| R | => reverse               0000001  0010110
| E | => concatenate            000000_10010110
+---+    part        +-key--+ +----value------+
         kv pair     +------------------------+
         kv pair                                +--------+
         message     +-----------------------------------+

Specification

Varint

  • A type of a PB’s value
  • Represents an integer
  • Has variable-length (1 ~ 16bytes)

MSB(Most Significant Bit)

  • To divide to a varint chunk
  • If msb is 1, next byte is also a part of varint
  • If msb is 0, next byte is the last byte of varint
                                                      10010110_00000001_...
                                                      |        |
1. MSB is 1. So next byte is also a part of varint   -+        |
2. MSB is 0. So this byte is the last byte of varint ----------+

Varint chunk:
10010110 00000001
                                                      10010110_10000001_00000011_...
                                                      |        |        |
1. MSB is 1. So next byte is also a part of varint   -+        |        |
2. MSB is 1. So next byte is also a part of varint   ----------+        |
3. MSB is 0. So this byte is the last byte of varint -------------------+

Varint chunk:
10010110 10000001 00000011

Converting varint binary to integer

10010110_10000001_00000011_..........

10010110_10000001_00000011 1. Divide to a varint chunk by using msb
 0010110  0000001  0000011 2. Drop msbs
 0000011  0000001  0010110 3. Reverse order by each 7bit
   00000_11000000_10010110 4. Concatenate bitsrings
         ||       |  | ||  5. Cast as an integer
         ||       |  | ||
         ||       |  | |+ 2
         ||       |  | +- 4
         ||       |  +--- 16
         ||       +------ 128
         |+-------------- 16384
         +--------------- 32768
32768 + 16384 + 128 + 16 + 4 + 2 = 49302

Elixir

Dividing to a varint chunk - Pattern matching is awesome

b = :binary.encode_unsigned(0b10010110_10000001_00000011)
<<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>> = b

msb # => 1
rest_7bits # => <<22::size(7)>>
rest_binary # => <<129, 3>>
10010110_10000001_00000011
+                          msb         1
 +-----+                   rest_7bits  0010110
         +---------------+ rest_binary 10000001_00000011
defmodule MsbBinary do
  def scan(<<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) do
    IO.inspect(msb: msb, rest_7bits: rest_7bits, rest_binary: rest_binary)
  end
end
binary = :binary.encode_unsigned(0b10010110_10000001_00000011)
MsbBinary.scan(binary)
[msb: 1, rest_7bits: <<22::size(7)>>, rest_binary: <<129, 3>>]

Dividing to a varint chunk - Recursion is also awesome

defmodule MsbBinary do
  def scan(<<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
    IO.inspect(msb: 0, rest_7bits: rest_7bits, rest_binary: rest_binary)
  end
  def scan(<<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
    IO.inspect(msb: 1, rest_7bits: rest_7bits, rest_binary: rest_binary)
    scan(rest_binary) # Do recursive
  end
end
binary = :binary.encode_unsigned(0b10010110_10000001_00000011)
MsbBinary.scan(binary)
[msb: 1, rest_7bits: <<22::size(7)>>, rest_binary: <<129, 3>>]
[msb: 1, rest_7bits: <<1::size(7)>>,  rest_binary: <<3>>]
[msb: 0, rest_7bits: <<3::size(7)>>,  rest_binary: ""]

Scanning varint

10010110_10000001_00000011_..........

10010110_10000001_00000011 1. Divide to a varint chunk by using msb
 0010110  0000001  0000011 2. Drop msbs
 0000011  0000001  0010110 3. Reverse order by each 7bit
   00000_11000000_10010110 4. Concatenate bitsrings
                           5. Cast as an integer
defmodule MsbBinary do
  def scan(binary) when is_binary(binary) do
    do_scan(<<>>, binary)
  end
  defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
    IO.inspect(progress: progress, msb: 0, rest_7bits: rest_7bits, rest_binary: rest_binary)
    {<<rest_7bits::bitstring, progress::bitstring>>, rest_binary}
  end
  defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
    IO.inspect(progress: progress, msb: 1, rest_7bits: rest_7bits, rest_binary: rest_binary)
    do_scan(<<rest_7bits::bitstring, progress::bitstring>>, rest_binary)
  end
end
binary = :binary.encode_unsigned(0b10010110_10000001_00000011)
MsbBinary.scan(binary)
[progress: "",                 msb: 1, rest_7bits: <<22::size(7)>>, rest_binary: <<129, 3, 5, 64>>]
[progress: <<22::size(7)>>,    msb: 1, rest_7bits: <<1::size(7)>>,  rest_binary: <<3, 5, 64>>]
[progress: <<2, 22::size(6)>>, msb: 0, rest_7bits: <<3::size(7)>>,  rest_binary: <<5, 64>>]
{<<6, 4, 22::size(5)>>, <<5, 64>>} # Result

Converting varint

defmodule MsbBinary do
  def scan(binary) when is_binary(binary) do
    do_scan(<<>>, binary)
  end
  defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
    {<<rest_7bits::bitstring, progress::bitstring>>, rest_binary}
  end
  defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
    do_scan(<<rest_7bits::bitstring, progress::bitstring>>, rest_binary)
  end
end

defmodule Varint do
  def as_int32(b) when is_bitstring(b) do
    size = bit_size(b)
    <<i::integer-size(size)>> = b
    i
  end
end

binary = :binary.encode_unsigned(0b10010110_00000001)
{chunk, _rest} = MsbBinary.scan(binary) # => {<<2, 22::size(6)>>, ""}
Varint.as_int32(chunk) # => 150

You’ve been able to understand the part of figure

                     00001000_10010110_00000001_..........
         msb 0       ^||||||| |        |
         key number   ^^^^||| |        |
+---+    value type       ^^^ |        |
| H | => msb 1                ^        |
| E | => msb 0                         ^
|   | => drop msb              0010110  0000001
| R | => reverse               0000001  0010110
| E | => concatenate            000000_10010110
+---+    part        +-key--+ +----value------+
         kv pair     +------------------------+
         kv pair                                +--------+
         message     +-----------------------------------+

Key part

+---+                00001000_10010110_00000001_..........
| H | => msb 0       ^||||||| |        |
| E | => key number   ^^^^||| |        |
| R | => value type       ^^^ |        |
| E |    msb 1                ^        |
+---+    msb 0                         ^
         drop msb              0010110  0000001
         reverse               0000001  0010110
         concatenate            000000_10010110
         part        +-key--+ +----value------+
         kv pair     +------------------------+
         kv pair                                +--------+
         message     +-----------------------------------+

Specification

Scaning key part

  • Same as varint
    • Taking while msb is 1
    • The msb which is the last byte of key part is 0
  • Parsing value is a little bit different

Key part has two information

message MyMessage {
  optional int32 a = 1;
}
  • A type of the value
    • an integer between from 0 to 5
    • In the example, the type of value which is bind by key a is int32 due to int32 a
  • A value of the key
    • an integer greater than 0
    • In the example, when key a is serialized, it represented by 1 due to a = 1

Getting a type of the value

  • The last 3 bits of key part
  • number 3 and 4 are depricated
numberMeaningUsed For
0Varintint32, int64, uint32, uint64, sint32, sint64, bool, enum
164-bitfixed64, sfixed64, double
2Length-delimitedstring, bytes, embedded messages, packed repeated fields
532-bitfixed32, sfixed32, float

Getting a value of the key

  • The last 3 bits are “type of the value”
  • Parsing as a varint except above

Parsing key part

            00001000_10010110_........
msb 0       ^|||||||
key number   ^^^^|||
value type       ^^^
part        +-key--+ +----value-------
typebitstringintegermeaning
A value of the key00011The Value of the key is 1
A type of the value0000Type of the value is Varint

Elixir

defmodule MsbBinary do
  def scan(binary) when is_binary(binary) do
    do_scan(<<>>, binary)
  end
  defp do_scan(progress <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
    {<<rest_7bits::bitstring, progress::bitstring>>, rest_binary}
  end
  defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
    do_scan(<<rest_7bits::bitstring, progress::bitstring>>, rest_binary)
  end
end

defmodule Key do
  @wire_type_size 3
  @wire_types %{
    0 => :varint,
    1 => :"64bit",
    2 => :length_delimited,
    3 => :start_group,
    4 => :end_group,
    5 => :"32bit"
  }
  def parse(b) when is_bitstring(b) do
    total_size = bit_size(b)
    key_size = total_size - @wire_type_size
    <<key_no::size(key_size), wire_type_no::size(@wire_type_size)>> = b
    {key_no, @wire_types[wire_type_no]}
  end
end

binary = :binary.encode_unsigned(0b00001000_10010110_00000001)
{chunk, _rest} = MsbBinary.scan(binary) # => {<<8::size(7)>>, <<150, 1>>}
Key.parse(chunk) # => {1, :varint}

You’ve been able to understand the part of figure

+---+                00001000_10010110_00000001_..........
| H | => msb 0       ^||||||| |        |
| E | => key number   ^^^^||| |        |
| R | => value type       ^^^ |        |
| E |    msb 1                ^        |
+---+    msb 0                         ^
         drop msb              0010110  0000001
         reverse               0000001  0010110
         concatenate            000000_10010110
         part        +-key--+ +----value------+
         kv pair     +------------------------+
         kv pair                                +--------+
         message     +-----------------------------------+

Conclusion

You know BEAM basis about bitstring and binary

  • Difference between bitstring and binary
  • Make a binary as you read
  • Bitstring pattern matching
  • Concatenate bitstrings
  • Convert bitstring to integer

You almost know what this figure represents

            00001000_10010110_00000001_......
msb 0       ^||||||| |        |
key number   ^^^^||| |        |
value type       ^^^ |        |
msb 1                ^        |
msb 0                         ^
drop msb              0010110  0000001
reverse               0000001  0010110
concatenate            000000_10010110
part        +-key--+ +----value------+
kv pair     +------------------------+
kv pair                                +--------+
message     +-----------------------------------+

You can explain

  • 00001000_10010110_00000001 is parsed to %{1 => 150}
defmodule MsbBinary do
  def scan(binary) when is_binary(binary) do
    do_scan(<<>>, binary)
  end
  defp do_scan(progress <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
    {<<rest_7bits::bitstring, progress::bitstring>>, rest_binary}
  end
  defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
    do_scan(<<rest_7bits::bitstring, progress::bitstring>>, rest_binary)
  end
end
defmodule Varint do
  def as_int32(b) when is_bitstring(b) do
    size = bit_size(b)
    <<i::integer-size(size)>> = b
    i
  end
end
defmodule Key do
  @wire_type_size 3
  @wire_types %{
    0 => :varint,
    1 => :"64bit",
    2 => :length_delimited,
    3 => :start_group,
    4 => :end_group,
    5 => :"32bit"
  }
  def parse(b) when is_bitstring(b) do
    total_size = bit_size(b)
    key_size = total_size - @wire_type_size
    <<key_no::size(key_size), wire_type_no::size(@wire_type_size)>> = b
    {key_no, @wire_types[wire_type_no]}
  end
end
binary = :binary.encode_unsigned(0b00001000_10010110_00000001)
{key_part, rest} = MsbBinary.scan(binary)
{key_no, wire_type} = Key.parse(key_part) # => {1, :varint}
{value_part, _} = MsbBinary.scan(rest)
value = Varint.as_int32(value_part) # => 150

Appendix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment