{{ message }}

Instantly share code, notes, and snippets.

# niku/slide.org

Last active Jun 1, 2019
Slide for Erlang & Elixir Fest 2019 -- https://elixir-fest.jp/

# Agenda

• How to parse a binary `00001000_10010110_00000001` to `%{1 => 150}`
• How to do by using Elixir

# Me

• https://github.com/niku
• Living in Sapporo, Hokkaido
• Work at Farmnote
• A Software and Hardware company to solve agriculutual issues particularly hard management
• Enjoy sapporo-beam
• A community to talk about ErlangVM
• Almost every Thursday
• Since 2014

# BEAM bitstring and binary basis

## Bitstring and binary in BEAM

descriptionrepresentationa kind of bitstring?a kind of binary?
bitstringa sequence of zero or more bits`<<10::size(4)>>`YesNo
binaryA bitstring its size is divisible by 8`<<30>>`**Yes**Yes

## Manupirations

TypeHow toResult
Making a binary as you readbinary**=:binary.encode_unsigned(0b10101100_00000010)=**`<<172, 2>>`
bitstring**=<<0::1, 1::1, 0::1, 1::1>>=**`<<5::size(4)>>`
Pattern matchingbinary**=<<x, rest::binary>>=** = `<<1, 2, 3>>``x` is `1`, `rest` is `<<2, 3>>`
bitstring**=<<x::1, rest::bitstring>>=** = `<<1, 2, 3>>``x` is `0`, `rest` is `<<2, 4, 3::size(7)>>`
Concatenationbinary**=<<1>> <> <<127, 233>>=**`<<1, 127, 233>>`
bitstring`a` = `<<1::1, 0::1>>;` `b` = `<<0::1, 1::1>>;` **=<<a::bitstring, b::bitstring>>=**`<<9::size(4)>>`
Conversion to integerbinary**=:binary.decode_unsigned(<<1, 255>>)=**511
bitstring`a` = `<<1::1, 0::1>>;` `size` = `bit_size(a);` `<<x::integer-size(size)>>` = `a;` **=x=**2

# ProtocolBuffers(PB)

• A mechanism for serializing structured data like XML
• simpler
• 3 to 10 times smaller
• 20 to 100 times faster
• By default gRPC uses protocol buffers

## Overview

```StructuredData:
%MyMessage{a: 150}

Proto:
message MyMessage {
optional int32 a = 1;
}

Binary:
00001000_10010110_00000001

[Serialize]              [Deserialize]

StructuredData            Binary
\__Binary         \__StructuredData
/                 /
+-----+           +-----+
|Proto|           |Proto|
+-----+           +-----+
```

## Binary

```            00001000_10010110_00000001_......
msb 0       ^||||||| |        |
key number   ^^^^||| |        |
value type       ^^^ |        |
msb 1                ^        |
msb 0                         ^
drop msb              0010110  0000001
reverse               0000001  0010110
concatenate            000000_10010110
part        +-key--+ +----value------+
kv pair     +------------------------+
kv pair                                +--------+
message     +-----------------------------------+
```

# Value part

```                     00001000_10010110_00000001_..........
msb 0       ^||||||| |        |
key number   ^^^^||| |        |
+---+    value type       ^^^ |        |
| H | => msb 1                ^        |
| E | => msb 0                         ^
|   | => drop msb              0010110  0000001
| R | => reverse               0000001  0010110
| E | => concatenate            000000_10010110
+---+    part        +-key--+ +----value------+
kv pair     +------------------------+
kv pair                                +--------+
message     +-----------------------------------+
```

## Specification

### Varint

• A type of a PB’s value
• Represents an integer
• Has variable-length (1 ~ 16bytes)

### MSB(Most Significant Bit)

• To divide to a varint chunk
• If msb is `1`, next byte is also a part of varint
• If msb is `0`, next byte is the last byte of varint
```                                                      10010110_00000001_...
|        |
1. MSB is 1. So next byte is also a part of varint   -+        |
2. MSB is 0. So this byte is the last byte of varint ----------+

Varint chunk:
10010110 00000001
```
```                                                      10010110_10000001_00000011_...
|        |        |
1. MSB is 1. So next byte is also a part of varint   -+        |        |
2. MSB is 1. So next byte is also a part of varint   ----------+        |
3. MSB is 0. So this byte is the last byte of varint -------------------+

Varint chunk:
10010110 10000001 00000011
```

### Converting varint binary to integer

```10010110_10000001_00000011_..........

10010110_10000001_00000011 1. Divide to a varint chunk by using msb
0010110  0000001  0000011 2. Drop msbs
0000011  0000001  0010110 3. Reverse order by each 7bit
00000_11000000_10010110 4. Concatenate bitsrings
||       |  | ||  5. Cast as an integer
||       |  | ||
||       |  | |+ 2
||       |  | +- 4
||       |  +--- 16
||       +------ 128
|+-------------- 16384
+--------------- 32768
32768 + 16384 + 128 + 16 + 4 + 2 = 49302
```

## Elixir

### Dividing to a varint chunk - Pattern matching is awesome

```b = :binary.encode_unsigned(0b10010110_10000001_00000011)
<<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>> = b

msb # => 1
rest_7bits # => <<22::size(7)>>
rest_binary # => <<129, 3>>```
```10010110_10000001_00000011
+                          msb         1
+-----+                   rest_7bits  0010110
+---------------+ rest_binary 10000001_00000011
```
```defmodule MsbBinary do
def scan(<<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) do
IO.inspect(msb: msb, rest_7bits: rest_7bits, rest_binary: rest_binary)
end
end
binary = :binary.encode_unsigned(0b10010110_10000001_00000011)
MsbBinary.scan(binary)```
`[msb: 1, rest_7bits: <<22::size(7)>>, rest_binary: <<129, 3>>]`

### Dividing to a varint chunk - Recursion is also awesome

```defmodule MsbBinary do
def scan(<<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
IO.inspect(msb: 0, rest_7bits: rest_7bits, rest_binary: rest_binary)
end
def scan(<<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
IO.inspect(msb: 1, rest_7bits: rest_7bits, rest_binary: rest_binary)
scan(rest_binary) # Do recursive
end
end
binary = :binary.encode_unsigned(0b10010110_10000001_00000011)
MsbBinary.scan(binary)```
```[msb: 1, rest_7bits: <<22::size(7)>>, rest_binary: <<129, 3>>]
[msb: 1, rest_7bits: <<1::size(7)>>,  rest_binary: <<3>>]
[msb: 0, rest_7bits: <<3::size(7)>>,  rest_binary: ""]```

### Scanning varint

```10010110_10000001_00000011_..........

10010110_10000001_00000011 1. Divide to a varint chunk by using msb
0010110  0000001  0000011 2. Drop msbs
0000011  0000001  0010110 3. Reverse order by each 7bit
00000_11000000_10010110 4. Concatenate bitsrings
5. Cast as an integer
```
```defmodule MsbBinary do
def scan(binary) when is_binary(binary) do
do_scan(<<>>, binary)
end
defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
IO.inspect(progress: progress, msb: 0, rest_7bits: rest_7bits, rest_binary: rest_binary)
{<<rest_7bits::bitstring, progress::bitstring>>, rest_binary}
end
defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
IO.inspect(progress: progress, msb: 1, rest_7bits: rest_7bits, rest_binary: rest_binary)
do_scan(<<rest_7bits::bitstring, progress::bitstring>>, rest_binary)
end
end
binary = :binary.encode_unsigned(0b10010110_10000001_00000011)
MsbBinary.scan(binary)```
```[progress: "",                 msb: 1, rest_7bits: <<22::size(7)>>, rest_binary: <<129, 3, 5, 64>>]
[progress: <<22::size(7)>>,    msb: 1, rest_7bits: <<1::size(7)>>,  rest_binary: <<3, 5, 64>>]
[progress: <<2, 22::size(6)>>, msb: 0, rest_7bits: <<3::size(7)>>,  rest_binary: <<5, 64>>]
{<<6, 4, 22::size(5)>>, <<5, 64>>} # Result```

### Converting varint

```defmodule MsbBinary do
def scan(binary) when is_binary(binary) do
do_scan(<<>>, binary)
end
defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
{<<rest_7bits::bitstring, progress::bitstring>>, rest_binary}
end
defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
do_scan(<<rest_7bits::bitstring, progress::bitstring>>, rest_binary)
end
end

defmodule Varint do
def as_int32(b) when is_bitstring(b) do
size = bit_size(b)
<<i::integer-size(size)>> = b
i
end
end

binary = :binary.encode_unsigned(0b10010110_00000001)
{chunk, _rest} = MsbBinary.scan(binary) # => {<<2, 22::size(6)>>, ""}
Varint.as_int32(chunk) # => 150```

## You’ve been able to understand the part of figure

```                     00001000_10010110_00000001_..........
msb 0       ^||||||| |        |
key number   ^^^^||| |        |
+---+    value type       ^^^ |        |
| H | => msb 1                ^        |
| E | => msb 0                         ^
|   | => drop msb              0010110  0000001
| R | => reverse               0000001  0010110
| E | => concatenate            000000_10010110
+---+    part        +-key--+ +----value------+
kv pair     +------------------------+
kv pair                                +--------+
message     +-----------------------------------+
```

# Key part

```+---+                00001000_10010110_00000001_..........
| H | => msb 0       ^||||||| |        |
| E | => key number   ^^^^||| |        |
| R | => value type       ^^^ |        |
| E |    msb 1                ^        |
+---+    msb 0                         ^
drop msb              0010110  0000001
reverse               0000001  0010110
concatenate            000000_10010110
part        +-key--+ +----value------+
kv pair     +------------------------+
kv pair                                +--------+
message     +-----------------------------------+
```

## Specification

### Scaning key part

• Same as varint
• Taking while msb is `1`
• The msb which is the last byte of key part is `0`
• Parsing value is a little bit different

### Key part has two information

```message MyMessage {
optional int32 a = 1;
}```
• A type of the value
• an integer between from 0 to 5
• In the example, the type of value which is bind by key a is int32 due to `int32 a`
• A value of the key
• an integer greater than `0`
• In the example, when key `a` is serialized, it represented by 1 due to `a = 1`

### Getting a type of the value

• The last 3 bits of key part
• number 3 and 4 are depricated
numberMeaningUsed For
0Varintint32, int64, uint32, uint64, sint32, sint64, bool, enum
164-bitfixed64, sfixed64, double
2Length-delimitedstring, bytes, embedded messages, packed repeated fields
532-bitfixed32, sfixed32, float

### Getting a value of the key

• The last 3 bits are “type of the value”
• Parsing as a varint except above

### Parsing key part

```            00001000_10010110_........
msb 0       ^|||||||
key number   ^^^^|||
value type       ^^^
part        +-key--+ +----value-------
```
typebitstringintegermeaning
A value of the key00011The Value of the key is 1
A type of the value0000Type of the value is Varint

## Elixir

```defmodule MsbBinary do
def scan(binary) when is_binary(binary) do
do_scan(<<>>, binary)
end
defp do_scan(progress <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
{<<rest_7bits::bitstring, progress::bitstring>>, rest_binary}
end
defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
do_scan(<<rest_7bits::bitstring, progress::bitstring>>, rest_binary)
end
end

defmodule Key do
@wire_type_size 3
@wire_types %{
0 => :varint,
1 => :"64bit",
2 => :length_delimited,
3 => :start_group,
4 => :end_group,
5 => :"32bit"
}
def parse(b) when is_bitstring(b) do
total_size = bit_size(b)
key_size = total_size - @wire_type_size
<<key_no::size(key_size), wire_type_no::size(@wire_type_size)>> = b
{key_no, @wire_types[wire_type_no]}
end
end

binary = :binary.encode_unsigned(0b00001000_10010110_00000001)
{chunk, _rest} = MsbBinary.scan(binary) # => {<<8::size(7)>>, <<150, 1>>}
Key.parse(chunk) # => {1, :varint}```

## You’ve been able to understand the part of figure

```+---+                00001000_10010110_00000001_..........
| H | => msb 0       ^||||||| |        |
| E | => key number   ^^^^||| |        |
| R | => value type       ^^^ |        |
| E |    msb 1                ^        |
+---+    msb 0                         ^
drop msb              0010110  0000001
reverse               0000001  0010110
concatenate            000000_10010110
part        +-key--+ +----value------+
kv pair     +------------------------+
kv pair                                +--------+
message     +-----------------------------------+
```

# Conclusion

## You know BEAM basis about bitstring and binary

• Difference between bitstring and binary
• Make a binary as you read
• Bitstring pattern matching
• Concatenate bitstrings
• Convert bitstring to integer

## You almost know what this figure represents

```            00001000_10010110_00000001_......
msb 0       ^||||||| |        |
key number   ^^^^||| |        |
value type       ^^^ |        |
msb 1                ^        |
msb 0                         ^
drop msb              0010110  0000001
reverse               0000001  0010110
concatenate            000000_10010110
part        +-key--+ +----value------+
kv pair     +------------------------+
kv pair                                +--------+
message     +-----------------------------------+
```

## You can explain

• `00001000_10010110_00000001` is parsed to `%{1 => 150}`
```defmodule MsbBinary do
def scan(binary) when is_binary(binary) do
do_scan(<<>>, binary)
end
defp do_scan(progress <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 0 do
{<<rest_7bits::bitstring, progress::bitstring>>, rest_binary}
end
defp do_scan(progress, <<msb::1, rest_7bits::bitstring-(7), rest_binary::binary>>) when msb == 1 do
do_scan(<<rest_7bits::bitstring, progress::bitstring>>, rest_binary)
end
end
defmodule Varint do
def as_int32(b) when is_bitstring(b) do
size = bit_size(b)
<<i::integer-size(size)>> = b
i
end
end
defmodule Key do
@wire_type_size 3
@wire_types %{
0 => :varint,
1 => :"64bit",
2 => :length_delimited,
3 => :start_group,
4 => :end_group,
5 => :"32bit"
}
def parse(b) when is_bitstring(b) do
total_size = bit_size(b)
key_size = total_size - @wire_type_size
<<key_no::size(key_size), wire_type_no::size(@wire_type_size)>> = b
{key_no, @wire_types[wire_type_no]}
end
end
binary = :binary.encode_unsigned(0b00001000_10010110_00000001)
{key_part, rest} = MsbBinary.scan(binary)
{key_no, wire_type} = Key.parse(key_part) # => {1, :varint}
{value_part, _} = MsbBinary.scan(rest)
value = Varint.as_int32(value_part) # => 150```