Skip to content

Instantly share code, notes, and snippets.

@tyre
Last active August 29, 2015 14:22
Show Gist options
  • Save tyre/b960539d29b7e23b446b to your computer and use it in GitHub Desktop.
Save tyre/b960539d29b7e23b446b to your computer and use it in GitHub Desktop.

Binary/Bitstring Matching

Introduction

Binary matching is a powerful feature in Elixir that is useful for extracting information from binaries as well as pattern matching. This article serves as a short overview of the available options when pattern matching and demonstrates a few common usecases.

Uses

Binary matching can be used by itself to extract information from binaries:

iex> <<"Hello, ", place::binary>> = "Hello, World"
"Hello, World"
iex> place
"World"

Or as a part of function definitions to pattern match:

defmodule ImageTyper
  @png_signature <<137::size(8), 80::size(8), 78::size(8), 71::size(8),
                13::size(8), 10::size(8), 26::size(8), 10::size(8)>>
  @jpg_signature <<255::size(8), 216::size(8)>>
  
  def type(<<@png_signature, rest::binary>>), do: :png
  def type(<<@jpg_signature, rest::binary>>), do: :jpg
  def type(_), do :unknown
end

Types

There are 9 types used in binary matching:

integer float bits (alias for bitstring) bitstring binary bytes (alias for binary) utf8 utf16 utf32

When no type is specified, the default is integer.

Unit and Size

The length of the match is equal to the unit (a number of bits) times the size (the number of repeated segnments of length unit).

Type Default Unit
integer 1 bit
float 1 bit
binary 8 bits

Sizes for types are a bit more nuanced. The default size for integers is 8.

For floats, it is 64. For floats, size * unit must result in 32 or 64, corresponding to binary32 and binary64, respectively.

For binaries, the default is the size of the binary. Only the last binary in a binary match can use the default size. All others must have their size specified explicitly, even if the match is unambiguous.

For example:

iex> <<name::binary, " the ", species::binary>>= <<"Frank the Walrus">>
** (CompileError): a binary field without size is only allowed at the end of a binary pattern
iex> <<name::binary-size(5), " the ", species::binary>>= <<"Frank the Walrus">>
"Frank the Walrus"
iex> {name, species}
{"Frank", "Walrus"}

For floats, size * unit must result in 32 or 64, corresponding to binary32 and binary64, respectively.

Modifiers

Some types have associated modifiers to clear up ambiguity in byte representation. The following

Modifier Relevant Type(s)
signed integer
unsigned (default) integer
little integer, utf16, utf32
big (default) integer, utf16, utf32
native integer, utf16, utf32

Sign

Integers can be signed or unsigned, defaulting to unsigned.

iex> <<int::integer>> =  <<-100>>
<<156>>
iex> int
156
iex> <<int::integer-signed>> =  <<-100>>
<<156>>
iex> int
-100

Endianness

Elixir has three options for endianness: big, little, and native. The default is big. native is determined by the VM at startup.

iex> <<number::little-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
256
iex> <<number::big-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
1
iex> <<number::native-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
256```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment