Skip to content

Instantly share code, notes, and snippets.

@jsvisa
Created May 20, 2016 12:19
Show Gist options
  • Save jsvisa/916ecc418487b17d203df6ff98e6bcae to your computer and use it in GitHub Desktop.
Save jsvisa/916ecc418487b17d203df6ff98e6bcae to your computer and use it in GitHub Desktop.
Bad UTF8 in Elixir
defmodule Check do
def is_utf8?(""), do: true
def is_utf8?(<< char :: size(8), rest :: binary >>) when char <= 0x7f, do: is_utf8?(rest)
def is_utf8?(<< f :: size(8), s :: size(8), rest :: binary >>)
when f >= 0xc0 and f <= 0xdf and # 110* ****
s >= 0x80 and s <= 0xbf do # 10** ****
is_utf8?(rest)
end
def is_utf8?(<< f :: size(8), s :: size(8), t :: size(8), rest :: binary >>)
when f >= 0xe0 and f <= 0xef and # 1110 ****
s >= 0x80 and s <= 0xbf and # 10** ****
t >= 0x80 and t <= 0xbf do # 10** ****
is_utf8?(rest)
end
def is_utf8?(_rest), do: false
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment