Skip to content

Instantly share code, notes, and snippets.

@muratpurc
Created September 22, 2011 10:30
Show Gist options
  • Save muratpurc/1234502 to your computer and use it in GitHub Desktop.
Save muratpurc/1234502 to your computer and use it in GitHub Desktop.
PHP: UTF-8 encoding detection
/**
* Checks, if passed string is utf-8 encoded or not.
*
* @param string $string String to check
* @return int Number of found UTF-8 character
*/
function mp_detectUTF8($string)
{
return preg_match('%(?:
[\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
|\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
|\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
|\xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
|[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
|\xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)+%xs', $string
);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment