Skip to content

Instantly share code, notes, and snippets.

@GromNaN
Created December 1, 2011 22:54
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save GromNaN/1420501 to your computer and use it in GitHub Desktop.
Save GromNaN/1420501 to your computer and use it in GitHub Desktop.
How to detect UTF-8 string in PHP
<?php
function isUtf8($string)
{
return preg_match('%(?:'
. '[\xC2-\xDF][\x80-\xBF]' // non-overlong 2-byte
. '|\xE0[\xA0-\xBF][\x80-\xBF]' // excluding overlongs
. '|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}' // straight 3-byte
. '|\xED[\x80-\x9F][\x80-\xBF]' // excluding surrogates
. '|\xF0[\x90-\xBF][\x80-\xBF]{2}' // planes 1-3
. '|[\xF1-\xF3][\x80-\xBF]{3}' // planes 4-15
. '|\xF4[\x80-\x8F][\x80-\xBF]{2}' // plane 16
. ')+%xs', $string);
}
@cay89
Copy link

cay89 commented Apr 25, 2018

It works perfectly, thanks!

@wloges
Copy link

wloges commented Sep 13, 2019

Not always prefectly. Try isUtf8('SPÓŁDZIELNIA')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment