Skip to content

Instantly share code, notes, and snippets.

@ericstone57
Created December 4, 2014 03:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ericstone57/25ec83de97d5468810a5 to your computer and use it in GitHub Desktop.
Save ericstone57/25ec83de97d5468810a5 to your computer and use it in GitHub Desktop.
sanitize utf8mb4 characters
/**
* sanitize utf8mb4 characters
*/
function _utf8_4byte_to_3byte($input) {
if (!empty($input)) {
$utf8_2byte = 0xC0 /*1100 0000*/; $utf8_2byte_bmask = 0xE0 /*1110 0000*/;
$utf8_3byte = 0xE0 /*1110 0000*/; $utf8_3byte_bmask = 0XF0 /*1111 0000*/;
$utf8_4byte = 0xF0 /*1111 0000*/; $utf8_4byte_bmask = 0xF8 /*1111 1000*/;
$sanitized = "";
$len = strlen($input);
for ($i = 0; $i < $len; ++$i) {
$mb_char = $input[$i]; // Potentially a multibyte sequence
$byte = ord($mb_char);
if (($byte & $utf8_2byte_bmask) == $utf8_2byte) {
$mb_char .= $input[++$i];
}
else if (($byte & $utf8_3byte_bmask) == $utf8_3byte) {
$mb_char .= $input[++$i];
$mb_char .= $input[++$i];
}
else if (($byte & $utf8_4byte_bmask) == $utf8_4byte) {
// Replace with ? to avoid MySQL exception
//$mb_char = '?';
$mb_char = ' ';
$i += 3;
}
$sanitized .= $mb_char;
}
$input= $sanitized;
}
return $input;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment