Skip to content

Instantly share code, notes, and snippets.

@hakre
Created April 10, 2012 14:05
Show Gist options
  • Save hakre/2351606 to your computer and use it in GitHub Desktop.
Save hakre/2351606 to your computer and use it in GitHub Desktop.
Unicode Codepoint to UTF-8
<?php
/**
* @see Unicode 6.0.0 Ch2 General Structure, rfc3629
* @param int|string $codepoint e.g. 0xC9 / "U+00C9"
* @return string
*/
function unicodeCodePointToUTF8($codepoint)
{
is_string($codepoint) && sscanf($codepoint, 'U+%x', $codepoint);
if ($codepoint < 0) {
throw new InvalidArgumentException('Lower than 0x00.');
}
if ($codepoint > 0x10FFFD) {
throw new InvalidArgumentException('Larger than 0x10FFFD.');
}
if (0xD800 <= $codepoint && $codepoint <= 0xDFFF) {
throw new InvalidArgumentException(sprintf('High and low surrogate halves are invalid unicode codepoints (U+D800 through U+DFFF, is U+%04X).', $codepoint));
}
if ($codepoint <= 0x7F) {
return chr($codepoint);
}
if ($codepoint <= 0x7FF) {
return chr(0xC0 | $codepoint >> 6 & 0x1F) . chr(0x80 | $codepoint & 0x3F);
}
if ($codepoint <= 0xFFFF) {
return chr(0xE0 | $codepoint >> 12 & 0xF) . chr(0x80 | $codepoint >> 6 & 0x3F) . chr(0x80 | $codepoint & 0x3F);
}
return chr(0xF0 | $codepoint >> 18 & 0x7) . chr(0x80 | $codepoint >> 12 & 0x3F) . chr(0x80 | $codepoint >> 6 & 0x3F) . chr(0x80 | $codepoint & 0x3F);
}
@mirhmousavi
Copy link

Good snippet, also you can use json_encode function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment