Skip to content

Instantly share code, notes, and snippets.

@asika32764
Created August 5, 2014 02:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save asika32764/7f844cf496e946f613a2 to your computer and use it in GitHub Desktop.
Save asika32764/7f844cf496e946f613a2 to your computer and use it in GitHub Desktop.
<?php
function ordutf8($string, &$offset) {
$code = ord(substr($string, $offset,1));
if ($code >= 128) { //otherwise 0xxxxxxx
if ($code < 224) $bytesnumber = 2; //110xxxxx
else if ($code < 240) $bytesnumber = 3; //1110xxxx
else if ($code < 248) $bytesnumber = 4; //11110xxx
$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
for ($i = 2; $i <= $bytesnumber; $i++) {
$offset ++;
$code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx
$codetemp = $codetemp*64 + $code2;
}
$code = $codetemp;
}
$offset += 1;
if ($offset >= strlen($string)) $offset = -1;
return $code;
}
// $offset is a reference, as it is not easy to split a utf-8 char-by-char. Useful to iterate on a string:
$text = "abcàê߀abc";
$offset = 0;
while ($offset >= 0) {
echo $offset.": ".ordutf8($text, $offset)."\n";
}
/* returns:
0: 97
1: 98
2: 99
3: 224
5: 234
7: 223
9: 8364
12: 97
13: 98
14: 99
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment