Skip to content

Instantly share code, notes, and snippets.

@aadl
Created December 12, 2009 17:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save aadl/254973 to your computer and use it in GitHub Desktop.
Save aadl/254973 to your computer and use it in GitHub Desktop.
<?php
/*
Changes III's unicode brackets into encoded unicode characters.
We have found this more reliable than using the pre-encoded values from the interface. The XRecord will give the bracket output.
This presumes you have your database in unicode in iii. iii can convert your database to unicode for you. call the helpdesk
If you see codes like {231} without the 'u' then you haven't converted or are entering them in old format
Example Input: al-I{u02BB}tir{u0101}f{u0101}t
Example Output: Al-Iʻtirāfāt
*/
$matches = array();
$string = "al-I{u02BB}tir{u0101}f{u0101}t"; // our example string
print "Input: $string\n";
preg_match_all('/\{u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]\}/', $string, $matches); //find all the {} codes
foreach ($matches[0] as $match_string) {
$code = hexdec($match_string); // convert to decimal
$character = html_entity_decode("&#$code;", ENT_NOQUOTES, 'UTF-8'); // decode decimal into utf8 char
$string = str_replace($match_string, $character, $string); // replace the code with the utf8 char
}
print "Output: $string\n";
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment