Created
November 16, 2011 20:04
-
-
Save ramsey/1371181 to your computer and use it in GitHub Desktop.
xmlentities() implemented in PHP (provides functionality similar to htmlentities())
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
function xmlentities($s) | |
{ | |
static $patterns = null; | |
static $replacements = null; | |
static $translation = null; | |
if ($translation === null) { | |
$translation = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES); | |
foreach ($translation as $k => $v) { | |
$patterns[] = "/$v/"; | |
$replacements[] = '&#' . ord($k) . ';'; | |
} | |
} | |
return preg_replace($patterns, $replacements, htmlentities($s, ENT_QUOTES, 'UTF-8')); | |
} |
Thanks for the pointers, Wez!
I hadn't thought about the table needing to be built each time, but I was aware of the number of regex replaces per call. This was some source I had sitting around on my blog from a long time ago, and I was just moving it here in preparation for updating my blog.
I'll update the function to include the static variables to save on table creation, but do you have any recommendations on making this a less expensive call?
I'd try to avoid needing to do this in the first place; htmlspecialchars encodes just the characters that are special to XML and HTML. If you have UTF-8 text, I would just declare the XML doc with the right charset/encoding attribute in the <?xml tag and then use the natural UTF-8 text.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Just happened to notice this; since it's likely that you're calling this more than once per page load, consider caching the table building:
also: this seems rather expensive; adds 101 regex replaces per call