Skip to content

Instantly share code, notes, and snippets.

@ramsey
Created November 16, 2011 20:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ramsey/1371181 to your computer and use it in GitHub Desktop.
Save ramsey/1371181 to your computer and use it in GitHub Desktop.
xmlentities() implemented in PHP (provides functionality similar to htmlentities())
<?php
function xmlentities($s)
{
static $patterns = null;
static $replacements = null;
static $translation = null;
if ($translation === null) {
$translation = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
foreach ($translation as $k => $v) {
$patterns[] = "/$v/";
$replacements[] = '&#' . ord($k) . ';';
}
}
return preg_replace($patterns, $replacements, htmlentities($s, ENT_QUOTES, 'UTF-8'));
}
@wez
Copy link

wez commented Nov 18, 2011

Just happened to notice this; since it's likely that you're calling this more than once per page load, consider caching the table building:

function xmlentities($s) {
    static $patterns = null;
    static $reps = null;
    static $tbl = null;
    if ($tbl === null) {
        $tbl = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
        foreach ($tbl as $k => $v) {
            $patterns[] = "/$v/";
            $reps[] = '&#' . ord($k) . ';'
        }
   }
  return preg_replace($patterns, $reps, htmlentities($s, ENT_QUOTES, 'UTF-8'));
}

also: this seems rather expensive; adds 101 regex replaces per call

@ramsey
Copy link
Author

ramsey commented Nov 18, 2011

Thanks for the pointers, Wez!

I hadn't thought about the table needing to be built each time, but I was aware of the number of regex replaces per call. This was some source I had sitting around on my blog from a long time ago, and I was just moving it here in preparation for updating my blog.

I'll update the function to include the static variables to save on table creation, but do you have any recommendations on making this a less expensive call?

@wez
Copy link

wez commented Nov 19, 2011

I'd try to avoid needing to do this in the first place; htmlspecialchars encodes just the characters that are special to XML and HTML. If you have UTF-8 text, I would just declare the XML doc with the right charset/encoding attribute in the <?xml tag and then use the natural UTF-8 text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment