Skip to content

Instantly share code, notes, and snippets.

@loilo
Last active July 19, 2023 02:50
Show Gist options
  • Save loilo/9cffb1af3fa554976d2d884d761250f0 to your computer and use it in GitHub Desktop.
Save loilo/9cffb1af3fa554976d2d884d761250f0 to your computer and use it in GitHub Desktop.
Modify HTML Using PHP

Modify HTML Using PHP

Instead of relying on unsafe regular expressions and string manipulation, we can utilize PHP's built-in DOM extension for modifying HTML.

The manipulate_html() function from this snippet allows you to pass some HTML code, traverse & modify each of its DOM nodes in a callback and will return to you the modified HTML code.

The following example modifies all images in an HTML snippet to use lazy loading:

manipulate_html('<img src="foo.jpg">', function (DOMNode $node) {
    if ($node->nodeName === 'img') {
        $node->setAttribute('loading', 'lazy');
    }
});

// Returns '<img src="foo.jpg" loading="lazy">'

This is just a single element, but you can test this snippet with any website:

manipulate_html(
    file_get_contents('https://www.php.net/'),
    function (DOMNode $node) {
        // ...
    }
);
<?php
function walk_dom(DOMNode $domNode, callable $callback): void
{
foreach ($domNode->childNodes as $node) {
$callback($node);
if ($node->hasChildNodes()) {
walk_dom($node, $callback);
}
}
}
function manipulate_html(string $html, callable $callback): string
{
$dom = new DOMDocument();
// Don't spread warnings when encountering malformed HTML
$previousXmlErrorBehavior = libxml_use_internal_errors(true);
// Use XML processing instruction to properly interpret document as UTF-8
@$dom->loadHTML(
'<?xml encoding="utf-8" ?>' . $html,
LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD
);
foreach ($dom->childNodes as $item) {
if ($item->nodeType === XML_PI_NODE) {
$dom->removeChild($item);
}
}
$dom->encoding = 'UTF-8';
walk_dom($dom, $callback);
// Turn DOM back into HTML and remove leading/trailing whitespace
$result = trim($dom->saveHTML());
// Restore previous XML error behavior
libxml_use_internal_errors($previousXmlErrorBehavior);
return $result;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment