Skip to content

Instantly share code, notes, and snippets.

@xeoncross
Last active November 8, 2017 15:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xeoncross/c621ec946bc69accab1a8396cef27f70 to your computer and use it in GitHub Desktop.
Save xeoncross/c621ec946bc69accab1a8396cef27f70 to your computer and use it in GitHub Desktop.
Sample of parsing an HTML fragment and using libxml + XPath in PHP
<?php
$html = '<div><img src="rabbit.jpg">';
libxml_use_internal_errors(true) and libxml_clear_errors();
$doc = new \DOMDocument;
$doc->loadHTML($html ?: '<html></html>');
$path = '//div';
$xp = new \DOMXPath($doc);
$nodes = $xp->query($path);
foreach ($nodes as $node) {
print "Removing: " . print_r($node, true) . "\n";
// $node->parentNode->removeChild($node);
}
// Save
$html = $doc->saveHTML();
print preg_replace('~<(?:!DOCTYPE|/?(?:html|body))[^>]*>\s*~i', '', $html);
Removing: DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] =>
[nextSibling] =>
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
)
<div><img src="rabbit.jpg"></div>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment