Skip to content

Instantly share code, notes, and snippets.

@jakzal
Last active January 22, 2024 13:18
Show Gist options
  • Star 21 You must be signed in to star a gist
  • Fork 11 You must be signed in to fork a gist
  • Save jakzal/8dd52d3df9a49c1e5922 to your computer and use it in GitHub Desktop.
Save jakzal/8dd52d3df9a49c1e5922 to your computer and use it in GitHub Desktop.
Removing nodes with DomCrawler
<?php
<<<CONFIG
packages:
- "symfony/dom-crawler: ~2.3"
- "symfony/css-selector: ~2.3"
CONFIG;
use Symfony\Component\DomCrawler\Crawler;
$html = <<<HTML
<html>
<div class="content">
<h2 class="gamma">Excerpt</h2>
<p>...content html...</p>
</div>
<div class="content">
<h2 class="gamma">Excerpt</h2>
<p>...more content html...</p>
</div>
</html>
HTML;
$crawler = new Crawler($html, 'http://localhost');
// remove all h2 nodes inside .content
$crawler->filter('html .content h2')->each(function (Crawler $crawler) {
foreach ($crawler as $node) {
$node->parentNode->removeChild($node);
}
});
// output .content nodes with h2 removed
$crawler->filter('html .content')->each(function (Crawler $crawler) {
echo $crawler->html();
});
@jakzal
Copy link
Author

jakzal commented Apr 1, 2015

Run this with melody:

melody run https://gist.github.com/jakzal/8dd52d3df9a49c1e5922

@howtomakeaturn
Copy link

this helps me today. thanks!

@Verron
Copy link

Verron commented Jul 19, 2016

Thanks for this. Awesome example.

@Insolita
Copy link

+1 Thanks

@yog-strina
Copy link

Thank you, man with a glorious beer

@Exadra37
Copy link

Exadra37 commented Aug 1, 2017

👍

@NinoSkopac
Copy link

much appreciated

@NinoSkopac
Copy link

Since each() gets one node at a time, this has the same effect:

$crawler->filter('html .content h2')->each(function (Crawler $crawler) {
    $node = $crawler->getNode(0);
    $node->parentNode->removeChild($node);
});

IMO this is more readable

@broiniac
Copy link

broiniac commented Oct 29, 2021

Since each() gets one node at a time, this has the same effect

👍 Nice catch

@ajmeese7
Copy link

Since each() gets one node at a time, this has the same effect:

$crawler->filter('html .content h2')->each(function (Crawler $crawler) {
    $node = $crawler->getNode(0);
    $node->parentNode->removeChild($node);
});

IMO this is more readable

@NinoSkopac I am aware that it has been several years since your message, but for anyone stumbling across this thread as I have, be aware that you can't use this method in WebDriver mode now. You get the following error:

Uncaught InvalidArgumentException: The "getNode" method cannot be used in WebDriver mode. Use "getElement" instead.

@eduance
Copy link

eduance commented Jan 22, 2024

You're amazing thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment