Skip to content

Instantly share code, notes, and snippets.

@marinarioagalliu
Last active January 28, 2021 16:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save marinarioagalliu/7f2c2c2a3999289c430be4348af2da8d to your computer and use it in GitHub Desktop.
Save marinarioagalliu/7f2c2c2a3999289c430be4348af2da8d to your computer and use it in GitHub Desktop.
$crawler = $client->request('GET', $urlToCrawl);
$crawler->filter('#rg292169')->each(function ($node) {
if ($node->filter('.LinkUrl')->count() || $node->filter('.PageLink_Sx')->count()) {
//echo "has link";
// getting all parset links with recursion
$node->filter('a')->each(function ($url) {
$url = $url->attr('href');
//if I make another use of filter here like:
node->filter('a > span')->each(function ($url) {
//is this recognized as a new request or no
});
if(!$this->linkExists($url) && strlen($url) > 1)
{
$this->linkArray[] = $url;
sleep(5);
$this->recordParseLinks($url);
}
});
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment