Skip to content

Instantly share code, notes, and snippets.

@franz-josef-kaiser
Last active August 29, 2015 14:13
Show Gist options
  • Save franz-josef-kaiser/b4651103168a56dbd4ed to your computer and use it in GitHub Desktop.
Save franz-josef-kaiser/b4651103168a56dbd4ed to your computer and use it in GitHub Desktop.
Fetch remote file, get all Links using PHPs \DOMDocument
<?php
// @author Jack (Original Author) incl. error handling
// @link http://codegolf.stackexchange.com/a/44340/11940
// modify state
$libxml_previous_state = libxml_use_internal_errors( true );
$source = 'http://stroustrup.com/C++.html';
$dom = new \DOMDocument;
// Dump error suppression not needed here as LibXML errors are handled above
// @$dom->loadHTMLFile( $source );
$dom->loadHTMLFile( $source );
$anchors = $dom->getElementsByTagName('a');
foreach ( $anchors as $a ) {
if ( in_array( parse_url( $url = $a->getAttribute('href'), PHP_URL_SCHEME ), [ 'http', 'https', ], true ) ) {
echo "{$url}<br>";
}
}
// handle errors
libxml_clear_errors();
// restore errors to mess up global error handling
libxml_use_internal_errors( $libxml_previous_state );
@datibbaw
Copy link

Now that this is no longer in the context of code golf, please read this answer as well :)

@franz-josef-kaiser
Copy link
Author

@datibbaw Didn't see your comment in the notifications. Anyway: Thanks for the heads up :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment