Skip to content

Instantly share code, notes, and snippets.

@ghalusa
Created May 16, 2013 11:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ghalusa/5591124 to your computer and use it in GitHub Desktop.
Save ghalusa/5591124 to your computer and use it in GitHub Desktop.
Parsing anchors in rendered html with php
<?php
$url = "http://www.imdb.com/movies-in-theaters/";
$input = @file_get_contents($url) or die("Could not access file: $url");
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches)) {
// $matches[2] = array of link addresses
// $matches[3] = array of link text - including HTML code
echo "<pre>";
var_dump($matches[2]);
echo "</pre>";
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment