Skip to content

Instantly share code, notes, and snippets.

@pixelsoul
Created February 12, 2019 04:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pixelsoul/fe06f5604f1b7f1f43f439e183eb97d6 to your computer and use it in GitHub Desktop.
Save pixelsoul/fe06f5604f1b7f1f43f439e183eb97d6 to your computer and use it in GitHub Desktop.
Parse robots.txt file for sitemaps
<?php
function robotsFile($url){
$file = file_get_contents($url);
$pattern = '/Sitemap: ([^\s]+)/';
preg_match_all($pattern, $file, $match);
$results = array();
foreach($match[1] as $key => $url){
$results[] = $url;
}
return $results;
}
print_r(robotsFile("https://www.cnn.com/robots.txt"));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment