Skip to content

Instantly share code, notes, and snippets.

@widelec-BB
Created September 7, 2018 11:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save widelec-BB/b54526f34c749e516913c06b6975a103 to your computer and use it in GitHub Desktop.
Save widelec-BB/b54526f34c749e516913c06b6975a103 to your computer and use it in GitHub Desktop.
Gets the full urls form directory index page generated by http server.
#!/usr/bin/php
<?php
/*
** Gets the full urls form directory index page
** generated by http server.
** Usage: ./get_full_urls.php url_to_directory_listing
*/
if(count($argv) < 2 || !$argv[1]) {
die("usage: {$argv[0]} url\n");
}
$html = file_get_contents($argv[1]);
if ($html === false) {
die("Failed to fetch given url\n");
}
$argv[1] = rtrim($argv[1], '/');
$argv[1] .= '/';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
$url = $nodeList->item($i)->value;
if (in_array($url, array('..', '.', '../', './', '/../', '/./'))) {
continue;
}
echo $argv[1].$url."\n";
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment