Skip to content

Instantly share code, notes, and snippets.

@kyletaylored
Created April 19, 2019 14:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kyletaylored/9aa642b5a4cb24259819ac4710876545 to your computer and use it in GitHub Desktop.
Save kyletaylored/9aa642b5a4cb24259819ac4710876545 to your computer and use it in GitHub Desktop.
Sitemap XML De-duplicator
<?php
$xml=simplexml_load_file("sitemap.xml") or die("Error: Cannot create object");
$storage = [];
$xmltxt = fopen("xml.txt", "w");
$count = $skipped = 0;
// Normalize and remove duplicates.
foreach ($xml as $sxe) {
$count++;
$loc = strtolower($sxe->loc);
$parse = parse_url($loc);
$path = base64_encode($parse['path']);
print "Processing $count: " . $parse['path'] . PHP_EOL;
if (empty($storage[$path])) {
$storage[$path] = $path;
fwrite($xmltxt, $sxe->asXML() . PHP_EOL);
} else {
$skipped++;
}
}
fclose($xmltxt);
$total = round($skipped / $count) * 100;
print "Skipped $skipped ($total%)";
@kyletaylored
Copy link
Author

Removes any duplicate entries from a sitemap.xml file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment