Skip to content

Instantly share code, notes, and snippets.

@hamletbatista
Created February 27, 2019 21:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hamletbatista/9836b5a05f6699b2255347810c964d7c to your computer and use it in GitHub Desktop.
Save hamletbatista/9836b5a05f6699b2255347810c964d7c to your computer and use it in GitHub Desktop.
Read Sitemap URLs from XML Sitemap Index
sitemap_index_url="https://www.searchenginejournal.com/sitemap_index.xml"
from bs4 import BeautifulSoup
import requests
sitemap_index = {}
r = requests.get(sitemap_index_url)
xml = r.text
soup = BeautifulSoup(xml)
sitemapTags = soup.find_all("sitemap")
print("The number of sitemaps are {0}".format(len(sitemapTags)))
for sitemap in sitemapTags:
sitemap_index[sitemap.findNext("loc").text] = sitemap.findNext("lastmod").text
print(sitemap_index)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment