Skip to content

Instantly share code, notes, and snippets.

@hamletbatista
Created February 27, 2019 21:51
Show Gist options
  • Save hamletbatista/110fddcd4a1c3f9fb52ef57448485abc to your computer and use it in GitHub Desktop.
Save hamletbatista/110fddcd4a1c3f9fb52ef57448485abc to your computer and use it in GitHub Desktop.
Read URLs from XML Sitemap
sitemaps = {}
for (sitemap_url, lasmod) in sitemap_index.items():
if(sitemap_url.find("post") > 0):
print(sitemap_url)
if 1: # for testing
r = requests.get(sitemap_url)
xml = r.text
soup = BeautifulSoup(xml)
URLTags = soup.find_all("url")
print("The number of URLs are {0}".format(len(URLTags)))
for URL in URLTags:
sitemaps[URL.findNext("loc").text] = URL.findNext("lastmod").text
#print(xml_sitemap) #for testing
#break #for testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment