Skip to content

Instantly share code, notes, and snippets.

@makmac213
Created March 10, 2016 18:38
Show Gist options
  • Save makmac213/ca5956a5dc142b149b36 to your computer and use it in GitHub Desktop.
Save makmac213/ca5956a5dc142b149b36 to your computer and use it in GitHub Desktop.
Sitemap Status Crawler
import requests
from BeautifulSoup import BeautifulSoup
resp = requests.get('http://www.ofwguru.com/sitemap.xml')
soup = BeautifulSoup(resp.content)
urls = soup.findAll('url')
for url in urls:
loc = url.find('loc').string
resp = requests.get(loc)
print loc, resp.status_code
# log urls that are not 200 status
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment