Skip to content

Instantly share code, notes, and snippets.

@tinnet
Created January 10, 2013 11:11
Show Gist options
  • Save tinnet/4501305 to your computer and use it in GitHub Desktop.
Save tinnet/4501305 to your computer and use it in GitHub Desktop.
Small python (2.7) script to check .csv files full of urls for their current status code (for example to verify if you fixed the issues google webmaster tools is reporting)
from __future__ import print_function
import argparse
import csv
import requests
import sys
_EPILOG = """
Script takes a list of .csv files, tries to guess their format (seperator),
then checks for a field called 'URL', tries to fetch that url and prints
the response code back out (with the history of codes attached if there where
redirects)."""
def find_url(row):
if 'URL' in row:
return row['URL']
if 'url' in row:
return row['url']
if 'uri' in row:
return row['uri']
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Checks urls from .csv files for their HTTP reponse codes', epilog=_EPILOG)
parser.add_argument('files', type=str, metavar='CSVFILE', nargs='+')
args = parser.parse_args()
for file in args.files:
with open(file, 'rU') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
print('"URL";"STATUSCODE";"HISTORY"')
for row in csv.DictReader(csvfile,dialect=dialect):
r = requests.get(find_url(row))
print('"{}";{};"{}"'.format(find_url(row), r.status_code, [h.status_code for h in r.history]))
@tinnet
Copy link
Author

tinnet commented Jan 10, 2013

Small python (2.7) script to check .csv files full of urls for their current status code (for example to verify if you fixed the issues google webmaster tools is reporting)

Requirements

requests (http://docs.python-requests.org/en/latest/)

INPUT

Just an example, any file that python csv can read and that contains a 'URL' column is fine

URL;STATUSCODE;MESSAGE;DATE;CATEGORY
http://example.net/some/bad/request;400;;12/15/12;Other
http://example.net/some/missing/page;404;;12/15/12;Other

OUTPUT

"http://example.net/some/bad/request";400;"[]"
"http://example.net/some/missing/page";200;"[302,301]"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment