Skip to content

Instantly share code, notes, and snippets.

@StewartJohn
Created July 2, 2019 16:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save StewartJohn/af326231123f2fa226e85290cefca461 to your computer and use it in GitHub Desktop.
Save StewartJohn/af326231123f2fa226e85290cefca461 to your computer and use it in GitHub Desktop.
Check HTTP Status of a csv list of URLS
import urllib3
import os
import csv
rows = []
http=urllib3.PoolManager()
#load the csv list of URLs to check
with open('urls.csv', 'r') as input:
csv_reader = csv.reader(input)
for blog in csv_reader:
#check blogs in list for status
try:
#this get response will only retry a url 3 times, does not preload the site data, and records the status but does not load a redirect
resp = http.request('GET', blog[0], retries=3, preload_content=False, redirect=False)
line = [blog[0], resp.status]
rows.append(line)
resp.release_conn()
#if there's a domain not found error, it will be caught by this except
except:
line = [blog[0], "no connection"]
rows.append(line)
with open('urlStatus.csv', 'w') as output:
csv_writer = csv.writer(output)
csv_writer.writerows(rows)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment