Skip to content

Instantly share code, notes, and snippets.

@theSage21
Last active October 1, 2015 10:11
Show Gist options
  • Save theSage21/76342fd109f8f7766ca5 to your computer and use it in GitHub Desktop.
Save theSage21/76342fd109f8f7766ca5 to your computer and use it in GitHub Desktop.
Check if a list of URLs has had any content changed. Useful for websites with no RSS feed
import os
from requests import get
from hashlib import md5
urls = ['http://ststephens.edu/',
'http://cmi.ac.in/']
try:
with open('.website_signatures', 'r') as fl:
old_signature = fl.readlines()
except IOError:
old_signature = ['' for _ in urls]
# wait for internet connection
wait = 1
while True:
try:
html = [get(u).text.encode('utf-8') for u in urls]
except:
import time
time.sleep(wait)
wait = wait * 2
else:
break
new_signature = [md5(h).hexdigest() for h in html]
changes = [old_sig == new_sig for old_sig, new_sig in zip(old_signature, new_signature)]
msg = ''
for changed, link in zip(changes, urls):
if changed:
msg += 'Changed : '
else:
msg += 'Not Changed : '
msg += link + '\n'
os.system('notify-send "Websites with changed contents" "{}"'.format(msg))
with open('.website_signatures', 'w') as fl:
fl.writelines(map(lambda x: str(x) + '\n', new_signature))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment