Skip to content

Instantly share code, notes, and snippets.

@nmostafavi
Last active March 1, 2021 17:12
Show Gist options
  • Save nmostafavi/cce5cc75b269b0b59991d7469e52882a to your computer and use it in GitHub Desktop.
Save nmostafavi/cce5cc75b269b0b59991d7469e52882a to your computer and use it in GitHub Desktop.
Python script that checks a page for changes, then notifies via IFTTT

diffchecker.py

A simple script you can run with crontab to monitor for changes to a webpage. If any changes are detected, it notifies an IFTTT endpoint. From there, you can customize it to do whatever you want (e.g. push notification, call your friend, email your enemy, etc.)

Set up IFTTT

  1. Get your webhook API key from https://maker.ifttt.com/. You'll need an account for this.
  2. Create a new recipe. Choose Webhooks as the source service, and Receive a web request as the trigger.
  3. Choose an Event Name. For example, page_updated.
  4. Configure the "action service" to your liking. I usually use the Notifications service to send a push notification to my phone. Your action can use the Value1 "ingredient" to refer to the URL being monitored, e.g. if you'd like the push notification to include the URL being monitored for quick reference.
  5. Save your recipe, double-check that it's enabled, and move on to setting up the script.

Set up the script

There are two variables to update:

  1. The URL you would like to monitor, and
  2. The IFTTT Webhook URL. Replace the URL with the page you'd like to monitor, then replace the [[event_name]] and [[key]] with the bits from earlier.

Set up crontab

No, you don't have to use crontab, but I am running this on a Raspberry Pi using crontab. Example crontab configuration which assumes you've placed the configured script in the home directory on a Raspberry Pi:

# m h  dom mon dow   command
* * * * * /home/pi/diffchecker.py  # Checks webpage for changes once every minute

And you're all set!

The first time the script runs, it will create a temp file and trigger your IFTTT endpoint. Subsequent runs will compare the webpage against the tempfile copy. So, if you only want to test the IFTTT endpoint, you can delete the temp file and run the script manually.

#!/usr/bin/env python3
import urllib.request
import urllib.parse
import os
import re
url = "https://www.example.com/page-to-monitor-for-changes.html" # edit me
webhookurl = "https://maker.ifttt.com/trigger/[[event_name]]/with/key/[[key]]" # edit me
tempfile = "diffchecker_temp"
# Download latest version of the page
req = urllib.request.Request(url)
req.add_header("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/20100101 Firefox/72.0")
response = urllib.request.urlopen(req)
current = response.read().decode("utf-8")
# Strip windows-style newlines for consistency between in-memory and on-disk versions of the file
current = current.replace("\r\n", "\n")
# Fetch previous version from disk
previous = ""
if os.path.exists(tempfile):
with open(tempfile, "r") as f:
previous = f.read()
# Save new version to disk
with open(tempfile, "w") as f:
f.write(current)
# Alert if different
if current != previous:
print("Change detected!")
# Trigger IFTTT
payload = urllib.parse.urlencode({"value1": url}).encode("ascii")
with urllib.request.urlopen(webhookurl, payload) as f:
print(f.read().decode("utf-8"))
else:
print("No change.")
#!/usr/bin/env python3
import argparse
import difflib
import hashlib
import os
import re
import time
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
TIMEOUT = 15 # seconds
def fetch_url(url):
request = urllib.request.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/20100101 Firefox/72.0')
try:
response = urllib.request.urlopen(request, timeout=TIMEOUT)
data = response.read().decode('utf-8')
except Exception as e:
print('Error: ' + str(e))
return False, None
if response.status != 200:
print('HTTP Response: ' + str(response.status))
return False, None
return True, data
def notify(url, webhookurl):
payload = urllib.parse.urlencode({'value1': url}).encode('ascii')
for _ in range(10):
try:
with urllib.request.urlopen(webhookurl, payload, timeout=TIMEOUT) as f:
print(f.read().decode('utf-8'))
return
except Exception as e:
print('Error: ' + str(e))
def main(args):
url = args.url
webhookurl = 'https://maker.ifttt.com/trigger/page_updated/with/key/' + args.api_key
# Generate a hash so we can run multiple instances of this script from the same folder without file name clashes.
hash = (hashlib.sha1(bytes(url, 'utf-8')).hexdigest())[:10]
previous_html = ''
previous_text = ''
current_html = ''
current_text = ''
# By keeping track of all previously-seen revisions, this script is robust against server caching issues in which
# multiple successive reloads may return a stale, older version of the webpage.
seen = {}
while True:
timestamp = time.strftime('%Y-%m-%d %H%M%S')
# Download latest version of the page
success, current_html = fetch_url(url)
if not success:
time.sleep(TIMEOUT)
continue
# Strip html and collapse newlines.
soup = BeautifulSoup(current_html, 'html.parser')
current_text = soup.get_text()
current_text = current_text.replace('\r', '')
current_text = re.sub('\n+', '\n', current_text)
# Alert if different
if current_text not in seen:
print(timestamp + ' ' + hash + ' Change detected!')
# Trigger IFTTT
notify(url, webhookurl)
# Save before/after snapshot to disk
with open(timestamp + ' ' + hash + ' before.html', 'w') as f:
f.write(previous_html)
with open(timestamp + ' ' + hash + ' after.html', 'w') as f:
f.write(current_html)
with open(timestamp + ' ' + hash + ' before.txt', 'w') as f:
f.write(previous_text)
with open(timestamp + ' ' + hash + ' after.txt', 'w') as f:
f.write(current_text)
with open(timestamp + ' ' + hash + ' diff.txt', 'w') as f:
diff = ''.join(difflib.context_diff(previous_text.splitlines(keepends=True), current_text.splitlines(keepends=True)))
f.write(diff)
else:
print(timestamp + ' ' + hash + ' No change.')
previous_html = current_html
previous_text = current_text
seen[current_text] = timestamp
time.sleep(int(args.poll_interval))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--url', action='store', dest='url', help='URL of the webpage to check for updates.', required=True)
parser.add_argument('--api-key', action='store', dest='api_key', help='IFTTT Webhooks API key. This can be found on the IFTTT website: Click "Documentation" on https://ifttt.com/maker_webhooks/ to obtain your key.', required=True)
parser.add_argument('--poll-interval', action='store', dest='poll_interval', help='Poll interval, in seconds. i.e. how long to wait inbetween each subsequent page check. (Default: 60 seconds)', default=60, required=False)
args = parser.parse_args()
main(args)
@nmostafavi
Copy link
Author

Added a new script, diffchecker2.py, which is basically the same thing, but uses polling instead of relying on crontab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment