Still very much a WIP but it gets the job done.
pip install -r requirements.txt
- Replace
WATCH_URL
andBASE_URL
inscraper.py
python3 scraper.py output_file
You may run into errors installing the dependency lxml
so refer to their installation guide for troubleshooting.
-
Send the
WATCH_URL
as a command line arg -
Parse
BASE_URL
from from theWATCH_URL
-
Compare
data-repost-of
attribute todata-pid
to filter duplicates