Skip to content

Instantly share code, notes, and snippets.

@mandemeskel
Last active February 5, 2021 17:51
Show Gist options
  • Save mandemeskel/cba2bb24c871724ca551b211de0f57b1 to your computer and use it in GitHub Desktop.
Save mandemeskel/cba2bb24c871724ca551b211de0f57b1 to your computer and use it in GitHub Desktop.
#!/bin/sh
# get the start date and time
start_datetime=$(date '+%m_%d_%Y_%H_%M_%S')
echo "${start_datetime} - starting spider ${SPIDER_NAME} - debug: ${DEBUG}"
# go to the spider directory
cd $SPIDER_PATH
# prevent click, which pipenv relies on, from freaking out to due to lack of locale info https://click.palletsprojects.com/en/7.x/python3/
export LC_ALL=en_US.utf-8
# run the spider
$PIPENV run scrapy crawl $SPIDER_NAME -a debug=$DEBUG &> "logs/log_${start_datetime}.txt"
# get the end date and time
end_datetime=$(date '+%m_%d_%Y_%H_%M_%S')
echo "${end_datetime} - spider finished successfully"
# use `crontab -e` and add this line to crontab to run crawl.sh 1:30 AM local time every day
30 1 * * * export PIPENV=/Library/Frameworks/Python.framework/Versions/3.6/bin/pipenv SPIDER_PATH=/Users/michael/repos/spiders SPIDER_NAME=multi_subject_spider DEBUG=True && /Users/michael/repos/crawl.sh >> /Users/michael/repos/spiders/logs/cron_log.txt 2 >& 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment