Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
crawl courseware from school website
#!/usr/bin/env bash
# params
# prepare tmp fs
cd /tmp; mkdir infosec; cd infosec
# fetch links of pdfs from html index
# 1. grab html behind auth. wall
# 2. filter out urls to pdf
# 3. finalize list with sed
curl $URL -H $COOKIE --compressed \
| grep -Eo '"https.*.pdf"' \
| sed 's/"//g' \
> links_to_courseware_pdfs.log
# download courseware in parallel for speed with wget
cat links_to_courseware_pdfs.log | parallel --gnu "wget --header $COOKIE {}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.