Skip to content

Instantly share code, notes, and snippets.

@avinash
Created March 12, 2021 07:09
Show Gist options
  • Save avinash/72058073a9f6fbf857bb6cebb2615d7b to your computer and use it in GitHub Desktop.
Save avinash/72058073a9f6fbf857bb6cebb2615d7b to your computer and use it in GitHub Desktop.
Download exam papers from gceguide.com
# Step 1: Download the HTML file corresponding to a year
wget "https://papers.gceguide.com/A%20Levels/Mathematics%20(9709)/2020/" -O 2020.html
# Step 2: Use sed, tr and awk to clean the HTML file so that we only have the names of the PDF files and download them individually with wget
cat 2020.html | sed 's/<li class=.file.>/£/g' | tr '£' '\n' | grep pdf | sed 's/^.*class=.name.>//g' | sed 's/<.*$//g' | awk '{ print "wget \"https://papers.gceguide.com/A%20Levels/Mathematics%20(9709)/2020/" $0 "\"" }' | bash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment