Skip to content

Instantly share code, notes, and snippets.

@xiaohk
Last active March 7, 2017 02:04
Show Gist options
  • Save xiaohk/cff07fa5a49129a8ae3cb6ea3f95feb9 to your computer and use it in GitHub Desktop.
Save xiaohk/cff07fa5a49129a8ae3cb6ea3f95feb9 to your computer and use it in GitHub Desktop.
Download all avaliable data files of the book "Linear Regression by Example"(5th edition)
"""
If you want to download the data of 4th edition, change all the '5' in the string
below to '4' (i.e. '/data5' -> '/data4'). Although two editions almost have the same
data sets, the data format for 4th edition is kinda sloppy. Thus 5th edition is recommended.
"""
import urllib.request
import requests
import regex as re
URL = "http://www1.aucegypt.edu/faculty/hadi/RABE5/data5/"
# Get the valid data locations
web = requests.get("http://www1.aucegypt.edu/faculty/hadi/RABE5/#Download").text
pages = re.findall(r'P\d+-*\d*.?\.txt', web, overlapped = True)
# Download the data into the current directory
for p in set(pages):
try:
urllib.request.urlretrieve(URL + p, "./" + p)
print(p + " downloaded")
except urllib.error.HTTPError:
pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment