Skip to content

Instantly share code, notes, and snippets.

@Btibert3
Created January 18, 2022 00:53
Show Gist options
  • Save Btibert3/0201a22ecdaceebfd326cc7c6f660411 to your computer and use it in GitHub Desktop.
Save Btibert3/0201a22ecdaceebfd326cc7c6f660411 to your computer and use it in GitHub Desktop.
Moneypuck dataset import using python for RapidMiner
# SOURCE (and adapted from): https://stackoverflow.com/a/46676405/155406
import requests, zipfile, io
import pandas
# use requests to get the dataset, get the zipfile, and use pandas to read the csv
# the site blocks certain requests, but ironically allows wget from Google Colab
def rm_main():
# get the data
URL = ("https://peter-tanner.com/moneypuck/downloads/shots_2021.zip")
r = requests.get(URL)
zf = zipfile.ZipFile(io.BytesIO(r.content))
# find the first matching csv file in the zip:
match = [s for s in zf.namelist() if ".csv" in s][0]
# the first line of the file contains a string - that line shall de ignored, hence skiprows
df = pandas.read_csv(zf.open(match), low_memory=True)
return df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment