To take full advantage of the workshop you'll need
- PySpark installed (anything more recent than 2.3 should be fine)
- Jupyter installed
- Pandas and Arrow installed
- All able to talk to each other
- One or more datasets
You can clone this repository to have the notebook and slides (some things may still change until Saturday, like uploading and upgating the compiled slides, but the notebook is essentially finished).