Title: Diving into Open Data with IPython Notebook & Pandas
Anyone who is interested in working with data and hasn't used IPython
Notebook & pandas together before.
The goal is to show people how Python can be used as a practical and fun tool for working with data, as an alternative to R. After going to this talk, they'll have a good idea of the power of IPython notebook and pandas. They'll also be able to use it for some simple data analysis, because the slides double as a practice sheet for playing around with the data on your own.
I'll walk you through Python's best tools for getting a grip on data:
IPython Notebook and pandas. I'll show you how to read in data, clean
it up, graph it, and draw some conclusions, using some open data about
the number of cyclists on Montréal's bike paths as an example.
Using the example of some cyclist sensor data from Montréal, I'll
explain how to
- clean up data (fix date formatting issues, remove null values, ...)
- graph the data
- scrape weather data from the weather office website and look at the
relationship between temperature & cyclists
- aggregate the data to find out how many people bike on weekdays vs
- talk about possible directions to take the project (make a model
Here's an approximate outline.
- Who am I? Why do I use IPython & pandas? (2 minutes)
- What is IPython Notebook? Short demo. (5 minutes)
- What is pandas? What are its advantages over straight numpy? (5
- Installation tips (use anaconda!) & how to start the notebook. (1
- Importing data into a dataframe. What's a dataframe? Plotting the
data (3 minutes)
- Indexing and slicing dataframes (2 minutes)
- Using groupby & aggregate to get weekday counts (3 minutes)
- Resampling weather data (2 minutes)
- More slicing to zoom in on unpopular days (2 minutes)
- Questions (5 minutes)
Total: 30 minutes
I gave this talk at PyCon Canada in August and it was very well
received -- people told me that it showed them how to do things they
didn't know were possible, and that it was really accessible for
Python beginners. I've also given versions of this talk at Montréal
I'm also planning to submit a 1-hour tutorial on IPython notebook &
pandas to PyData in NYC in November, so providing that that gets
accepted I'll have even more practice talking about these tools.
This talk was accepted! =)