Skip to content

Instantly share code, notes, and snippets.

@ka-pr
Created November 14, 2019 13:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ka-pr/269eea4485b37eb2cb571c61ed111ebb to your computer and use it in GitHub Desktop.
Save ka-pr/269eea4485b37eb2cb571c61ed111ebb to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Render our plots inline\n",
"%matplotlib inline\n",
"\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# pd.set_option('display.mpl_style', 'default') # Make the graphs a bit prettier\n",
"# plt.rcParams['figure.figsize'] = (15, 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading data from a csv file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can read data from a CSV file using the `read_csv` function. By default, it assumes that the fields are comma-separated.\n",
"\n",
"We're going to be looking some cyclist data from Montréal. Here's the [original page](http://donnees.ville.montreal.qc.ca/dataset/velos-comptage) (in French), but it's already included in this repository. We're using the data from 2012.\n",
"\n",
"This dataset is a list of how many people were on 7 different bike paths in Montreal, each day."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"ename": "UnicodeDecodeError",
"evalue": "'utf-8' codec can't decode byte 0xe9 in position 15: invalid continuation byte",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-2-cd17e845b50e>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mbroken_df\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'data/bikes.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 683\u001b[0m )\n\u001b[1;32m 684\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 685\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 686\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 687\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 455\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 456\u001b[0m \u001b[0;31m# Create the parser.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 457\u001b[0;31m \u001b[0mparser\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTextFileReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfp_or_buf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 458\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 459\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mchunksize\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0miterator\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 893\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"has_index_names\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"has_index_names\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 894\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 895\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 896\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 897\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_make_engine\u001b[0;34m(self, engine)\u001b[0m\n\u001b[1;32m 1133\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mengine\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"c\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1134\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"c\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1135\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCParserWrapper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1136\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1137\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"python\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, src, **kwds)\u001b[0m\n\u001b[1;32m 1904\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"usecols\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0musecols\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1905\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1906\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparsers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTextReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1907\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munnamed_cols\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munnamed_cols\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1908\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.__cinit__\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._get_header\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mUnicodeDecodeError\u001b[0m: 'utf-8' codec can't decode byte 0xe9 in position 15: invalid continuation byte"
]
}
],
"source": [
"broken_df = pd.read_csv('data/bikes.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Look at the first 3 rows\n",
"broken_df[:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You'll notice that this is totally broken! `read_csv` has a bunch of options that will let us fix that, though. Here we'll\n",
"\n",
"* change the column separator to a `;`\n",
"* Set the encoding to `'latin1'` (the default is `'utf8'`)\n",
"* Parse the dates in the 'Date' column\n",
"* Tell it that our dates have the day first instead of the month first\n",
"* Set the index to be the 'Date' column"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Berri 1</th>\n",
" <th>Brébeuf (données non disponibles)</th>\n",
" <th>Côte-Sainte-Catherine</th>\n",
" <th>Maisonneuve 1</th>\n",
" <th>Maisonneuve 2</th>\n",
" <th>du Parc</th>\n",
" <th>Pierre-Dupuy</th>\n",
" <th>Rachel1</th>\n",
" <th>St-Urbain (données non disponibles)</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2012-01-01</th>\n",
" <td>35</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>38</td>\n",
" <td>51</td>\n",
" <td>26</td>\n",
" <td>10</td>\n",
" <td>16</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-01-02</th>\n",
" <td>83</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>68</td>\n",
" <td>153</td>\n",
" <td>53</td>\n",
" <td>6</td>\n",
" <td>43</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-01-03</th>\n",
" <td>135</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" <td>104</td>\n",
" <td>248</td>\n",
" <td>89</td>\n",
" <td>3</td>\n",
" <td>58</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Berri 1 Brébeuf (données non disponibles) Côte-Sainte-Catherine \\\n",
"Date \n",
"2012-01-01 35 NaN 0 \n",
"2012-01-02 83 NaN 1 \n",
"2012-01-03 135 NaN 2 \n",
"\n",
" Maisonneuve 1 Maisonneuve 2 du Parc Pierre-Dupuy Rachel1 \\\n",
"Date \n",
"2012-01-01 38 51 26 10 16 \n",
"2012-01-02 68 153 53 6 43 \n",
"2012-01-03 104 248 89 3 58 \n",
"\n",
" St-Urbain (données non disponibles) \n",
"Date \n",
"2012-01-01 NaN \n",
"2012-01-02 NaN \n",
"2012-01-03 NaN "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fixed_df = pd.read_csv('data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')\n",
"fixed_df[:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Selecting a column"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you read a CSV, you get a kind of object called a `DataFrame`, which is made up of rows and columns. You get columns out of a DataFrame the same way you get elements out of a dictionary.\n",
"\n",
"Here's an example:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Date\n",
"2012-01-01 35\n",
"2012-01-02 83\n",
"2012-01-03 135\n",
"2012-01-04 144\n",
"2012-01-05 197\n",
" ... \n",
"2012-11-01 2405\n",
"2012-11-02 1582\n",
"2012-11-03 844\n",
"2012-11-04 966\n",
"2012-11-05 2247\n",
"Name: Berri 1, Length: 310, dtype: int64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fixed_df['Berri 1']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also look at some basic statistics of each column."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Berri 1</th>\n",
" <th>Brébeuf (données non disponibles)</th>\n",
" <th>Côte-Sainte-Catherine</th>\n",
" <th>Maisonneuve 1</th>\n",
" <th>Maisonneuve 2</th>\n",
" <th>du Parc</th>\n",
" <th>Pierre-Dupuy</th>\n",
" <th>Rachel1</th>\n",
" <th>St-Urbain (données non disponibles)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>310.000000</td>\n",
" <td>0.0</td>\n",
" <td>310.000000</td>\n",
" <td>310.000000</td>\n",
" <td>310.000000</td>\n",
" <td>310.000000</td>\n",
" <td>310.000000</td>\n",
" <td>310.000000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>2985.048387</td>\n",
" <td>NaN</td>\n",
" <td>1233.351613</td>\n",
" <td>1983.325806</td>\n",
" <td>3510.261290</td>\n",
" <td>1862.983871</td>\n",
" <td>1054.306452</td>\n",
" <td>2873.483871</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>2169.271062</td>\n",
" <td>NaN</td>\n",
" <td>944.643188</td>\n",
" <td>1450.715170</td>\n",
" <td>2484.959789</td>\n",
" <td>1332.543266</td>\n",
" <td>1064.029205</td>\n",
" <td>2039.315504</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>32.000000</td>\n",
" <td>NaN</td>\n",
" <td>0.000000</td>\n",
" <td>33.000000</td>\n",
" <td>47.000000</td>\n",
" <td>18.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>596.000000</td>\n",
" <td>NaN</td>\n",
" <td>243.250000</td>\n",
" <td>427.000000</td>\n",
" <td>831.000000</td>\n",
" <td>474.750000</td>\n",
" <td>53.250000</td>\n",
" <td>731.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>3128.000000</td>\n",
" <td>NaN</td>\n",
" <td>1269.000000</td>\n",
" <td>2019.500000</td>\n",
" <td>3688.500000</td>\n",
" <td>1822.500000</td>\n",
" <td>704.000000</td>\n",
" <td>3223.500000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>4973.250000</td>\n",
" <td>NaN</td>\n",
" <td>2003.000000</td>\n",
" <td>3168.250000</td>\n",
" <td>5731.750000</td>\n",
" <td>3069.000000</td>\n",
" <td>1818.500000</td>\n",
" <td>4717.250000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7077.000000</td>\n",
" <td>NaN</td>\n",
" <td>3124.000000</td>\n",
" <td>4999.000000</td>\n",
" <td>8222.000000</td>\n",
" <td>4510.000000</td>\n",
" <td>4386.000000</td>\n",
" <td>6595.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Berri 1 Brébeuf (données non disponibles) Côte-Sainte-Catherine \\\n",
"count 310.000000 0.0 310.000000 \n",
"mean 2985.048387 NaN 1233.351613 \n",
"std 2169.271062 NaN 944.643188 \n",
"min 32.000000 NaN 0.000000 \n",
"25% 596.000000 NaN 243.250000 \n",
"50% 3128.000000 NaN 1269.000000 \n",
"75% 4973.250000 NaN 2003.000000 \n",
"max 7077.000000 NaN 3124.000000 \n",
"\n",
" Maisonneuve 1 Maisonneuve 2 du Parc Pierre-Dupuy Rachel1 \\\n",
"count 310.000000 310.000000 310.000000 310.000000 310.000000 \n",
"mean 1983.325806 3510.261290 1862.983871 1054.306452 2873.483871 \n",
"std 1450.715170 2484.959789 1332.543266 1064.029205 2039.315504 \n",
"min 33.000000 47.000000 18.000000 0.000000 0.000000 \n",
"25% 427.000000 831.000000 474.750000 53.250000 731.000000 \n",
"50% 2019.500000 3688.500000 1822.500000 704.000000 3223.500000 \n",
"75% 3168.250000 5731.750000 3069.000000 1818.500000 4717.250000 \n",
"max 4999.000000 8222.000000 4510.000000 4386.000000 6595.000000 \n",
"\n",
" St-Urbain (données non disponibles) \n",
"count 0.0 \n",
"mean NaN \n",
"std NaN \n",
"min NaN \n",
"25% NaN \n",
"50% NaN \n",
"75% NaN \n",
"max NaN "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fixed_df.describe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.count()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.std()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.min()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.max()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.corr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Plotting a column"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just add `.plot()` to the end! How could it be easier? =)\n",
"\n",
"We can see that, unsurprisingly, not many people are biking in January, February, and March, "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df['Berri 1'].plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df['Berri 1'].plot(figsize=(10,4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also plot all the columns just as easily. We'll make it a little bigger, too.\n",
"You can see that it's more squished together, but all the bike paths behave basically the same -- if it's a bad day for cyclists, it's a bad day everywhere."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.plot(figsize=(15, 5))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.rolling(window=15).mean().plot(figsize=(15,5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What if we want to know the cumulative number of people who have biked the path instead of just the daily number?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"cumulative_df = fixed_df.apply(np.cumsum).copy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cumulative_df.plot(figsize=(15,5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What if we wanted to know the distribution of visits?"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2d95e58208>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2d92f1d0b8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2d92eca320>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2d92ef17f0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2d92e98cf8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2d92ec82b0>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2d92e6e828>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2d92e15dd8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f2d92e15e10>]],\n",
" dtype=object)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 936x936 with 9 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fixed_df.hist(figsize=(13,13))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see here that most trails get very few visitors most days."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fixed_df.columns"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment