Skip to content

Instantly share code, notes, and snippets.

@mattharrison
Last active February 14, 2024 20:58
Show Gist options
  • Save mattharrison/ddbd73c13ad0e76ec638311c669bc96b to your computer and use it in GitHub Desktop.
Save mattharrison/ddbd73c13ad0e76ec638311c669bc96b to your computer and use it in GitHub Desktop.
Idiomatic Pandas: 5 tips for better pandas code
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Idiomatic Pandas\n",
"## 5 Tips for Better Pandas Code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## About Matt Harrison @\\_\\_mharrison\\_\\_\n",
"\n",
"* Author of Pandas 1.x Cookbook, Machine Learning Pocket Reference, and Learning the Pandas Library.\n",
"* Corporate trainer at MetaSnake. Taught Pandas to 1000's of students."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"https://d31ezp3r8jwmks.cloudfront.net/bqr8mz52iamflifihoc7086xc79c\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Practice this on your data!\n",
"* 4 two hour sessions\n",
"* Coming soon!\n",
"* Follow on Twitter @\\_\\_mharrison\\_\\_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Outline\n",
"\n",
"* Load Data\n",
"* Types\n",
"* Chaining\n",
"* Mutation\n",
"* Apply\n",
"* Aggregation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from IPython.display import display\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1.3.2'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.__version__"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"pd.options.display.min_rows = 20"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/matt/envs/menv/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3418: DtypeWarning: Columns (68,70,71,72,73,74,76,79) have mixed types.Specify dtype option on import or set low_memory=False.\n",
" exec(code_obj, self.user_global_ns, self.user_ns)\n"
]
}
],
"source": [
"autos = pd.read_csv('https://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>barrels08</th>\n",
" <th>barrelsA08</th>\n",
" <th>charge120</th>\n",
" <th>charge240</th>\n",
" <th>city08</th>\n",
" <th>city08U</th>\n",
" <th>cityA08</th>\n",
" <th>cityA08U</th>\n",
" <th>cityCD</th>\n",
" <th>cityE</th>\n",
" <th>...</th>\n",
" <th>mfrCode</th>\n",
" <th>c240Dscr</th>\n",
" <th>charge240b</th>\n",
" <th>c240bDscr</th>\n",
" <th>createdOn</th>\n",
" <th>modifiedOn</th>\n",
" <th>startStop</th>\n",
" <th>phevCity</th>\n",
" <th>phevHwy</th>\n",
" <th>phevComb</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.695714</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>19</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>29.964545</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>9</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>12.207778</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>23</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>29.964545</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>10</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>17.347895</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>17</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>14.982273</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>21</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>13.184400</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>22</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>13.733750</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>23</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>12.677308</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>23</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>13.184400</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>23</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>16.480500</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>18</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>12.677308</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>23</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>13.733750</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>21</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>11.771786</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>24</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>13.184400</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>21</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>14.982273</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>19</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>14.330870</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>20</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>15.695714</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>18</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>15.695714</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>18</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>18.311667</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>16</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 83 columns</p>\n",
"</div>"
],
"text/plain": [
" barrels08 barrelsA08 charge120 charge240 city08 city08U cityA08 \\\n",
"0 15.695714 0.0 0.0 0.0 19 0.0 0 \n",
"1 29.964545 0.0 0.0 0.0 9 0.0 0 \n",
"2 12.207778 0.0 0.0 0.0 23 0.0 0 \n",
"3 29.964545 0.0 0.0 0.0 10 0.0 0 \n",
"4 17.347895 0.0 0.0 0.0 17 0.0 0 \n",
"5 14.982273 0.0 0.0 0.0 21 0.0 0 \n",
"6 13.184400 0.0 0.0 0.0 22 0.0 0 \n",
"7 13.733750 0.0 0.0 0.0 23 0.0 0 \n",
"8 12.677308 0.0 0.0 0.0 23 0.0 0 \n",
"9 13.184400 0.0 0.0 0.0 23 0.0 0 \n",
"... ... ... ... ... ... ... ... \n",
"41134 16.480500 0.0 0.0 0.0 18 0.0 0 \n",
"41135 12.677308 0.0 0.0 0.0 23 0.0 0 \n",
"41136 13.733750 0.0 0.0 0.0 21 0.0 0 \n",
"41137 11.771786 0.0 0.0 0.0 24 0.0 0 \n",
"41138 13.184400 0.0 0.0 0.0 21 0.0 0 \n",
"41139 14.982273 0.0 0.0 0.0 19 0.0 0 \n",
"41140 14.330870 0.0 0.0 0.0 20 0.0 0 \n",
"41141 15.695714 0.0 0.0 0.0 18 0.0 0 \n",
"41142 15.695714 0.0 0.0 0.0 18 0.0 0 \n",
"41143 18.311667 0.0 0.0 0.0 16 0.0 0 \n",
"\n",
" cityA08U cityCD cityE ... mfrCode c240Dscr charge240b c240bDscr \\\n",
"0 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"1 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"2 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"3 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"4 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"5 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"6 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"7 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"8 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"9 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"... ... ... ... ... ... ... ... ... \n",
"41134 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41135 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41136 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41137 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41138 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41139 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41140 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41141 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41142 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"41143 0.0 0.0 0.0 ... NaN NaN 0.0 NaN \n",
"\n",
" createdOn modifiedOn startStop \\\n",
"0 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"1 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"2 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"3 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"4 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"5 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"6 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"7 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"8 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"9 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"... ... ... ... \n",
"41134 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41135 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41136 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41137 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41138 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41139 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41140 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41141 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41142 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"41143 Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013 NaN \n",
"\n",
" phevCity phevHwy phevComb \n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"5 0 0 0 \n",
"6 0 0 0 \n",
"7 0 0 0 \n",
"8 0 0 0 \n",
"9 0 0 0 \n",
"... ... ... ... \n",
"41134 0 0 0 \n",
"41135 0 0 0 \n",
"41136 0 0 0 \n",
"41137 0 0 0 \n",
"41138 0 0 0 \n",
"41139 0 0 0 \n",
"41140 0 0 0 \n",
"41141 0 0 0 \n",
"41142 0 0 0 \n",
"41143 0 0 0 \n",
"\n",
"[41144 rows x 83 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autos"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['barrels08', 'barrelsA08', 'charge120', 'charge240', 'city08',\n",
" 'city08U', 'cityA08', 'cityA08U', 'cityCD', 'cityE', 'cityUF', 'co2',\n",
" 'co2A', 'co2TailpipeAGpm', 'co2TailpipeGpm', 'comb08', 'comb08U',\n",
" 'combA08', 'combA08U', 'combE', 'combinedCD', 'combinedUF', 'cylinders',\n",
" 'displ', 'drive', 'engId', 'eng_dscr', 'feScore', 'fuelCost08',\n",
" 'fuelCostA08', 'fuelType', 'fuelType1', 'ghgScore', 'ghgScoreA',\n",
" 'highway08', 'highway08U', 'highwayA08', 'highwayA08U', 'highwayCD',\n",
" 'highwayE', 'highwayUF', 'hlv', 'hpv', 'id', 'lv2', 'lv4', 'make',\n",
" 'model', 'mpgData', 'phevBlended', 'pv2', 'pv4', 'range', 'rangeCity',\n",
" 'rangeCityA', 'rangeHwy', 'rangeHwyA', 'trany', 'UCity', 'UCityA',\n",
" 'UHighway', 'UHighwayA', 'VClass', 'year', 'youSaveSpend', 'guzzler',\n",
" 'trans_dscr', 'tCharger', 'sCharger', 'atvType', 'fuelType2', 'rangeA',\n",
" 'evMotor', 'mfrCode', 'c240Dscr', 'charge240b', 'c240bDscr',\n",
" 'createdOn', 'modifiedOn', 'startStop', 'phevCity', 'phevHwy',\n",
" 'phevComb'],\n",
" dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autos.columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Types\n",
"Getting the right types will enable analysis and correctness."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"cols = ['city08', 'comb08', 'highway08', 'cylinders', 'displ', 'drive', 'eng_dscr', \n",
" 'fuelCost08', 'make', 'model', 'trany', 'range', 'createdOn', 'year']"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"city08 int64\n",
"comb08 int64\n",
"highway08 int64\n",
"cylinders float64\n",
"displ float64\n",
"drive object\n",
"eng_dscr object\n",
"fuelCost08 int64\n",
"make object\n",
"model object\n",
"trany object\n",
"range int64\n",
"createdOn object\n",
"year int64\n",
"dtype: object"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autos[cols].dtypes"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index 128\n",
"city08 329152\n",
"comb08 329152\n",
"highway08 329152\n",
"cylinders 329152\n",
"displ 329152\n",
"drive 3028369\n",
"eng_dscr 2135693\n",
"fuelCost08 329152\n",
"make 2606267\n",
"model 2813134\n",
"trany 2933276\n",
"range 329152\n",
"createdOn 3497240\n",
"year 329152\n",
"dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autos[cols].memory_usage(deep=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"19647323"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autos[cols].memory_usage(deep=True).sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ints"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>fuelCost08</th>\n",
" <th>range</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>18.369045</td>\n",
" <td>20.616396</td>\n",
" <td>24.504667</td>\n",
" <td>2362.335942</td>\n",
" <td>0.793506</td>\n",
" <td>2001.535266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>7.905886</td>\n",
" <td>7.674535</td>\n",
" <td>7.730364</td>\n",
" <td>654.981925</td>\n",
" <td>13.041592</td>\n",
" <td>11.142414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>6.000000</td>\n",
" <td>7.000000</td>\n",
" <td>9.000000</td>\n",
" <td>500.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1984.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>15.000000</td>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>1900.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1991.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>24.000000</td>\n",
" <td>2350.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2002.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>20.000000</td>\n",
" <td>23.000000</td>\n",
" <td>28.000000</td>\n",
" <td>2700.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2011.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>150.000000</td>\n",
" <td>136.000000</td>\n",
" <td>124.000000</td>\n",
" <td>7400.000000</td>\n",
" <td>370.000000</td>\n",
" <td>2020.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 fuelCost08 range \\\n",
"count 41144.000000 41144.000000 41144.000000 41144.000000 41144.000000 \n",
"mean 18.369045 20.616396 24.504667 2362.335942 0.793506 \n",
"std 7.905886 7.674535 7.730364 654.981925 13.041592 \n",
"min 6.000000 7.000000 9.000000 500.000000 0.000000 \n",
"25% 15.000000 17.000000 20.000000 1900.000000 0.000000 \n",
"50% 17.000000 20.000000 24.000000 2350.000000 0.000000 \n",
"75% 20.000000 23.000000 28.000000 2700.000000 0.000000 \n",
"max 150.000000 136.000000 124.000000 7400.000000 370.000000 \n",
"\n",
" year \n",
"count 41144.000000 \n",
"mean 2001.535266 \n",
"std 11.142414 \n",
"min 1984.000000 \n",
"25% 1991.000000 \n",
"50% 2002.000000 \n",
"75% 2011.000000 \n",
"max 2020.000000 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autos[cols].select_dtypes(int).describe()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>fuelCost08</th>\n",
" <th>range</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>18.369045</td>\n",
" <td>20.616396</td>\n",
" <td>24.504667</td>\n",
" <td>2362.335942</td>\n",
" <td>0.793506</td>\n",
" <td>2001.535266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>7.905886</td>\n",
" <td>7.674535</td>\n",
" <td>7.730364</td>\n",
" <td>654.981925</td>\n",
" <td>13.041592</td>\n",
" <td>11.142414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>6.000000</td>\n",
" <td>7.000000</td>\n",
" <td>9.000000</td>\n",
" <td>500.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1984.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>15.000000</td>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>1900.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1991.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>24.000000</td>\n",
" <td>2350.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2002.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>20.000000</td>\n",
" <td>23.000000</td>\n",
" <td>28.000000</td>\n",
" <td>2700.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2011.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>150.000000</td>\n",
" <td>136.000000</td>\n",
" <td>124.000000</td>\n",
" <td>7400.000000</td>\n",
" <td>370.000000</td>\n",
" <td>2020.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 fuelCost08 range \\\n",
"count 41144.000000 41144.000000 41144.000000 41144.000000 41144.000000 \n",
"mean 18.369045 20.616396 24.504667 2362.335942 0.793506 \n",
"std 7.905886 7.674535 7.730364 654.981925 13.041592 \n",
"min 6.000000 7.000000 9.000000 500.000000 0.000000 \n",
"25% 15.000000 17.000000 20.000000 1900.000000 0.000000 \n",
"50% 17.000000 20.000000 24.000000 2350.000000 0.000000 \n",
"75% 20.000000 23.000000 28.000000 2700.000000 0.000000 \n",
"max 150.000000 136.000000 124.000000 7400.000000 370.000000 \n",
"\n",
" year \n",
"count 41144.000000 \n",
"mean 2001.535266 \n",
"std 11.142414 \n",
"min 1984.000000 \n",
"25% 1991.000000 \n",
"50% 2002.000000 \n",
"75% 2011.000000 \n",
"max 2020.000000 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# chaining\n",
"(autos\n",
" [cols]\n",
" .select_dtypes(int)\n",
" .describe()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"iinfo(min=-128, max=127, dtype=int8)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# can comb08 be and int8?\n",
"np.iinfo(np.int8)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"iinfo(min=-32768, max=32767, dtype=int16)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.iinfo(np.int16)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>highway08</th>\n",
" <th>fuelCost08</th>\n",
" <th>range</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>24.504667</td>\n",
" <td>2362.335942</td>\n",
" <td>0.793506</td>\n",
" <td>2001.535266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>7.730364</td>\n",
" <td>654.981925</td>\n",
" <td>13.041592</td>\n",
" <td>11.142414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>9.000000</td>\n",
" <td>500.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1984.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>20.000000</td>\n",
" <td>1900.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1991.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>24.000000</td>\n",
" <td>2350.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2002.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>28.000000</td>\n",
" <td>2700.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2011.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>124.000000</td>\n",
" <td>7400.000000</td>\n",
" <td>370.000000</td>\n",
" <td>2020.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" highway08 fuelCost08 range year\n",
"count 41144.000000 41144.000000 41144.000000 41144.000000\n",
"mean 24.504667 2362.335942 0.793506 2001.535266\n",
"std 7.730364 654.981925 13.041592 11.142414\n",
"min 9.000000 500.000000 0.000000 1984.000000\n",
"25% 20.000000 1900.000000 0.000000 1991.000000\n",
"50% 24.000000 2350.000000 0.000000 2002.000000\n",
"75% 28.000000 2700.000000 0.000000 2011.000000\n",
"max 124.000000 7400.000000 370.000000 2020.000000"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# chaining\n",
"(autos\n",
" [cols]\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16' })\n",
" .select_dtypes([int, 'int8'])\n",
" .describe()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>fuelCost08</th>\n",
" <th>range</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>18.369045</td>\n",
" <td>20.616396</td>\n",
" <td>24.504667</td>\n",
" <td>2362.335942</td>\n",
" <td>0.793506</td>\n",
" <td>2001.535266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>7.905886</td>\n",
" <td>7.674535</td>\n",
" <td>7.730364</td>\n",
" <td>654.981925</td>\n",
" <td>13.041592</td>\n",
" <td>11.142414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>6.000000</td>\n",
" <td>7.000000</td>\n",
" <td>9.000000</td>\n",
" <td>500.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1984.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>15.000000</td>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>1900.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1991.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>24.000000</td>\n",
" <td>2350.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2002.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>20.000000</td>\n",
" <td>23.000000</td>\n",
" <td>28.000000</td>\n",
" <td>2700.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2011.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>150.000000</td>\n",
" <td>136.000000</td>\n",
" <td>124.000000</td>\n",
" <td>7400.000000</td>\n",
" <td>370.000000</td>\n",
" <td>2020.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 fuelCost08 range \\\n",
"count 41144.000000 41144.000000 41144.000000 41144.000000 41144.000000 \n",
"mean 18.369045 20.616396 24.504667 2362.335942 0.793506 \n",
"std 7.905886 7.674535 7.730364 654.981925 13.041592 \n",
"min 6.000000 7.000000 9.000000 500.000000 0.000000 \n",
"25% 15.000000 17.000000 20.000000 1900.000000 0.000000 \n",
"50% 17.000000 20.000000 24.000000 2350.000000 0.000000 \n",
"75% 20.000000 23.000000 28.000000 2700.000000 0.000000 \n",
"max 150.000000 136.000000 124.000000 7400.000000 370.000000 \n",
"\n",
" year \n",
"count 41144.000000 \n",
"mean 2001.535266 \n",
"std 11.142414 \n",
"min 1984.000000 \n",
"25% 1991.000000 \n",
"50% 2002.000000 \n",
"75% 2011.000000 \n",
"max 2020.000000 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# chaining\n",
"# use 'integer' so see all int-like columns\n",
"(autos\n",
" [cols]\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16'})\n",
" .select_dtypes(['integer']) # see https://numpy.org/doc/stable/reference/arrays.scalars.html\n",
" .describe()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"18124995"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# chaining\n",
"(autos\n",
" [cols]\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16'})\n",
" .memory_usage(deep=True)\n",
" .sum() # was 19,647,323\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Floats"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4.0</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>12.0</td>\n",
" <td>4.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>8.0</td>\n",
" <td>5.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>4.0</td>\n",
" <td>1.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>4.0</td>\n",
" <td>1.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>4.0</td>\n",
" <td>1.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>4.0</td>\n",
" <td>1.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>4.0</td>\n",
" <td>1.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>4.0</td>\n",
" <td>2.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>4.0</td>\n",
" <td>1.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>4.0</td>\n",
" <td>1.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>4.0</td>\n",
" <td>1.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>4.0</td>\n",
" <td>1.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" cylinders displ\n",
"0 4.0 2.0\n",
"1 12.0 4.9\n",
"2 4.0 2.2\n",
"3 8.0 5.2\n",
"4 4.0 2.2\n",
"5 4.0 1.8\n",
"6 4.0 1.8\n",
"7 4.0 1.6\n",
"8 4.0 1.6\n",
"9 4.0 1.8\n",
"... ... ...\n",
"41134 4.0 2.1\n",
"41135 4.0 1.9\n",
"41136 4.0 1.9\n",
"41137 4.0 1.9\n",
"41138 4.0 1.9\n",
"41139 4.0 2.2\n",
"41140 4.0 2.2\n",
"41141 4.0 2.2\n",
"41142 4.0 2.2\n",
"41143 4.0 2.2\n",
"\n",
"[41144 rows x 2 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(autos\n",
"[cols]\n",
".select_dtypes('float'))"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 40938.000000\n",
"mean 5.717084\n",
"std 1.755517\n",
"min 2.000000\n",
"25% 4.000000\n",
"50% 6.000000\n",
"75% 6.000000\n",
"max 16.000000\n",
"Name: cylinders, dtype: float64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# surprise! cylinders looks int-like\n",
"autos.cylinders.describe()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4.0 15938\n",
"6.0 14284\n",
"8.0 8801\n",
"5.0 771\n",
"12.0 626\n",
"3.0 279\n",
"NaN 206\n",
"10.0 170\n",
"2.0 59\n",
"16.0 10\n",
"Name: cylinders, dtype: int64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# opps! missing values\n",
"autos.cylinders.value_counts(dropna=False)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>drive</th>\n",
" <th>eng_dscr</th>\n",
" <th>fuelCost08</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>trany</th>\n",
" <th>range</th>\n",
" <th>createdOn</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>7138</th>\n",
" <td>81</td>\n",
" <td>85</td>\n",
" <td>91</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>800</td>\n",
" <td>Nissan</td>\n",
" <td>Altra EV</td>\n",
" <td>NaN</td>\n",
" <td>90</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7139</th>\n",
" <td>81</td>\n",
" <td>72</td>\n",
" <td>64</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>900</td>\n",
" <td>Toyota</td>\n",
" <td>RAV4 EV</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8143</th>\n",
" <td>81</td>\n",
" <td>72</td>\n",
" <td>64</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>900</td>\n",
" <td>Toyota</td>\n",
" <td>RAV4 EV</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8144</th>\n",
" <td>74</td>\n",
" <td>65</td>\n",
" <td>58</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1000</td>\n",
" <td>Ford</td>\n",
" <td>Th!nk</td>\n",
" <td>NaN</td>\n",
" <td>29</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8146</th>\n",
" <td>45</td>\n",
" <td>39</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>1700</td>\n",
" <td>Ford</td>\n",
" <td>Explorer USPS Electric</td>\n",
" <td>NaN</td>\n",
" <td>38</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8147</th>\n",
" <td>84</td>\n",
" <td>75</td>\n",
" <td>66</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>900</td>\n",
" <td>Nissan</td>\n",
" <td>Hyper-Mini</td>\n",
" <td>NaN</td>\n",
" <td>33</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9212</th>\n",
" <td>87</td>\n",
" <td>78</td>\n",
" <td>69</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>850</td>\n",
" <td>Toyota</td>\n",
" <td>RAV4 EV</td>\n",
" <td>NaN</td>\n",
" <td>95</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9213</th>\n",
" <td>45</td>\n",
" <td>39</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>1700</td>\n",
" <td>Ford</td>\n",
" <td>Explorer USPS Electric</td>\n",
" <td>NaN</td>\n",
" <td>38</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10329</th>\n",
" <td>87</td>\n",
" <td>78</td>\n",
" <td>69</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>850</td>\n",
" <td>Toyota</td>\n",
" <td>RAV4 EV</td>\n",
" <td>NaN</td>\n",
" <td>95</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2003</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21413</th>\n",
" <td>22</td>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>4-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>1750</td>\n",
" <td>Subaru</td>\n",
" <td>RX Turbo</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34407</th>\n",
" <td>73</td>\n",
" <td>72</td>\n",
" <td>71</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>900</td>\n",
" <td>BYD</td>\n",
" <td>e6</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>187</td>\n",
" <td>Wed Mar 13 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34408</th>\n",
" <td>118</td>\n",
" <td>108</td>\n",
" <td>97</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>600</td>\n",
" <td>Nissan</td>\n",
" <td>Leaf (62 kW-hr battery pack)</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>226</td>\n",
" <td>Wed Mar 13 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34409</th>\n",
" <td>114</td>\n",
" <td>104</td>\n",
" <td>94</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>650</td>\n",
" <td>Nissan</td>\n",
" <td>Leaf SV/SL (62 kW-hr battery pack)</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>215</td>\n",
" <td>Wed Mar 13 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34538</th>\n",
" <td>74</td>\n",
" <td>74</td>\n",
" <td>73</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>All-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>900</td>\n",
" <td>Audi</td>\n",
" <td>e-tron</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>204</td>\n",
" <td>Tue Apr 16 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34561</th>\n",
" <td>80</td>\n",
" <td>76</td>\n",
" <td>72</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>4-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>850</td>\n",
" <td>Jaguar</td>\n",
" <td>I-Pace</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>234</td>\n",
" <td>Thu May 02 00:00:00 EDT 2019</td>\n",
" <td>2020</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34563</th>\n",
" <td>138</td>\n",
" <td>131</td>\n",
" <td>124</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>500</td>\n",
" <td>Tesla</td>\n",
" <td>Model 3 Standard Range</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>220</td>\n",
" <td>Thu May 02 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34564</th>\n",
" <td>140</td>\n",
" <td>133</td>\n",
" <td>124</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>500</td>\n",
" <td>Tesla</td>\n",
" <td>Model 3 Standard Range Plus</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>240</td>\n",
" <td>Thu May 02 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34565</th>\n",
" <td>115</td>\n",
" <td>111</td>\n",
" <td>107</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>All-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>600</td>\n",
" <td>Tesla</td>\n",
" <td>Model S Long Range</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>370</td>\n",
" <td>Thu May 02 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34566</th>\n",
" <td>104</td>\n",
" <td>104</td>\n",
" <td>104</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>All-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>650</td>\n",
" <td>Tesla</td>\n",
" <td>Model S Performance (19\" Wheels)</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>345</td>\n",
" <td>Thu May 02 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34567</th>\n",
" <td>98</td>\n",
" <td>97</td>\n",
" <td>96</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>All-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>700</td>\n",
" <td>Tesla</td>\n",
" <td>Model S Performance (21\" Wheels)</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>325</td>\n",
" <td>Thu May 02 00:00:00 EDT 2019</td>\n",
" <td>2019</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>206 rows × 14 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ drive \\\n",
"7138 81 85 91 NaN NaN NaN \n",
"7139 81 72 64 NaN NaN 2-Wheel Drive \n",
"8143 81 72 64 NaN NaN 2-Wheel Drive \n",
"8144 74 65 58 NaN NaN NaN \n",
"8146 45 39 33 NaN NaN 2-Wheel Drive \n",
"8147 84 75 66 NaN NaN NaN \n",
"9212 87 78 69 NaN NaN 2-Wheel Drive \n",
"9213 45 39 33 NaN NaN 2-Wheel Drive \n",
"10329 87 78 69 NaN NaN 2-Wheel Drive \n",
"21413 22 24 28 NaN NaN 4-Wheel Drive \n",
"... ... ... ... ... ... ... \n",
"34407 73 72 71 NaN NaN Front-Wheel Drive \n",
"34408 118 108 97 NaN NaN Front-Wheel Drive \n",
"34409 114 104 94 NaN NaN Front-Wheel Drive \n",
"34538 74 74 73 NaN NaN All-Wheel Drive \n",
"34561 80 76 72 NaN NaN 4-Wheel Drive \n",
"34563 138 131 124 NaN NaN Rear-Wheel Drive \n",
"34564 140 133 124 NaN NaN Rear-Wheel Drive \n",
"34565 115 111 107 NaN NaN All-Wheel Drive \n",
"34566 104 104 104 NaN NaN All-Wheel Drive \n",
"34567 98 97 96 NaN NaN All-Wheel Drive \n",
"\n",
" eng_dscr fuelCost08 make model \\\n",
"7138 NaN 800 Nissan Altra EV \n",
"7139 NaN 900 Toyota RAV4 EV \n",
"8143 NaN 900 Toyota RAV4 EV \n",
"8144 NaN 1000 Ford Th!nk \n",
"8146 NaN 1700 Ford Explorer USPS Electric \n",
"8147 NaN 900 Nissan Hyper-Mini \n",
"9212 NaN 850 Toyota RAV4 EV \n",
"9213 NaN 1700 Ford Explorer USPS Electric \n",
"10329 NaN 850 Toyota RAV4 EV \n",
"21413 NaN 1750 Subaru RX Turbo \n",
"... ... ... ... ... \n",
"34407 NaN 900 BYD e6 \n",
"34408 NaN 600 Nissan Leaf (62 kW-hr battery pack) \n",
"34409 NaN 650 Nissan Leaf SV/SL (62 kW-hr battery pack) \n",
"34538 NaN 900 Audi e-tron \n",
"34561 NaN 850 Jaguar I-Pace \n",
"34563 NaN 500 Tesla Model 3 Standard Range \n",
"34564 NaN 500 Tesla Model 3 Standard Range Plus \n",
"34565 NaN 600 Tesla Model S Long Range \n",
"34566 NaN 650 Tesla Model S Performance (19\" Wheels) \n",
"34567 NaN 700 Tesla Model S Performance (21\" Wheels) \n",
"\n",
" trany range createdOn year \n",
"7138 NaN 90 Tue Jan 01 00:00:00 EST 2013 2000 \n",
"7139 NaN 88 Tue Jan 01 00:00:00 EST 2013 2000 \n",
"8143 NaN 88 Tue Jan 01 00:00:00 EST 2013 2001 \n",
"8144 NaN 29 Tue Jan 01 00:00:00 EST 2013 2001 \n",
"8146 NaN 38 Tue Jan 01 00:00:00 EST 2013 2001 \n",
"8147 NaN 33 Tue Jan 01 00:00:00 EST 2013 2001 \n",
"9212 NaN 95 Tue Jan 01 00:00:00 EST 2013 2002 \n",
"9213 NaN 38 Tue Jan 01 00:00:00 EST 2013 2002 \n",
"10329 NaN 95 Tue Jan 01 00:00:00 EST 2013 2003 \n",
"21413 Manual 5-spd 0 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"... ... ... ... ... \n",
"34407 Automatic (A1) 187 Wed Mar 13 00:00:00 EDT 2019 2019 \n",
"34408 Automatic (A1) 226 Wed Mar 13 00:00:00 EDT 2019 2019 \n",
"34409 Automatic (A1) 215 Wed Mar 13 00:00:00 EDT 2019 2019 \n",
"34538 Automatic (A1) 204 Tue Apr 16 00:00:00 EDT 2019 2019 \n",
"34561 Automatic (A1) 234 Thu May 02 00:00:00 EDT 2019 2020 \n",
"34563 Automatic (A1) 220 Thu May 02 00:00:00 EDT 2019 2019 \n",
"34564 Automatic (A1) 240 Thu May 02 00:00:00 EDT 2019 2019 \n",
"34565 Automatic (A1) 370 Thu May 02 00:00:00 EDT 2019 2019 \n",
"34566 Automatic (A1) 345 Thu May 02 00:00:00 EDT 2019 2019 \n",
"34567 Automatic (A1) 325 Thu May 02 00:00:00 EDT 2019 2019 \n",
"\n",
"[206 rows x 14 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# where are they missing?\n",
"(autos\n",
" [cols]\n",
" .query('cylinders.isna()')\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>fuelCost08</th>\n",
" <th>range</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>18.369045</td>\n",
" <td>20.616396</td>\n",
" <td>24.504667</td>\n",
" <td>5.688460</td>\n",
" <td>3.277904</td>\n",
" <td>2362.335942</td>\n",
" <td>0.793506</td>\n",
" <td>2001.535266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>7.905886</td>\n",
" <td>7.674535</td>\n",
" <td>7.730364</td>\n",
" <td>1.797009</td>\n",
" <td>1.373415</td>\n",
" <td>654.981925</td>\n",
" <td>13.041592</td>\n",
" <td>11.142414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>6.000000</td>\n",
" <td>7.000000</td>\n",
" <td>9.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>500.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1984.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>15.000000</td>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>4.000000</td>\n",
" <td>2.200000</td>\n",
" <td>1900.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1991.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>24.000000</td>\n",
" <td>6.000000</td>\n",
" <td>3.000000</td>\n",
" <td>2350.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2002.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>20.000000</td>\n",
" <td>23.000000</td>\n",
" <td>28.000000</td>\n",
" <td>6.000000</td>\n",
" <td>4.300000</td>\n",
" <td>2700.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2011.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>150.000000</td>\n",
" <td>136.000000</td>\n",
" <td>124.000000</td>\n",
" <td>16.000000</td>\n",
" <td>8.400000</td>\n",
" <td>7400.000000</td>\n",
" <td>370.000000</td>\n",
" <td>2020.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"count 41144.000000 41144.000000 41144.000000 41144.000000 41144.000000 \n",
"mean 18.369045 20.616396 24.504667 5.688460 3.277904 \n",
"std 7.905886 7.674535 7.730364 1.797009 1.373415 \n",
"min 6.000000 7.000000 9.000000 0.000000 0.000000 \n",
"25% 15.000000 17.000000 20.000000 4.000000 2.200000 \n",
"50% 17.000000 20.000000 24.000000 6.000000 3.000000 \n",
"75% 20.000000 23.000000 28.000000 6.000000 4.300000 \n",
"max 150.000000 136.000000 124.000000 16.000000 8.400000 \n",
"\n",
" fuelCost08 range year \n",
"count 41144.000000 41144.000000 41144.000000 \n",
"mean 2362.335942 0.793506 2001.535266 \n",
"std 654.981925 13.041592 11.142414 \n",
"min 500.000000 0.000000 1984.000000 \n",
"25% 1900.000000 0.000000 1991.000000 \n",
"50% 2350.000000 0.000000 2002.000000 \n",
"75% 2700.000000 0.000000 2011.000000 \n",
"max 7400.000000 370.000000 2020.000000 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# chaining - add cylinders and displ columns\n",
"(autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0))\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', \n",
" 'fuelCost08': 'int16', 'range': 'int16', 'year': 'int16', })\n",
" .describe()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>fuelCost08</th>\n",
" <th>range</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>40938.000000</td>\n",
" <td>40940.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" <td>41144.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>18.369045</td>\n",
" <td>20.616396</td>\n",
" <td>24.504667</td>\n",
" <td>5.717084</td>\n",
" <td>3.294238</td>\n",
" <td>2362.335942</td>\n",
" <td>0.793506</td>\n",
" <td>2001.535266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>7.905886</td>\n",
" <td>7.674535</td>\n",
" <td>7.730364</td>\n",
" <td>1.755517</td>\n",
" <td>1.357151</td>\n",
" <td>654.981925</td>\n",
" <td>13.041592</td>\n",
" <td>11.142414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>6.000000</td>\n",
" <td>7.000000</td>\n",
" <td>9.000000</td>\n",
" <td>2.000000</td>\n",
" <td>0.000000</td>\n",
" <td>500.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1984.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>15.000000</td>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>4.000000</td>\n",
" <td>2.200000</td>\n",
" <td>1900.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1991.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>17.000000</td>\n",
" <td>20.000000</td>\n",
" <td>24.000000</td>\n",
" <td>6.000000</td>\n",
" <td>3.000000</td>\n",
" <td>2350.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2002.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>20.000000</td>\n",
" <td>23.000000</td>\n",
" <td>28.000000</td>\n",
" <td>6.000000</td>\n",
" <td>4.300000</td>\n",
" <td>2700.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2011.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>150.000000</td>\n",
" <td>136.000000</td>\n",
" <td>124.000000</td>\n",
" <td>16.000000</td>\n",
" <td>8.400000</td>\n",
" <td>7400.000000</td>\n",
" <td>370.000000</td>\n",
" <td>2020.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"count 41144.000000 41144.000000 41144.000000 40938.000000 40940.000000 \n",
"mean 18.369045 20.616396 24.504667 5.717084 3.294238 \n",
"std 7.905886 7.674535 7.730364 1.755517 1.357151 \n",
"min 6.000000 7.000000 9.000000 2.000000 0.000000 \n",
"25% 15.000000 17.000000 20.000000 4.000000 2.200000 \n",
"50% 17.000000 20.000000 24.000000 6.000000 3.000000 \n",
"75% 20.000000 23.000000 28.000000 6.000000 4.300000 \n",
"max 150.000000 136.000000 124.000000 16.000000 8.400000 \n",
"\n",
" fuelCost08 range year \n",
"count 41144.000000 41144.000000 41144.000000 \n",
"mean 2362.335942 0.793506 2001.535266 \n",
"std 654.981925 13.041592 11.142414 \n",
"min 500.000000 0.000000 1984.000000 \n",
"25% 1900.000000 0.000000 1991.000000 \n",
"50% 2350.000000 0.000000 2002.000000 \n",
"75% 2700.000000 0.000000 2011.000000 \n",
"max 7400.000000 370.000000 2020.000000 "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autos[cols].describe()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"finfo(resolution=0.001, min=-6.55040e+04, max=6.55040e+04, dtype=float16)"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# use this to inspect float sizes\n",
"np.finfo(np.float16)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>drive</th>\n",
" <th>eng_dscr</th>\n",
" <th>fuelCost08</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>trany</th>\n",
" <th>range</th>\n",
" <th>createdOn</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19</td>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>4</td>\n",
" <td>2.000000</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Alfa Romeo</td>\n",
" <td>Spider Veloce 2000</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9</td>\n",
" <td>11</td>\n",
" <td>14</td>\n",
" <td>12</td>\n",
" <td>4.898438</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>(GUZZLER)</td>\n",
" <td>3850</td>\n",
" <td>Ferrari</td>\n",
" <td>Testarossa</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>23</td>\n",
" <td>27</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1550</td>\n",
" <td>Dodge</td>\n",
" <td>Charger</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>8</td>\n",
" <td>5.199219</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>3850</td>\n",
" <td>Dodge</td>\n",
" <td>B150/B250 Wagon 2WD</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>17</td>\n",
" <td>19</td>\n",
" <td>23</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>2700</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>21</td>\n",
" <td>22</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22</td>\n",
" <td>25</td>\n",
" <td>29</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1700</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>23</td>\n",
" <td>24</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1750</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1600</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>23</td>\n",
" <td>25</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1700</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>18</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.099609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2100</td>\n",
" <td>Saab</td>\n",
" <td>900</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(TBI) (FFS)</td>\n",
" <td>1600</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(MFI) (FFS)</td>\n",
" <td>1750</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(TBI) (FFS)</td>\n",
" <td>1500</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>32</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(MFI) (FFS)</td>\n",
" <td>1700</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>19</td>\n",
" <td>22</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>20</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1850</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>16</td>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>2900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 14 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"0 19 21 25 4 2.000000 \n",
"1 9 11 14 12 4.898438 \n",
"2 23 27 33 4 2.199219 \n",
"3 10 11 12 8 5.199219 \n",
"4 17 19 23 4 2.199219 \n",
"5 21 22 24 4 1.799805 \n",
"6 22 25 29 4 1.799805 \n",
"7 23 24 26 4 1.599609 \n",
"8 23 26 31 4 1.599609 \n",
"9 23 25 30 4 1.799805 \n",
"... ... ... ... ... ... \n",
"41134 18 20 24 4 2.099609 \n",
"41135 23 26 33 4 1.900391 \n",
"41136 21 24 30 4 1.900391 \n",
"41137 24 28 33 4 1.900391 \n",
"41138 21 25 32 4 1.900391 \n",
"41139 19 22 26 4 2.199219 \n",
"41140 20 23 28 4 2.199219 \n",
"41141 18 21 24 4 2.199219 \n",
"41142 18 21 24 4 2.199219 \n",
"41143 16 18 21 4 2.199219 \n",
"\n",
" drive eng_dscr fuelCost08 make \\\n",
"0 Rear-Wheel Drive (FFS) 2000 Alfa Romeo \n",
"1 Rear-Wheel Drive (GUZZLER) 3850 Ferrari \n",
"2 Front-Wheel Drive (FFS) 1550 Dodge \n",
"3 Rear-Wheel Drive NaN 3850 Dodge \n",
"4 4-Wheel or All-Wheel Drive (FFS,TRBO) 2700 Subaru \n",
"5 Front-Wheel Drive (FFS) 1900 Subaru \n",
"6 Front-Wheel Drive (FFS) 1700 Subaru \n",
"7 Front-Wheel Drive (FFS) 1750 Toyota \n",
"8 Front-Wheel Drive (FFS) 1600 Toyota \n",
"9 Front-Wheel Drive (FFS) 1700 Toyota \n",
"... ... ... ... ... \n",
"41134 Front-Wheel Drive (FFS) 2100 Saab \n",
"41135 Front-Wheel Drive (TBI) (FFS) 1600 Saturn \n",
"41136 Front-Wheel Drive (MFI) (FFS) 1750 Saturn \n",
"41137 Front-Wheel Drive (TBI) (FFS) 1500 Saturn \n",
"41138 Front-Wheel Drive (MFI) (FFS) 1700 Saturn \n",
"41139 Front-Wheel Drive (FFS) 1900 Subaru \n",
"41140 Front-Wheel Drive (FFS) 1850 Subaru \n",
"41141 4-Wheel or All-Wheel Drive (FFS) 2000 Subaru \n",
"41142 4-Wheel or All-Wheel Drive (FFS) 2000 Subaru \n",
"41143 4-Wheel or All-Wheel Drive (FFS,TRBO) 2900 Subaru \n",
"\n",
" model trany range \\\n",
"0 Spider Veloce 2000 Manual 5-spd 0 \n",
"1 Testarossa Manual 5-spd 0 \n",
"2 Charger Manual 5-spd 0 \n",
"3 B150/B250 Wagon 2WD Automatic 3-spd 0 \n",
"4 Legacy AWD Turbo Manual 5-spd 0 \n",
"5 Loyale Automatic 3-spd 0 \n",
"6 Loyale Manual 5-spd 0 \n",
"7 Corolla Automatic 3-spd 0 \n",
"8 Corolla Manual 5-spd 0 \n",
"9 Corolla Automatic 4-spd 0 \n",
"... ... ... ... \n",
"41134 900 Manual 5-spd 0 \n",
"41135 SL Automatic 4-spd 0 \n",
"41136 SL Automatic 4-spd 0 \n",
"41137 SL Manual 5-spd 0 \n",
"41138 SL Manual 5-spd 0 \n",
"41139 Legacy Automatic 4-spd 0 \n",
"41140 Legacy Manual 5-spd 0 \n",
"41141 Legacy AWD Automatic 4-spd 0 \n",
"41142 Legacy AWD Manual 5-spd 0 \n",
"41143 Legacy AWD Turbo Automatic 4-spd 0 \n",
"\n",
" createdOn year \n",
"0 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"1 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"2 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"3 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"4 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"5 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"6 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"7 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"8 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"9 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"... ... ... \n",
"41134 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41135 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41136 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41137 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41138 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41139 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41140 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41141 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41142 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41143 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"\n",
"[41144 rows x 14 columns]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# chaining - convert displ to float16\n",
"(autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'))\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16'})\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"17590123"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# new memory usage\n",
"(autos\n",
" .loc[:, cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'))\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16'})\n",
" .memory_usage(deep=True)\n",
" .sum() # was 19,647,323\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Objects"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>drive</th>\n",
" <th>eng_dscr</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>trany</th>\n",
" <th>createdOn</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Alfa Romeo</td>\n",
" <td>Spider Veloce 2000</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>(GUZZLER)</td>\n",
" <td>Ferrari</td>\n",
" <td>Testarossa</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Dodge</td>\n",
" <td>Charger</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>Dodge</td>\n",
" <td>B150/B250 Wagon 2WD</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Saab</td>\n",
" <td>900</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(TBI) (FFS)</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(MFI) (FFS)</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(TBI) (FFS)</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(MFI) (FFS)</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" drive eng_dscr make \\\n",
"0 Rear-Wheel Drive (FFS) Alfa Romeo \n",
"1 Rear-Wheel Drive (GUZZLER) Ferrari \n",
"2 Front-Wheel Drive (FFS) Dodge \n",
"3 Rear-Wheel Drive NaN Dodge \n",
"4 4-Wheel or All-Wheel Drive (FFS,TRBO) Subaru \n",
"5 Front-Wheel Drive (FFS) Subaru \n",
"6 Front-Wheel Drive (FFS) Subaru \n",
"7 Front-Wheel Drive (FFS) Toyota \n",
"8 Front-Wheel Drive (FFS) Toyota \n",
"9 Front-Wheel Drive (FFS) Toyota \n",
"... ... ... ... \n",
"41134 Front-Wheel Drive (FFS) Saab \n",
"41135 Front-Wheel Drive (TBI) (FFS) Saturn \n",
"41136 Front-Wheel Drive (MFI) (FFS) Saturn \n",
"41137 Front-Wheel Drive (TBI) (FFS) Saturn \n",
"41138 Front-Wheel Drive (MFI) (FFS) Saturn \n",
"41139 Front-Wheel Drive (FFS) Subaru \n",
"41140 Front-Wheel Drive (FFS) Subaru \n",
"41141 4-Wheel or All-Wheel Drive (FFS) Subaru \n",
"41142 4-Wheel or All-Wheel Drive (FFS) Subaru \n",
"41143 4-Wheel or All-Wheel Drive (FFS,TRBO) Subaru \n",
"\n",
" model trany createdOn \n",
"0 Spider Veloce 2000 Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"1 Testarossa Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"2 Charger Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"3 B150/B250 Wagon 2WD Automatic 3-spd Tue Jan 01 00:00:00 EST 2013 \n",
"4 Legacy AWD Turbo Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"5 Loyale Automatic 3-spd Tue Jan 01 00:00:00 EST 2013 \n",
"6 Loyale Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"7 Corolla Automatic 3-spd Tue Jan 01 00:00:00 EST 2013 \n",
"8 Corolla Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"9 Corolla Automatic 4-spd Tue Jan 01 00:00:00 EST 2013 \n",
"... ... ... ... \n",
"41134 900 Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41135 SL Automatic 4-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41136 SL Automatic 4-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41137 SL Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41138 SL Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41139 Legacy Automatic 4-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41140 Legacy Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41141 Legacy AWD Automatic 4-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41142 Legacy AWD Manual 5-spd Tue Jan 01 00:00:00 EST 2013 \n",
"41143 Legacy AWD Turbo Automatic 4-spd Tue Jan 01 00:00:00 EST 2013 \n",
"\n",
"[41144 rows x 6 columns]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(autos\n",
" [cols]\n",
" .select_dtypes(object)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Front-Wheel Drive 14236\n",
"Rear-Wheel Drive 13831\n",
"4-Wheel or All-Wheel Drive 6648\n",
"All-Wheel Drive 3015\n",
"4-Wheel Drive 1460\n",
"NaN 1189\n",
"2-Wheel Drive 507\n",
"Part-time 4-Wheel Drive 258\n",
"Name: drive, dtype: int64"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# looks categorical\n",
"(autos.drive.value_counts(dropna=False))"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>drive</th>\n",
" <th>eng_dscr</th>\n",
" <th>fuelCost08</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>trany</th>\n",
" <th>range</th>\n",
" <th>createdOn</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>7138</th>\n",
" <td>81</td>\n",
" <td>85</td>\n",
" <td>91</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>800</td>\n",
" <td>Nissan</td>\n",
" <td>Altra EV</td>\n",
" <td>NaN</td>\n",
" <td>90</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8144</th>\n",
" <td>74</td>\n",
" <td>65</td>\n",
" <td>58</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1000</td>\n",
" <td>Ford</td>\n",
" <td>Th!nk</td>\n",
" <td>NaN</td>\n",
" <td>29</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8147</th>\n",
" <td>84</td>\n",
" <td>75</td>\n",
" <td>66</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>900</td>\n",
" <td>Nissan</td>\n",
" <td>Hyper-Mini</td>\n",
" <td>NaN</td>\n",
" <td>33</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18217</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>4.0</td>\n",
" <td>2.0</td>\n",
" <td>NaN</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Alfa Romeo</td>\n",
" <td>Spider Veloce 2000</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18218</th>\n",
" <td>20</td>\n",
" <td>22</td>\n",
" <td>26</td>\n",
" <td>4.0</td>\n",
" <td>1.5</td>\n",
" <td>NaN</td>\n",
" <td>(FFS)</td>\n",
" <td>1900</td>\n",
" <td>Bertone</td>\n",
" <td>X1/9</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18219</th>\n",
" <td>13</td>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" <td>8.0</td>\n",
" <td>5.7</td>\n",
" <td>NaN</td>\n",
" <td>(350 V8) (FFS)</td>\n",
" <td>2800</td>\n",
" <td>Chevrolet</td>\n",
" <td>Corvette</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18220</th>\n",
" <td>13</td>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" <td>8.0</td>\n",
" <td>5.7</td>\n",
" <td>NaN</td>\n",
" <td>(350 V8) (FFS)</td>\n",
" <td>2800</td>\n",
" <td>Chevrolet</td>\n",
" <td>Corvette</td>\n",
" <td>Manual 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18221</th>\n",
" <td>15</td>\n",
" <td>17</td>\n",
" <td>20</td>\n",
" <td>6.0</td>\n",
" <td>3.0</td>\n",
" <td>NaN</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>2500</td>\n",
" <td>Nissan</td>\n",
" <td>300ZX</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18222</th>\n",
" <td>16</td>\n",
" <td>18</td>\n",
" <td>20</td>\n",
" <td>6.0</td>\n",
" <td>3.0</td>\n",
" <td>NaN</td>\n",
" <td>(FFS)</td>\n",
" <td>2350</td>\n",
" <td>Nissan</td>\n",
" <td>300ZX</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18223</th>\n",
" <td>16</td>\n",
" <td>18</td>\n",
" <td>22</td>\n",
" <td>6.0</td>\n",
" <td>3.0</td>\n",
" <td>NaN</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>2350</td>\n",
" <td>Nissan</td>\n",
" <td>300ZX</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20063</th>\n",
" <td>13</td>\n",
" <td>15</td>\n",
" <td>19</td>\n",
" <td>8.0</td>\n",
" <td>5.0</td>\n",
" <td>NaN</td>\n",
" <td>(FFS) CA model</td>\n",
" <td>2800</td>\n",
" <td>Mercury</td>\n",
" <td>Grand Marquis Wagon</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20064</th>\n",
" <td>13</td>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" <td>8.0</td>\n",
" <td>5.0</td>\n",
" <td>NaN</td>\n",
" <td>(GM-OLDS) CA model</td>\n",
" <td>2800</td>\n",
" <td>Oldsmobile</td>\n",
" <td>Custom Cruiser Wagon</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20065</th>\n",
" <td>14</td>\n",
" <td>16</td>\n",
" <td>19</td>\n",
" <td>8.0</td>\n",
" <td>5.0</td>\n",
" <td>NaN</td>\n",
" <td>(GM-CHEV) CA model</td>\n",
" <td>2650</td>\n",
" <td>Pontiac</td>\n",
" <td>Parisienne Wagon</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20387</th>\n",
" <td>14</td>\n",
" <td>14</td>\n",
" <td>15</td>\n",
" <td>4.0</td>\n",
" <td>2.4</td>\n",
" <td>NaN</td>\n",
" <td>(FFS) CA model</td>\n",
" <td>3000</td>\n",
" <td>Nissan</td>\n",
" <td>Pickup Cab Chassis</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1984</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21129</th>\n",
" <td>14</td>\n",
" <td>16</td>\n",
" <td>21</td>\n",
" <td>8.0</td>\n",
" <td>3.5</td>\n",
" <td>NaN</td>\n",
" <td>GUZZLER FFS,TURBO</td>\n",
" <td>3250</td>\n",
" <td>Lotus</td>\n",
" <td>Esprit V8</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23029</th>\n",
" <td>79</td>\n",
" <td>85</td>\n",
" <td>94</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Lead Acid</td>\n",
" <td>800</td>\n",
" <td>GMC</td>\n",
" <td>EV1</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>55</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23030</th>\n",
" <td>35</td>\n",
" <td>37</td>\n",
" <td>39</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NiMH</td>\n",
" <td>1750</td>\n",
" <td>GMC</td>\n",
" <td>EV1</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>105</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23032</th>\n",
" <td>49</td>\n",
" <td>48</td>\n",
" <td>46</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1400</td>\n",
" <td>Honda</td>\n",
" <td>EV Plus</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>81</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23037</th>\n",
" <td>49</td>\n",
" <td>48</td>\n",
" <td>46</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1400</td>\n",
" <td>Honda</td>\n",
" <td>EV Plus</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>81</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1998</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23040</th>\n",
" <td>102</td>\n",
" <td>98</td>\n",
" <td>94</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>650</td>\n",
" <td>MINI</td>\n",
" <td>MiniE</td>\n",
" <td>Automatic (A1)</td>\n",
" <td>100</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1189 rows × 14 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ drive eng_dscr \\\n",
"7138 81 85 91 NaN NaN NaN NaN \n",
"8144 74 65 58 NaN NaN NaN NaN \n",
"8147 84 75 66 NaN NaN NaN NaN \n",
"18217 18 21 25 4.0 2.0 NaN (FFS) \n",
"18218 20 22 26 4.0 1.5 NaN (FFS) \n",
"18219 13 15 20 8.0 5.7 NaN (350 V8) (FFS) \n",
"18220 13 15 20 8.0 5.7 NaN (350 V8) (FFS) \n",
"18221 15 17 20 6.0 3.0 NaN (FFS,TRBO) \n",
"18222 16 18 20 6.0 3.0 NaN (FFS) \n",
"18223 16 18 22 6.0 3.0 NaN (FFS,TRBO) \n",
"... ... ... ... ... ... ... ... \n",
"20063 13 15 19 8.0 5.0 NaN (FFS) CA model \n",
"20064 13 15 20 8.0 5.0 NaN (GM-OLDS) CA model \n",
"20065 14 16 19 8.0 5.0 NaN (GM-CHEV) CA model \n",
"20387 14 14 15 4.0 2.4 NaN (FFS) CA model \n",
"21129 14 16 21 8.0 3.5 NaN GUZZLER FFS,TURBO \n",
"23029 79 85 94 NaN NaN NaN Lead Acid \n",
"23030 35 37 39 NaN NaN NaN NiMH \n",
"23032 49 48 46 NaN NaN NaN NaN \n",
"23037 49 48 46 NaN NaN NaN NaN \n",
"23040 102 98 94 NaN NaN NaN NaN \n",
"\n",
" fuelCost08 make model trany range \\\n",
"7138 800 Nissan Altra EV NaN 90 \n",
"8144 1000 Ford Th!nk NaN 29 \n",
"8147 900 Nissan Hyper-Mini NaN 33 \n",
"18217 2000 Alfa Romeo Spider Veloce 2000 Manual 5-spd 0 \n",
"18218 1900 Bertone X1/9 Manual 5-spd 0 \n",
"18219 2800 Chevrolet Corvette Automatic 4-spd 0 \n",
"18220 2800 Chevrolet Corvette Manual 4-spd 0 \n",
"18221 2500 Nissan 300ZX Automatic 4-spd 0 \n",
"18222 2350 Nissan 300ZX Automatic 4-spd 0 \n",
"18223 2350 Nissan 300ZX Manual 5-spd 0 \n",
"... ... ... ... ... ... \n",
"20063 2800 Mercury Grand Marquis Wagon Automatic 4-spd 0 \n",
"20064 2800 Oldsmobile Custom Cruiser Wagon Automatic 4-spd 0 \n",
"20065 2650 Pontiac Parisienne Wagon Automatic 4-spd 0 \n",
"20387 3000 Nissan Pickup Cab Chassis Manual 5-spd 0 \n",
"21129 3250 Lotus Esprit V8 Manual 5-spd 0 \n",
"23029 800 GMC EV1 Automatic (A1) 55 \n",
"23030 1750 GMC EV1 Automatic (A1) 105 \n",
"23032 1400 Honda EV Plus Automatic (A1) 81 \n",
"23037 1400 Honda EV Plus Automatic (A1) 81 \n",
"23040 650 MINI MiniE Automatic (A1) 100 \n",
"\n",
" createdOn year \n",
"7138 Tue Jan 01 00:00:00 EST 2013 2000 \n",
"8144 Tue Jan 01 00:00:00 EST 2013 2001 \n",
"8147 Tue Jan 01 00:00:00 EST 2013 2001 \n",
"18217 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"18218 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"18219 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"18220 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"18221 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"18222 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"18223 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"... ... ... \n",
"20063 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"20064 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"20065 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"20387 Tue Jan 01 00:00:00 EST 2013 1984 \n",
"21129 Tue Jan 01 00:00:00 EST 2013 2002 \n",
"23029 Tue Jan 01 00:00:00 EST 2013 1999 \n",
"23030 Tue Jan 01 00:00:00 EST 2013 1999 \n",
"23032 Tue Jan 01 00:00:00 EST 2013 1999 \n",
"23037 Tue Jan 01 00:00:00 EST 2013 1998 \n",
"23040 Tue Jan 01 00:00:00 EST 2013 2008 \n",
"\n",
"[1189 rows x 14 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# where are the values missing for drive?\n",
"(autos\n",
" [cols]\n",
" .query('drive.isna()'))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"12093275"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# drive and make (in .astype) to category\n",
"(autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category')\n",
" )\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .memory_usage(deep=True)\n",
" .sum() # was 19,647,323\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Automatic 4-spd 11047\n",
"Manual 5-spd 8361\n",
"Automatic 3-spd 3151\n",
"Automatic (S6) 3106\n",
"Manual 6-spd 2757\n",
"Automatic 5-spd 2203\n",
"Automatic (S8) 1665\n",
"Automatic 6-spd 1619\n",
"Manual 4-spd 1483\n",
"Automatic (S5) 833\n",
"Automatic (variable gear ratios) 826\n",
"Automatic 7-spd 724\n",
"Automatic 8-spd 433\n",
"Automatic (AM-S7) 424\n",
"Automatic (S7) 327\n",
"Automatic 9-spd 293\n",
"Automatic (AM7) 245\n",
"Automatic (S4) 233\n",
"Automatic (AV-S6) 208\n",
"Automatic (A1) 201\n",
"Automatic (AM6) 151\n",
"Automatic (AV-S7) 139\n",
"Automatic (S10) 124\n",
"Automatic (AM-S6) 116\n",
"Manual 7-spd 114\n",
"Automatic (S9) 86\n",
"Manual 3-spd 77\n",
"Automatic (AM-S8) 60\n",
"Automatic (AV-S8) 47\n",
"Automatic 10-spd 25\n",
"Manual 4-spd Doubled 17\n",
"Automatic (AM5) 14\n",
"NaN 11\n",
"Automatic (AV-S10) 11\n",
"Automatic (AM8) 6\n",
"Automatic (AM-S9) 3\n",
"Automatic (L3) 2\n",
"Automatic (L4) 2\n",
"Name: trany, dtype: int64"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# let's inspect trany\n",
"# looks like it has two pieces of information embedded in column\n",
"(autos.trany.value_counts(dropna=False))"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"10631047"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add automatic, speeds from trany, then drop trany\n",
"(autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category'),\n",
" automatic=autos.trany.str.contains('Auto'),\n",
" speeds=autos.trany.str.extract(r'(\\d)+').fillna('20').astype('int8')\n",
" )\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .drop(columns=['trany'])\n",
" .memory_usage(deep=True)\n",
" .sum() # was 19,647,323\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dates"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/matt/envs/menv/lib/python3.8/site-packages/dateutil/parser/_parser.py:1213: UnknownTimezoneWarning: tzname EST identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.\n",
" warnings.warn(\"tzname {tzname} identified but not understood. \"\n",
"/home/matt/envs/menv/lib/python3.8/site-packages/dateutil/parser/_parser.py:1213: UnknownTimezoneWarning: tzname EDT identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.\n",
" warnings.warn(\"tzname {tzname} identified but not understood. \"\n"
]
},
{
"data": {
"text/plain": [
"7462959"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add createdOn\n",
"(autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category'),\n",
" automatic=autos.trany.str.contains('Auto'),\n",
" speeds=autos.trany.str.extract(r'(\\d)+').fillna('20').astype('int8'),\n",
" createdOn=pd.to_datetime(autos.createdOn).dt.tz_localize('America/New_York')\n",
" )\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .drop(columns=['trany'])\n",
" .memory_usage(deep=True)\n",
" .sum() # was 19,647,323\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0 Tue Jan 01 00:00:00 EST 2013\n",
"1 Tue Jan 01 00:00:00 EST 2013\n",
"2 Tue Jan 01 00:00:00 EST 2013\n",
"3 Tue Jan 01 00:00:00 EST 2013\n",
"4 Tue Jan 01 00:00:00 EST 2013\n",
"5 Tue Jan 01 00:00:00 EST 2013\n",
"6 Tue Jan 01 00:00:00 EST 2013\n",
"7 Tue Jan 01 00:00:00 EST 2013\n",
"8 Tue Jan 01 00:00:00 EST 2013\n",
"9 Tue Jan 01 00:00:00 EST 2013\n",
" ... \n",
"41134 Tue Jan 01 00:00:00 EST 2013\n",
"41135 Tue Jan 01 00:00:00 EST 2013\n",
"41136 Tue Jan 01 00:00:00 EST 2013\n",
"41137 Tue Jan 01 00:00:00 EST 2013\n",
"41138 Tue Jan 01 00:00:00 EST 2013\n",
"41139 Tue Jan 01 00:00:00 EST 2013\n",
"41140 Tue Jan 01 00:00:00 EST 2013\n",
"41141 Tue Jan 01 00:00:00 EST 2013\n",
"41142 Tue Jan 01 00:00:00 EST 2013\n",
"41143 Tue Jan 01 00:00:00 EST 2013\n",
"Name: createdOn, Length: 41144, dtype: object"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Python doesn't like EST/EDT\n",
"autos[cols].createdOn"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"NaN 16153\n",
"(FFS) 8827\n",
"SIDI 5526\n",
"(FFS) CA model 926\n",
"(FFS) (MPFI) 734\n",
"FFV 701\n",
"(FFS,TRBO) 666\n",
"(350 V8) (FFS) 411\n",
"(GUZZLER) (FFS) 366\n",
"SOHC 354\n",
" ... \n",
"B234L/R4 (FFS,TRBO) 1\n",
"GUZZLER V8 FFS,TURBO 1\n",
"4.6M FFS MPFI 1\n",
"CNG FFS 1\n",
"POLICE FFS MPFI 1\n",
"B308E5 FFS,TURBO 1\n",
"5.4E-R FFS MPFI 1\n",
"V-6 FFS 1\n",
"(GUZZLER) (FFS) (S-CHARGE) 1\n",
"R-ENG (FFS,TRBO) 1\n",
"Name: eng_dscr, Length: 558, dtype: int64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fix date warnings - move on to eng_dscr\n",
"# http://www.fueleconomy.gov/feg/findacarhelp.shtml#trany\n",
"(autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category'),\n",
" automatic=autos.trany.str.contains('Auto'),\n",
" speeds=autos.trany.str.extract(r'(\\d)+').fillna('20').astype('int8'),\n",
" createdOn=pd.to_datetime(autos.createdOn.replace({' EDT': '-04:00',\n",
" ' EST': '-05:00'}, regex=True), utc=True).dt.tz_convert('America/New_York')\n",
" )\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .drop(columns=['trany'])\n",
" .eng_dscr\n",
" .value_counts(dropna=False)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"6701302"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add ffs (Feedback fuel system), drop eng_descr\n",
"(autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category'),\n",
" automatic=autos.trany.str.contains('Auto'),\n",
" speeds=autos.trany.str.extract(r'(\\d)+').fillna('20').astype('int8'),\n",
" createdOn=pd.to_datetime(autos.createdOn.replace({' EDT': '-04:00',\n",
" ' EST': '-05:00'}, regex=True), utc=True).dt.tz_convert('America/New_York'),\n",
" ffs=autos.eng_dscr.str.contains('FFS')\n",
" )\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .drop(columns=['trany', 'eng_dscr'])\n",
" .memory_usage(deep=True)\n",
" .sum() # was 19,647,323\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"lines_to_next_cell": 0,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>drive</th>\n",
" <th>fuelCost08</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>range</th>\n",
" <th>createdOn</th>\n",
" <th>year</th>\n",
" <th>automatic</th>\n",
" <th>speeds</th>\n",
" <th>ffs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19</td>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>4</td>\n",
" <td>2.000000</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Alfa Romeo</td>\n",
" <td>Spider Veloce 2000</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9</td>\n",
" <td>11</td>\n",
" <td>14</td>\n",
" <td>12</td>\n",
" <td>4.898438</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>3850</td>\n",
" <td>Ferrari</td>\n",
" <td>Testarossa</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>23</td>\n",
" <td>27</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1550</td>\n",
" <td>Dodge</td>\n",
" <td>Charger</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>8</td>\n",
" <td>5.199219</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>3850</td>\n",
" <td>Dodge</td>\n",
" <td>B150/B250 Wagon 2WD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>17</td>\n",
" <td>19</td>\n",
" <td>23</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2700</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>21</td>\n",
" <td>22</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22</td>\n",
" <td>25</td>\n",
" <td>29</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>23</td>\n",
" <td>24</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1750</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1600</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>23</td>\n",
" <td>25</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>18</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.099609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>2100</td>\n",
" <td>Saab</td>\n",
" <td>900</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1600</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1750</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1500</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>32</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>19</td>\n",
" <td>22</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>20</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1850</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>16</td>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 15 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"0 19 21 25 4 2.000000 \n",
"1 9 11 14 12 4.898438 \n",
"2 23 27 33 4 2.199219 \n",
"3 10 11 12 8 5.199219 \n",
"4 17 19 23 4 2.199219 \n",
"5 21 22 24 4 1.799805 \n",
"6 22 25 29 4 1.799805 \n",
"7 23 24 26 4 1.599609 \n",
"8 23 26 31 4 1.599609 \n",
"9 23 25 30 4 1.799805 \n",
"... ... ... ... ... ... \n",
"41134 18 20 24 4 2.099609 \n",
"41135 23 26 33 4 1.900391 \n",
"41136 21 24 30 4 1.900391 \n",
"41137 24 28 33 4 1.900391 \n",
"41138 21 25 32 4 1.900391 \n",
"41139 19 22 26 4 2.199219 \n",
"41140 20 23 28 4 2.199219 \n",
"41141 18 21 24 4 2.199219 \n",
"41142 18 21 24 4 2.199219 \n",
"41143 16 18 21 4 2.199219 \n",
"\n",
" drive fuelCost08 make \\\n",
"0 Rear-Wheel Drive 2000 Alfa Romeo \n",
"1 Rear-Wheel Drive 3850 Ferrari \n",
"2 Front-Wheel Drive 1550 Dodge \n",
"3 Rear-Wheel Drive 3850 Dodge \n",
"4 4-Wheel or All-Wheel Drive 2700 Subaru \n",
"5 Front-Wheel Drive 1900 Subaru \n",
"6 Front-Wheel Drive 1700 Subaru \n",
"7 Front-Wheel Drive 1750 Toyota \n",
"8 Front-Wheel Drive 1600 Toyota \n",
"9 Front-Wheel Drive 1700 Toyota \n",
"... ... ... ... \n",
"41134 Front-Wheel Drive 2100 Saab \n",
"41135 Front-Wheel Drive 1600 Saturn \n",
"41136 Front-Wheel Drive 1750 Saturn \n",
"41137 Front-Wheel Drive 1500 Saturn \n",
"41138 Front-Wheel Drive 1700 Saturn \n",
"41139 Front-Wheel Drive 1900 Subaru \n",
"41140 Front-Wheel Drive 1850 Subaru \n",
"41141 4-Wheel or All-Wheel Drive 2000 Subaru \n",
"41142 4-Wheel or All-Wheel Drive 2000 Subaru \n",
"41143 4-Wheel or All-Wheel Drive 2900 Subaru \n",
"\n",
" model range createdOn year automatic \\\n",
"0 Spider Veloce 2000 0 2013-01-01 00:00:00-05:00 1985 False \n",
"1 Testarossa 0 2013-01-01 00:00:00-05:00 1985 False \n",
"2 Charger 0 2013-01-01 00:00:00-05:00 1985 False \n",
"3 B150/B250 Wagon 2WD 0 2013-01-01 00:00:00-05:00 1985 True \n",
"4 Legacy AWD Turbo 0 2013-01-01 00:00:00-05:00 1993 False \n",
"5 Loyale 0 2013-01-01 00:00:00-05:00 1993 True \n",
"6 Loyale 0 2013-01-01 00:00:00-05:00 1993 False \n",
"7 Corolla 0 2013-01-01 00:00:00-05:00 1993 True \n",
"8 Corolla 0 2013-01-01 00:00:00-05:00 1993 False \n",
"9 Corolla 0 2013-01-01 00:00:00-05:00 1993 True \n",
"... ... ... ... ... ... \n",
"41134 900 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41135 SL 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41136 SL 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41137 SL 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41138 SL 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41139 Legacy 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41140 Legacy 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41141 Legacy AWD 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41142 Legacy AWD 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41143 Legacy AWD Turbo 0 2013-01-01 00:00:00-05:00 1993 True \n",
"\n",
" speeds ffs \n",
"0 5 True \n",
"1 5 False \n",
"2 5 True \n",
"3 3 NaN \n",
"4 5 True \n",
"5 3 True \n",
"6 5 True \n",
"7 3 True \n",
"8 5 True \n",
"9 4 True \n",
"... ... ... \n",
"41134 5 True \n",
"41135 4 True \n",
"41136 4 True \n",
"41137 5 True \n",
"41138 5 True \n",
"41139 4 True \n",
"41140 5 True \n",
"41141 4 True \n",
"41142 5 True \n",
"41143 4 True \n",
"\n",
"[41144 rows x 15 columns]"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# a glorious function\n",
"def tweak_autos(autos):\n",
" cols = ['city08', 'comb08', 'highway08', 'cylinders', 'displ', 'drive', 'eng_dscr', \n",
" 'fuelCost08', 'make', 'model', 'trany', 'range', 'createdOn', 'year']\n",
" return (autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category'),\n",
" automatic=autos.trany.str.contains('Auto'),\n",
" speeds=autos.trany.str.extract(r'(\\d)+').fillna('20').astype('int8'),\n",
" createdOn=pd.to_datetime(autos.createdOn.replace({' EDT': '-04:00',\n",
" ' EST': '-05:00'}, regex=True), utc=True).dt.tz_convert('America/New_York'),\n",
" ffs=autos.eng_dscr.str.contains('FFS')\n",
" )\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16',\n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .drop(columns=['trany', 'eng_dscr'])\n",
" )\n",
"\n",
"tweak_autos(autos)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Chain\n",
"\n",
"Chaining is also called \"flow\" programming. Rather than making intermediate variables, just leverage the fact that most operations return a new object and work on that.\n",
"\n",
"The chain should read like a recipe of ordered steps.\n",
"\n",
"(BTW, this is actually what we did above.)\n",
"\n",
"<div class='alert alert-warning'>\n",
" Hint: Leverage <tt>.pipe</tt> if you can't find a way to chain 😉🐼💪\n",
"</div>\n",
" \n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>drive</th>\n",
" <th>fuelCost08</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>range</th>\n",
" <th>createdOn</th>\n",
" <th>year</th>\n",
" <th>automatic</th>\n",
" <th>speeds</th>\n",
" <th>ffs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19</td>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>4</td>\n",
" <td>2.000000</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Alfa Romeo</td>\n",
" <td>Spider Veloce 2000</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9</td>\n",
" <td>11</td>\n",
" <td>14</td>\n",
" <td>12</td>\n",
" <td>4.898438</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>3850</td>\n",
" <td>Ferrari</td>\n",
" <td>Testarossa</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>23</td>\n",
" <td>27</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1550</td>\n",
" <td>Dodge</td>\n",
" <td>Charger</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>8</td>\n",
" <td>5.199219</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>3850</td>\n",
" <td>Dodge</td>\n",
" <td>B150/B250 Wagon 2WD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>17</td>\n",
" <td>19</td>\n",
" <td>23</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2700</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>21</td>\n",
" <td>22</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22</td>\n",
" <td>25</td>\n",
" <td>29</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>23</td>\n",
" <td>24</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1750</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1600</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>23</td>\n",
" <td>25</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>18</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.099609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>2100</td>\n",
" <td>Saab</td>\n",
" <td>900</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1600</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1750</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1500</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>32</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>19</td>\n",
" <td>22</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>20</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1850</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>16</td>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 15 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"0 19 21 25 4 2.000000 \n",
"1 9 11 14 12 4.898438 \n",
"2 23 27 33 4 2.199219 \n",
"3 10 11 12 8 5.199219 \n",
"4 17 19 23 4 2.199219 \n",
"5 21 22 24 4 1.799805 \n",
"6 22 25 29 4 1.799805 \n",
"7 23 24 26 4 1.599609 \n",
"8 23 26 31 4 1.599609 \n",
"9 23 25 30 4 1.799805 \n",
"... ... ... ... ... ... \n",
"41134 18 20 24 4 2.099609 \n",
"41135 23 26 33 4 1.900391 \n",
"41136 21 24 30 4 1.900391 \n",
"41137 24 28 33 4 1.900391 \n",
"41138 21 25 32 4 1.900391 \n",
"41139 19 22 26 4 2.199219 \n",
"41140 20 23 28 4 2.199219 \n",
"41141 18 21 24 4 2.199219 \n",
"41142 18 21 24 4 2.199219 \n",
"41143 16 18 21 4 2.199219 \n",
"\n",
" drive fuelCost08 make \\\n",
"0 Rear-Wheel Drive 2000 Alfa Romeo \n",
"1 Rear-Wheel Drive 3850 Ferrari \n",
"2 Front-Wheel Drive 1550 Dodge \n",
"3 Rear-Wheel Drive 3850 Dodge \n",
"4 4-Wheel or All-Wheel Drive 2700 Subaru \n",
"5 Front-Wheel Drive 1900 Subaru \n",
"6 Front-Wheel Drive 1700 Subaru \n",
"7 Front-Wheel Drive 1750 Toyota \n",
"8 Front-Wheel Drive 1600 Toyota \n",
"9 Front-Wheel Drive 1700 Toyota \n",
"... ... ... ... \n",
"41134 Front-Wheel Drive 2100 Saab \n",
"41135 Front-Wheel Drive 1600 Saturn \n",
"41136 Front-Wheel Drive 1750 Saturn \n",
"41137 Front-Wheel Drive 1500 Saturn \n",
"41138 Front-Wheel Drive 1700 Saturn \n",
"41139 Front-Wheel Drive 1900 Subaru \n",
"41140 Front-Wheel Drive 1850 Subaru \n",
"41141 4-Wheel or All-Wheel Drive 2000 Subaru \n",
"41142 4-Wheel or All-Wheel Drive 2000 Subaru \n",
"41143 4-Wheel or All-Wheel Drive 2900 Subaru \n",
"\n",
" model range createdOn year automatic \\\n",
"0 Spider Veloce 2000 0 2013-01-01 00:00:00-05:00 1985 False \n",
"1 Testarossa 0 2013-01-01 00:00:00-05:00 1985 False \n",
"2 Charger 0 2013-01-01 00:00:00-05:00 1985 False \n",
"3 B150/B250 Wagon 2WD 0 2013-01-01 00:00:00-05:00 1985 True \n",
"4 Legacy AWD Turbo 0 2013-01-01 00:00:00-05:00 1993 False \n",
"5 Loyale 0 2013-01-01 00:00:00-05:00 1993 True \n",
"6 Loyale 0 2013-01-01 00:00:00-05:00 1993 False \n",
"7 Corolla 0 2013-01-01 00:00:00-05:00 1993 True \n",
"8 Corolla 0 2013-01-01 00:00:00-05:00 1993 False \n",
"9 Corolla 0 2013-01-01 00:00:00-05:00 1993 True \n",
"... ... ... ... ... ... \n",
"41134 900 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41135 SL 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41136 SL 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41137 SL 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41138 SL 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41139 Legacy 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41140 Legacy 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41141 Legacy AWD 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41142 Legacy AWD 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41143 Legacy AWD Turbo 0 2013-01-01 00:00:00-05:00 1993 True \n",
"\n",
" speeds ffs \n",
"0 5 True \n",
"1 5 False \n",
"2 5 True \n",
"3 3 NaN \n",
"4 5 True \n",
"5 3 True \n",
"6 5 True \n",
"7 3 True \n",
"8 5 True \n",
"9 4 True \n",
"... ... ... \n",
"41134 5 True \n",
"41135 4 True \n",
"41136 4 True \n",
"41137 5 True \n",
"41138 5 True \n",
"41139 4 True \n",
"41140 5 True \n",
"41141 4 True \n",
"41142 5 True \n",
"41143 4 True \n",
"\n",
"[41144 rows x 15 columns]"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def tweak_autos(autos):\n",
" cols = ['city08', 'comb08', 'highway08', 'cylinders', 'displ', 'drive', 'eng_dscr', \n",
" 'fuelCost08', 'make', 'model', 'trany', 'range', 'createdOn', 'year']\n",
" return (autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category'),\n",
" automatic=autos.trany.str.contains('Auto'),\n",
" speeds=autos.trany.str.extract(r'(\\d)+').fillna('20').astype('int8'),\n",
" createdOn=pd.to_datetime(autos.createdOn.replace({' EDT': '-04:00',\n",
" ' EST': '-05:00'}, regex=True), utc=True).dt.tz_convert('America/New_York'),\n",
" ffs=autos.eng_dscr.str.contains('FFS')\n",
" )\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .drop(columns=['trany', 'eng_dscr'])\n",
" )\n",
"\n",
"tweak_autos(autos)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-39-c7c24d2ec7ba>:5: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['cylinders'] = cyls2\n",
"/home/matt/envs/menv/lib/python3.8/site-packages/pandas/core/generic.py:5516: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" self[name] = value\n",
"<ipython-input-39-c7c24d2ec7ba>:11: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['automatic'] = autos.trany.str.contains('Auto')\n",
"<ipython-input-39-c7c24d2ec7ba>:15: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['speeds'] = speedint\n",
"/home/matt/envs/menv/lib/python3.8/site-packages/dateutil/parser/_parser.py:1213: UnknownTimezoneWarning: tzname EST identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.\n",
" warnings.warn(\"tzname {tzname} identified but not understood. \"\n",
"/home/matt/envs/menv/lib/python3.8/site-packages/dateutil/parser/_parser.py:1213: UnknownTimezoneWarning: tzname EDT identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.\n",
" warnings.warn(\"tzname {tzname} identified but not understood. \"\n",
"<ipython-input-39-c7c24d2ec7ba>:17: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access\n",
" a1.ffs=autos.eng_dscr.str.contains('FFS')\n",
"<ipython-input-39-c7c24d2ec7ba>:18: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['highway08'] = autos.highway08.astype('int8')\n",
"<ipython-input-39-c7c24d2ec7ba>:19: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['city08'] = autos.city08.astype('int8')\n",
"<ipython-input-39-c7c24d2ec7ba>:20: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['comb08'] = autos.comb08.astype('int16')\n",
"<ipython-input-39-c7c24d2ec7ba>:21: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['fuelCost08'] = autos.fuelCost08.astype('int16')\n",
"<ipython-input-39-c7c24d2ec7ba>:22: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['range'] = autos.range.astype('int16')\n",
"<ipython-input-39-c7c24d2ec7ba>:23: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" a1['make'] = autos.make.astype('category')\n"
]
}
],
"source": [
"# compare chain to this mess\n",
"a1 = autos[cols]\n",
"cyls = autos.cylinders.fillna(0)\n",
"cyls2 = cyls.astype('int8')\n",
"a1['cylinders'] = cyls2\n",
"displ = a1.displ\n",
"displ2 = displ.fillna(0)\n",
"displ3 = displ2.astype('float16')\n",
"a1.displ = displ3\n",
"a1.drive = autos.drive.fillna('Other').astype('category')\n",
"a1['automatic'] = autos.trany.str.contains('Auto') \n",
"speed = autos.trany.str.extract(r'(\\d)+')\n",
"speedfill = speed.fillna('20')\n",
"speedint = speedfill.astype('int8')\n",
"a1['speeds'] = speedint\n",
"a1.createdOn=pd.to_datetime(autos.createdOn).dt.tz_localize('America/New_York')\n",
"a1.ffs=autos.eng_dscr.str.contains('FFS')\n",
"a1['highway08'] = autos.highway08.astype('int8')\n",
"a1['city08'] = autos.city08.astype('int8')\n",
"a1['comb08'] = autos.comb08.astype('int16')\n",
"a1['fuelCost08'] = autos.fuelCost08.astype('int16')\n",
"a1['range'] = autos.range.astype('int16')\n",
"a1['make'] = autos.make.astype('category')\n",
"a3 = a1.drop(columns=['trany', 'eng_dscr'])"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>drive</th>\n",
" <th>eng_dscr</th>\n",
" <th>fuelCost08</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>trany</th>\n",
" <th>range</th>\n",
" <th>createdOn</th>\n",
" <th>year</th>\n",
" <th>automatic</th>\n",
" <th>speeds</th>\n",
" <th>ffs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19</td>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>4</td>\n",
" <td>2.000000</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Alfa Romeo</td>\n",
" <td>Spider Veloce 2000</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9</td>\n",
" <td>11</td>\n",
" <td>14</td>\n",
" <td>12</td>\n",
" <td>4.898438</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>(GUZZLER)</td>\n",
" <td>3850</td>\n",
" <td>Ferrari</td>\n",
" <td>Testarossa</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>23</td>\n",
" <td>27</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1550</td>\n",
" <td>Dodge</td>\n",
" <td>Charger</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>8</td>\n",
" <td>5.199219</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>3850</td>\n",
" <td>Dodge</td>\n",
" <td>B150/B250 Wagon 2WD</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>17</td>\n",
" <td>19</td>\n",
" <td>23</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>2700</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>21</td>\n",
" <td>22</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22</td>\n",
" <td>25</td>\n",
" <td>29</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1700</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>23</td>\n",
" <td>24</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1750</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1600</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>23</td>\n",
" <td>25</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1700</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>18</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.099609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2100</td>\n",
" <td>Saab</td>\n",
" <td>900</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(TBI) (FFS)</td>\n",
" <td>1600</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(MFI) (FFS)</td>\n",
" <td>1750</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(TBI) (FFS)</td>\n",
" <td>1500</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>32</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(MFI) (FFS)</td>\n",
" <td>1700</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>19</td>\n",
" <td>22</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>20</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1850</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>16</td>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>2900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 17 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"0 19 21 25 4 2.000000 \n",
"1 9 11 14 12 4.898438 \n",
"2 23 27 33 4 2.199219 \n",
"3 10 11 12 8 5.199219 \n",
"4 17 19 23 4 2.199219 \n",
"5 21 22 24 4 1.799805 \n",
"6 22 25 29 4 1.799805 \n",
"7 23 24 26 4 1.599609 \n",
"8 23 26 31 4 1.599609 \n",
"9 23 25 30 4 1.799805 \n",
"... ... ... ... ... ... \n",
"41134 18 20 24 4 2.099609 \n",
"41135 23 26 33 4 1.900391 \n",
"41136 21 24 30 4 1.900391 \n",
"41137 24 28 33 4 1.900391 \n",
"41138 21 25 32 4 1.900391 \n",
"41139 19 22 26 4 2.199219 \n",
"41140 20 23 28 4 2.199219 \n",
"41141 18 21 24 4 2.199219 \n",
"41142 18 21 24 4 2.199219 \n",
"41143 16 18 21 4 2.199219 \n",
"\n",
" drive eng_dscr fuelCost08 make \\\n",
"0 Rear-Wheel Drive (FFS) 2000 Alfa Romeo \n",
"1 Rear-Wheel Drive (GUZZLER) 3850 Ferrari \n",
"2 Front-Wheel Drive (FFS) 1550 Dodge \n",
"3 Rear-Wheel Drive NaN 3850 Dodge \n",
"4 4-Wheel or All-Wheel Drive (FFS,TRBO) 2700 Subaru \n",
"5 Front-Wheel Drive (FFS) 1900 Subaru \n",
"6 Front-Wheel Drive (FFS) 1700 Subaru \n",
"7 Front-Wheel Drive (FFS) 1750 Toyota \n",
"8 Front-Wheel Drive (FFS) 1600 Toyota \n",
"9 Front-Wheel Drive (FFS) 1700 Toyota \n",
"... ... ... ... ... \n",
"41134 Front-Wheel Drive (FFS) 2100 Saab \n",
"41135 Front-Wheel Drive (TBI) (FFS) 1600 Saturn \n",
"41136 Front-Wheel Drive (MFI) (FFS) 1750 Saturn \n",
"41137 Front-Wheel Drive (TBI) (FFS) 1500 Saturn \n",
"41138 Front-Wheel Drive (MFI) (FFS) 1700 Saturn \n",
"41139 Front-Wheel Drive (FFS) 1900 Subaru \n",
"41140 Front-Wheel Drive (FFS) 1850 Subaru \n",
"41141 4-Wheel or All-Wheel Drive (FFS) 2000 Subaru \n",
"41142 4-Wheel or All-Wheel Drive (FFS) 2000 Subaru \n",
"41143 4-Wheel or All-Wheel Drive (FFS,TRBO) 2900 Subaru \n",
"\n",
" model trany range createdOn \\\n",
"0 Spider Veloce 2000 Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"1 Testarossa Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"2 Charger Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"3 B150/B250 Wagon 2WD Automatic 3-spd 0 2013-01-01 00:00:00-05:00 \n",
"4 Legacy AWD Turbo Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"5 Loyale Automatic 3-spd 0 2013-01-01 00:00:00-05:00 \n",
"6 Loyale Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"7 Corolla Automatic 3-spd 0 2013-01-01 00:00:00-05:00 \n",
"8 Corolla Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"9 Corolla Automatic 4-spd 0 2013-01-01 00:00:00-05:00 \n",
"... ... ... ... ... \n",
"41134 900 Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"41135 SL Automatic 4-spd 0 2013-01-01 00:00:00-05:00 \n",
"41136 SL Automatic 4-spd 0 2013-01-01 00:00:00-05:00 \n",
"41137 SL Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"41138 SL Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"41139 Legacy Automatic 4-spd 0 2013-01-01 00:00:00-05:00 \n",
"41140 Legacy Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"41141 Legacy AWD Automatic 4-spd 0 2013-01-01 00:00:00-05:00 \n",
"41142 Legacy AWD Manual 5-spd 0 2013-01-01 00:00:00-05:00 \n",
"41143 Legacy AWD Turbo Automatic 4-spd 0 2013-01-01 00:00:00-05:00 \n",
"\n",
" year automatic speeds ffs \n",
"0 1985 False 5 True \n",
"1 1985 False 5 False \n",
"2 1985 False 5 True \n",
"3 1985 True 3 NaN \n",
"4 1993 False 5 True \n",
"5 1993 True 3 True \n",
"6 1993 False 5 True \n",
"7 1993 True 3 True \n",
"8 1993 False 5 True \n",
"9 1993 True 4 True \n",
"... ... ... ... ... \n",
"41134 1993 False 5 True \n",
"41135 1993 True 4 True \n",
"41136 1993 True 4 True \n",
"41137 1993 False 5 True \n",
"41138 1993 False 5 True \n",
"41139 1993 True 4 True \n",
"41140 1993 False 5 True \n",
"41141 1993 True 4 True \n",
"41142 1993 False 5 True \n",
"41143 1993 True 4 True \n",
"\n",
"[41144 rows x 17 columns]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>drive</th>\n",
" <th>fuelCost08</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>range</th>\n",
" <th>createdOn</th>\n",
" <th>year</th>\n",
" <th>automatic</th>\n",
" <th>speeds</th>\n",
" <th>ffs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19</td>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>4</td>\n",
" <td>2.000000</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Alfa Romeo</td>\n",
" <td>Spider Veloce 2000</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9</td>\n",
" <td>11</td>\n",
" <td>14</td>\n",
" <td>12</td>\n",
" <td>4.898438</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>3850</td>\n",
" <td>Ferrari</td>\n",
" <td>Testarossa</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>23</td>\n",
" <td>27</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1550</td>\n",
" <td>Dodge</td>\n",
" <td>Charger</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>8</td>\n",
" <td>5.199219</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>3850</td>\n",
" <td>Dodge</td>\n",
" <td>B150/B250 Wagon 2WD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1985</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>17</td>\n",
" <td>19</td>\n",
" <td>23</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2700</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>21</td>\n",
" <td>22</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22</td>\n",
" <td>25</td>\n",
" <td>29</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>23</td>\n",
" <td>24</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1750</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>31</td>\n",
" <td>4</td>\n",
" <td>1.599609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1600</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>23</td>\n",
" <td>25</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.799805</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>18</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.099609</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>2100</td>\n",
" <td>Saab</td>\n",
" <td>900</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1600</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1750</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>33</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1500</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>32</td>\n",
" <td>4</td>\n",
" <td>1.900391</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1700</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>19</td>\n",
" <td>22</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>20</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>1850</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>16</td>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>4</td>\n",
" <td>2.199219</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>2900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>0</td>\n",
" <td>2013-01-01 00:00:00-05:00</td>\n",
" <td>1993</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 15 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"0 19 21 25 4 2.000000 \n",
"1 9 11 14 12 4.898438 \n",
"2 23 27 33 4 2.199219 \n",
"3 10 11 12 8 5.199219 \n",
"4 17 19 23 4 2.199219 \n",
"5 21 22 24 4 1.799805 \n",
"6 22 25 29 4 1.799805 \n",
"7 23 24 26 4 1.599609 \n",
"8 23 26 31 4 1.599609 \n",
"9 23 25 30 4 1.799805 \n",
"... ... ... ... ... ... \n",
"41134 18 20 24 4 2.099609 \n",
"41135 23 26 33 4 1.900391 \n",
"41136 21 24 30 4 1.900391 \n",
"41137 24 28 33 4 1.900391 \n",
"41138 21 25 32 4 1.900391 \n",
"41139 19 22 26 4 2.199219 \n",
"41140 20 23 28 4 2.199219 \n",
"41141 18 21 24 4 2.199219 \n",
"41142 18 21 24 4 2.199219 \n",
"41143 16 18 21 4 2.199219 \n",
"\n",
" drive fuelCost08 make \\\n",
"0 Rear-Wheel Drive 2000 Alfa Romeo \n",
"1 Rear-Wheel Drive 3850 Ferrari \n",
"2 Front-Wheel Drive 1550 Dodge \n",
"3 Rear-Wheel Drive 3850 Dodge \n",
"4 4-Wheel or All-Wheel Drive 2700 Subaru \n",
"5 Front-Wheel Drive 1900 Subaru \n",
"6 Front-Wheel Drive 1700 Subaru \n",
"7 Front-Wheel Drive 1750 Toyota \n",
"8 Front-Wheel Drive 1600 Toyota \n",
"9 Front-Wheel Drive 1700 Toyota \n",
"... ... ... ... \n",
"41134 Front-Wheel Drive 2100 Saab \n",
"41135 Front-Wheel Drive 1600 Saturn \n",
"41136 Front-Wheel Drive 1750 Saturn \n",
"41137 Front-Wheel Drive 1500 Saturn \n",
"41138 Front-Wheel Drive 1700 Saturn \n",
"41139 Front-Wheel Drive 1900 Subaru \n",
"41140 Front-Wheel Drive 1850 Subaru \n",
"41141 4-Wheel or All-Wheel Drive 2000 Subaru \n",
"41142 4-Wheel or All-Wheel Drive 2000 Subaru \n",
"41143 4-Wheel or All-Wheel Drive 2900 Subaru \n",
"\n",
" model range createdOn year automatic \\\n",
"0 Spider Veloce 2000 0 2013-01-01 00:00:00-05:00 1985 False \n",
"1 Testarossa 0 2013-01-01 00:00:00-05:00 1985 False \n",
"2 Charger 0 2013-01-01 00:00:00-05:00 1985 False \n",
"3 B150/B250 Wagon 2WD 0 2013-01-01 00:00:00-05:00 1985 True \n",
"4 Legacy AWD Turbo 0 2013-01-01 00:00:00-05:00 1993 False \n",
"5 Loyale 0 2013-01-01 00:00:00-05:00 1993 True \n",
"6 Loyale 0 2013-01-01 00:00:00-05:00 1993 False \n",
"7 Corolla 0 2013-01-01 00:00:00-05:00 1993 True \n",
"8 Corolla 0 2013-01-01 00:00:00-05:00 1993 False \n",
"9 Corolla 0 2013-01-01 00:00:00-05:00 1993 True \n",
"... ... ... ... ... ... \n",
"41134 900 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41135 SL 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41136 SL 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41137 SL 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41138 SL 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41139 Legacy 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41140 Legacy 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41141 Legacy AWD 0 2013-01-01 00:00:00-05:00 1993 True \n",
"41142 Legacy AWD 0 2013-01-01 00:00:00-05:00 1993 False \n",
"41143 Legacy AWD Turbo 0 2013-01-01 00:00:00-05:00 1993 True \n",
"\n",
" speeds ffs \n",
"0 5 True \n",
"1 5 False \n",
"2 5 True \n",
"3 3 NaN \n",
"4 5 True \n",
"5 3 True \n",
"6 5 True \n",
"7 3 True \n",
"8 5 True \n",
"9 4 True \n",
"... ... ... \n",
"41134 5 True \n",
"41135 4 True \n",
"41136 4 True \n",
"41137 5 True \n",
"41138 5 True \n",
"41139 4 True \n",
"41140 5 True \n",
"41141 4 True \n",
"41142 5 True \n",
"41143 4 True \n",
"\n",
"[41144 rows x 15 columns]"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# easy to debug\n",
"# - assign to var (df3)\n",
"# - comment out\n",
"# - pipe to display\n",
"\n",
"\n",
"from IPython.display import display\n",
"\n",
"def get_var(df, var_name):\n",
" globals()[var_name] = df\n",
" return df\n",
"\n",
"def tweak_autos(autos):\n",
" return (autos\n",
" [cols]\n",
" # create var \n",
" .pipe(get_var, 'df3')\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category'),\n",
" automatic=autos.trany.str.contains('Auto'),\n",
" speeds=autos.trany.str.extract(r'(\\d)+').fillna('20').astype('int8'), \n",
" createdOn=pd.to_datetime(autos.createdOn.replace({' EDT': '-04:00',\n",
" ' EST': '-05:00'}, regex=True), utc=True).dt.tz_convert('America/New_York'),\n",
" ffs=autos.eng_dscr.str.contains('FFS')\n",
" )\n",
" # debug pipe \n",
" .pipe(lambda df: display(df) or df)\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16', \n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .drop(columns=['trany', 'eng_dscr'])\n",
" )\n",
"\n",
"tweak_autos(autos)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>drive</th>\n",
" <th>eng_dscr</th>\n",
" <th>fuelCost08</th>\n",
" <th>make</th>\n",
" <th>model</th>\n",
" <th>trany</th>\n",
" <th>range</th>\n",
" <th>createdOn</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19</td>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>4.0</td>\n",
" <td>2.0</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Alfa Romeo</td>\n",
" <td>Spider Veloce 2000</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9</td>\n",
" <td>11</td>\n",
" <td>14</td>\n",
" <td>12.0</td>\n",
" <td>4.9</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>(GUZZLER)</td>\n",
" <td>3850</td>\n",
" <td>Ferrari</td>\n",
" <td>Testarossa</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>23</td>\n",
" <td>27</td>\n",
" <td>33</td>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1550</td>\n",
" <td>Dodge</td>\n",
" <td>Charger</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>8.0</td>\n",
" <td>5.2</td>\n",
" <td>Rear-Wheel Drive</td>\n",
" <td>NaN</td>\n",
" <td>3850</td>\n",
" <td>Dodge</td>\n",
" <td>B150/B250 Wagon 2WD</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>17</td>\n",
" <td>19</td>\n",
" <td>23</td>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>2700</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>21</td>\n",
" <td>22</td>\n",
" <td>24</td>\n",
" <td>4.0</td>\n",
" <td>1.8</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22</td>\n",
" <td>25</td>\n",
" <td>29</td>\n",
" <td>4.0</td>\n",
" <td>1.8</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1700</td>\n",
" <td>Subaru</td>\n",
" <td>Loyale</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>23</td>\n",
" <td>24</td>\n",
" <td>26</td>\n",
" <td>4.0</td>\n",
" <td>1.6</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1750</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Automatic 3-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>31</td>\n",
" <td>4.0</td>\n",
" <td>1.6</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1600</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>23</td>\n",
" <td>25</td>\n",
" <td>30</td>\n",
" <td>4.0</td>\n",
" <td>1.8</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1700</td>\n",
" <td>Toyota</td>\n",
" <td>Corolla</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41134</th>\n",
" <td>18</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>4.0</td>\n",
" <td>2.1</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2100</td>\n",
" <td>Saab</td>\n",
" <td>900</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41135</th>\n",
" <td>23</td>\n",
" <td>26</td>\n",
" <td>33</td>\n",
" <td>4.0</td>\n",
" <td>1.9</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(TBI) (FFS)</td>\n",
" <td>1600</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41136</th>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>4.0</td>\n",
" <td>1.9</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(MFI) (FFS)</td>\n",
" <td>1750</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41137</th>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>33</td>\n",
" <td>4.0</td>\n",
" <td>1.9</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(TBI) (FFS)</td>\n",
" <td>1500</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41138</th>\n",
" <td>21</td>\n",
" <td>25</td>\n",
" <td>32</td>\n",
" <td>4.0</td>\n",
" <td>1.9</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(MFI) (FFS)</td>\n",
" <td>1700</td>\n",
" <td>Saturn</td>\n",
" <td>SL</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41139</th>\n",
" <td>19</td>\n",
" <td>22</td>\n",
" <td>26</td>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41140</th>\n",
" <td>20</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" <td>Front-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>1850</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41141</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41142</th>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>24</td>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS)</td>\n",
" <td>2000</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD</td>\n",
" <td>Manual 5-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41143</th>\n",
" <td>16</td>\n",
" <td>18</td>\n",
" <td>21</td>\n",
" <td>4.0</td>\n",
" <td>2.2</td>\n",
" <td>4-Wheel or All-Wheel Drive</td>\n",
" <td>(FFS,TRBO)</td>\n",
" <td>2900</td>\n",
" <td>Subaru</td>\n",
" <td>Legacy AWD Turbo</td>\n",
" <td>Automatic 4-spd</td>\n",
" <td>0</td>\n",
" <td>Tue Jan 01 00:00:00 EST 2013</td>\n",
" <td>1993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>41144 rows × 14 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"0 19 21 25 4.0 2.0 \n",
"1 9 11 14 12.0 4.9 \n",
"2 23 27 33 4.0 2.2 \n",
"3 10 11 12 8.0 5.2 \n",
"4 17 19 23 4.0 2.2 \n",
"5 21 22 24 4.0 1.8 \n",
"6 22 25 29 4.0 1.8 \n",
"7 23 24 26 4.0 1.6 \n",
"8 23 26 31 4.0 1.6 \n",
"9 23 25 30 4.0 1.8 \n",
"... ... ... ... ... ... \n",
"41134 18 20 24 4.0 2.1 \n",
"41135 23 26 33 4.0 1.9 \n",
"41136 21 24 30 4.0 1.9 \n",
"41137 24 28 33 4.0 1.9 \n",
"41138 21 25 32 4.0 1.9 \n",
"41139 19 22 26 4.0 2.2 \n",
"41140 20 23 28 4.0 2.2 \n",
"41141 18 21 24 4.0 2.2 \n",
"41142 18 21 24 4.0 2.2 \n",
"41143 16 18 21 4.0 2.2 \n",
"\n",
" drive eng_dscr fuelCost08 make \\\n",
"0 Rear-Wheel Drive (FFS) 2000 Alfa Romeo \n",
"1 Rear-Wheel Drive (GUZZLER) 3850 Ferrari \n",
"2 Front-Wheel Drive (FFS) 1550 Dodge \n",
"3 Rear-Wheel Drive NaN 3850 Dodge \n",
"4 4-Wheel or All-Wheel Drive (FFS,TRBO) 2700 Subaru \n",
"5 Front-Wheel Drive (FFS) 1900 Subaru \n",
"6 Front-Wheel Drive (FFS) 1700 Subaru \n",
"7 Front-Wheel Drive (FFS) 1750 Toyota \n",
"8 Front-Wheel Drive (FFS) 1600 Toyota \n",
"9 Front-Wheel Drive (FFS) 1700 Toyota \n",
"... ... ... ... ... \n",
"41134 Front-Wheel Drive (FFS) 2100 Saab \n",
"41135 Front-Wheel Drive (TBI) (FFS) 1600 Saturn \n",
"41136 Front-Wheel Drive (MFI) (FFS) 1750 Saturn \n",
"41137 Front-Wheel Drive (TBI) (FFS) 1500 Saturn \n",
"41138 Front-Wheel Drive (MFI) (FFS) 1700 Saturn \n",
"41139 Front-Wheel Drive (FFS) 1900 Subaru \n",
"41140 Front-Wheel Drive (FFS) 1850 Subaru \n",
"41141 4-Wheel or All-Wheel Drive (FFS) 2000 Subaru \n",
"41142 4-Wheel or All-Wheel Drive (FFS) 2000 Subaru \n",
"41143 4-Wheel or All-Wheel Drive (FFS,TRBO) 2900 Subaru \n",
"\n",
" model trany range \\\n",
"0 Spider Veloce 2000 Manual 5-spd 0 \n",
"1 Testarossa Manual 5-spd 0 \n",
"2 Charger Manual 5-spd 0 \n",
"3 B150/B250 Wagon 2WD Automatic 3-spd 0 \n",
"4 Legacy AWD Turbo Manual 5-spd 0 \n",
"5 Loyale Automatic 3-spd 0 \n",
"6 Loyale Manual 5-spd 0 \n",
"7 Corolla Automatic 3-spd 0 \n",
"8 Corolla Manual 5-spd 0 \n",
"9 Corolla Automatic 4-spd 0 \n",
"... ... ... ... \n",
"41134 900 Manual 5-spd 0 \n",
"41135 SL Automatic 4-spd 0 \n",
"41136 SL Automatic 4-spd 0 \n",
"41137 SL Manual 5-spd 0 \n",
"41138 SL Manual 5-spd 0 \n",
"41139 Legacy Automatic 4-spd 0 \n",
"41140 Legacy Manual 5-spd 0 \n",
"41141 Legacy AWD Automatic 4-spd 0 \n",
"41142 Legacy AWD Manual 5-spd 0 \n",
"41143 Legacy AWD Turbo Automatic 4-spd 0 \n",
"\n",
" createdOn year \n",
"0 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"1 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"2 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"3 Tue Jan 01 00:00:00 EST 2013 1985 \n",
"4 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"5 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"6 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"7 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"8 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"9 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"... ... ... \n",
"41134 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41135 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41136 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41137 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41138 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41139 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41140 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41141 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41142 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"41143 Tue Jan 01 00:00:00 EST 2013 1993 \n",
"\n",
"[41144 rows x 14 columns]"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# inspect intermediate data frame\n",
"df3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Don't Mutate\n",
"\n",
"> \"you are missing the point, inplace rarely actually does something inplace, you are thinking that you are saving memory but you are not.\"\n",
">\n",
"> **jreback** - Pandas core dev\n",
"\n",
"\n",
"\n",
"https://github.com/pandas-dev/pandas/issues/16529#issuecomment-676518136\n",
"\n",
"* In general, no performance benefits\n",
"* Prohibits chaining\n",
"* ``SettingWithCopyWarning`` fun\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Don't Apply (if you can)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"def tweak_autos(autos):\n",
" return (autos\n",
" [cols]\n",
" .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),\n",
" displ=autos.displ.fillna(0).astype('float16'),\n",
" drive=autos.drive.fillna('Other').astype('category'),\n",
" automatic=autos.trany.str.contains('Auto'),\n",
" speeds=autos.trany.str.extract(r'(\\d)+').fillna('20').astype('int8'),\n",
" createdOn=pd.to_datetime(autos.createdOn.replace({' EDT': '-04:00',\n",
" ' EST': '-05:00'}, regex=True), utc=True).dt.tz_convert('America/New_York'),\n",
" ffs=autos.eng_dscr.str.contains('FFS')\n",
" )\n",
" .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16',\n",
" 'range': 'int16', 'year': 'int16', 'make': 'category'})\n",
" .drop(columns=['trany', 'eng_dscr'])\n",
" )\n",
"\n",
"\n",
"autos2 = tweak_autos(autos)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0 12.379737\n",
"1 26.135000\n",
"2 10.226739\n",
"3 23.521500\n",
"4 13.836176\n",
"5 11.200714\n",
"6 10.691591\n",
"7 10.226739\n",
"8 10.226739\n",
"9 10.226739\n",
" ... \n",
"41134 13.067500\n",
"41135 10.226739\n",
"41136 11.200714\n",
"41137 9.800625\n",
"41138 11.200714\n",
"41139 12.379737\n",
"41140 11.760750\n",
"41141 13.067500\n",
"41142 13.067500\n",
"41143 14.700938\n",
"Name: city08, Length: 41144, dtype: float64"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# try to me more Euro-centric\n",
"def to_lper100km(val):\n",
" return 235.215 / val\n",
"autos2.city08.apply(to_lper100km)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0 12.379737\n",
"1 26.135000\n",
"2 10.226739\n",
"3 23.521500\n",
"4 13.836176\n",
"5 11.200714\n",
"6 10.691591\n",
"7 10.226739\n",
"8 10.226739\n",
"9 10.226739\n",
" ... \n",
"41134 13.067500\n",
"41135 10.226739\n",
"41136 11.200714\n",
"41137 9.800625\n",
"41138 11.200714\n",
"41139 12.379737\n",
"41140 11.760750\n",
"41141 13.067500\n",
"41142 13.067500\n",
"41143 14.700938\n",
"Name: city08, Length: 41144, dtype: float64"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# this gives the sames results\n",
"235.215 / autos2.city08 "
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"6.07 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"autos2.city08.apply(to_lper100km)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"110 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"235.215 / autos2.city08 "
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"56.54545454545455"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# ~50x slower!\n",
"6_220 / 110"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"def is_american(val):\n",
" return val in {'Chevrolet', 'Ford', 'Dodge', 'GMC', 'Tesla'}"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"772 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"autos2.make.apply(is_american)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"630 µs ± 32.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"autos2.make.isin({'Chevrolet', 'Ford', 'Dodge', 'GMC', 'Tesla'})"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"autos3 = autos2.assign(make=autos2.make.astype(str))"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.42 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"# converted to string\n",
"autos3.make.isin({'Chevrolet', 'Ford', 'Dodge', 'GMC', 'Tesla'})"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4.88 ms ± 56.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"autos3.make.apply(is_american)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"def country(val):\n",
" if val in {'Chevrolet', 'Ford', 'Dodge', 'GMC', 'Tesla'}:\n",
" return 'US'\n",
" return 'Other'"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.97 ms ± 68.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"# Might be ok for strings, since they are not vectorized...\n",
"(autos2\n",
" .assign(country=autos2.make.apply(country))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4.82 ms ± 224 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"values = {'Chevrolet', 'Ford', 'Dodge', 'GMC', 'Tesla'}\n",
"(autos2\n",
" .assign(country='US')\n",
" .assign(country=lambda df_:df_.country.where(df_.make.isin(values), 'Other'))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.91 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"\n",
"(autos2\n",
" .assign(country=np.select([autos2.make.isin({'Chevrolet', 'Ford', 'Dodge', 'GMC', 'Tesla'})], \n",
" ['US'], 'Other'))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"lines_to_next_cell": 0
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.91 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"\n",
"(autos2\n",
" .assign(country=np.where(autos2.make.isin({'Chevrolet', 'Ford', 'Dodge', 'GMC', 'Tesla'}), \n",
" 'US', 'Other'))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Master Aggregation\n",
"\n",
"Let's compare mileage by country by year...🤔"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>fuelCost08</th>\n",
" <th>range</th>\n",
" <th>speeds</th>\n",
" </tr>\n",
" <tr>\n",
" <th>year</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1984</th>\n",
" <td>17.982688</td>\n",
" <td>19.881874</td>\n",
" <td>23.075356</td>\n",
" <td>5.385438</td>\n",
" <td>3.164062</td>\n",
" <td>2313.543788</td>\n",
" <td>0.000000</td>\n",
" <td>3.928208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1985</th>\n",
" <td>17.878307</td>\n",
" <td>19.808348</td>\n",
" <td>23.042328</td>\n",
" <td>5.375661</td>\n",
" <td>3.164062</td>\n",
" <td>2334.509112</td>\n",
" <td>0.000000</td>\n",
" <td>3.924750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1986</th>\n",
" <td>17.665289</td>\n",
" <td>19.550413</td>\n",
" <td>22.699174</td>\n",
" <td>5.425620</td>\n",
" <td>3.183594</td>\n",
" <td>2354.049587</td>\n",
" <td>0.000000</td>\n",
" <td>3.984298</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1987</th>\n",
" <td>17.310345</td>\n",
" <td>19.228549</td>\n",
" <td>22.445068</td>\n",
" <td>5.412189</td>\n",
" <td>3.173828</td>\n",
" <td>2403.648757</td>\n",
" <td>0.000000</td>\n",
" <td>4.037690</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1988</th>\n",
" <td>17.333628</td>\n",
" <td>19.328319</td>\n",
" <td>22.702655</td>\n",
" <td>5.461947</td>\n",
" <td>3.195312</td>\n",
" <td>2387.035398</td>\n",
" <td>0.000000</td>\n",
" <td>4.129204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1989</th>\n",
" <td>17.143972</td>\n",
" <td>19.125759</td>\n",
" <td>22.465742</td>\n",
" <td>5.488291</td>\n",
" <td>3.208984</td>\n",
" <td>2433.434519</td>\n",
" <td>0.000000</td>\n",
" <td>4.166522</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1990</th>\n",
" <td>17.033395</td>\n",
" <td>19.000928</td>\n",
" <td>22.337662</td>\n",
" <td>5.496289</td>\n",
" <td>3.216797</td>\n",
" <td>2436.178108</td>\n",
" <td>0.000000</td>\n",
" <td>4.238404</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1991</th>\n",
" <td>16.848940</td>\n",
" <td>18.825972</td>\n",
" <td>22.253534</td>\n",
" <td>5.598940</td>\n",
" <td>3.267578</td>\n",
" <td>2490.856890</td>\n",
" <td>0.000000</td>\n",
" <td>4.301237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1992</th>\n",
" <td>16.805531</td>\n",
" <td>18.862623</td>\n",
" <td>22.439786</td>\n",
" <td>5.623550</td>\n",
" <td>3.275391</td>\n",
" <td>2494.736842</td>\n",
" <td>0.000000</td>\n",
" <td>4.318466</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1993</th>\n",
" <td>16.998170</td>\n",
" <td>19.104300</td>\n",
" <td>22.780421</td>\n",
" <td>5.602928</td>\n",
" <td>3.248047</td>\n",
" <td>2454.620311</td>\n",
" <td>0.000000</td>\n",
" <td>4.339433</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1994</th>\n",
" <td>16.918534</td>\n",
" <td>19.012220</td>\n",
" <td>22.725051</td>\n",
" <td>5.704684</td>\n",
" <td>3.333984</td>\n",
" <td>2461.507128</td>\n",
" <td>0.000000</td>\n",
" <td>4.332994</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1995</th>\n",
" <td>16.569804</td>\n",
" <td>18.797311</td>\n",
" <td>22.671148</td>\n",
" <td>5.892451</td>\n",
" <td>3.472656</td>\n",
" <td>2497.828335</td>\n",
" <td>0.000000</td>\n",
" <td>4.356774</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1996</th>\n",
" <td>17.289780</td>\n",
" <td>19.584735</td>\n",
" <td>23.569211</td>\n",
" <td>5.627426</td>\n",
" <td>3.234375</td>\n",
" <td>2375.032342</td>\n",
" <td>0.000000</td>\n",
" <td>4.364812</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1997</th>\n",
" <td>17.135171</td>\n",
" <td>19.429134</td>\n",
" <td>23.451444</td>\n",
" <td>5.666667</td>\n",
" <td>3.226562</td>\n",
" <td>2405.511811</td>\n",
" <td>0.000000</td>\n",
" <td>4.402887</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1998</th>\n",
" <td>17.113300</td>\n",
" <td>19.518473</td>\n",
" <td>23.546798</td>\n",
" <td>5.633005</td>\n",
" <td>3.201172</td>\n",
" <td>2382.635468</td>\n",
" <td>0.229064</td>\n",
" <td>4.419951</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999</th>\n",
" <td>17.272300</td>\n",
" <td>19.611502</td>\n",
" <td>23.552817</td>\n",
" <td>5.667840</td>\n",
" <td>3.189453</td>\n",
" <td>2392.194836</td>\n",
" <td>0.570423</td>\n",
" <td>4.421362</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2000</th>\n",
" <td>17.221429</td>\n",
" <td>19.526190</td>\n",
" <td>23.414286</td>\n",
" <td>5.713095</td>\n",
" <td>3.201172</td>\n",
" <td>2429.702381</td>\n",
" <td>0.348810</td>\n",
" <td>4.508333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2001</th>\n",
" <td>17.275521</td>\n",
" <td>19.479693</td>\n",
" <td>23.328211</td>\n",
" <td>5.720088</td>\n",
" <td>3.193359</td>\n",
" <td>2448.463227</td>\n",
" <td>0.261251</td>\n",
" <td>4.660812</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2002</th>\n",
" <td>16.893333</td>\n",
" <td>19.168205</td>\n",
" <td>23.030769</td>\n",
" <td>5.827692</td>\n",
" <td>3.263672</td>\n",
" <td>2479.794872</td>\n",
" <td>0.136410</td>\n",
" <td>4.757949</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2003</th>\n",
" <td>16.780651</td>\n",
" <td>19.000958</td>\n",
" <td>22.836207</td>\n",
" <td>5.942529</td>\n",
" <td>3.357422</td>\n",
" <td>2525.574713</td>\n",
" <td>0.090996</td>\n",
" <td>4.911877</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2004</th>\n",
" <td>16.740642</td>\n",
" <td>19.067736</td>\n",
" <td>23.064171</td>\n",
" <td>5.957219</td>\n",
" <td>3.394531</td>\n",
" <td>2512.566845</td>\n",
" <td>0.000000</td>\n",
" <td>4.976827</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2005</th>\n",
" <td>16.851630</td>\n",
" <td>19.193825</td>\n",
" <td>23.297599</td>\n",
" <td>5.944254</td>\n",
" <td>3.400391</td>\n",
" <td>2518.610635</td>\n",
" <td>0.000000</td>\n",
" <td>5.192110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2006</th>\n",
" <td>16.626812</td>\n",
" <td>18.959239</td>\n",
" <td>23.048913</td>\n",
" <td>6.100543</td>\n",
" <td>3.548828</td>\n",
" <td>2539.175725</td>\n",
" <td>0.000000</td>\n",
" <td>5.315217</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2007</th>\n",
" <td>16.605684</td>\n",
" <td>18.978686</td>\n",
" <td>23.083481</td>\n",
" <td>6.166075</td>\n",
" <td>3.628906</td>\n",
" <td>2535.923623</td>\n",
" <td>0.000000</td>\n",
" <td>5.610124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2008</th>\n",
" <td>16.900590</td>\n",
" <td>19.276327</td>\n",
" <td>23.455771</td>\n",
" <td>6.192923</td>\n",
" <td>3.638672</td>\n",
" <td>2536.436394</td>\n",
" <td>0.084246</td>\n",
" <td>5.773378</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2009</th>\n",
" <td>17.334459</td>\n",
" <td>19.735642</td>\n",
" <td>24.017736</td>\n",
" <td>6.122466</td>\n",
" <td>3.625000</td>\n",
" <td>2427.027027</td>\n",
" <td>0.000000</td>\n",
" <td>6.043074</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010</th>\n",
" <td>18.105500</td>\n",
" <td>20.588819</td>\n",
" <td>24.947701</td>\n",
" <td>5.965735</td>\n",
" <td>3.501953</td>\n",
" <td>2351.082056</td>\n",
" <td>0.000000</td>\n",
" <td>6.271416</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011</th>\n",
" <td>18.669027</td>\n",
" <td>21.011504</td>\n",
" <td>25.169912</td>\n",
" <td>5.980531</td>\n",
" <td>3.521484</td>\n",
" <td>2333.982301</td>\n",
" <td>0.259292</td>\n",
" <td>6.560177</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012</th>\n",
" <td>19.362847</td>\n",
" <td>21.819444</td>\n",
" <td>26.105035</td>\n",
" <td>5.910590</td>\n",
" <td>3.460938</td>\n",
" <td>2289.973958</td>\n",
" <td>0.782118</td>\n",
" <td>6.706597</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013</th>\n",
" <td>20.661318</td>\n",
" <td>23.125000</td>\n",
" <td>27.504223</td>\n",
" <td>5.762669</td>\n",
" <td>3.328125</td>\n",
" <td>2210.768581</td>\n",
" <td>1.255068</td>\n",
" <td>6.896959</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2014</th>\n",
" <td>21.033469</td>\n",
" <td>23.531429</td>\n",
" <td>27.978776</td>\n",
" <td>5.745306</td>\n",
" <td>3.289062</td>\n",
" <td>2198.040816</td>\n",
" <td>1.405714</td>\n",
" <td>6.985306</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015</th>\n",
" <td>21.445830</td>\n",
" <td>24.038971</td>\n",
" <td>28.586906</td>\n",
" <td>5.635230</td>\n",
" <td>3.205078</td>\n",
" <td>2148.869836</td>\n",
" <td>2.208106</td>\n",
" <td>7.035853</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016</th>\n",
" <td>22.591918</td>\n",
" <td>25.150555</td>\n",
" <td>29.606973</td>\n",
" <td>5.463550</td>\n",
" <td>3.054688</td>\n",
" <td>2091.204437</td>\n",
" <td>4.546751</td>\n",
" <td>7.080032</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2017</th>\n",
" <td>22.761021</td>\n",
" <td>25.249033</td>\n",
" <td>29.554524</td>\n",
" <td>5.453210</td>\n",
" <td>3.025391</td>\n",
" <td>2096.558391</td>\n",
" <td>4.336427</td>\n",
" <td>7.225058</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018</th>\n",
" <td>22.564732</td>\n",
" <td>25.019345</td>\n",
" <td>29.273065</td>\n",
" <td>5.438988</td>\n",
" <td>2.992188</td>\n",
" <td>2103.980655</td>\n",
" <td>3.519345</td>\n",
" <td>7.017113</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019</th>\n",
" <td>23.318147</td>\n",
" <td>25.627942</td>\n",
" <td>29.664389</td>\n",
" <td>5.368261</td>\n",
" <td>2.964844</td>\n",
" <td>2093.545938</td>\n",
" <td>5.565680</td>\n",
" <td>7.136674</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020</th>\n",
" <td>22.679426</td>\n",
" <td>25.267943</td>\n",
" <td>29.617225</td>\n",
" <td>5.071770</td>\n",
" <td>2.644531</td>\n",
" <td>2023.444976</td>\n",
" <td>2.282297</td>\n",
" <td>7.746411</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ fuelCost08 \\\n",
"year \n",
"1984 17.982688 19.881874 23.075356 5.385438 3.164062 2313.543788 \n",
"1985 17.878307 19.808348 23.042328 5.375661 3.164062 2334.509112 \n",
"1986 17.665289 19.550413 22.699174 5.425620 3.183594 2354.049587 \n",
"1987 17.310345 19.228549 22.445068 5.412189 3.173828 2403.648757 \n",
"1988 17.333628 19.328319 22.702655 5.461947 3.195312 2387.035398 \n",
"1989 17.143972 19.125759 22.465742 5.488291 3.208984 2433.434519 \n",
"1990 17.033395 19.000928 22.337662 5.496289 3.216797 2436.178108 \n",
"1991 16.848940 18.825972 22.253534 5.598940 3.267578 2490.856890 \n",
"1992 16.805531 18.862623 22.439786 5.623550 3.275391 2494.736842 \n",
"1993 16.998170 19.104300 22.780421 5.602928 3.248047 2454.620311 \n",
"1994 16.918534 19.012220 22.725051 5.704684 3.333984 2461.507128 \n",
"1995 16.569804 18.797311 22.671148 5.892451 3.472656 2497.828335 \n",
"1996 17.289780 19.584735 23.569211 5.627426 3.234375 2375.032342 \n",
"1997 17.135171 19.429134 23.451444 5.666667 3.226562 2405.511811 \n",
"1998 17.113300 19.518473 23.546798 5.633005 3.201172 2382.635468 \n",
"1999 17.272300 19.611502 23.552817 5.667840 3.189453 2392.194836 \n",
"2000 17.221429 19.526190 23.414286 5.713095 3.201172 2429.702381 \n",
"2001 17.275521 19.479693 23.328211 5.720088 3.193359 2448.463227 \n",
"2002 16.893333 19.168205 23.030769 5.827692 3.263672 2479.794872 \n",
"2003 16.780651 19.000958 22.836207 5.942529 3.357422 2525.574713 \n",
"2004 16.740642 19.067736 23.064171 5.957219 3.394531 2512.566845 \n",
"2005 16.851630 19.193825 23.297599 5.944254 3.400391 2518.610635 \n",
"2006 16.626812 18.959239 23.048913 6.100543 3.548828 2539.175725 \n",
"2007 16.605684 18.978686 23.083481 6.166075 3.628906 2535.923623 \n",
"2008 16.900590 19.276327 23.455771 6.192923 3.638672 2536.436394 \n",
"2009 17.334459 19.735642 24.017736 6.122466 3.625000 2427.027027 \n",
"2010 18.105500 20.588819 24.947701 5.965735 3.501953 2351.082056 \n",
"2011 18.669027 21.011504 25.169912 5.980531 3.521484 2333.982301 \n",
"2012 19.362847 21.819444 26.105035 5.910590 3.460938 2289.973958 \n",
"2013 20.661318 23.125000 27.504223 5.762669 3.328125 2210.768581 \n",
"2014 21.033469 23.531429 27.978776 5.745306 3.289062 2198.040816 \n",
"2015 21.445830 24.038971 28.586906 5.635230 3.205078 2148.869836 \n",
"2016 22.591918 25.150555 29.606973 5.463550 3.054688 2091.204437 \n",
"2017 22.761021 25.249033 29.554524 5.453210 3.025391 2096.558391 \n",
"2018 22.564732 25.019345 29.273065 5.438988 2.992188 2103.980655 \n",
"2019 23.318147 25.627942 29.664389 5.368261 2.964844 2093.545938 \n",
"2020 22.679426 25.267943 29.617225 5.071770 2.644531 2023.444976 \n",
"\n",
" range speeds \n",
"year \n",
"1984 0.000000 3.928208 \n",
"1985 0.000000 3.924750 \n",
"1986 0.000000 3.984298 \n",
"1987 0.000000 4.037690 \n",
"1988 0.000000 4.129204 \n",
"1989 0.000000 4.166522 \n",
"1990 0.000000 4.238404 \n",
"1991 0.000000 4.301237 \n",
"1992 0.000000 4.318466 \n",
"1993 0.000000 4.339433 \n",
"1994 0.000000 4.332994 \n",
"1995 0.000000 4.356774 \n",
"1996 0.000000 4.364812 \n",
"1997 0.000000 4.402887 \n",
"1998 0.229064 4.419951 \n",
"1999 0.570423 4.421362 \n",
"2000 0.348810 4.508333 \n",
"2001 0.261251 4.660812 \n",
"2002 0.136410 4.757949 \n",
"2003 0.090996 4.911877 \n",
"2004 0.000000 4.976827 \n",
"2005 0.000000 5.192110 \n",
"2006 0.000000 5.315217 \n",
"2007 0.000000 5.610124 \n",
"2008 0.084246 5.773378 \n",
"2009 0.000000 6.043074 \n",
"2010 0.000000 6.271416 \n",
"2011 0.259292 6.560177 \n",
"2012 0.782118 6.706597 \n",
"2013 1.255068 6.896959 \n",
"2014 1.405714 6.985306 \n",
"2015 2.208106 7.035853 \n",
"2016 4.546751 7.080032 \n",
"2017 4.336427 7.225058 \n",
"2018 3.519345 7.017113 \n",
"2019 5.565680 7.136674 \n",
"2020 2.282297 7.746411 "
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(autos2\n",
" .groupby('year')\n",
" .mean()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>comb08</th>\n",
" <th>speeds</th>\n",
" </tr>\n",
" <tr>\n",
" <th>year</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1984</th>\n",
" <td>19.881874</td>\n",
" <td>3.928208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1985</th>\n",
" <td>19.808348</td>\n",
" <td>3.924750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1986</th>\n",
" <td>19.550413</td>\n",
" <td>3.984298</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1987</th>\n",
" <td>19.228549</td>\n",
" <td>4.037690</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1988</th>\n",
" <td>19.328319</td>\n",
" <td>4.129204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1989</th>\n",
" <td>19.125759</td>\n",
" <td>4.166522</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1990</th>\n",
" <td>19.000928</td>\n",
" <td>4.238404</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1991</th>\n",
" <td>18.825972</td>\n",
" <td>4.301237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1992</th>\n",
" <td>18.862623</td>\n",
" <td>4.318466</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1993</th>\n",
" <td>19.104300</td>\n",
" <td>4.339433</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1994</th>\n",
" <td>19.012220</td>\n",
" <td>4.332994</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1995</th>\n",
" <td>18.797311</td>\n",
" <td>4.356774</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1996</th>\n",
" <td>19.584735</td>\n",
" <td>4.364812</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1997</th>\n",
" <td>19.429134</td>\n",
" <td>4.402887</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1998</th>\n",
" <td>19.518473</td>\n",
" <td>4.419951</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999</th>\n",
" <td>19.611502</td>\n",
" <td>4.421362</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2000</th>\n",
" <td>19.526190</td>\n",
" <td>4.508333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2001</th>\n",
" <td>19.479693</td>\n",
" <td>4.660812</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2002</th>\n",
" <td>19.168205</td>\n",
" <td>4.757949</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2003</th>\n",
" <td>19.000958</td>\n",
" <td>4.911877</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2004</th>\n",
" <td>19.067736</td>\n",
" <td>4.976827</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2005</th>\n",
" <td>19.193825</td>\n",
" <td>5.192110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2006</th>\n",
" <td>18.959239</td>\n",
" <td>5.315217</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2007</th>\n",
" <td>18.978686</td>\n",
" <td>5.610124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2008</th>\n",
" <td>19.276327</td>\n",
" <td>5.773378</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2009</th>\n",
" <td>19.735642</td>\n",
" <td>6.043074</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010</th>\n",
" <td>20.588819</td>\n",
" <td>6.271416</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011</th>\n",
" <td>21.011504</td>\n",
" <td>6.560177</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012</th>\n",
" <td>21.819444</td>\n",
" <td>6.706597</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013</th>\n",
" <td>23.125000</td>\n",
" <td>6.896959</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2014</th>\n",
" <td>23.531429</td>\n",
" <td>6.985306</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015</th>\n",
" <td>24.038971</td>\n",
" <td>7.035853</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016</th>\n",
" <td>25.150555</td>\n",
" <td>7.080032</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2017</th>\n",
" <td>25.249033</td>\n",
" <td>7.225058</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018</th>\n",
" <td>25.019345</td>\n",
" <td>7.017113</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019</th>\n",
" <td>25.627942</td>\n",
" <td>7.136674</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020</th>\n",
" <td>25.267943</td>\n",
" <td>7.746411</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" comb08 speeds\n",
"year \n",
"1984 19.881874 3.928208\n",
"1985 19.808348 3.924750\n",
"1986 19.550413 3.984298\n",
"1987 19.228549 4.037690\n",
"1988 19.328319 4.129204\n",
"1989 19.125759 4.166522\n",
"1990 19.000928 4.238404\n",
"1991 18.825972 4.301237\n",
"1992 18.862623 4.318466\n",
"1993 19.104300 4.339433\n",
"1994 19.012220 4.332994\n",
"1995 18.797311 4.356774\n",
"1996 19.584735 4.364812\n",
"1997 19.429134 4.402887\n",
"1998 19.518473 4.419951\n",
"1999 19.611502 4.421362\n",
"2000 19.526190 4.508333\n",
"2001 19.479693 4.660812\n",
"2002 19.168205 4.757949\n",
"2003 19.000958 4.911877\n",
"2004 19.067736 4.976827\n",
"2005 19.193825 5.192110\n",
"2006 18.959239 5.315217\n",
"2007 18.978686 5.610124\n",
"2008 19.276327 5.773378\n",
"2009 19.735642 6.043074\n",
"2010 20.588819 6.271416\n",
"2011 21.011504 6.560177\n",
"2012 21.819444 6.706597\n",
"2013 23.125000 6.896959\n",
"2014 23.531429 6.985306\n",
"2015 24.038971 7.035853\n",
"2016 25.150555 7.080032\n",
"2017 25.249033 7.225058\n",
"2018 25.019345 7.017113\n",
"2019 25.627942 7.136674\n",
"2020 25.267943 7.746411"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# watch order of column filtering/aggregation\n",
"(autos2\n",
" .groupby('year')\n",
" [['comb08', 'speeds']]\n",
" .mean()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.02 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"# watch order of column filtering/aggregation\n",
"(autos2\n",
" .groupby('year')\n",
" [['comb08', 'speeds']]\n",
" .mean()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4.52 ms ± 49.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"# watch order of column filtering/aggregation\n",
"(autos2\n",
" .groupby('year')\n",
" .mean()\n",
" [['comb08', 'speeds']]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7f16802a4fd0>]"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"findfont: Font family [\"['Lato']\"] not found. Falling back to DejaVu Sans.\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"plt.style.use('pandas1book') \n",
"sns.set_context('talk')\n",
"plt.plot(range(10))"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='year'>"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"findfont: Font family [\"['Lato']\"] not found. Falling back to DejaVu Sans.\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"(autos2\n",
" .groupby('year')\n",
" [['comb08', 'speeds']]\n",
" .mean()\n",
" .plot()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='year'>"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"(autos2\n",
" .groupby('year')\n",
" [['comb08', 'speeds']]\n",
" .mean()\n",
" #.median()\n",
" #.quantile(.3)\n",
" #.std()\n",
" #.var()\n",
" .plot()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>city08</th>\n",
" <th>comb08</th>\n",
" <th>highway08</th>\n",
" <th>cylinders</th>\n",
" <th>displ</th>\n",
" <th>fuelCost08</th>\n",
" <th>range</th>\n",
" <th>speeds</th>\n",
" </tr>\n",
" <tr>\n",
" <th>year</th>\n",
" <th>country</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1984</th>\n",
" <th>Other</th>\n",
" <td>19.384615</td>\n",
" <td>21.417330</td>\n",
" <td>24.847038</td>\n",
" <td>4.908046</td>\n",
" <td>2.691406</td>\n",
" <td>2118.125553</td>\n",
" <td>0.000000</td>\n",
" <td>3.969054</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>16.079232</td>\n",
" <td>17.797119</td>\n",
" <td>20.669868</td>\n",
" <td>6.033613</td>\n",
" <td>3.808594</td>\n",
" <td>2578.871549</td>\n",
" <td>0.000000</td>\n",
" <td>3.872749</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1985</th>\n",
" <th>Other</th>\n",
" <td>19.284768</td>\n",
" <td>21.373068</td>\n",
" <td>24.816777</td>\n",
" <td>4.871965</td>\n",
" <td>2.636719</td>\n",
" <td>2141.997792</td>\n",
" <td>0.000000</td>\n",
" <td>3.958057</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>16.275472</td>\n",
" <td>18.025157</td>\n",
" <td>21.020126</td>\n",
" <td>5.949686</td>\n",
" <td>3.765625</td>\n",
" <td>2553.899371</td>\n",
" <td>0.000000</td>\n",
" <td>3.886792</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1986</th>\n",
" <th>Other</th>\n",
" <td>19.167183</td>\n",
" <td>21.213622</td>\n",
" <td>24.650155</td>\n",
" <td>4.804954</td>\n",
" <td>2.537109</td>\n",
" <td>2149.148607</td>\n",
" <td>0.000000</td>\n",
" <td>4.069659</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>15.945035</td>\n",
" <td>17.645390</td>\n",
" <td>20.464539</td>\n",
" <td>6.136525</td>\n",
" <td>3.925781</td>\n",
" <td>2588.741135</td>\n",
" <td>0.000000</td>\n",
" <td>3.886525</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1987</th>\n",
" <th>Other</th>\n",
" <td>18.633381</td>\n",
" <td>20.710414</td>\n",
" <td>24.186876</td>\n",
" <td>4.825963</td>\n",
" <td>2.583984</td>\n",
" <td>2227.318117</td>\n",
" <td>0.000000</td>\n",
" <td>4.142653</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>15.611722</td>\n",
" <td>17.326007</td>\n",
" <td>20.208791</td>\n",
" <td>6.164835</td>\n",
" <td>3.931641</td>\n",
" <td>2630.036630</td>\n",
" <td>0.000000</td>\n",
" <td>3.902930</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1988</th>\n",
" <th>Other</th>\n",
" <td>18.668224</td>\n",
" <td>20.814642</td>\n",
" <td>24.437695</td>\n",
" <td>4.819315</td>\n",
" <td>2.531250</td>\n",
" <td>2207.476636</td>\n",
" <td>0.000000</td>\n",
" <td>4.205607</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>15.577869</td>\n",
" <td>17.372951</td>\n",
" <td>20.420082</td>\n",
" <td>6.307377</td>\n",
" <td>4.066406</td>\n",
" <td>2623.258197</td>\n",
" <td>0.000000</td>\n",
" <td>4.028689</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2016</th>\n",
" <th>Other</th>\n",
" <td>21.903749</td>\n",
" <td>24.439716</td>\n",
" <td>28.866261</td>\n",
" <td>5.493414</td>\n",
" <td>2.992188</td>\n",
" <td>2127.608916</td>\n",
" <td>1.017224</td>\n",
" <td>7.296859</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>25.061818</td>\n",
" <td>27.701818</td>\n",
" <td>32.265455</td>\n",
" <td>5.356364</td>\n",
" <td>3.277344</td>\n",
" <td>1960.545455</td>\n",
" <td>17.214545</td>\n",
" <td>6.301818</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2017</th>\n",
" <th>Other</th>\n",
" <td>22.423795</td>\n",
" <td>24.910521</td>\n",
" <td>29.208456</td>\n",
" <td>5.431662</td>\n",
" <td>2.919922</td>\n",
" <td>2114.110128</td>\n",
" <td>1.243854</td>\n",
" <td>7.474926</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>24.003623</td>\n",
" <td>26.496377</td>\n",
" <td>30.829710</td>\n",
" <td>5.532609</td>\n",
" <td>3.419922</td>\n",
" <td>2031.884058</td>\n",
" <td>15.731884</td>\n",
" <td>6.304348</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2018</th>\n",
" <th>Other</th>\n",
" <td>22.310442</td>\n",
" <td>24.779868</td>\n",
" <td>29.042333</td>\n",
" <td>5.396990</td>\n",
" <td>2.886719</td>\n",
" <td>2121.448730</td>\n",
" <td>1.135466</td>\n",
" <td>7.391345</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>23.526690</td>\n",
" <td>25.925267</td>\n",
" <td>30.145907</td>\n",
" <td>5.597865</td>\n",
" <td>3.390625</td>\n",
" <td>2037.900356</td>\n",
" <td>12.537367</td>\n",
" <td>5.601423</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2019</th>\n",
" <th>Other</th>\n",
" <td>23.084221</td>\n",
" <td>25.456922</td>\n",
" <td>29.560503</td>\n",
" <td>5.315586</td>\n",
" <td>2.839844</td>\n",
" <td>2093.659245</td>\n",
" <td>2.581801</td>\n",
" <td>7.545983</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>24.169014</td>\n",
" <td>26.250000</td>\n",
" <td>30.042254</td>\n",
" <td>5.559859</td>\n",
" <td>3.419922</td>\n",
" <td>2093.133803</td>\n",
" <td>16.419014</td>\n",
" <td>5.647887</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2020</th>\n",
" <th>Other</th>\n",
" <td>22.579487</td>\n",
" <td>25.174359</td>\n",
" <td>29.543590</td>\n",
" <td>5.148718</td>\n",
" <td>2.693359</td>\n",
" <td>2050.256410</td>\n",
" <td>2.446154</td>\n",
" <td>7.743590</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>24.071429</td>\n",
" <td>26.571429</td>\n",
" <td>30.642857</td>\n",
" <td>4.000000</td>\n",
" <td>1.978516</td>\n",
" <td>1650.000000</td>\n",
" <td>0.000000</td>\n",
" <td>7.785714</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>74 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 cylinders displ \\\n",
"year country \n",
"1984 Other 19.384615 21.417330 24.847038 4.908046 2.691406 \n",
" US 16.079232 17.797119 20.669868 6.033613 3.808594 \n",
"1985 Other 19.284768 21.373068 24.816777 4.871965 2.636719 \n",
" US 16.275472 18.025157 21.020126 5.949686 3.765625 \n",
"1986 Other 19.167183 21.213622 24.650155 4.804954 2.537109 \n",
" US 15.945035 17.645390 20.464539 6.136525 3.925781 \n",
"1987 Other 18.633381 20.710414 24.186876 4.825963 2.583984 \n",
" US 15.611722 17.326007 20.208791 6.164835 3.931641 \n",
"1988 Other 18.668224 20.814642 24.437695 4.819315 2.531250 \n",
" US 15.577869 17.372951 20.420082 6.307377 4.066406 \n",
"... ... ... ... ... ... \n",
"2016 Other 21.903749 24.439716 28.866261 5.493414 2.992188 \n",
" US 25.061818 27.701818 32.265455 5.356364 3.277344 \n",
"2017 Other 22.423795 24.910521 29.208456 5.431662 2.919922 \n",
" US 24.003623 26.496377 30.829710 5.532609 3.419922 \n",
"2018 Other 22.310442 24.779868 29.042333 5.396990 2.886719 \n",
" US 23.526690 25.925267 30.145907 5.597865 3.390625 \n",
"2019 Other 23.084221 25.456922 29.560503 5.315586 2.839844 \n",
" US 24.169014 26.250000 30.042254 5.559859 3.419922 \n",
"2020 Other 22.579487 25.174359 29.543590 5.148718 2.693359 \n",
" US 24.071429 26.571429 30.642857 4.000000 1.978516 \n",
"\n",
" fuelCost08 range speeds \n",
"year country \n",
"1984 Other 2118.125553 0.000000 3.969054 \n",
" US 2578.871549 0.000000 3.872749 \n",
"1985 Other 2141.997792 0.000000 3.958057 \n",
" US 2553.899371 0.000000 3.886792 \n",
"1986 Other 2149.148607 0.000000 4.069659 \n",
" US 2588.741135 0.000000 3.886525 \n",
"1987 Other 2227.318117 0.000000 4.142653 \n",
" US 2630.036630 0.000000 3.902930 \n",
"1988 Other 2207.476636 0.000000 4.205607 \n",
" US 2623.258197 0.000000 4.028689 \n",
"... ... ... ... \n",
"2016 Other 2127.608916 1.017224 7.296859 \n",
" US 1960.545455 17.214545 6.301818 \n",
"2017 Other 2114.110128 1.243854 7.474926 \n",
" US 2031.884058 15.731884 6.304348 \n",
"2018 Other 2121.448730 1.135466 7.391345 \n",
" US 2037.900356 12.537367 5.601423 \n",
"2019 Other 2093.659245 2.581801 7.545983 \n",
" US 2093.133803 16.419014 5.647887 \n",
"2020 Other 2050.256410 2.446154 7.743590 \n",
" US 1650.000000 0.000000 7.785714 \n",
"\n",
"[74 rows x 8 columns]"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add country\n",
"(autos2\n",
" .assign(country=autos2.make.apply(country))\n",
" .groupby(['year', 'country'])\n",
" .mean()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe thead tr:last-of-type th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th colspan=\"3\" halign=\"left\">city08</th>\n",
" <th colspan=\"3\" halign=\"left\">comb08</th>\n",
" <th colspan=\"3\" halign=\"left\">highway08</th>\n",
" <th>cylinders</th>\n",
" <th>...</th>\n",
" <th>displ</th>\n",
" <th colspan=\"3\" halign=\"left\">fuelCost08</th>\n",
" <th colspan=\"3\" halign=\"left\">range</th>\n",
" <th colspan=\"3\" halign=\"left\">speeds</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th>min</th>\n",
" <th>mean</th>\n",
" <th>second_to_last</th>\n",
" <th>min</th>\n",
" <th>mean</th>\n",
" <th>second_to_last</th>\n",
" <th>min</th>\n",
" <th>mean</th>\n",
" <th>second_to_last</th>\n",
" <th>min</th>\n",
" <th>...</th>\n",
" <th>second_to_last</th>\n",
" <th>min</th>\n",
" <th>mean</th>\n",
" <th>second_to_last</th>\n",
" <th>min</th>\n",
" <th>mean</th>\n",
" <th>second_to_last</th>\n",
" <th>min</th>\n",
" <th>mean</th>\n",
" <th>second_to_last</th>\n",
" </tr>\n",
" <tr>\n",
" <th>year</th>\n",
" <th>country</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1984</th>\n",
" <th>Other</th>\n",
" <td>7</td>\n",
" <td>19.384615</td>\n",
" <td>14</td>\n",
" <td>8</td>\n",
" <td>21.417330</td>\n",
" <td>14</td>\n",
" <td>9</td>\n",
" <td>24.847038</td>\n",
" <td>15</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>2.400391</td>\n",
" <td>1050</td>\n",
" <td>2118.125553</td>\n",
" <td>3000</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>3.969054</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>8</td>\n",
" <td>16.079232</td>\n",
" <td>15</td>\n",
" <td>9</td>\n",
" <td>17.797119</td>\n",
" <td>17</td>\n",
" <td>10</td>\n",
" <td>20.669868</td>\n",
" <td>19</td>\n",
" <td>4</td>\n",
" <td>...</td>\n",
" <td>4.101562</td>\n",
" <td>1200</td>\n",
" <td>2578.871549</td>\n",
" <td>2500</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>3.872749</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1985</th>\n",
" <th>Other</th>\n",
" <td>7</td>\n",
" <td>19.284768</td>\n",
" <td>19</td>\n",
" <td>8</td>\n",
" <td>21.373068</td>\n",
" <td>20</td>\n",
" <td>9</td>\n",
" <td>24.816777</td>\n",
" <td>22</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>2.000000</td>\n",
" <td>1000</td>\n",
" <td>2141.997792</td>\n",
" <td>2100</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>3.958057</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>8</td>\n",
" <td>16.275472</td>\n",
" <td>14</td>\n",
" <td>10</td>\n",
" <td>18.025157</td>\n",
" <td>15</td>\n",
" <td>10</td>\n",
" <td>21.020126</td>\n",
" <td>17</td>\n",
" <td>3</td>\n",
" <td>...</td>\n",
" <td>3.699219</td>\n",
" <td>1000</td>\n",
" <td>2553.899371</td>\n",
" <td>2800</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>3.886792</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1986</th>\n",
" <th>Other</th>\n",
" <td>6</td>\n",
" <td>19.167183</td>\n",
" <td>10</td>\n",
" <td>7</td>\n",
" <td>21.213622</td>\n",
" <td>11</td>\n",
" <td>9</td>\n",
" <td>24.650155</td>\n",
" <td>12</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>4.199219</td>\n",
" <td>900</td>\n",
" <td>2149.148607</td>\n",
" <td>3850</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>4.069659</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>9</td>\n",
" <td>15.945035</td>\n",
" <td>16</td>\n",
" <td>10</td>\n",
" <td>17.645390</td>\n",
" <td>17</td>\n",
" <td>11</td>\n",
" <td>20.464539</td>\n",
" <td>19</td>\n",
" <td>3</td>\n",
" <td>...</td>\n",
" <td>4.300781</td>\n",
" <td>900</td>\n",
" <td>2588.741135</td>\n",
" <td>2500</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>3.886525</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1987</th>\n",
" <th>Other</th>\n",
" <td>6</td>\n",
" <td>18.633381</td>\n",
" <td>12</td>\n",
" <td>7</td>\n",
" <td>20.710414</td>\n",
" <td>12</td>\n",
" <td>9</td>\n",
" <td>24.186876</td>\n",
" <td>12</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>2.400391</td>\n",
" <td>900</td>\n",
" <td>2227.318117</td>\n",
" <td>3500</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>4.142653</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>8</td>\n",
" <td>15.611722</td>\n",
" <td>12</td>\n",
" <td>9</td>\n",
" <td>17.326007</td>\n",
" <td>13</td>\n",
" <td>10</td>\n",
" <td>20.208791</td>\n",
" <td>14</td>\n",
" <td>3</td>\n",
" <td>...</td>\n",
" <td>2.800781</td>\n",
" <td>900</td>\n",
" <td>2630.036630</td>\n",
" <td>3250</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>3.902930</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">1988</th>\n",
" <th>Other</th>\n",
" <td>6</td>\n",
" <td>18.668224</td>\n",
" <td>12</td>\n",
" <td>7</td>\n",
" <td>20.814642</td>\n",
" <td>12</td>\n",
" <td>10</td>\n",
" <td>24.437695</td>\n",
" <td>12</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>2.400391</td>\n",
" <td>950</td>\n",
" <td>2207.476636</td>\n",
" <td>3500</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>4.205607</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>8</td>\n",
" <td>15.577869</td>\n",
" <td>14</td>\n",
" <td>9</td>\n",
" <td>17.372951</td>\n",
" <td>14</td>\n",
" <td>10</td>\n",
" <td>20.420082</td>\n",
" <td>15</td>\n",
" <td>3</td>\n",
" <td>...</td>\n",
" <td>2.800781</td>\n",
" <td>900</td>\n",
" <td>2623.258197</td>\n",
" <td>3000</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>4.028689</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2016</th>\n",
" <th>Other</th>\n",
" <td>10</td>\n",
" <td>21.903749</td>\n",
" <td>28</td>\n",
" <td>12</td>\n",
" <td>24.439716</td>\n",
" <td>30</td>\n",
" <td>13</td>\n",
" <td>28.866261</td>\n",
" <td>32</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>2.000000</td>\n",
" <td>550</td>\n",
" <td>2127.608916</td>\n",
" <td>1700</td>\n",
" <td>0</td>\n",
" <td>1.017224</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>7.296859</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>11</td>\n",
" <td>25.061818</td>\n",
" <td>91</td>\n",
" <td>12</td>\n",
" <td>27.701818</td>\n",
" <td>93</td>\n",
" <td>16</td>\n",
" <td>32.265455</td>\n",
" <td>94</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>550</td>\n",
" <td>1960.545455</td>\n",
" <td>700</td>\n",
" <td>0</td>\n",
" <td>17.214545</td>\n",
" <td>200</td>\n",
" <td>1</td>\n",
" <td>6.301818</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2017</th>\n",
" <th>Other</th>\n",
" <td>10</td>\n",
" <td>22.423795</td>\n",
" <td>21</td>\n",
" <td>11</td>\n",
" <td>24.910521</td>\n",
" <td>24</td>\n",
" <td>11</td>\n",
" <td>29.208456</td>\n",
" <td>28</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>2.500000</td>\n",
" <td>500</td>\n",
" <td>2114.110128</td>\n",
" <td>2150</td>\n",
" <td>0</td>\n",
" <td>1.243854</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>7.474926</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>11</td>\n",
" <td>24.003623</td>\n",
" <td>131</td>\n",
" <td>12</td>\n",
" <td>26.496377</td>\n",
" <td>126</td>\n",
" <td>15</td>\n",
" <td>30.829710</td>\n",
" <td>120</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>500</td>\n",
" <td>2031.884058</td>\n",
" <td>500</td>\n",
" <td>0</td>\n",
" <td>15.731884</td>\n",
" <td>310</td>\n",
" <td>0</td>\n",
" <td>6.304348</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2018</th>\n",
" <th>Other</th>\n",
" <td>9</td>\n",
" <td>22.310442</td>\n",
" <td>11</td>\n",
" <td>11</td>\n",
" <td>24.779868</td>\n",
" <td>12</td>\n",
" <td>11</td>\n",
" <td>29.042333</td>\n",
" <td>15</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>5.000000</td>\n",
" <td>500</td>\n",
" <td>2121.448730</td>\n",
" <td>4300</td>\n",
" <td>0</td>\n",
" <td>1.135466</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.391345</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>11</td>\n",
" <td>23.526690</td>\n",
" <td>120</td>\n",
" <td>14</td>\n",
" <td>25.925267</td>\n",
" <td>116</td>\n",
" <td>15</td>\n",
" <td>30.145907</td>\n",
" <td>112</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>500</td>\n",
" <td>2037.900356</td>\n",
" <td>550</td>\n",
" <td>0</td>\n",
" <td>12.537367</td>\n",
" <td>310</td>\n",
" <td>0</td>\n",
" <td>5.601423</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2019</th>\n",
" <th>Other</th>\n",
" <td>9</td>\n",
" <td>23.084221</td>\n",
" <td>19</td>\n",
" <td>11</td>\n",
" <td>25.456922</td>\n",
" <td>22</td>\n",
" <td>14</td>\n",
" <td>29.560503</td>\n",
" <td>27</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>3.000000</td>\n",
" <td>500</td>\n",
" <td>2093.659245</td>\n",
" <td>2150</td>\n",
" <td>0</td>\n",
" <td>2.581801</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.545983</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>11</td>\n",
" <td>24.169014</td>\n",
" <td>104</td>\n",
" <td>14</td>\n",
" <td>26.250000</td>\n",
" <td>104</td>\n",
" <td>15</td>\n",
" <td>30.042254</td>\n",
" <td>104</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>500</td>\n",
" <td>2093.133803</td>\n",
" <td>650</td>\n",
" <td>0</td>\n",
" <td>16.419014</td>\n",
" <td>345</td>\n",
" <td>0</td>\n",
" <td>5.647887</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">2020</th>\n",
" <th>Other</th>\n",
" <td>13</td>\n",
" <td>22.579487</td>\n",
" <td>17</td>\n",
" <td>15</td>\n",
" <td>25.174359</td>\n",
" <td>20</td>\n",
" <td>18</td>\n",
" <td>29.543590</td>\n",
" <td>24</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>3.000000</td>\n",
" <td>600</td>\n",
" <td>2050.256410</td>\n",
" <td>2100</td>\n",
" <td>0</td>\n",
" <td>2.446154</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.743590</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>20</td>\n",
" <td>24.071429</td>\n",
" <td>21</td>\n",
" <td>22</td>\n",
" <td>26.571429</td>\n",
" <td>24</td>\n",
" <td>26</td>\n",
" <td>30.642857</td>\n",
" <td>28</td>\n",
" <td>4</td>\n",
" <td>...</td>\n",
" <td>2.300781</td>\n",
" <td>1300</td>\n",
" <td>1650.000000</td>\n",
" <td>1750</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.785714</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>74 rows × 24 columns</p>\n",
"</div>"
],
"text/plain": [
" city08 comb08 \\\n",
" min mean second_to_last min mean \n",
"year country \n",
"1984 Other 7 19.384615 14 8 21.417330 \n",
" US 8 16.079232 15 9 17.797119 \n",
"1985 Other 7 19.284768 19 8 21.373068 \n",
" US 8 16.275472 14 10 18.025157 \n",
"1986 Other 6 19.167183 10 7 21.213622 \n",
" US 9 15.945035 16 10 17.645390 \n",
"1987 Other 6 18.633381 12 7 20.710414 \n",
" US 8 15.611722 12 9 17.326007 \n",
"1988 Other 6 18.668224 12 7 20.814642 \n",
" US 8 15.577869 14 9 17.372951 \n",
"... ... ... ... ... ... \n",
"2016 Other 10 21.903749 28 12 24.439716 \n",
" US 11 25.061818 91 12 27.701818 \n",
"2017 Other 10 22.423795 21 11 24.910521 \n",
" US 11 24.003623 131 12 26.496377 \n",
"2018 Other 9 22.310442 11 11 24.779868 \n",
" US 11 23.526690 120 14 25.925267 \n",
"2019 Other 9 23.084221 19 11 25.456922 \n",
" US 11 24.169014 104 14 26.250000 \n",
"2020 Other 13 22.579487 17 15 25.174359 \n",
" US 20 24.071429 21 22 26.571429 \n",
"\n",
" highway08 cylinders \\\n",
" second_to_last min mean second_to_last min \n",
"year country \n",
"1984 Other 14 9 24.847038 15 2 \n",
" US 17 10 20.669868 19 4 \n",
"1985 Other 20 9 24.816777 22 0 \n",
" US 15 10 21.020126 17 3 \n",
"1986 Other 11 9 24.650155 12 0 \n",
" US 17 11 20.464539 19 3 \n",
"1987 Other 12 9 24.186876 12 2 \n",
" US 13 10 20.208791 14 3 \n",
"1988 Other 12 10 24.437695 12 2 \n",
" US 14 10 20.420082 15 3 \n",
"... ... ... ... ... ... \n",
"2016 Other 30 13 28.866261 32 0 \n",
" US 93 16 32.265455 94 0 \n",
"2017 Other 24 11 29.208456 28 0 \n",
" US 126 15 30.829710 120 0 \n",
"2018 Other 12 11 29.042333 15 0 \n",
" US 116 15 30.145907 112 0 \n",
"2019 Other 22 14 29.560503 27 0 \n",
" US 104 15 30.042254 104 0 \n",
"2020 Other 20 18 29.543590 24 0 \n",
" US 24 26 30.642857 28 4 \n",
"\n",
" ... displ fuelCost08 range \\\n",
" ... second_to_last min mean second_to_last min \n",
"year country ... \n",
"1984 Other ... 2.400391 1050 2118.125553 3000 0 \n",
" US ... 4.101562 1200 2578.871549 2500 0 \n",
"1985 Other ... 2.000000 1000 2141.997792 2100 0 \n",
" US ... 3.699219 1000 2553.899371 2800 0 \n",
"1986 Other ... 4.199219 900 2149.148607 3850 0 \n",
" US ... 4.300781 900 2588.741135 2500 0 \n",
"1987 Other ... 2.400391 900 2227.318117 3500 0 \n",
" US ... 2.800781 900 2630.036630 3250 0 \n",
"1988 Other ... 2.400391 950 2207.476636 3500 0 \n",
" US ... 2.800781 900 2623.258197 3000 0 \n",
"... ... ... ... ... ... ... \n",
"2016 Other ... 2.000000 550 2127.608916 1700 0 \n",
" US ... 0.000000 550 1960.545455 700 0 \n",
"2017 Other ... 2.500000 500 2114.110128 2150 0 \n",
" US ... 0.000000 500 2031.884058 500 0 \n",
"2018 Other ... 5.000000 500 2121.448730 4300 0 \n",
" US ... 0.000000 500 2037.900356 550 0 \n",
"2019 Other ... 3.000000 500 2093.659245 2150 0 \n",
" US ... 0.000000 500 2093.133803 650 0 \n",
"2020 Other ... 3.000000 600 2050.256410 2100 0 \n",
" US ... 2.300781 1300 1650.000000 1750 0 \n",
"\n",
" speeds \n",
" mean second_to_last min mean second_to_last \n",
"year country \n",
"1984 Other 0.000000 0 3 3.969054 5 \n",
" US 0.000000 0 3 3.872749 4 \n",
"1985 Other 0.000000 0 3 3.958057 4 \n",
" US 0.000000 0 3 3.886792 4 \n",
"1986 Other 0.000000 0 3 4.069659 4 \n",
" US 0.000000 0 3 3.886525 4 \n",
"1987 Other 0.000000 0 3 4.142653 4 \n",
" US 0.000000 0 3 3.902930 4 \n",
"1988 Other 0.000000 0 3 4.205607 4 \n",
" US 0.000000 0 3 4.028689 4 \n",
"... ... ... ... ... ... \n",
"2016 Other 1.017224 0 1 7.296859 7 \n",
" US 17.214545 200 1 6.301818 1 \n",
"2017 Other 1.243854 0 1 7.474926 7 \n",
" US 15.731884 310 0 6.304348 1 \n",
"2018 Other 1.135466 0 0 7.391345 0 \n",
" US 12.537367 310 0 5.601423 1 \n",
"2019 Other 2.581801 0 0 7.545983 8 \n",
" US 16.419014 345 0 5.647887 1 \n",
"2020 Other 2.446154 0 0 7.743590 0 \n",
" US 0.000000 0 0 7.785714 0 \n",
"\n",
"[74 rows x 24 columns]"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# can go deeper and apply multiple aggregates\n",
"def second_to_last(ser):\n",
" return ser.iloc[-2]\n",
"\n",
"(autos2\n",
" .assign(country=autos2.make.apply(country))\n",
" .groupby(['year', 'country'])\n",
" .agg(['min', 'mean', second_to_last])\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='year,country'>"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# back to simpler example, adding plots\n",
"(autos2\n",
" .assign(country=autos2.make.apply(country))\n",
" .groupby(['year', 'country'])\n",
" .mean()\n",
" .plot()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe thead tr:last-of-type th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th colspan=\"2\" halign=\"left\">city08</th>\n",
" <th colspan=\"2\" halign=\"left\">comb08</th>\n",
" <th colspan=\"2\" halign=\"left\">highway08</th>\n",
" <th colspan=\"2\" halign=\"left\">cylinders</th>\n",
" <th colspan=\"2\" halign=\"left\">displ</th>\n",
" <th colspan=\"2\" halign=\"left\">fuelCost08</th>\n",
" <th colspan=\"2\" halign=\"left\">range</th>\n",
" <th colspan=\"2\" halign=\"left\">speeds</th>\n",
" </tr>\n",
" <tr>\n",
" <th>country</th>\n",
" <th>Other</th>\n",
" <th>US</th>\n",
" <th>Other</th>\n",
" <th>US</th>\n",
" <th>Other</th>\n",
" <th>US</th>\n",
" <th>Other</th>\n",
" <th>US</th>\n",
" <th>Other</th>\n",
" <th>US</th>\n",
" <th>Other</th>\n",
" <th>US</th>\n",
" <th>Other</th>\n",
" <th>US</th>\n",
" <th>Other</th>\n",
" <th>US</th>\n",
" </tr>\n",
" <tr>\n",
" <th>year</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1984</th>\n",
" <td>19.384615</td>\n",
" <td>16.079232</td>\n",
" <td>21.417330</td>\n",
" <td>17.797119</td>\n",
" <td>24.847038</td>\n",
" <td>20.669868</td>\n",
" <td>4.908046</td>\n",
" <td>6.033613</td>\n",
" <td>2.691406</td>\n",
" <td>3.808594</td>\n",
" <td>2118.125553</td>\n",
" <td>2578.871549</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.969054</td>\n",
" <td>3.872749</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1985</th>\n",
" <td>19.284768</td>\n",
" <td>16.275472</td>\n",
" <td>21.373068</td>\n",
" <td>18.025157</td>\n",
" <td>24.816777</td>\n",
" <td>21.020126</td>\n",
" <td>4.871965</td>\n",
" <td>5.949686</td>\n",
" <td>2.636719</td>\n",
" <td>3.765625</td>\n",
" <td>2141.997792</td>\n",
" <td>2553.899371</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.958057</td>\n",
" <td>3.886792</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1986</th>\n",
" <td>19.167183</td>\n",
" <td>15.945035</td>\n",
" <td>21.213622</td>\n",
" <td>17.645390</td>\n",
" <td>24.650155</td>\n",
" <td>20.464539</td>\n",
" <td>4.804954</td>\n",
" <td>6.136525</td>\n",
" <td>2.537109</td>\n",
" <td>3.925781</td>\n",
" <td>2149.148607</td>\n",
" <td>2588.741135</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.069659</td>\n",
" <td>3.886525</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1987</th>\n",
" <td>18.633381</td>\n",
" <td>15.611722</td>\n",
" <td>20.710414</td>\n",
" <td>17.326007</td>\n",
" <td>24.186876</td>\n",
" <td>20.208791</td>\n",
" <td>4.825963</td>\n",
" <td>6.164835</td>\n",
" <td>2.583984</td>\n",
" <td>3.931641</td>\n",
" <td>2227.318117</td>\n",
" <td>2630.036630</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.142653</td>\n",
" <td>3.902930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1988</th>\n",
" <td>18.668224</td>\n",
" <td>15.577869</td>\n",
" <td>20.814642</td>\n",
" <td>17.372951</td>\n",
" <td>24.437695</td>\n",
" <td>20.420082</td>\n",
" <td>4.819315</td>\n",
" <td>6.307377</td>\n",
" <td>2.531250</td>\n",
" <td>4.066406</td>\n",
" <td>2207.476636</td>\n",
" <td>2623.258197</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.205607</td>\n",
" <td>4.028689</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1989</th>\n",
" <td>18.533040</td>\n",
" <td>15.139831</td>\n",
" <td>20.662261</td>\n",
" <td>16.908898</td>\n",
" <td>24.252570</td>\n",
" <td>19.887712</td>\n",
" <td>4.879589</td>\n",
" <td>6.366525</td>\n",
" <td>2.542969</td>\n",
" <td>4.171875</td>\n",
" <td>2250.000000</td>\n",
" <td>2698.093220</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.264317</td>\n",
" <td>4.025424</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1990</th>\n",
" <td>18.510109</td>\n",
" <td>14.850575</td>\n",
" <td>20.640747</td>\n",
" <td>16.577011</td>\n",
" <td>24.267496</td>\n",
" <td>19.485057</td>\n",
" <td>4.839813</td>\n",
" <td>6.466667</td>\n",
" <td>2.507812</td>\n",
" <td>4.265625</td>\n",
" <td>2238.258165</td>\n",
" <td>2728.735632</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.328149</td>\n",
" <td>4.105747</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1991</th>\n",
" <td>18.087943</td>\n",
" <td>14.803279</td>\n",
" <td>20.174468</td>\n",
" <td>16.599532</td>\n",
" <td>23.809929</td>\n",
" <td>19.683841</td>\n",
" <td>5.029787</td>\n",
" <td>6.538642</td>\n",
" <td>2.609375</td>\n",
" <td>4.351562</td>\n",
" <td>2348.581560</td>\n",
" <td>2725.761124</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.341844</td>\n",
" <td>4.234192</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1992</th>\n",
" <td>17.915374</td>\n",
" <td>14.895631</td>\n",
" <td>20.098731</td>\n",
" <td>16.735437</td>\n",
" <td>23.820874</td>\n",
" <td>20.063107</td>\n",
" <td>5.145275</td>\n",
" <td>6.446602</td>\n",
" <td>2.708984</td>\n",
" <td>4.250000</td>\n",
" <td>2373.272214</td>\n",
" <td>2703.762136</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.356841</td>\n",
" <td>4.252427</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1993</th>\n",
" <td>18.084866</td>\n",
" <td>15.007772</td>\n",
" <td>20.309760</td>\n",
" <td>16.896373</td>\n",
" <td>24.172560</td>\n",
" <td>20.230570</td>\n",
" <td>5.114569</td>\n",
" <td>6.497409</td>\n",
" <td>2.683594</td>\n",
" <td>4.281250</td>\n",
" <td>2333.097595</td>\n",
" <td>2677.202073</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.371994</td>\n",
" <td>4.279793</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1994</th>\n",
" <td>18.046474</td>\n",
" <td>14.952514</td>\n",
" <td>20.264423</td>\n",
" <td>16.829609</td>\n",
" <td>24.173077</td>\n",
" <td>20.201117</td>\n",
" <td>5.185897</td>\n",
" <td>6.608939</td>\n",
" <td>2.712891</td>\n",
" <td>4.414062</td>\n",
" <td>2326.041667</td>\n",
" <td>2697.625698</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.355769</td>\n",
" <td>4.293296</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1995</th>\n",
" <td>17.678914</td>\n",
" <td>14.533724</td>\n",
" <td>20.091054</td>\n",
" <td>16.422287</td>\n",
" <td>24.263578</td>\n",
" <td>19.747801</td>\n",
" <td>5.444089</td>\n",
" <td>6.715543</td>\n",
" <td>2.908203</td>\n",
" <td>4.507812</td>\n",
" <td>2355.191693</td>\n",
" <td>2759.677419</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.380192</td>\n",
" <td>4.313783</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1996</th>\n",
" <td>18.480545</td>\n",
" <td>14.926641</td>\n",
" <td>20.906615</td>\n",
" <td>16.961390</td>\n",
" <td>25.093385</td>\n",
" <td>20.544402</td>\n",
" <td>5.147860</td>\n",
" <td>6.579151</td>\n",
" <td>2.708984</td>\n",
" <td>4.277344</td>\n",
" <td>2250.291829</td>\n",
" <td>2622.586873</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.416342</td>\n",
" <td>4.262548</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1997</th>\n",
" <td>18.090909</td>\n",
" <td>14.978632</td>\n",
" <td>20.509470</td>\n",
" <td>16.991453</td>\n",
" <td>24.678030</td>\n",
" <td>20.683761</td>\n",
" <td>5.261364</td>\n",
" <td>6.581197</td>\n",
" <td>2.787109</td>\n",
" <td>4.218750</td>\n",
" <td>2319.128788</td>\n",
" <td>2600.427350</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.452652</td>\n",
" <td>4.290598</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1998</th>\n",
" <td>17.925267</td>\n",
" <td>15.288000</td>\n",
" <td>20.457295</td>\n",
" <td>17.408000</td>\n",
" <td>24.704626</td>\n",
" <td>20.944000</td>\n",
" <td>5.275801</td>\n",
" <td>6.436000</td>\n",
" <td>2.800781</td>\n",
" <td>4.105469</td>\n",
" <td>2295.373665</td>\n",
" <td>2578.800000</td>\n",
" <td>0.144128</td>\n",
" <td>0.420000</td>\n",
" <td>4.485765</td>\n",
" <td>4.272000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999</th>\n",
" <td>17.925125</td>\n",
" <td>15.709163</td>\n",
" <td>20.386023</td>\n",
" <td>17.756972</td>\n",
" <td>24.577371</td>\n",
" <td>21.099602</td>\n",
" <td>5.377704</td>\n",
" <td>6.362550</td>\n",
" <td>2.832031</td>\n",
" <td>4.042969</td>\n",
" <td>2312.728785</td>\n",
" <td>2582.470120</td>\n",
" <td>0.251248</td>\n",
" <td>1.334661</td>\n",
" <td>4.507488</td>\n",
" <td>4.215139</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2000</th>\n",
" <td>17.881849</td>\n",
" <td>15.714844</td>\n",
" <td>20.301370</td>\n",
" <td>17.757812</td>\n",
" <td>24.416096</td>\n",
" <td>21.128906</td>\n",
" <td>5.441781</td>\n",
" <td>6.332031</td>\n",
" <td>2.859375</td>\n",
" <td>3.978516</td>\n",
" <td>2385.958904</td>\n",
" <td>2529.492188</td>\n",
" <td>0.304795</td>\n",
" <td>0.449219</td>\n",
" <td>4.619863</td>\n",
" <td>4.253906</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2001</th>\n",
" <td>17.941267</td>\n",
" <td>15.643939</td>\n",
" <td>20.289026</td>\n",
" <td>17.496212</td>\n",
" <td>24.372488</td>\n",
" <td>20.768939</td>\n",
" <td>5.479134</td>\n",
" <td>6.310606</td>\n",
" <td>2.873047</td>\n",
" <td>3.976562</td>\n",
" <td>2399.536321</td>\n",
" <td>2568.371212</td>\n",
" <td>0.187017</td>\n",
" <td>0.443182</td>\n",
" <td>4.761978</td>\n",
" <td>4.412879</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2002</th>\n",
" <td>17.644412</td>\n",
" <td>15.083916</td>\n",
" <td>20.076923</td>\n",
" <td>16.979021</td>\n",
" <td>24.207547</td>\n",
" <td>20.195804</td>\n",
" <td>5.576197</td>\n",
" <td>6.433566</td>\n",
" <td>2.935547</td>\n",
" <td>4.058594</td>\n",
" <td>2425.689405</td>\n",
" <td>2610.139860</td>\n",
" <td>0.137881</td>\n",
" <td>0.132867</td>\n",
" <td>4.920174</td>\n",
" <td>4.367133</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2003</th>\n",
" <td>17.565101</td>\n",
" <td>14.826087</td>\n",
" <td>19.953020</td>\n",
" <td>16.628763</td>\n",
" <td>24.052349</td>\n",
" <td>19.806020</td>\n",
" <td>5.683221</td>\n",
" <td>6.588629</td>\n",
" <td>3.031250</td>\n",
" <td>4.171875</td>\n",
" <td>2480.604027</td>\n",
" <td>2637.625418</td>\n",
" <td>0.127517</td>\n",
" <td>0.000000</td>\n",
" <td>5.154362</td>\n",
" <td>4.307692</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2004</th>\n",
" <td>17.426290</td>\n",
" <td>14.928571</td>\n",
" <td>19.923833</td>\n",
" <td>16.805195</td>\n",
" <td>24.160934</td>\n",
" <td>20.165584</td>\n",
" <td>5.729730</td>\n",
" <td>6.558442</td>\n",
" <td>3.087891</td>\n",
" <td>4.199219</td>\n",
" <td>2476.719902</td>\n",
" <td>2607.305195</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>5.229730</td>\n",
" <td>4.308442</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2005</th>\n",
" <td>17.412170</td>\n",
" <td>15.196610</td>\n",
" <td>19.892078</td>\n",
" <td>17.132203</td>\n",
" <td>24.189437</td>\n",
" <td>20.664407</td>\n",
" <td>5.773823</td>\n",
" <td>6.447458</td>\n",
" <td>3.152344</td>\n",
" <td>4.132812</td>\n",
" <td>2493.455798</td>\n",
" <td>2592.881356</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>5.362801</td>\n",
" <td>4.688136</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2006</th>\n",
" <td>17.062575</td>\n",
" <td>15.300366</td>\n",
" <td>19.509025</td>\n",
" <td>17.285714</td>\n",
" <td>23.762936</td>\n",
" <td>20.875458</td>\n",
" <td>5.977136</td>\n",
" <td>6.476190</td>\n",
" <td>3.345703</td>\n",
" <td>4.171875</td>\n",
" <td>2527.496992</td>\n",
" <td>2574.725275</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>5.492178</td>\n",
" <td>4.776557</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2007</th>\n",
" <td>16.996403</td>\n",
" <td>15.489726</td>\n",
" <td>19.452038</td>\n",
" <td>17.626712</td>\n",
" <td>23.742206</td>\n",
" <td>21.202055</td>\n",
" <td>6.044365</td>\n",
" <td>6.513699</td>\n",
" <td>3.423828</td>\n",
" <td>4.210938</td>\n",
" <td>2544.664269</td>\n",
" <td>2510.958904</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>5.864508</td>\n",
" <td>4.883562</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2008</th>\n",
" <td>17.239869</td>\n",
" <td>15.770073</td>\n",
" <td>19.677985</td>\n",
" <td>17.937956</td>\n",
" <td>23.983571</td>\n",
" <td>21.697080</td>\n",
" <td>6.095290</td>\n",
" <td>6.518248</td>\n",
" <td>3.462891</td>\n",
" <td>4.222656</td>\n",
" <td>2551.369113</td>\n",
" <td>2486.678832</td>\n",
" <td>0.109529</td>\n",
" <td>0.000000</td>\n",
" <td>5.969332</td>\n",
" <td>5.120438</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2009</th>\n",
" <td>17.696803</td>\n",
" <td>16.148014</td>\n",
" <td>20.186329</td>\n",
" <td>18.259928</td>\n",
" <td>24.590959</td>\n",
" <td>22.140794</td>\n",
" <td>5.970232</td>\n",
" <td>6.620939</td>\n",
" <td>3.402344</td>\n",
" <td>4.351562</td>\n",
" <td>2433.076075</td>\n",
" <td>2407.220217</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>6.189636</td>\n",
" <td>5.563177</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010</th>\n",
" <td>18.325342</td>\n",
" <td>17.278970</td>\n",
" <td>20.851598</td>\n",
" <td>19.600858</td>\n",
" <td>25.256849</td>\n",
" <td>23.785408</td>\n",
" <td>5.897260</td>\n",
" <td>6.223176</td>\n",
" <td>3.357422</td>\n",
" <td>4.050781</td>\n",
" <td>2374.429224</td>\n",
" <td>2263.304721</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>6.378995</td>\n",
" <td>5.866953</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011</th>\n",
" <td>19.247387</td>\n",
" <td>16.817844</td>\n",
" <td>21.635308</td>\n",
" <td>19.014870</td>\n",
" <td>25.855981</td>\n",
" <td>22.973978</td>\n",
" <td>5.851336</td>\n",
" <td>6.394052</td>\n",
" <td>3.320312</td>\n",
" <td>4.167969</td>\n",
" <td>2326.248548</td>\n",
" <td>2358.736059</td>\n",
" <td>0.340302</td>\n",
" <td>0.000000</td>\n",
" <td>6.714286</td>\n",
" <td>6.066914</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012</th>\n",
" <td>19.838052</td>\n",
" <td>17.802974</td>\n",
" <td>22.339751</td>\n",
" <td>20.111524</td>\n",
" <td>26.695357</td>\n",
" <td>24.167286</td>\n",
" <td>5.792752</td>\n",
" <td>6.297398</td>\n",
" <td>3.269531</td>\n",
" <td>4.085938</td>\n",
" <td>2282.502831</td>\n",
" <td>2314.498141</td>\n",
" <td>0.634202</td>\n",
" <td>1.267658</td>\n",
" <td>6.834655</td>\n",
" <td>6.286245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013</th>\n",
" <td>20.982888</td>\n",
" <td>19.453815</td>\n",
" <td>23.471658</td>\n",
" <td>21.823293</td>\n",
" <td>27.860963</td>\n",
" <td>26.164659</td>\n",
" <td>5.658824</td>\n",
" <td>6.152610</td>\n",
" <td>3.179688</td>\n",
" <td>3.884766</td>\n",
" <td>2208.288770</td>\n",
" <td>2220.080321</td>\n",
" <td>0.853476</td>\n",
" <td>2.763052</td>\n",
" <td>7.033155</td>\n",
" <td>6.385542</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2014</th>\n",
" <td>21.159919</td>\n",
" <td>20.506329</td>\n",
" <td>23.655870</td>\n",
" <td>23.012658</td>\n",
" <td>28.088057</td>\n",
" <td>27.523207</td>\n",
" <td>5.719636</td>\n",
" <td>5.852321</td>\n",
" <td>3.210938</td>\n",
" <td>3.615234</td>\n",
" <td>2212.196356</td>\n",
" <td>2139.029536</td>\n",
" <td>0.859312</td>\n",
" <td>3.683544</td>\n",
" <td>7.210526</td>\n",
" <td>6.046414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015</th>\n",
" <td>21.350000</td>\n",
" <td>21.817490</td>\n",
" <td>23.935294</td>\n",
" <td>24.441065</td>\n",
" <td>28.481373</td>\n",
" <td>28.996198</td>\n",
" <td>5.604902</td>\n",
" <td>5.752852</td>\n",
" <td>3.101562</td>\n",
" <td>3.605469</td>\n",
" <td>2164.215686</td>\n",
" <td>2089.353612</td>\n",
" <td>0.638235</td>\n",
" <td>8.296578</td>\n",
" <td>7.211765</td>\n",
" <td>6.353612</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016</th>\n",
" <td>21.903749</td>\n",
" <td>25.061818</td>\n",
" <td>24.439716</td>\n",
" <td>27.701818</td>\n",
" <td>28.866261</td>\n",
" <td>32.265455</td>\n",
" <td>5.493414</td>\n",
" <td>5.356364</td>\n",
" <td>2.992188</td>\n",
" <td>3.277344</td>\n",
" <td>2127.608916</td>\n",
" <td>1960.545455</td>\n",
" <td>1.017224</td>\n",
" <td>17.214545</td>\n",
" <td>7.296859</td>\n",
" <td>6.301818</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2017</th>\n",
" <td>22.423795</td>\n",
" <td>24.003623</td>\n",
" <td>24.910521</td>\n",
" <td>26.496377</td>\n",
" <td>29.208456</td>\n",
" <td>30.829710</td>\n",
" <td>5.431662</td>\n",
" <td>5.532609</td>\n",
" <td>2.919922</td>\n",
" <td>3.419922</td>\n",
" <td>2114.110128</td>\n",
" <td>2031.884058</td>\n",
" <td>1.243854</td>\n",
" <td>15.731884</td>\n",
" <td>7.474926</td>\n",
" <td>6.304348</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018</th>\n",
" <td>22.310442</td>\n",
" <td>23.526690</td>\n",
" <td>24.779868</td>\n",
" <td>25.925267</td>\n",
" <td>29.042333</td>\n",
" <td>30.145907</td>\n",
" <td>5.396990</td>\n",
" <td>5.597865</td>\n",
" <td>2.886719</td>\n",
" <td>3.390625</td>\n",
" <td>2121.448730</td>\n",
" <td>2037.900356</td>\n",
" <td>1.135466</td>\n",
" <td>12.537367</td>\n",
" <td>7.391345</td>\n",
" <td>5.601423</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019</th>\n",
" <td>23.084221</td>\n",
" <td>24.169014</td>\n",
" <td>25.456922</td>\n",
" <td>26.250000</td>\n",
" <td>29.560503</td>\n",
" <td>30.042254</td>\n",
" <td>5.315586</td>\n",
" <td>5.559859</td>\n",
" <td>2.839844</td>\n",
" <td>3.419922</td>\n",
" <td>2093.659245</td>\n",
" <td>2093.133803</td>\n",
" <td>2.581801</td>\n",
" <td>16.419014</td>\n",
" <td>7.545983</td>\n",
" <td>5.647887</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020</th>\n",
" <td>22.579487</td>\n",
" <td>24.071429</td>\n",
" <td>25.174359</td>\n",
" <td>26.571429</td>\n",
" <td>29.543590</td>\n",
" <td>30.642857</td>\n",
" <td>5.148718</td>\n",
" <td>4.000000</td>\n",
" <td>2.693359</td>\n",
" <td>1.978516</td>\n",
" <td>2050.256410</td>\n",
" <td>1650.000000</td>\n",
" <td>2.446154</td>\n",
" <td>0.000000</td>\n",
" <td>7.743590</td>\n",
" <td>7.785714</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city08 comb08 highway08 \\\n",
"country Other US Other US Other US \n",
"year \n",
"1984 19.384615 16.079232 21.417330 17.797119 24.847038 20.669868 \n",
"1985 19.284768 16.275472 21.373068 18.025157 24.816777 21.020126 \n",
"1986 19.167183 15.945035 21.213622 17.645390 24.650155 20.464539 \n",
"1987 18.633381 15.611722 20.710414 17.326007 24.186876 20.208791 \n",
"1988 18.668224 15.577869 20.814642 17.372951 24.437695 20.420082 \n",
"1989 18.533040 15.139831 20.662261 16.908898 24.252570 19.887712 \n",
"1990 18.510109 14.850575 20.640747 16.577011 24.267496 19.485057 \n",
"1991 18.087943 14.803279 20.174468 16.599532 23.809929 19.683841 \n",
"1992 17.915374 14.895631 20.098731 16.735437 23.820874 20.063107 \n",
"1993 18.084866 15.007772 20.309760 16.896373 24.172560 20.230570 \n",
"1994 18.046474 14.952514 20.264423 16.829609 24.173077 20.201117 \n",
"1995 17.678914 14.533724 20.091054 16.422287 24.263578 19.747801 \n",
"1996 18.480545 14.926641 20.906615 16.961390 25.093385 20.544402 \n",
"1997 18.090909 14.978632 20.509470 16.991453 24.678030 20.683761 \n",
"1998 17.925267 15.288000 20.457295 17.408000 24.704626 20.944000 \n",
"1999 17.925125 15.709163 20.386023 17.756972 24.577371 21.099602 \n",
"2000 17.881849 15.714844 20.301370 17.757812 24.416096 21.128906 \n",
"2001 17.941267 15.643939 20.289026 17.496212 24.372488 20.768939 \n",
"2002 17.644412 15.083916 20.076923 16.979021 24.207547 20.195804 \n",
"2003 17.565101 14.826087 19.953020 16.628763 24.052349 19.806020 \n",
"2004 17.426290 14.928571 19.923833 16.805195 24.160934 20.165584 \n",
"2005 17.412170 15.196610 19.892078 17.132203 24.189437 20.664407 \n",
"2006 17.062575 15.300366 19.509025 17.285714 23.762936 20.875458 \n",
"2007 16.996403 15.489726 19.452038 17.626712 23.742206 21.202055 \n",
"2008 17.239869 15.770073 19.677985 17.937956 23.983571 21.697080 \n",
"2009 17.696803 16.148014 20.186329 18.259928 24.590959 22.140794 \n",
"2010 18.325342 17.278970 20.851598 19.600858 25.256849 23.785408 \n",
"2011 19.247387 16.817844 21.635308 19.014870 25.855981 22.973978 \n",
"2012 19.838052 17.802974 22.339751 20.111524 26.695357 24.167286 \n",
"2013 20.982888 19.453815 23.471658 21.823293 27.860963 26.164659 \n",
"2014 21.159919 20.506329 23.655870 23.012658 28.088057 27.523207 \n",
"2015 21.350000 21.817490 23.935294 24.441065 28.481373 28.996198 \n",
"2016 21.903749 25.061818 24.439716 27.701818 28.866261 32.265455 \n",
"2017 22.423795 24.003623 24.910521 26.496377 29.208456 30.829710 \n",
"2018 22.310442 23.526690 24.779868 25.925267 29.042333 30.145907 \n",
"2019 23.084221 24.169014 25.456922 26.250000 29.560503 30.042254 \n",
"2020 22.579487 24.071429 25.174359 26.571429 29.543590 30.642857 \n",
"\n",
" cylinders displ fuelCost08 \\\n",
"country Other US Other US Other US \n",
"year \n",
"1984 4.908046 6.033613 2.691406 3.808594 2118.125553 2578.871549 \n",
"1985 4.871965 5.949686 2.636719 3.765625 2141.997792 2553.899371 \n",
"1986 4.804954 6.136525 2.537109 3.925781 2149.148607 2588.741135 \n",
"1987 4.825963 6.164835 2.583984 3.931641 2227.318117 2630.036630 \n",
"1988 4.819315 6.307377 2.531250 4.066406 2207.476636 2623.258197 \n",
"1989 4.879589 6.366525 2.542969 4.171875 2250.000000 2698.093220 \n",
"1990 4.839813 6.466667 2.507812 4.265625 2238.258165 2728.735632 \n",
"1991 5.029787 6.538642 2.609375 4.351562 2348.581560 2725.761124 \n",
"1992 5.145275 6.446602 2.708984 4.250000 2373.272214 2703.762136 \n",
"1993 5.114569 6.497409 2.683594 4.281250 2333.097595 2677.202073 \n",
"1994 5.185897 6.608939 2.712891 4.414062 2326.041667 2697.625698 \n",
"1995 5.444089 6.715543 2.908203 4.507812 2355.191693 2759.677419 \n",
"1996 5.147860 6.579151 2.708984 4.277344 2250.291829 2622.586873 \n",
"1997 5.261364 6.581197 2.787109 4.218750 2319.128788 2600.427350 \n",
"1998 5.275801 6.436000 2.800781 4.105469 2295.373665 2578.800000 \n",
"1999 5.377704 6.362550 2.832031 4.042969 2312.728785 2582.470120 \n",
"2000 5.441781 6.332031 2.859375 3.978516 2385.958904 2529.492188 \n",
"2001 5.479134 6.310606 2.873047 3.976562 2399.536321 2568.371212 \n",
"2002 5.576197 6.433566 2.935547 4.058594 2425.689405 2610.139860 \n",
"2003 5.683221 6.588629 3.031250 4.171875 2480.604027 2637.625418 \n",
"2004 5.729730 6.558442 3.087891 4.199219 2476.719902 2607.305195 \n",
"2005 5.773823 6.447458 3.152344 4.132812 2493.455798 2592.881356 \n",
"2006 5.977136 6.476190 3.345703 4.171875 2527.496992 2574.725275 \n",
"2007 6.044365 6.513699 3.423828 4.210938 2544.664269 2510.958904 \n",
"2008 6.095290 6.518248 3.462891 4.222656 2551.369113 2486.678832 \n",
"2009 5.970232 6.620939 3.402344 4.351562 2433.076075 2407.220217 \n",
"2010 5.897260 6.223176 3.357422 4.050781 2374.429224 2263.304721 \n",
"2011 5.851336 6.394052 3.320312 4.167969 2326.248548 2358.736059 \n",
"2012 5.792752 6.297398 3.269531 4.085938 2282.502831 2314.498141 \n",
"2013 5.658824 6.152610 3.179688 3.884766 2208.288770 2220.080321 \n",
"2014 5.719636 5.852321 3.210938 3.615234 2212.196356 2139.029536 \n",
"2015 5.604902 5.752852 3.101562 3.605469 2164.215686 2089.353612 \n",
"2016 5.493414 5.356364 2.992188 3.277344 2127.608916 1960.545455 \n",
"2017 5.431662 5.532609 2.919922 3.419922 2114.110128 2031.884058 \n",
"2018 5.396990 5.597865 2.886719 3.390625 2121.448730 2037.900356 \n",
"2019 5.315586 5.559859 2.839844 3.419922 2093.659245 2093.133803 \n",
"2020 5.148718 4.000000 2.693359 1.978516 2050.256410 1650.000000 \n",
"\n",
" range speeds \n",
"country Other US Other US \n",
"year \n",
"1984 0.000000 0.000000 3.969054 3.872749 \n",
"1985 0.000000 0.000000 3.958057 3.886792 \n",
"1986 0.000000 0.000000 4.069659 3.886525 \n",
"1987 0.000000 0.000000 4.142653 3.902930 \n",
"1988 0.000000 0.000000 4.205607 4.028689 \n",
"1989 0.000000 0.000000 4.264317 4.025424 \n",
"1990 0.000000 0.000000 4.328149 4.105747 \n",
"1991 0.000000 0.000000 4.341844 4.234192 \n",
"1992 0.000000 0.000000 4.356841 4.252427 \n",
"1993 0.000000 0.000000 4.371994 4.279793 \n",
"1994 0.000000 0.000000 4.355769 4.293296 \n",
"1995 0.000000 0.000000 4.380192 4.313783 \n",
"1996 0.000000 0.000000 4.416342 4.262548 \n",
"1997 0.000000 0.000000 4.452652 4.290598 \n",
"1998 0.144128 0.420000 4.485765 4.272000 \n",
"1999 0.251248 1.334661 4.507488 4.215139 \n",
"2000 0.304795 0.449219 4.619863 4.253906 \n",
"2001 0.187017 0.443182 4.761978 4.412879 \n",
"2002 0.137881 0.132867 4.920174 4.367133 \n",
"2003 0.127517 0.000000 5.154362 4.307692 \n",
"2004 0.000000 0.000000 5.229730 4.308442 \n",
"2005 0.000000 0.000000 5.362801 4.688136 \n",
"2006 0.000000 0.000000 5.492178 4.776557 \n",
"2007 0.000000 0.000000 5.864508 4.883562 \n",
"2008 0.109529 0.000000 5.969332 5.120438 \n",
"2009 0.000000 0.000000 6.189636 5.563177 \n",
"2010 0.000000 0.000000 6.378995 5.866953 \n",
"2011 0.340302 0.000000 6.714286 6.066914 \n",
"2012 0.634202 1.267658 6.834655 6.286245 \n",
"2013 0.853476 2.763052 7.033155 6.385542 \n",
"2014 0.859312 3.683544 7.210526 6.046414 \n",
"2015 0.638235 8.296578 7.211765 6.353612 \n",
"2016 1.017224 17.214545 7.296859 6.301818 \n",
"2017 1.243854 15.731884 7.474926 6.304348 \n",
"2018 1.135466 12.537367 7.391345 5.601423 \n",
"2019 2.581801 16.419014 7.545983 5.647887 \n",
"2020 2.446154 0.000000 7.743590 7.785714 "
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(autos2\n",
" .assign(country=autos2.make.apply(country))\n",
" .groupby(['year', 'country'])\n",
" .mean()\n",
" .unstack()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x7f16ad9528b0>"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"(autos2\n",
" .assign(country=autos2.make.apply(country))\n",
" .groupby(['year', 'country'])\n",
" .mean()\n",
" #.std()\n",
" .unstack()\n",
" .city08\n",
" .plot()\n",
" .legend(bbox_to_anchor=(1,1))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x7f167dd8a6d0>"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# smoothe it out a bit w/ rolling\n",
"(autos2\n",
" .assign(country=autos2.make.apply(country))\n",
" .groupby(['year', 'country'])\n",
" .mean()\n",
" .unstack()\n",
" .city08\n",
" .rolling(3)\n",
" .mean()\n",
" .plot()\n",
" .legend(bbox_to_anchor=(1,1))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"* Correct types save space and enable convenient math, string, and date functionality\n",
"* Chaining operations will:\n",
" * Make code readable\n",
" * Remove bugs\n",
" * Easier to debug\n",
"* Don't mutate (there's no point). Embrace chaining.\n",
"* ``.apply`` is slow for math\n",
"* Aggregations are powerful. Play with them until they make sense\n",
"\n",
"Follow me on Twitter ``@__mharrison__``\n",
"\n",
"Idiomatic Pandas Workshop coming up\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"jupytext": {
"encoding": "# -*- coding: utf-8 -*-",
"formats": "ipynb,py:light"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@theAfricanQuant
Copy link

image
I am getting even after downgrading my pandas to "pandas==1.2.3"

@theAfricanQuant
Copy link

theAfricanQuant commented Feb 20, 2022

(autos [cols] .query('cylinders.isnull()', engine='python') )

The code above (chained of course) worked for me after some stackoverflowing. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment