Skip to content

Instantly share code, notes, and snippets.

@ClebsonDantasUchoa
Last active August 28, 2018 23:30
Show Gist options
  • Save ClebsonDantasUchoa/d573ba194f48124b7575e7d023a246b6 to your computer and use it in GitHub Desktop.
Save ClebsonDantasUchoa/d573ba194f48124b7575e7d023a246b6 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prevendo preço de casas\n",
"### Muitas vezes, um conjunto de dados pode vir com informações faltantes ou trazer informações em formatos não interessantes para os algoritmos de Machine Learning. Para resolver isso, precisamos tratar os dados de forma correta antes de utilizá-los."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Importação de bibliotecas"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import pandas as pd\n",
"from sklearn import tree\n",
"from sklearn.preprocessing import Imputer"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('train.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>MSSubClass</th>\n",
" <th>MSZoning</th>\n",
" <th>LotFrontage</th>\n",
" <th>LotArea</th>\n",
" <th>Street</th>\n",
" <th>Alley</th>\n",
" <th>LotShape</th>\n",
" <th>LandContour</th>\n",
" <th>Utilities</th>\n",
" <th>...</th>\n",
" <th>PoolArea</th>\n",
" <th>PoolQC</th>\n",
" <th>Fence</th>\n",
" <th>MiscFeature</th>\n",
" <th>MiscVal</th>\n",
" <th>MoSold</th>\n",
" <th>YrSold</th>\n",
" <th>SaleType</th>\n",
" <th>SaleCondition</th>\n",
" <th>SalePrice</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>65.0</td>\n",
" <td>8450</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>208500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>80.0</td>\n",
" <td>9600</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>181500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>68.0</td>\n",
" <td>11250</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>223500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>70</td>\n",
" <td>RL</td>\n",
" <td>60.0</td>\n",
" <td>9550</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Abnorml</td>\n",
" <td>140000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>84.0</td>\n",
" <td>14260</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>50</td>\n",
" <td>RL</td>\n",
" <td>85.0</td>\n",
" <td>14115</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>MnPrv</td>\n",
" <td>Shed</td>\n",
" <td>700</td>\n",
" <td>10</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>143000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>75.0</td>\n",
" <td>10084</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>307000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>10382</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Shed</td>\n",
" <td>350</td>\n",
" <td>11</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>50</td>\n",
" <td>RM</td>\n",
" <td>51.0</td>\n",
" <td>6120</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Abnorml</td>\n",
" <td>129900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td>190</td>\n",
" <td>RL</td>\n",
" <td>50.0</td>\n",
" <td>7420</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>118000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10 rows × 81 columns</p>\n",
"</div>"
],
"text/plain": [
" Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n",
"0 1 60 RL 65.0 8450 Pave NaN Reg \n",
"1 2 20 RL 80.0 9600 Pave NaN Reg \n",
"2 3 60 RL 68.0 11250 Pave NaN IR1 \n",
"3 4 70 RL 60.0 9550 Pave NaN IR1 \n",
"4 5 60 RL 84.0 14260 Pave NaN IR1 \n",
"5 6 50 RL 85.0 14115 Pave NaN IR1 \n",
"6 7 20 RL 75.0 10084 Pave NaN Reg \n",
"7 8 60 RL NaN 10382 Pave NaN IR1 \n",
"8 9 50 RM 51.0 6120 Pave NaN Reg \n",
"9 10 190 RL 50.0 7420 Pave NaN Reg \n",
"\n",
" LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \\\n",
"0 Lvl AllPub ... 0 NaN NaN NaN 0 \n",
"1 Lvl AllPub ... 0 NaN NaN NaN 0 \n",
"2 Lvl AllPub ... 0 NaN NaN NaN 0 \n",
"3 Lvl AllPub ... 0 NaN NaN NaN 0 \n",
"4 Lvl AllPub ... 0 NaN NaN NaN 0 \n",
"5 Lvl AllPub ... 0 NaN MnPrv Shed 700 \n",
"6 Lvl AllPub ... 0 NaN NaN NaN 0 \n",
"7 Lvl AllPub ... 0 NaN NaN Shed 350 \n",
"8 Lvl AllPub ... 0 NaN NaN NaN 0 \n",
"9 Lvl AllPub ... 0 NaN NaN NaN 0 \n",
"\n",
" MoSold YrSold SaleType SaleCondition SalePrice \n",
"0 2 2008 WD Normal 208500 \n",
"1 5 2007 WD Normal 181500 \n",
"2 9 2008 WD Normal 223500 \n",
"3 2 2006 WD Abnorml 140000 \n",
"4 12 2008 WD Normal 250000 \n",
"5 10 2009 WD Normal 143000 \n",
"6 8 2007 WD Normal 307000 \n",
"7 11 2009 WD Normal 200000 \n",
"8 4 2008 WD Abnorml 129900 \n",
"9 1 2008 WD Normal 118000 \n",
"\n",
"[10 rows x 81 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Colunas com muitos elementos faltantes podem não ser interessantes para o resultado final"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>MSSubClass</th>\n",
" <th>MSZoning</th>\n",
" <th>LotFrontage</th>\n",
" <th>LotArea</th>\n",
" <th>Street</th>\n",
" <th>LotShape</th>\n",
" <th>LandContour</th>\n",
" <th>Utilities</th>\n",
" <th>LotConfig</th>\n",
" <th>...</th>\n",
" <th>EnclosedPorch</th>\n",
" <th>3SsnPorch</th>\n",
" <th>ScreenPorch</th>\n",
" <th>PoolArea</th>\n",
" <th>MiscVal</th>\n",
" <th>MoSold</th>\n",
" <th>YrSold</th>\n",
" <th>SaleType</th>\n",
" <th>SaleCondition</th>\n",
" <th>SalePrice</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>65.0</td>\n",
" <td>8450</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>208500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>80.0</td>\n",
" <td>9600</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>FR2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>181500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>68.0</td>\n",
" <td>11250</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>223500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>70</td>\n",
" <td>RL</td>\n",
" <td>60.0</td>\n",
" <td>9550</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Corner</td>\n",
" <td>...</td>\n",
" <td>272</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Abnorml</td>\n",
" <td>140000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>84.0</td>\n",
" <td>14260</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>FR2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>50</td>\n",
" <td>RL</td>\n",
" <td>85.0</td>\n",
" <td>14115</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>320</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>700</td>\n",
" <td>10</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>143000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>75.0</td>\n",
" <td>10084</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>307000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>10382</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Corner</td>\n",
" <td>...</td>\n",
" <td>228</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>350</td>\n",
" <td>11</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>50</td>\n",
" <td>RM</td>\n",
" <td>51.0</td>\n",
" <td>6120</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>205</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Abnorml</td>\n",
" <td>129900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td>190</td>\n",
" <td>RL</td>\n",
" <td>50.0</td>\n",
" <td>7420</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Corner</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>118000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>11</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>70.0</td>\n",
" <td>11200</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>129500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>12</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>85.0</td>\n",
" <td>11924</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>2006</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>345000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>13</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>12968</td>\n",
" <td>Pave</td>\n",
" <td>IR2</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>176</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>144000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>14</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>91.0</td>\n",
" <td>10652</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>2007</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>279500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>10920</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Corner</td>\n",
" <td>...</td>\n",
" <td>176</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>157000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>16</td>\n",
" <td>45</td>\n",
" <td>RM</td>\n",
" <td>51.0</td>\n",
" <td>6120</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Corner</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>132000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>17</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>11241</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>CulDSac</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>700</td>\n",
" <td>3</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>149000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>18</td>\n",
" <td>90</td>\n",
" <td>RL</td>\n",
" <td>72.0</td>\n",
" <td>10791</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>500</td>\n",
" <td>10</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>90000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>19</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>66.0</td>\n",
" <td>13695</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>159000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>20</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>70.0</td>\n",
" <td>7560</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2009</td>\n",
" <td>COD</td>\n",
" <td>Abnorml</td>\n",
" <td>139000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>21</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>101.0</td>\n",
" <td>14215</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Corner</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>2006</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>325300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>22</td>\n",
" <td>45</td>\n",
" <td>RM</td>\n",
" <td>57.0</td>\n",
" <td>7449</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Bnk</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>205</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>139400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>23</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>75.0</td>\n",
" <td>9742</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>230000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>24</td>\n",
" <td>120</td>\n",
" <td>RM</td>\n",
" <td>44.0</td>\n",
" <td>4224</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>129900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>25</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>8246</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>154000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>26</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>110.0</td>\n",
" <td>14230</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Corner</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>256300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>27</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>60.0</td>\n",
" <td>7200</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Corner</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>134800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>28</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>98.0</td>\n",
" <td>11478</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>306000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>29</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>47.0</td>\n",
" <td>16321</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>CulDSac</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>207500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>30</td>\n",
" <td>30</td>\n",
" <td>RM</td>\n",
" <td>60.0</td>\n",
" <td>6324</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>87</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>68500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1430</th>\n",
" <td>1431</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>60.0</td>\n",
" <td>21930</td>\n",
" <td>Pave</td>\n",
" <td>IR3</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>192140</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1431</th>\n",
" <td>1432</td>\n",
" <td>120</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>4928</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>10</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>143750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1432</th>\n",
" <td>1433</td>\n",
" <td>30</td>\n",
" <td>RL</td>\n",
" <td>60.0</td>\n",
" <td>10800</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>64500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1433</th>\n",
" <td>1434</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>93.0</td>\n",
" <td>10261</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>186500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1434</th>\n",
" <td>1435</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>80.0</td>\n",
" <td>17400</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Low</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>160000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1435</th>\n",
" <td>1436</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>80.0</td>\n",
" <td>8400</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>2008</td>\n",
" <td>COD</td>\n",
" <td>Abnorml</td>\n",
" <td>174000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1436</th>\n",
" <td>1437</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>60.0</td>\n",
" <td>9000</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>FR2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>120500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1437</th>\n",
" <td>1438</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>96.0</td>\n",
" <td>12444</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>FR2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>304</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>2008</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>394617</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1438</th>\n",
" <td>1439</td>\n",
" <td>20</td>\n",
" <td>RM</td>\n",
" <td>90.0</td>\n",
" <td>7407</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>158</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>149700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1439</th>\n",
" <td>1440</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>80.0</td>\n",
" <td>11584</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>216</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>197000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1440</th>\n",
" <td>1441</td>\n",
" <td>70</td>\n",
" <td>RL</td>\n",
" <td>79.0</td>\n",
" <td>11526</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Bnk</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>191000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1441</th>\n",
" <td>1442</td>\n",
" <td>120</td>\n",
" <td>RM</td>\n",
" <td>NaN</td>\n",
" <td>4426</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>149300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1442</th>\n",
" <td>1443</td>\n",
" <td>60</td>\n",
" <td>FV</td>\n",
" <td>85.0</td>\n",
" <td>11003</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>310000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1443</th>\n",
" <td>1444</td>\n",
" <td>30</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>8854</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>121000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1444</th>\n",
" <td>1445</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>63.0</td>\n",
" <td>8500</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>FR2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>179600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1445</th>\n",
" <td>1446</td>\n",
" <td>85</td>\n",
" <td>RL</td>\n",
" <td>70.0</td>\n",
" <td>8400</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>252</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>129000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1446</th>\n",
" <td>1447</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>26142</td>\n",
" <td>Pave</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>CulDSac</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>157900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1447</th>\n",
" <td>1448</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>80.0</td>\n",
" <td>10000</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>240000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1448</th>\n",
" <td>1449</td>\n",
" <td>50</td>\n",
" <td>RL</td>\n",
" <td>70.0</td>\n",
" <td>11767</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>112000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1449</th>\n",
" <td>1450</td>\n",
" <td>180</td>\n",
" <td>RM</td>\n",
" <td>21.0</td>\n",
" <td>1533</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Abnorml</td>\n",
" <td>92000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1450</th>\n",
" <td>1451</td>\n",
" <td>90</td>\n",
" <td>RL</td>\n",
" <td>60.0</td>\n",
" <td>9000</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>FR2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>136000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1451</th>\n",
" <td>1452</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>78.0</td>\n",
" <td>9262</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2009</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>287090</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1452</th>\n",
" <td>1453</td>\n",
" <td>180</td>\n",
" <td>RM</td>\n",
" <td>35.0</td>\n",
" <td>3675</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>145000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1453</th>\n",
" <td>1454</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>90.0</td>\n",
" <td>17217</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>2006</td>\n",
" <td>WD</td>\n",
" <td>Abnorml</td>\n",
" <td>84500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1454</th>\n",
" <td>1455</td>\n",
" <td>20</td>\n",
" <td>FV</td>\n",
" <td>62.0</td>\n",
" <td>7500</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>10</td>\n",
" <td>2009</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>185000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1455</th>\n",
" <td>1456</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>62.0</td>\n",
" <td>7917</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>175000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1456</th>\n",
" <td>1457</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>85.0</td>\n",
" <td>13175</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>210000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1457</th>\n",
" <td>1458</td>\n",
" <td>70</td>\n",
" <td>RL</td>\n",
" <td>66.0</td>\n",
" <td>9042</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2500</td>\n",
" <td>5</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>266500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1458</th>\n",
" <td>1459</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>68.0</td>\n",
" <td>9717</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>112</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>142125</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1459</th>\n",
" <td>1460</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>75.0</td>\n",
" <td>9937</td>\n",
" <td>Pave</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>Inside</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>147500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1460 rows × 77 columns</p>\n",
"</div>"
],
"text/plain": [
" Id MSSubClass MSZoning LotFrontage LotArea Street LotShape \\\n",
"0 1 60 RL 65.0 8450 Pave Reg \n",
"1 2 20 RL 80.0 9600 Pave Reg \n",
"2 3 60 RL 68.0 11250 Pave IR1 \n",
"3 4 70 RL 60.0 9550 Pave IR1 \n",
"4 5 60 RL 84.0 14260 Pave IR1 \n",
"5 6 50 RL 85.0 14115 Pave IR1 \n",
"6 7 20 RL 75.0 10084 Pave Reg \n",
"7 8 60 RL NaN 10382 Pave IR1 \n",
"8 9 50 RM 51.0 6120 Pave Reg \n",
"9 10 190 RL 50.0 7420 Pave Reg \n",
"10 11 20 RL 70.0 11200 Pave Reg \n",
"11 12 60 RL 85.0 11924 Pave IR1 \n",
"12 13 20 RL NaN 12968 Pave IR2 \n",
"13 14 20 RL 91.0 10652 Pave IR1 \n",
"14 15 20 RL NaN 10920 Pave IR1 \n",
"15 16 45 RM 51.0 6120 Pave Reg \n",
"16 17 20 RL NaN 11241 Pave IR1 \n",
"17 18 90 RL 72.0 10791 Pave Reg \n",
"18 19 20 RL 66.0 13695 Pave Reg \n",
"19 20 20 RL 70.0 7560 Pave Reg \n",
"20 21 60 RL 101.0 14215 Pave IR1 \n",
"21 22 45 RM 57.0 7449 Pave Reg \n",
"22 23 20 RL 75.0 9742 Pave Reg \n",
"23 24 120 RM 44.0 4224 Pave Reg \n",
"24 25 20 RL NaN 8246 Pave IR1 \n",
"25 26 20 RL 110.0 14230 Pave Reg \n",
"26 27 20 RL 60.0 7200 Pave Reg \n",
"27 28 20 RL 98.0 11478 Pave Reg \n",
"28 29 20 RL 47.0 16321 Pave IR1 \n",
"29 30 30 RM 60.0 6324 Pave IR1 \n",
"... ... ... ... ... ... ... ... \n",
"1430 1431 60 RL 60.0 21930 Pave IR3 \n",
"1431 1432 120 RL NaN 4928 Pave IR1 \n",
"1432 1433 30 RL 60.0 10800 Pave Reg \n",
"1433 1434 60 RL 93.0 10261 Pave IR1 \n",
"1434 1435 20 RL 80.0 17400 Pave Reg \n",
"1435 1436 20 RL 80.0 8400 Pave Reg \n",
"1436 1437 20 RL 60.0 9000 Pave Reg \n",
"1437 1438 20 RL 96.0 12444 Pave Reg \n",
"1438 1439 20 RM 90.0 7407 Pave Reg \n",
"1439 1440 60 RL 80.0 11584 Pave Reg \n",
"1440 1441 70 RL 79.0 11526 Pave IR1 \n",
"1441 1442 120 RM NaN 4426 Pave Reg \n",
"1442 1443 60 FV 85.0 11003 Pave Reg \n",
"1443 1444 30 RL NaN 8854 Pave Reg \n",
"1444 1445 20 RL 63.0 8500 Pave Reg \n",
"1445 1446 85 RL 70.0 8400 Pave Reg \n",
"1446 1447 20 RL NaN 26142 Pave IR1 \n",
"1447 1448 60 RL 80.0 10000 Pave Reg \n",
"1448 1449 50 RL 70.0 11767 Pave Reg \n",
"1449 1450 180 RM 21.0 1533 Pave Reg \n",
"1450 1451 90 RL 60.0 9000 Pave Reg \n",
"1451 1452 20 RL 78.0 9262 Pave Reg \n",
"1452 1453 180 RM 35.0 3675 Pave Reg \n",
"1453 1454 20 RL 90.0 17217 Pave Reg \n",
"1454 1455 20 FV 62.0 7500 Pave Reg \n",
"1455 1456 60 RL 62.0 7917 Pave Reg \n",
"1456 1457 20 RL 85.0 13175 Pave Reg \n",
"1457 1458 70 RL 66.0 9042 Pave Reg \n",
"1458 1459 20 RL 68.0 9717 Pave Reg \n",
"1459 1460 20 RL 75.0 9937 Pave Reg \n",
"\n",
" LandContour Utilities LotConfig ... EnclosedPorch 3SsnPorch \\\n",
"0 Lvl AllPub Inside ... 0 0 \n",
"1 Lvl AllPub FR2 ... 0 0 \n",
"2 Lvl AllPub Inside ... 0 0 \n",
"3 Lvl AllPub Corner ... 272 0 \n",
"4 Lvl AllPub FR2 ... 0 0 \n",
"5 Lvl AllPub Inside ... 0 320 \n",
"6 Lvl AllPub Inside ... 0 0 \n",
"7 Lvl AllPub Corner ... 228 0 \n",
"8 Lvl AllPub Inside ... 205 0 \n",
"9 Lvl AllPub Corner ... 0 0 \n",
"10 Lvl AllPub Inside ... 0 0 \n",
"11 Lvl AllPub Inside ... 0 0 \n",
"12 Lvl AllPub Inside ... 0 0 \n",
"13 Lvl AllPub Inside ... 0 0 \n",
"14 Lvl AllPub Corner ... 176 0 \n",
"15 Lvl AllPub Corner ... 0 0 \n",
"16 Lvl AllPub CulDSac ... 0 0 \n",
"17 Lvl AllPub Inside ... 0 0 \n",
"18 Lvl AllPub Inside ... 0 0 \n",
"19 Lvl AllPub Inside ... 0 0 \n",
"20 Lvl AllPub Corner ... 0 0 \n",
"21 Bnk AllPub Inside ... 205 0 \n",
"22 Lvl AllPub Inside ... 0 0 \n",
"23 Lvl AllPub Inside ... 0 0 \n",
"24 Lvl AllPub Inside ... 0 0 \n",
"25 Lvl AllPub Corner ... 0 0 \n",
"26 Lvl AllPub Corner ... 0 0 \n",
"27 Lvl AllPub Inside ... 0 0 \n",
"28 Lvl AllPub CulDSac ... 0 0 \n",
"29 Lvl AllPub Inside ... 87 0 \n",
"... ... ... ... ... ... ... \n",
"1430 Lvl AllPub Inside ... 0 0 \n",
"1431 Lvl AllPub Inside ... 0 0 \n",
"1432 Lvl AllPub Inside ... 0 0 \n",
"1433 Lvl AllPub Inside ... 0 0 \n",
"1434 Low AllPub Inside ... 0 0 \n",
"1435 Lvl AllPub Inside ... 0 0 \n",
"1436 Lvl AllPub FR2 ... 0 0 \n",
"1437 Lvl AllPub FR2 ... 0 304 \n",
"1438 Lvl AllPub Inside ... 158 0 \n",
"1439 Lvl AllPub Inside ... 216 0 \n",
"1440 Bnk AllPub Inside ... 0 0 \n",
"1441 Lvl AllPub Inside ... 0 0 \n",
"1442 Lvl AllPub Inside ... 0 0 \n",
"1443 Lvl AllPub Inside ... 0 0 \n",
"1444 Lvl AllPub FR2 ... 0 0 \n",
"1445 Lvl AllPub Inside ... 252 0 \n",
"1446 Lvl AllPub CulDSac ... 0 0 \n",
"1447 Lvl AllPub Inside ... 0 0 \n",
"1448 Lvl AllPub Inside ... 0 0 \n",
"1449 Lvl AllPub Inside ... 0 0 \n",
"1450 Lvl AllPub FR2 ... 0 0 \n",
"1451 Lvl AllPub Inside ... 0 0 \n",
"1452 Lvl AllPub Inside ... 0 0 \n",
"1453 Lvl AllPub Inside ... 0 0 \n",
"1454 Lvl AllPub Inside ... 0 0 \n",
"1455 Lvl AllPub Inside ... 0 0 \n",
"1456 Lvl AllPub Inside ... 0 0 \n",
"1457 Lvl AllPub Inside ... 0 0 \n",
"1458 Lvl AllPub Inside ... 112 0 \n",
"1459 Lvl AllPub Inside ... 0 0 \n",
"\n",
" ScreenPorch PoolArea MiscVal MoSold YrSold SaleType SaleCondition \\\n",
"0 0 0 0 2 2008 WD Normal \n",
"1 0 0 0 5 2007 WD Normal \n",
"2 0 0 0 9 2008 WD Normal \n",
"3 0 0 0 2 2006 WD Abnorml \n",
"4 0 0 0 12 2008 WD Normal \n",
"5 0 0 700 10 2009 WD Normal \n",
"6 0 0 0 8 2007 WD Normal \n",
"7 0 0 350 11 2009 WD Normal \n",
"8 0 0 0 4 2008 WD Abnorml \n",
"9 0 0 0 1 2008 WD Normal \n",
"10 0 0 0 2 2008 WD Normal \n",
"11 0 0 0 7 2006 New Partial \n",
"12 176 0 0 9 2008 WD Normal \n",
"13 0 0 0 8 2007 New Partial \n",
"14 0 0 0 5 2008 WD Normal \n",
"15 0 0 0 7 2007 WD Normal \n",
"16 0 0 700 3 2010 WD Normal \n",
"17 0 0 500 10 2006 WD Normal \n",
"18 0 0 0 6 2008 WD Normal \n",
"19 0 0 0 5 2009 COD Abnorml \n",
"20 0 0 0 11 2006 New Partial \n",
"21 0 0 0 6 2007 WD Normal \n",
"22 0 0 0 9 2008 WD Normal \n",
"23 0 0 0 6 2007 WD Normal \n",
"24 0 0 0 5 2010 WD Normal \n",
"25 0 0 0 7 2009 WD Normal \n",
"26 0 0 0 5 2010 WD Normal \n",
"27 0 0 0 5 2010 WD Normal \n",
"28 0 0 0 12 2006 WD Normal \n",
"29 0 0 0 5 2008 WD Normal \n",
"... ... ... ... ... ... ... ... \n",
"1430 0 0 0 7 2006 WD Normal \n",
"1431 0 0 0 10 2009 WD Normal \n",
"1432 0 0 0 8 2007 WD Normal \n",
"1433 0 0 0 5 2008 WD Normal \n",
"1434 0 0 0 5 2006 WD Normal \n",
"1435 0 0 0 7 2008 COD Abnorml \n",
"1436 0 0 0 5 2007 WD Normal \n",
"1437 0 0 0 11 2008 New Partial \n",
"1438 0 0 0 4 2010 WD Normal \n",
"1439 0 0 0 11 2007 WD Normal \n",
"1440 0 0 0 9 2008 WD Normal \n",
"1441 0 0 0 5 2008 WD Normal \n",
"1442 0 0 0 4 2009 WD Normal \n",
"1443 40 0 0 5 2009 WD Normal \n",
"1444 0 0 0 11 2007 WD Normal \n",
"1445 0 0 0 5 2007 WD Normal \n",
"1446 0 0 0 4 2010 WD Normal \n",
"1447 0 0 0 12 2007 WD Normal \n",
"1448 0 0 0 5 2007 WD Normal \n",
"1449 0 0 0 8 2006 WD Abnorml \n",
"1450 0 0 0 9 2009 WD Normal \n",
"1451 0 0 0 5 2009 New Partial \n",
"1452 0 0 0 5 2006 WD Normal \n",
"1453 0 0 0 7 2006 WD Abnorml \n",
"1454 0 0 0 10 2009 WD Normal \n",
"1455 0 0 0 8 2007 WD Normal \n",
"1456 0 0 0 2 2010 WD Normal \n",
"1457 0 0 2500 5 2010 WD Normal \n",
"1458 0 0 0 4 2010 WD Normal \n",
"1459 0 0 0 6 2008 WD Normal \n",
"\n",
" SalePrice \n",
"0 208500 \n",
"1 181500 \n",
"2 223500 \n",
"3 140000 \n",
"4 250000 \n",
"5 143000 \n",
"6 307000 \n",
"7 200000 \n",
"8 129900 \n",
"9 118000 \n",
"10 129500 \n",
"11 345000 \n",
"12 144000 \n",
"13 279500 \n",
"14 157000 \n",
"15 132000 \n",
"16 149000 \n",
"17 90000 \n",
"18 159000 \n",
"19 139000 \n",
"20 325300 \n",
"21 139400 \n",
"22 230000 \n",
"23 129900 \n",
"24 154000 \n",
"25 256300 \n",
"26 134800 \n",
"27 306000 \n",
"28 207500 \n",
"29 68500 \n",
"... ... \n",
"1430 192140 \n",
"1431 143750 \n",
"1432 64500 \n",
"1433 186500 \n",
"1434 160000 \n",
"1435 174000 \n",
"1436 120500 \n",
"1437 394617 \n",
"1438 149700 \n",
"1439 197000 \n",
"1440 191000 \n",
"1441 149300 \n",
"1442 310000 \n",
"1443 121000 \n",
"1444 179600 \n",
"1445 129000 \n",
"1446 157900 \n",
"1447 240000 \n",
"1448 112000 \n",
"1449 92000 \n",
"1450 136000 \n",
"1451 287090 \n",
"1452 145000 \n",
"1453 84500 \n",
"1454 185000 \n",
"1455 175000 \n",
"1456 210000 \n",
"1457 266500 \n",
"1458 142125 \n",
"1459 147500 \n",
"\n",
"[1460 rows x 77 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dropna(axis=1,thresh=1460 * 0.25) #remove colunas com menos de 25% de valores reais."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Alguns elementos das colunas são Strings, algoritmos de Machine Learning em geral não aceitam Strings, precisamos tratar isso transformando-as em números."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>MSSubClass</th>\n",
" <th>LotFrontage</th>\n",
" <th>LotArea</th>\n",
" <th>OverallQual</th>\n",
" <th>OverallCond</th>\n",
" <th>YearBuilt</th>\n",
" <th>YearRemodAdd</th>\n",
" <th>MasVnrArea</th>\n",
" <th>BsmtFinSF1</th>\n",
" <th>...</th>\n",
" <th>SaleType_ConLI</th>\n",
" <th>SaleType_ConLw</th>\n",
" <th>SaleType_New</th>\n",
" <th>SaleType_Oth</th>\n",
" <th>SaleType_WD</th>\n",
" <th>SaleCondition_AdjLand</th>\n",
" <th>SaleCondition_Alloca</th>\n",
" <th>SaleCondition_Family</th>\n",
" <th>SaleCondition_Normal</th>\n",
" <th>SaleCondition_Partial</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>60</td>\n",
" <td>65.0</td>\n",
" <td>8450</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>2003</td>\n",
" <td>2003</td>\n",
" <td>196.0</td>\n",
" <td>706</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>20</td>\n",
" <td>80.0</td>\n",
" <td>9600</td>\n",
" <td>6</td>\n",
" <td>8</td>\n",
" <td>1976</td>\n",
" <td>1976</td>\n",
" <td>0.0</td>\n",
" <td>978</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>60</td>\n",
" <td>68.0</td>\n",
" <td>11250</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>2001</td>\n",
" <td>2002</td>\n",
" <td>162.0</td>\n",
" <td>486</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>70</td>\n",
" <td>60.0</td>\n",
" <td>9550</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>1915</td>\n",
" <td>1970</td>\n",
" <td>0.0</td>\n",
" <td>216</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>60</td>\n",
" <td>84.0</td>\n",
" <td>14260</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2000</td>\n",
" <td>2000</td>\n",
" <td>350.0</td>\n",
" <td>655</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>50</td>\n",
" <td>85.0</td>\n",
" <td>14115</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>1993</td>\n",
" <td>1995</td>\n",
" <td>0.0</td>\n",
" <td>732</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>20</td>\n",
" <td>75.0</td>\n",
" <td>10084</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2004</td>\n",
" <td>2005</td>\n",
" <td>186.0</td>\n",
" <td>1369</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>60</td>\n",
" <td>NaN</td>\n",
" <td>10382</td>\n",
" <td>7</td>\n",
" <td>6</td>\n",
" <td>1973</td>\n",
" <td>1973</td>\n",
" <td>240.0</td>\n",
" <td>859</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>50</td>\n",
" <td>51.0</td>\n",
" <td>6120</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>1931</td>\n",
" <td>1950</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td>190</td>\n",
" <td>50.0</td>\n",
" <td>7420</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1939</td>\n",
" <td>1950</td>\n",
" <td>0.0</td>\n",
" <td>851</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10 rows × 247 columns</p>\n",
"</div>"
],
"text/plain": [
" Id MSSubClass LotFrontage LotArea OverallQual OverallCond YearBuilt \\\n",
"0 1 60 65.0 8450 7 5 2003 \n",
"1 2 20 80.0 9600 6 8 1976 \n",
"2 3 60 68.0 11250 7 5 2001 \n",
"3 4 70 60.0 9550 7 5 1915 \n",
"4 5 60 84.0 14260 8 5 2000 \n",
"5 6 50 85.0 14115 5 5 1993 \n",
"6 7 20 75.0 10084 8 5 2004 \n",
"7 8 60 NaN 10382 7 6 1973 \n",
"8 9 50 51.0 6120 7 5 1931 \n",
"9 10 190 50.0 7420 5 6 1939 \n",
"\n",
" YearRemodAdd MasVnrArea BsmtFinSF1 ... \\\n",
"0 2003 196.0 706 ... \n",
"1 1976 0.0 978 ... \n",
"2 2002 162.0 486 ... \n",
"3 1970 0.0 216 ... \n",
"4 2000 350.0 655 ... \n",
"5 1995 0.0 732 ... \n",
"6 2005 186.0 1369 ... \n",
"7 1973 240.0 859 ... \n",
"8 1950 0.0 0 ... \n",
"9 1950 0.0 851 ... \n",
"\n",
" SaleType_ConLI SaleType_ConLw SaleType_New SaleType_Oth SaleType_WD \\\n",
"0 0 0 0 0 1 \n",
"1 0 0 0 0 1 \n",
"2 0 0 0 0 1 \n",
"3 0 0 0 0 1 \n",
"4 0 0 0 0 1 \n",
"5 0 0 0 0 1 \n",
"6 0 0 0 0 1 \n",
"7 0 0 0 0 1 \n",
"8 0 0 0 0 1 \n",
"9 0 0 0 0 1 \n",
"\n",
" SaleCondition_AdjLand SaleCondition_Alloca SaleCondition_Family \\\n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"5 0 0 0 \n",
"6 0 0 0 \n",
"7 0 0 0 \n",
"8 0 0 0 \n",
"9 0 0 0 \n",
"\n",
" SaleCondition_Normal SaleCondition_Partial \n",
"0 1 0 \n",
"1 1 0 \n",
"2 1 0 \n",
"3 0 0 \n",
"4 1 0 \n",
"5 1 0 \n",
"6 1 0 \n",
"7 1 0 \n",
"8 0 0 \n",
"9 1 0 \n",
"\n",
"[10 rows x 247 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.get_dummies(df, drop_first = True) \n",
"# O \"drop_first\" apaga sempre a ultima coluna criada a partir das Strings\n",
"# Perceba que o número de colunas aumentou de 89 para 247.\n",
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Para verificarmos a quantidade de dados faltantes em cada coluna, usamos:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Id 0\n",
"MSSubClass 0\n",
"LotFrontage 259\n",
"LotArea 0\n",
"OverallQual 0\n",
"OverallCond 0\n",
"YearBuilt 0\n",
"YearRemodAdd 0\n",
"MasVnrArea 8\n",
"BsmtFinSF1 0\n",
"BsmtFinSF2 0\n",
"BsmtUnfSF 0\n",
"TotalBsmtSF 0\n",
"1stFlrSF 0\n",
"2ndFlrSF 0\n",
"LowQualFinSF 0\n",
"GrLivArea 0\n",
"BsmtFullBath 0\n",
"BsmtHalfBath 0\n",
"FullBath 0\n",
"HalfBath 0\n",
"BedroomAbvGr 0\n",
"KitchenAbvGr 0\n",
"TotRmsAbvGrd 0\n",
"Fireplaces 0\n",
"GarageYrBlt 81\n",
"GarageCars 0\n",
"GarageArea 0\n",
"WoodDeckSF 0\n",
"OpenPorchSF 0\n",
" ... \n",
"GarageQual_Gd 0\n",
"GarageQual_Po 0\n",
"GarageQual_TA 0\n",
"GarageCond_Fa 0\n",
"GarageCond_Gd 0\n",
"GarageCond_Po 0\n",
"GarageCond_TA 0\n",
"PavedDrive_P 0\n",
"PavedDrive_Y 0\n",
"PoolQC_Fa 0\n",
"PoolQC_Gd 0\n",
"Fence_GdWo 0\n",
"Fence_MnPrv 0\n",
"Fence_MnWw 0\n",
"MiscFeature_Othr 0\n",
"MiscFeature_Shed 0\n",
"MiscFeature_TenC 0\n",
"SaleType_CWD 0\n",
"SaleType_Con 0\n",
"SaleType_ConLD 0\n",
"SaleType_ConLI 0\n",
"SaleType_ConLw 0\n",
"SaleType_New 0\n",
"SaleType_Oth 0\n",
"SaleType_WD 0\n",
"SaleCondition_AdjLand 0\n",
"SaleCondition_Alloca 0\n",
"SaleCondition_Family 0\n",
"SaleCondition_Normal 0\n",
"SaleCondition_Partial 0\n",
"Length: 247, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## O algoritmo de Machine Learning que iremos utilizar não aceita dados faltantes, então precisamos tratar esses dados.\n",
"### Existem algumas alternativas, como:\n",
" df = df.fillna(0) - Que substitui todos os valores faltantes por zero, o que não é aconselhável em alguns casos, pois se estivessemos trabalhando com dados sobre salário, não deveriamos substituir os salários faltantes com \"zeros\".\n",
" \n",
" df.dropna(axis=1) - Que remove todas as colunas com elementos faltantes, o que não é aconselhável no nosso caso, pois temos muitas colunas com elementos faltantes, o que significaria uma grande perda de dados.\n",
" \n",
" df = Imputer(missing_values='NaN', strategy='mean', axis=0) - Que substitui os valores faltantes pela média dos outros valores da coluna. (O que nós usaremos )\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>MSSubClass</th>\n",
" <th>LotFrontage</th>\n",
" <th>LotArea</th>\n",
" <th>OverallQual</th>\n",
" <th>OverallCond</th>\n",
" <th>YearBuilt</th>\n",
" <th>YearRemodAdd</th>\n",
" <th>MasVnrArea</th>\n",
" <th>BsmtFinSF1</th>\n",
" <th>...</th>\n",
" <th>SaleType_ConLI</th>\n",
" <th>SaleType_ConLw</th>\n",
" <th>SaleType_New</th>\n",
" <th>SaleType_Oth</th>\n",
" <th>SaleType_WD</th>\n",
" <th>SaleCondition_AdjLand</th>\n",
" <th>SaleCondition_Alloca</th>\n",
" <th>SaleCondition_Family</th>\n",
" <th>SaleCondition_Normal</th>\n",
" <th>SaleCondition_Partial</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>60</td>\n",
" <td>65</td>\n",
" <td>8450</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>2003</td>\n",
" <td>2003</td>\n",
" <td>196</td>\n",
" <td>706</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>20</td>\n",
" <td>80</td>\n",
" <td>9600</td>\n",
" <td>6</td>\n",
" <td>8</td>\n",
" <td>1976</td>\n",
" <td>1976</td>\n",
" <td>0</td>\n",
" <td>978</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>60</td>\n",
" <td>68</td>\n",
" <td>11250</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>2001</td>\n",
" <td>2002</td>\n",
" <td>162</td>\n",
" <td>486</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>70</td>\n",
" <td>60</td>\n",
" <td>9550</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>1915</td>\n",
" <td>1970</td>\n",
" <td>0</td>\n",
" <td>216</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>60</td>\n",
" <td>84</td>\n",
" <td>14260</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2000</td>\n",
" <td>2000</td>\n",
" <td>350</td>\n",
" <td>655</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>50</td>\n",
" <td>85</td>\n",
" <td>14115</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>1993</td>\n",
" <td>1995</td>\n",
" <td>0</td>\n",
" <td>732</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>20</td>\n",
" <td>75</td>\n",
" <td>10084</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2004</td>\n",
" <td>2005</td>\n",
" <td>186</td>\n",
" <td>1369</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>60</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>10382</td>\n",
" <td>7</td>\n",
" <td>6</td>\n",
" <td>1973</td>\n",
" <td>1973</td>\n",
" <td>240</td>\n",
" <td>859</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>50</td>\n",
" <td>51</td>\n",
" <td>6120</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>1931</td>\n",
" <td>1950</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td>190</td>\n",
" <td>50</td>\n",
" <td>7420</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1939</td>\n",
" <td>1950</td>\n",
" <td>0</td>\n",
" <td>851</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>11</td>\n",
" <td>20</td>\n",
" <td>70</td>\n",
" <td>11200</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>1965</td>\n",
" <td>1965</td>\n",
" <td>0</td>\n",
" <td>906</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>12</td>\n",
" <td>60</td>\n",
" <td>85</td>\n",
" <td>11924</td>\n",
" <td>9</td>\n",
" <td>5</td>\n",
" <td>2005</td>\n",
" <td>2006</td>\n",
" <td>286</td>\n",
" <td>998</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>13</td>\n",
" <td>20</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>12968</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1962</td>\n",
" <td>1962</td>\n",
" <td>0</td>\n",
" <td>737</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>14</td>\n",
" <td>20</td>\n",
" <td>91</td>\n",
" <td>10652</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>2006</td>\n",
" <td>2007</td>\n",
" <td>306</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>10920</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>1960</td>\n",
" <td>1960</td>\n",
" <td>212</td>\n",
" <td>733</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>16</td>\n",
" <td>45</td>\n",
" <td>51</td>\n",
" <td>6120</td>\n",
" <td>7</td>\n",
" <td>8</td>\n",
" <td>1929</td>\n",
" <td>2001</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>17</td>\n",
" <td>20</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>11241</td>\n",
" <td>6</td>\n",
" <td>7</td>\n",
" <td>1970</td>\n",
" <td>1970</td>\n",
" <td>180</td>\n",
" <td>578</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>18</td>\n",
" <td>90</td>\n",
" <td>72</td>\n",
" <td>10791</td>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>1967</td>\n",
" <td>1967</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>19</td>\n",
" <td>20</td>\n",
" <td>66</td>\n",
" <td>13695</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>2004</td>\n",
" <td>2004</td>\n",
" <td>0</td>\n",
" <td>646</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>20</td>\n",
" <td>20</td>\n",
" <td>70</td>\n",
" <td>7560</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1958</td>\n",
" <td>1965</td>\n",
" <td>0</td>\n",
" <td>504</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>21</td>\n",
" <td>60</td>\n",
" <td>101</td>\n",
" <td>14215</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2005</td>\n",
" <td>2006</td>\n",
" <td>380</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>22</td>\n",
" <td>45</td>\n",
" <td>57</td>\n",
" <td>7449</td>\n",
" <td>7</td>\n",
" <td>7</td>\n",
" <td>1930</td>\n",
" <td>1950</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>23</td>\n",
" <td>20</td>\n",
" <td>75</td>\n",
" <td>9742</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2002</td>\n",
" <td>2002</td>\n",
" <td>281</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>24</td>\n",
" <td>120</td>\n",
" <td>44</td>\n",
" <td>4224</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>1976</td>\n",
" <td>1976</td>\n",
" <td>0</td>\n",
" <td>840</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>25</td>\n",
" <td>20</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>8246</td>\n",
" <td>5</td>\n",
" <td>8</td>\n",
" <td>1968</td>\n",
" <td>2001</td>\n",
" <td>0</td>\n",
" <td>188</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>26</td>\n",
" <td>20</td>\n",
" <td>110</td>\n",
" <td>14230</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2007</td>\n",
" <td>2007</td>\n",
" <td>640</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>27</td>\n",
" <td>20</td>\n",
" <td>60</td>\n",
" <td>7200</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>1951</td>\n",
" <td>2000</td>\n",
" <td>0</td>\n",
" <td>234</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>28</td>\n",
" <td>20</td>\n",
" <td>98</td>\n",
" <td>11478</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2007</td>\n",
" <td>2008</td>\n",
" <td>200</td>\n",
" <td>1218</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>29</td>\n",
" <td>20</td>\n",
" <td>47</td>\n",
" <td>16321</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1957</td>\n",
" <td>1997</td>\n",
" <td>0</td>\n",
" <td>1277</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>30</td>\n",
" <td>30</td>\n",
" <td>60</td>\n",
" <td>6324</td>\n",
" <td>4</td>\n",
" <td>6</td>\n",
" <td>1927</td>\n",
" <td>1950</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1430</th>\n",
" <td>1431</td>\n",
" <td>60</td>\n",
" <td>60</td>\n",
" <td>21930</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>2005</td>\n",
" <td>2005</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1431</th>\n",
" <td>1432</td>\n",
" <td>120</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>4928</td>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>1976</td>\n",
" <td>1976</td>\n",
" <td>0</td>\n",
" <td>958</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1432</th>\n",
" <td>1433</td>\n",
" <td>30</td>\n",
" <td>60</td>\n",
" <td>10800</td>\n",
" <td>4</td>\n",
" <td>6</td>\n",
" <td>1927</td>\n",
" <td>2007</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1433</th>\n",
" <td>1434</td>\n",
" <td>60</td>\n",
" <td>93</td>\n",
" <td>10261</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>2000</td>\n",
" <td>2000</td>\n",
" <td>318</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1434</th>\n",
" <td>1435</td>\n",
" <td>20</td>\n",
" <td>80</td>\n",
" <td>17400</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>1977</td>\n",
" <td>1977</td>\n",
" <td>0</td>\n",
" <td>936</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1435</th>\n",
" <td>1436</td>\n",
" <td>20</td>\n",
" <td>80</td>\n",
" <td>8400</td>\n",
" <td>6</td>\n",
" <td>9</td>\n",
" <td>1962</td>\n",
" <td>2005</td>\n",
" <td>237</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1436</th>\n",
" <td>1437</td>\n",
" <td>20</td>\n",
" <td>60</td>\n",
" <td>9000</td>\n",
" <td>4</td>\n",
" <td>6</td>\n",
" <td>1971</td>\n",
" <td>1971</td>\n",
" <td>0</td>\n",
" <td>616</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1437</th>\n",
" <td>1438</td>\n",
" <td>20</td>\n",
" <td>96</td>\n",
" <td>12444</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2008</td>\n",
" <td>2008</td>\n",
" <td>426</td>\n",
" <td>1336</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1438</th>\n",
" <td>1439</td>\n",
" <td>20</td>\n",
" <td>90</td>\n",
" <td>7407</td>\n",
" <td>6</td>\n",
" <td>7</td>\n",
" <td>1957</td>\n",
" <td>1996</td>\n",
" <td>0</td>\n",
" <td>600</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1439</th>\n",
" <td>1440</td>\n",
" <td>60</td>\n",
" <td>80</td>\n",
" <td>11584</td>\n",
" <td>7</td>\n",
" <td>6</td>\n",
" <td>1979</td>\n",
" <td>1979</td>\n",
" <td>96</td>\n",
" <td>315</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1440</th>\n",
" <td>1441</td>\n",
" <td>70</td>\n",
" <td>79</td>\n",
" <td>11526</td>\n",
" <td>6</td>\n",
" <td>7</td>\n",
" <td>1922</td>\n",
" <td>1994</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1441</th>\n",
" <td>1442</td>\n",
" <td>120</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>4426</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>2004</td>\n",
" <td>2004</td>\n",
" <td>147</td>\n",
" <td>697</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1442</th>\n",
" <td>1443</td>\n",
" <td>60</td>\n",
" <td>85</td>\n",
" <td>11003</td>\n",
" <td>10</td>\n",
" <td>5</td>\n",
" <td>2008</td>\n",
" <td>2008</td>\n",
" <td>160</td>\n",
" <td>765</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1443</th>\n",
" <td>1444</td>\n",
" <td>30</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>8854</td>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>1916</td>\n",
" <td>1950</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1444</th>\n",
" <td>1445</td>\n",
" <td>20</td>\n",
" <td>63</td>\n",
" <td>8500</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>2004</td>\n",
" <td>2004</td>\n",
" <td>106</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1445</th>\n",
" <td>1446</td>\n",
" <td>85</td>\n",
" <td>70</td>\n",
" <td>8400</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>1966</td>\n",
" <td>1966</td>\n",
" <td>0</td>\n",
" <td>187</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1446</th>\n",
" <td>1447</td>\n",
" <td>20</td>\n",
" <td>&lt;bound method DataFrame.mean of Id MS...</td>\n",
" <td>26142</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>1962</td>\n",
" <td>1962</td>\n",
" <td>189</td>\n",
" <td>593</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1447</th>\n",
" <td>1448</td>\n",
" <td>60</td>\n",
" <td>80</td>\n",
" <td>10000</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>1995</td>\n",
" <td>1996</td>\n",
" <td>438</td>\n",
" <td>1079</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1448</th>\n",
" <td>1449</td>\n",
" <td>50</td>\n",
" <td>70</td>\n",
" <td>11767</td>\n",
" <td>4</td>\n",
" <td>7</td>\n",
" <td>1910</td>\n",
" <td>2000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1449</th>\n",
" <td>1450</td>\n",
" <td>180</td>\n",
" <td>21</td>\n",
" <td>1533</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>1970</td>\n",
" <td>1970</td>\n",
" <td>0</td>\n",
" <td>553</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1450</th>\n",
" <td>1451</td>\n",
" <td>90</td>\n",
" <td>60</td>\n",
" <td>9000</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>1974</td>\n",
" <td>1974</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1451</th>\n",
" <td>1452</td>\n",
" <td>20</td>\n",
" <td>78</td>\n",
" <td>9262</td>\n",
" <td>8</td>\n",
" <td>5</td>\n",
" <td>2008</td>\n",
" <td>2009</td>\n",
" <td>194</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1452</th>\n",
" <td>1453</td>\n",
" <td>180</td>\n",
" <td>35</td>\n",
" <td>3675</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>2005</td>\n",
" <td>2005</td>\n",
" <td>80</td>\n",
" <td>547</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1453</th>\n",
" <td>1454</td>\n",
" <td>20</td>\n",
" <td>90</td>\n",
" <td>17217</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>2006</td>\n",
" <td>2006</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1454</th>\n",
" <td>1455</td>\n",
" <td>20</td>\n",
" <td>62</td>\n",
" <td>7500</td>\n",
" <td>7</td>\n",
" <td>5</td>\n",
" <td>2004</td>\n",
" <td>2005</td>\n",
" <td>0</td>\n",
" <td>410</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1455</th>\n",
" <td>1456</td>\n",
" <td>60</td>\n",
" <td>62</td>\n",
" <td>7917</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>1999</td>\n",
" <td>2000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1456</th>\n",
" <td>1457</td>\n",
" <td>20</td>\n",
" <td>85</td>\n",
" <td>13175</td>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>1978</td>\n",
" <td>1988</td>\n",
" <td>119</td>\n",
" <td>790</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1457</th>\n",
" <td>1458</td>\n",
" <td>70</td>\n",
" <td>66</td>\n",
" <td>9042</td>\n",
" <td>7</td>\n",
" <td>9</td>\n",
" <td>1941</td>\n",
" <td>2006</td>\n",
" <td>0</td>\n",
" <td>275</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1458</th>\n",
" <td>1459</td>\n",
" <td>20</td>\n",
" <td>68</td>\n",
" <td>9717</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1950</td>\n",
" <td>1996</td>\n",
" <td>0</td>\n",
" <td>49</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1459</th>\n",
" <td>1460</td>\n",
" <td>20</td>\n",
" <td>75</td>\n",
" <td>9937</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1965</td>\n",
" <td>1965</td>\n",
" <td>0</td>\n",
" <td>830</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1460 rows × 247 columns</p>\n",
"</div>"
],
"text/plain": [
" Id MSSubClass LotFrontage \\\n",
"0 1 60 65 \n",
"1 2 20 80 \n",
"2 3 60 68 \n",
"3 4 70 60 \n",
"4 5 60 84 \n",
"5 6 50 85 \n",
"6 7 20 75 \n",
"7 8 60 <bound method DataFrame.mean of Id MS... \n",
"8 9 50 51 \n",
"9 10 190 50 \n",
"10 11 20 70 \n",
"11 12 60 85 \n",
"12 13 20 <bound method DataFrame.mean of Id MS... \n",
"13 14 20 91 \n",
"14 15 20 <bound method DataFrame.mean of Id MS... \n",
"15 16 45 51 \n",
"16 17 20 <bound method DataFrame.mean of Id MS... \n",
"17 18 90 72 \n",
"18 19 20 66 \n",
"19 20 20 70 \n",
"20 21 60 101 \n",
"21 22 45 57 \n",
"22 23 20 75 \n",
"23 24 120 44 \n",
"24 25 20 <bound method DataFrame.mean of Id MS... \n",
"25 26 20 110 \n",
"26 27 20 60 \n",
"27 28 20 98 \n",
"28 29 20 47 \n",
"29 30 30 60 \n",
"... ... ... ... \n",
"1430 1431 60 60 \n",
"1431 1432 120 <bound method DataFrame.mean of Id MS... \n",
"1432 1433 30 60 \n",
"1433 1434 60 93 \n",
"1434 1435 20 80 \n",
"1435 1436 20 80 \n",
"1436 1437 20 60 \n",
"1437 1438 20 96 \n",
"1438 1439 20 90 \n",
"1439 1440 60 80 \n",
"1440 1441 70 79 \n",
"1441 1442 120 <bound method DataFrame.mean of Id MS... \n",
"1442 1443 60 85 \n",
"1443 1444 30 <bound method DataFrame.mean of Id MS... \n",
"1444 1445 20 63 \n",
"1445 1446 85 70 \n",
"1446 1447 20 <bound method DataFrame.mean of Id MS... \n",
"1447 1448 60 80 \n",
"1448 1449 50 70 \n",
"1449 1450 180 21 \n",
"1450 1451 90 60 \n",
"1451 1452 20 78 \n",
"1452 1453 180 35 \n",
"1453 1454 20 90 \n",
"1454 1455 20 62 \n",
"1455 1456 60 62 \n",
"1456 1457 20 85 \n",
"1457 1458 70 66 \n",
"1458 1459 20 68 \n",
"1459 1460 20 75 \n",
"\n",
" LotArea OverallQual OverallCond YearBuilt YearRemodAdd MasVnrArea \\\n",
"0 8450 7 5 2003 2003 196 \n",
"1 9600 6 8 1976 1976 0 \n",
"2 11250 7 5 2001 2002 162 \n",
"3 9550 7 5 1915 1970 0 \n",
"4 14260 8 5 2000 2000 350 \n",
"5 14115 5 5 1993 1995 0 \n",
"6 10084 8 5 2004 2005 186 \n",
"7 10382 7 6 1973 1973 240 \n",
"8 6120 7 5 1931 1950 0 \n",
"9 7420 5 6 1939 1950 0 \n",
"10 11200 5 5 1965 1965 0 \n",
"11 11924 9 5 2005 2006 286 \n",
"12 12968 5 6 1962 1962 0 \n",
"13 10652 7 5 2006 2007 306 \n",
"14 10920 6 5 1960 1960 212 \n",
"15 6120 7 8 1929 2001 0 \n",
"16 11241 6 7 1970 1970 180 \n",
"17 10791 4 5 1967 1967 0 \n",
"18 13695 5 5 2004 2004 0 \n",
"19 7560 5 6 1958 1965 0 \n",
"20 14215 8 5 2005 2006 380 \n",
"21 7449 7 7 1930 1950 0 \n",
"22 9742 8 5 2002 2002 281 \n",
"23 4224 5 7 1976 1976 0 \n",
"24 8246 5 8 1968 2001 0 \n",
"25 14230 8 5 2007 2007 640 \n",
"26 7200 5 7 1951 2000 0 \n",
"27 11478 8 5 2007 2008 200 \n",
"28 16321 5 6 1957 1997 0 \n",
"29 6324 4 6 1927 1950 0 \n",
"... ... ... ... ... ... ... \n",
"1430 21930 5 5 2005 2005 0 \n",
"1431 4928 6 6 1976 1976 0 \n",
"1432 10800 4 6 1927 2007 0 \n",
"1433 10261 6 5 2000 2000 318 \n",
"1434 17400 5 5 1977 1977 0 \n",
"1435 8400 6 9 1962 2005 237 \n",
"1436 9000 4 6 1971 1971 0 \n",
"1437 12444 8 5 2008 2008 426 \n",
"1438 7407 6 7 1957 1996 0 \n",
"1439 11584 7 6 1979 1979 96 \n",
"1440 11526 6 7 1922 1994 0 \n",
"1441 4426 6 5 2004 2004 147 \n",
"1442 11003 10 5 2008 2008 160 \n",
"1443 8854 6 6 1916 1950 0 \n",
"1444 8500 7 5 2004 2004 106 \n",
"1445 8400 6 5 1966 1966 0 \n",
"1446 26142 5 7 1962 1962 189 \n",
"1447 10000 8 5 1995 1996 438 \n",
"1448 11767 4 7 1910 2000 0 \n",
"1449 1533 5 7 1970 1970 0 \n",
"1450 9000 5 5 1974 1974 0 \n",
"1451 9262 8 5 2008 2009 194 \n",
"1452 3675 5 5 2005 2005 80 \n",
"1453 17217 5 5 2006 2006 0 \n",
"1454 7500 7 5 2004 2005 0 \n",
"1455 7917 6 5 1999 2000 0 \n",
"1456 13175 6 6 1978 1988 119 \n",
"1457 9042 7 9 1941 2006 0 \n",
"1458 9717 5 6 1950 1996 0 \n",
"1459 9937 5 6 1965 1965 0 \n",
"\n",
" BsmtFinSF1 ... SaleType_ConLI SaleType_ConLw \\\n",
"0 706 ... 0 0 \n",
"1 978 ... 0 0 \n",
"2 486 ... 0 0 \n",
"3 216 ... 0 0 \n",
"4 655 ... 0 0 \n",
"5 732 ... 0 0 \n",
"6 1369 ... 0 0 \n",
"7 859 ... 0 0 \n",
"8 0 ... 0 0 \n",
"9 851 ... 0 0 \n",
"10 906 ... 0 0 \n",
"11 998 ... 0 0 \n",
"12 737 ... 0 0 \n",
"13 0 ... 0 0 \n",
"14 733 ... 0 0 \n",
"15 0 ... 0 0 \n",
"16 578 ... 0 0 \n",
"17 0 ... 0 0 \n",
"18 646 ... 0 0 \n",
"19 504 ... 0 0 \n",
"20 0 ... 0 0 \n",
"21 0 ... 0 0 \n",
"22 0 ... 0 0 \n",
"23 840 ... 0 0 \n",
"24 188 ... 0 0 \n",
"25 0 ... 0 0 \n",
"26 234 ... 0 0 \n",
"27 1218 ... 0 0 \n",
"28 1277 ... 0 0 \n",
"29 0 ... 0 0 \n",
"... ... ... ... ... \n",
"1430 0 ... 0 0 \n",
"1431 958 ... 0 0 \n",
"1432 0 ... 0 0 \n",
"1433 0 ... 0 0 \n",
"1434 936 ... 0 0 \n",
"1435 0 ... 0 0 \n",
"1436 616 ... 0 0 \n",
"1437 1336 ... 0 0 \n",
"1438 600 ... 0 0 \n",
"1439 315 ... 0 0 \n",
"1440 0 ... 0 0 \n",
"1441 697 ... 0 0 \n",
"1442 765 ... 0 0 \n",
"1443 0 ... 0 0 \n",
"1444 0 ... 0 0 \n",
"1445 187 ... 0 0 \n",
"1446 593 ... 0 0 \n",
"1447 1079 ... 0 0 \n",
"1448 0 ... 0 0 \n",
"1449 553 ... 0 0 \n",
"1450 0 ... 0 0 \n",
"1451 0 ... 0 0 \n",
"1452 547 ... 0 0 \n",
"1453 0 ... 0 0 \n",
"1454 410 ... 0 0 \n",
"1455 0 ... 0 0 \n",
"1456 790 ... 0 0 \n",
"1457 275 ... 0 0 \n",
"1458 49 ... 0 0 \n",
"1459 830 ... 0 0 \n",
"\n",
" SaleType_New SaleType_Oth SaleType_WD SaleCondition_AdjLand \\\n",
"0 0 0 1 0 \n",
"1 0 0 1 0 \n",
"2 0 0 1 0 \n",
"3 0 0 1 0 \n",
"4 0 0 1 0 \n",
"5 0 0 1 0 \n",
"6 0 0 1 0 \n",
"7 0 0 1 0 \n",
"8 0 0 1 0 \n",
"9 0 0 1 0 \n",
"10 0 0 1 0 \n",
"11 1 0 0 0 \n",
"12 0 0 1 0 \n",
"13 1 0 0 0 \n",
"14 0 0 1 0 \n",
"15 0 0 1 0 \n",
"16 0 0 1 0 \n",
"17 0 0 1 0 \n",
"18 0 0 1 0 \n",
"19 0 0 0 0 \n",
"20 1 0 0 0 \n",
"21 0 0 1 0 \n",
"22 0 0 1 0 \n",
"23 0 0 1 0 \n",
"24 0 0 1 0 \n",
"25 0 0 1 0 \n",
"26 0 0 1 0 \n",
"27 0 0 1 0 \n",
"28 0 0 1 0 \n",
"29 0 0 1 0 \n",
"... ... ... ... ... \n",
"1430 0 0 1 0 \n",
"1431 0 0 1 0 \n",
"1432 0 0 1 0 \n",
"1433 0 0 1 0 \n",
"1434 0 0 1 0 \n",
"1435 0 0 0 0 \n",
"1436 0 0 1 0 \n",
"1437 1 0 0 0 \n",
"1438 0 0 1 0 \n",
"1439 0 0 1 0 \n",
"1440 0 0 1 0 \n",
"1441 0 0 1 0 \n",
"1442 0 0 1 0 \n",
"1443 0 0 1 0 \n",
"1444 0 0 1 0 \n",
"1445 0 0 1 0 \n",
"1446 0 0 1 0 \n",
"1447 0 0 1 0 \n",
"1448 0 0 1 0 \n",
"1449 0 0 1 0 \n",
"1450 0 0 1 0 \n",
"1451 1 0 0 0 \n",
"1452 0 0 1 0 \n",
"1453 0 0 1 0 \n",
"1454 0 0 1 0 \n",
"1455 0 0 1 0 \n",
"1456 0 0 1 0 \n",
"1457 0 0 1 0 \n",
"1458 0 0 1 0 \n",
"1459 0 0 1 0 \n",
"\n",
" SaleCondition_Alloca SaleCondition_Family SaleCondition_Normal \\\n",
"0 0 0 1 \n",
"1 0 0 1 \n",
"2 0 0 1 \n",
"3 0 0 0 \n",
"4 0 0 1 \n",
"5 0 0 1 \n",
"6 0 0 1 \n",
"7 0 0 1 \n",
"8 0 0 0 \n",
"9 0 0 1 \n",
"10 0 0 1 \n",
"11 0 0 0 \n",
"12 0 0 1 \n",
"13 0 0 0 \n",
"14 0 0 1 \n",
"15 0 0 1 \n",
"16 0 0 1 \n",
"17 0 0 1 \n",
"18 0 0 1 \n",
"19 0 0 0 \n",
"20 0 0 0 \n",
"21 0 0 1 \n",
"22 0 0 1 \n",
"23 0 0 1 \n",
"24 0 0 1 \n",
"25 0 0 1 \n",
"26 0 0 1 \n",
"27 0 0 1 \n",
"28 0 0 1 \n",
"29 0 0 1 \n",
"... ... ... ... \n",
"1430 0 0 1 \n",
"1431 0 0 1 \n",
"1432 0 0 1 \n",
"1433 0 0 1 \n",
"1434 0 0 1 \n",
"1435 0 0 0 \n",
"1436 0 0 1 \n",
"1437 0 0 0 \n",
"1438 0 0 1 \n",
"1439 0 0 1 \n",
"1440 0 0 1 \n",
"1441 0 0 1 \n",
"1442 0 0 1 \n",
"1443 0 0 1 \n",
"1444 0 0 1 \n",
"1445 0 0 1 \n",
"1446 0 0 1 \n",
"1447 0 0 1 \n",
"1448 0 0 1 \n",
"1449 0 0 0 \n",
"1450 0 0 1 \n",
"1451 0 0 0 \n",
"1452 0 0 1 \n",
"1453 0 0 0 \n",
"1454 0 0 1 \n",
"1455 0 0 1 \n",
"1456 0 0 1 \n",
"1457 0 0 1 \n",
"1458 0 0 1 \n",
"1459 0 0 1 \n",
"\n",
" SaleCondition_Partial \n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"5 0 \n",
"6 0 \n",
"7 0 \n",
"8 0 \n",
"9 0 \n",
"10 0 \n",
"11 1 \n",
"12 0 \n",
"13 1 \n",
"14 0 \n",
"15 0 \n",
"16 0 \n",
"17 0 \n",
"18 0 \n",
"19 0 \n",
"20 1 \n",
"21 0 \n",
"22 0 \n",
"23 0 \n",
"24 0 \n",
"25 0 \n",
"26 0 \n",
"27 0 \n",
"28 0 \n",
"29 0 \n",
"... ... \n",
"1430 0 \n",
"1431 0 \n",
"1432 0 \n",
"1433 0 \n",
"1434 0 \n",
"1435 0 \n",
"1436 0 \n",
"1437 1 \n",
"1438 0 \n",
"1439 0 \n",
"1440 0 \n",
"1441 0 \n",
"1442 0 \n",
"1443 0 \n",
"1444 0 \n",
"1445 0 \n",
"1446 0 \n",
"1447 0 \n",
"1448 0 \n",
"1449 0 \n",
"1450 0 \n",
"1451 1 \n",
"1452 0 \n",
"1453 0 \n",
"1454 0 \n",
"1455 0 \n",
"1456 0 \n",
"1457 0 \n",
"1458 0 \n",
"1459 0 \n",
"\n",
"[1460 rows x 247 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df.fillna(df.mean())\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"imr = Imputer(missing_values='NaN', strategy='mean', axis=0)\n",
"imr = imr.fit(df.values)\n",
"imputed_data = imr.transform(df.values)\n",
"imputed_data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment