Skip to content

Instantly share code, notes, and snippets.

@esjacobs
Created July 25, 2018 17:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save esjacobs/dce62966bba756a7b9753a965cbda14f to your computer and use it in GitHub Desktop.
Save esjacobs/dce62966bba756a7b9753a965cbda14f to your computer and use it in GitHub Desktop.
Ame Iowa Housing Prices
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ames Housing Data and House Price Prediction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we start off by importing anything and everything that might be helpful here."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import scipy.stats as stats\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"from sklearn import linear_model\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"from sklearn.linear_model import Ridge, Lasso, ElasticNet, LinearRegression, RidgeCV, LassoCV, ElasticNetCV\n",
"from sklearn.model_selection import cross_val_score, cross_val_predict, train_test_split\n",
"import warnings\n",
"warnings.simplefilter(\"ignore\")\n",
"\n",
"sns.set_style('darkgrid')\n",
"\n",
"%config InlineBackend.figure_format = 'retina'\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we import out data files, first saving the names as variables."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"train_csv = '/Users/evanjacobs/dsi/DSI-US-4/project-2/train.csv'\n",
"test_csv = '/Users/evanjacobs/dsi/DSI-US-4/project-2/test.csv'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For now, we'll just import out training data, so we don't accidentally alter the precious testing data. "
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(train_csv)\n",
"finaltest = pd.read_csv(test_csv)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we're going to do our test train split, and here we'll set our y to be our target, 'SalePrice'."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"X = df.drop(['SalePrice'], axis=1)\n",
"y = df.SalePrice.values\n",
"X_full = df.drop(['SalePrice'], axis=1)\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's have a look, shall we?"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>PID</th>\n",
" <th>MS SubClass</th>\n",
" <th>MS Zoning</th>\n",
" <th>Lot Frontage</th>\n",
" <th>Lot Area</th>\n",
" <th>Street</th>\n",
" <th>Alley</th>\n",
" <th>Lot Shape</th>\n",
" <th>Land Contour</th>\n",
" <th>...</th>\n",
" <th>3Ssn Porch</th>\n",
" <th>Screen Porch</th>\n",
" <th>Pool Area</th>\n",
" <th>Pool QC</th>\n",
" <th>Fence</th>\n",
" <th>Misc Feature</th>\n",
" <th>Misc Val</th>\n",
" <th>Mo Sold</th>\n",
" <th>Yr Sold</th>\n",
" <th>Sale Type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>484</th>\n",
" <td>1875</td>\n",
" <td>534201040</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>70.0</td>\n",
" <td>8050</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>2007</td>\n",
" <td>WD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1234</th>\n",
" <td>178</td>\n",
" <td>902206040</td>\n",
" <td>50</td>\n",
" <td>RM</td>\n",
" <td>50.0</td>\n",
" <td>5500</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1917</th>\n",
" <td>20</td>\n",
" <td>527302110</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>85.0</td>\n",
" <td>13175</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>MnPrv</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>640</th>\n",
" <td>2420</td>\n",
" <td>528228280</td>\n",
" <td>120</td>\n",
" <td>RL</td>\n",
" <td>43.0</td>\n",
" <td>3087</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>2006</td>\n",
" <td>New</td>\n",
" </tr>\n",
" <tr>\n",
" <th>811</th>\n",
" <td>1448</td>\n",
" <td>907202160</td>\n",
" <td>80</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>10970</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Low</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>MnPrv</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>10</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 80 columns</p>\n",
"</div>"
],
"text/plain": [
" Id PID MS SubClass MS Zoning Lot Frontage Lot Area Street \\\n",
"484 1875 534201040 20 RL 70.0 8050 Pave \n",
"1234 178 902206040 50 RM 50.0 5500 Pave \n",
"1917 20 527302110 20 RL 85.0 13175 Pave \n",
"640 2420 528228280 120 RL 43.0 3087 Pave \n",
"811 1448 907202160 80 RL NaN 10970 Pave \n",
"\n",
" Alley Lot Shape Land Contour ... 3Ssn Porch Screen Porch \\\n",
"484 NaN Reg Lvl ... 0 0 \n",
"1234 NaN Reg Lvl ... 0 0 \n",
"1917 NaN Reg Lvl ... 0 0 \n",
"640 NaN Reg Lvl ... 0 0 \n",
"811 NaN IR1 Low ... 0 0 \n",
"\n",
" Pool Area Pool QC Fence Misc Feature Misc Val Mo Sold Yr Sold \\\n",
"484 0 NaN NaN NaN 0 3 2007 \n",
"1234 0 NaN NaN NaN 0 4 2010 \n",
"1917 0 NaN MnPrv NaN 0 2 2010 \n",
"640 0 NaN NaN NaN 0 11 2006 \n",
"811 0 NaN MnPrv NaN 0 10 2008 \n",
"\n",
" Sale Type \n",
"484 WD \n",
"1234 WD \n",
"1917 WD \n",
"640 New \n",
"811 WD \n",
"\n",
"[5 rows x 80 columns]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.head()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>PID</th>\n",
" <th>MS SubClass</th>\n",
" <th>Lot Frontage</th>\n",
" <th>Lot Area</th>\n",
" <th>Overall Qual</th>\n",
" <th>Overall Cond</th>\n",
" <th>Year Built</th>\n",
" <th>Year Remod/Add</th>\n",
" <th>Mas Vnr Area</th>\n",
" <th>...</th>\n",
" <th>Garage Area</th>\n",
" <th>Wood Deck SF</th>\n",
" <th>Open Porch SF</th>\n",
" <th>Enclosed Porch</th>\n",
" <th>3Ssn Porch</th>\n",
" <th>Screen Porch</th>\n",
" <th>Pool Area</th>\n",
" <th>Misc Val</th>\n",
" <th>Mo Sold</th>\n",
" <th>Yr Sold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>1538.000000</td>\n",
" <td>1.538000e+03</td>\n",
" <td>1538.000000</td>\n",
" <td>1294.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1521.000000</td>\n",
" <td>...</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" <td>1538.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>1469.118336</td>\n",
" <td>7.148299e+08</td>\n",
" <td>57.542263</td>\n",
" <td>69.540958</td>\n",
" <td>10179.084525</td>\n",
" <td>6.109883</td>\n",
" <td>5.571521</td>\n",
" <td>1971.674252</td>\n",
" <td>1984.081274</td>\n",
" <td>99.113083</td>\n",
" <td>...</td>\n",
" <td>471.424577</td>\n",
" <td>95.207412</td>\n",
" <td>49.256177</td>\n",
" <td>23.018856</td>\n",
" <td>2.914174</td>\n",
" <td>17.200260</td>\n",
" <td>3.197659</td>\n",
" <td>55.282835</td>\n",
" <td>6.195709</td>\n",
" <td>2007.784785</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>844.226713</td>\n",
" <td>1.887552e+08</td>\n",
" <td>43.351837</td>\n",
" <td>22.987056</td>\n",
" <td>7353.026485</td>\n",
" <td>1.405082</td>\n",
" <td>1.110848</td>\n",
" <td>30.258868</td>\n",
" <td>21.200024</td>\n",
" <td>174.156041</td>\n",
" <td>...</td>\n",
" <td>216.396308</td>\n",
" <td>132.411630</td>\n",
" <td>69.244398</td>\n",
" <td>60.037423</td>\n",
" <td>27.776465</td>\n",
" <td>59.571394</td>\n",
" <td>43.605315</td>\n",
" <td>617.362905</td>\n",
" <td>2.753136</td>\n",
" <td>1.313997</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>5.263011e+08</td>\n",
" <td>20.000000</td>\n",
" <td>21.000000</td>\n",
" <td>1300.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1879.000000</td>\n",
" <td>1950.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>2006.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>746.500000</td>\n",
" <td>5.284567e+08</td>\n",
" <td>20.000000</td>\n",
" <td>59.000000</td>\n",
" <td>7455.500000</td>\n",
" <td>5.000000</td>\n",
" <td>5.000000</td>\n",
" <td>1953.000000</td>\n",
" <td>1964.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>316.250000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>4.000000</td>\n",
" <td>2007.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1496.500000</td>\n",
" <td>5.354546e+08</td>\n",
" <td>50.000000</td>\n",
" <td>69.000000</td>\n",
" <td>9465.000000</td>\n",
" <td>6.000000</td>\n",
" <td>5.000000</td>\n",
" <td>1975.000000</td>\n",
" <td>1993.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>480.000000</td>\n",
" <td>0.000000</td>\n",
" <td>28.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>6.000000</td>\n",
" <td>2008.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>2174.750000</td>\n",
" <td>9.071855e+08</td>\n",
" <td>70.000000</td>\n",
" <td>80.000000</td>\n",
" <td>11635.500000</td>\n",
" <td>7.000000</td>\n",
" <td>6.000000</td>\n",
" <td>2001.000000</td>\n",
" <td>2004.000000</td>\n",
" <td>162.000000</td>\n",
" <td>...</td>\n",
" <td>576.000000</td>\n",
" <td>168.000000</td>\n",
" <td>72.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>8.000000</td>\n",
" <td>2009.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>2930.000000</td>\n",
" <td>9.241520e+08</td>\n",
" <td>190.000000</td>\n",
" <td>313.000000</td>\n",
" <td>159000.000000</td>\n",
" <td>10.000000</td>\n",
" <td>9.000000</td>\n",
" <td>2010.000000</td>\n",
" <td>2010.000000</td>\n",
" <td>1600.000000</td>\n",
" <td>...</td>\n",
" <td>1418.000000</td>\n",
" <td>1424.000000</td>\n",
" <td>547.000000</td>\n",
" <td>432.000000</td>\n",
" <td>508.000000</td>\n",
" <td>490.000000</td>\n",
" <td>800.000000</td>\n",
" <td>17000.000000</td>\n",
" <td>12.000000</td>\n",
" <td>2010.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8 rows × 38 columns</p>\n",
"</div>"
],
"text/plain": [
" Id PID MS SubClass Lot Frontage Lot Area \\\n",
"count 1538.000000 1.538000e+03 1538.000000 1294.000000 1538.000000 \n",
"mean 1469.118336 7.148299e+08 57.542263 69.540958 10179.084525 \n",
"std 844.226713 1.887552e+08 43.351837 22.987056 7353.026485 \n",
"min 1.000000 5.263011e+08 20.000000 21.000000 1300.000000 \n",
"25% 746.500000 5.284567e+08 20.000000 59.000000 7455.500000 \n",
"50% 1496.500000 5.354546e+08 50.000000 69.000000 9465.000000 \n",
"75% 2174.750000 9.071855e+08 70.000000 80.000000 11635.500000 \n",
"max 2930.000000 9.241520e+08 190.000000 313.000000 159000.000000 \n",
"\n",
" Overall Qual Overall Cond Year Built Year Remod/Add Mas Vnr Area \\\n",
"count 1538.000000 1538.000000 1538.000000 1538.000000 1521.000000 \n",
"mean 6.109883 5.571521 1971.674252 1984.081274 99.113083 \n",
"std 1.405082 1.110848 30.258868 21.200024 174.156041 \n",
"min 1.000000 1.000000 1879.000000 1950.000000 0.000000 \n",
"25% 5.000000 5.000000 1953.000000 1964.000000 0.000000 \n",
"50% 6.000000 5.000000 1975.000000 1993.000000 0.000000 \n",
"75% 7.000000 6.000000 2001.000000 2004.000000 162.000000 \n",
"max 10.000000 9.000000 2010.000000 2010.000000 1600.000000 \n",
"\n",
" ... Garage Area Wood Deck SF Open Porch SF Enclosed Porch \\\n",
"count ... 1538.000000 1538.000000 1538.000000 1538.000000 \n",
"mean ... 471.424577 95.207412 49.256177 23.018856 \n",
"std ... 216.396308 132.411630 69.244398 60.037423 \n",
"min ... 0.000000 0.000000 0.000000 0.000000 \n",
"25% ... 316.250000 0.000000 0.000000 0.000000 \n",
"50% ... 480.000000 0.000000 28.000000 0.000000 \n",
"75% ... 576.000000 168.000000 72.000000 0.000000 \n",
"max ... 1418.000000 1424.000000 547.000000 432.000000 \n",
"\n",
" 3Ssn Porch Screen Porch Pool Area Misc Val Mo Sold \\\n",
"count 1538.000000 1538.000000 1538.000000 1538.000000 1538.000000 \n",
"mean 2.914174 17.200260 3.197659 55.282835 6.195709 \n",
"std 27.776465 59.571394 43.605315 617.362905 2.753136 \n",
"min 0.000000 0.000000 0.000000 0.000000 1.000000 \n",
"25% 0.000000 0.000000 0.000000 0.000000 4.000000 \n",
"50% 0.000000 0.000000 0.000000 0.000000 6.000000 \n",
"75% 0.000000 0.000000 0.000000 0.000000 8.000000 \n",
"max 508.000000 490.000000 800.000000 17000.000000 12.000000 \n",
"\n",
" Yr Sold \n",
"count 1538.000000 \n",
"mean 2007.784785 \n",
"std 1.313997 \n",
"min 2006.000000 \n",
"25% 2007.000000 \n",
"50% 2008.000000 \n",
"75% 2009.000000 \n",
"max 2010.000000 \n",
"\n",
"[8 rows x 38 columns]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.describe()"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 1538 entries, 484 to 1169\n",
"Data columns (total 80 columns):\n",
"Id 1538 non-null int64\n",
"PID 1538 non-null int64\n",
"MS SubClass 1538 non-null int64\n",
"MS Zoning 1538 non-null object\n",
"Lot Frontage 1294 non-null float64\n",
"Lot Area 1538 non-null int64\n",
"Street 1538 non-null object\n",
"Alley 110 non-null object\n",
"Lot Shape 1538 non-null object\n",
"Land Contour 1538 non-null object\n",
"Utilities 1538 non-null object\n",
"Lot Config 1538 non-null object\n",
"Land Slope 1538 non-null object\n",
"Neighborhood 1538 non-null object\n",
"Condition 1 1538 non-null object\n",
"Condition 2 1538 non-null object\n",
"Bldg Type 1538 non-null object\n",
"House Style 1538 non-null object\n",
"Overall Qual 1538 non-null int64\n",
"Overall Cond 1538 non-null int64\n",
"Year Built 1538 non-null int64\n",
"Year Remod/Add 1538 non-null int64\n",
"Roof Style 1538 non-null object\n",
"Roof Matl 1538 non-null object\n",
"Exterior 1st 1538 non-null object\n",
"Exterior 2nd 1538 non-null object\n",
"Mas Vnr Type 1521 non-null object\n",
"Mas Vnr Area 1521 non-null float64\n",
"Exter Qual 1538 non-null object\n",
"Exter Cond 1538 non-null object\n",
"Foundation 1538 non-null object\n",
"Bsmt Qual 1501 non-null object\n",
"Bsmt Cond 1501 non-null object\n",
"Bsmt Exposure 1498 non-null object\n",
"BsmtFin Type 1 1501 non-null object\n",
"BsmtFin SF 1 1538 non-null float64\n",
"BsmtFin Type 2 1500 non-null object\n",
"BsmtFin SF 2 1538 non-null float64\n",
"Bsmt Unf SF 1538 non-null float64\n",
"Total Bsmt SF 1538 non-null float64\n",
"Heating 1538 non-null object\n",
"Heating QC 1538 non-null object\n",
"Central Air 1538 non-null object\n",
"Electrical 1538 non-null object\n",
"1st Flr SF 1538 non-null int64\n",
"2nd Flr SF 1538 non-null int64\n",
"Low Qual Fin SF 1538 non-null int64\n",
"Gr Liv Area 1538 non-null int64\n",
"Bsmt Full Bath 1537 non-null float64\n",
"Bsmt Half Bath 1537 non-null float64\n",
"Full Bath 1538 non-null int64\n",
"Half Bath 1538 non-null int64\n",
"Bedroom AbvGr 1538 non-null int64\n",
"Kitchen AbvGr 1538 non-null int64\n",
"Kitchen Qual 1538 non-null object\n",
"TotRms AbvGrd 1538 non-null int64\n",
"Functional 1538 non-null object\n",
"Fireplaces 1538 non-null int64\n",
"Fireplace Qu 798 non-null object\n",
"Garage Type 1449 non-null object\n",
"Garage Yr Blt 1449 non-null float64\n",
"Garage Finish 1449 non-null object\n",
"Garage Cars 1538 non-null float64\n",
"Garage Area 1538 non-null float64\n",
"Garage Qual 1449 non-null object\n",
"Garage Cond 1449 non-null object\n",
"Paved Drive 1538 non-null object\n",
"Wood Deck SF 1538 non-null int64\n",
"Open Porch SF 1538 non-null int64\n",
"Enclosed Porch 1538 non-null int64\n",
"3Ssn Porch 1538 non-null int64\n",
"Screen Porch 1538 non-null int64\n",
"Pool Area 1538 non-null int64\n",
"Pool QC 9 non-null object\n",
"Fence 286 non-null object\n",
"Misc Feature 50 non-null object\n",
"Misc Val 1538 non-null int64\n",
"Mo Sold 1538 non-null int64\n",
"Yr Sold 1538 non-null int64\n",
"Sale Type 1538 non-null object\n",
"dtypes: float64(11), int64(27), object(42)\n",
"memory usage: 973.3+ KB\n"
]
}
],
"source": [
"X_train.info()"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Id', 'PID', 'MS SubClass', 'MS Zoning', 'Lot Frontage', 'Lot Area',\n",
" 'Street', 'Alley', 'Lot Shape', 'Land Contour', 'Utilities',\n",
" 'Lot Config', 'Land Slope', 'Neighborhood', 'Condition 1',\n",
" 'Condition 2', 'Bldg Type', 'House Style', 'Overall Qual',\n",
" 'Overall Cond', 'Year Built', 'Year Remod/Add', 'Roof Style',\n",
" 'Roof Matl', 'Exterior 1st', 'Exterior 2nd', 'Mas Vnr Type',\n",
" 'Mas Vnr Area', 'Exter Qual', 'Exter Cond', 'Foundation', 'Bsmt Qual',\n",
" 'Bsmt Cond', 'Bsmt Exposure', 'BsmtFin Type 1', 'BsmtFin SF 1',\n",
" 'BsmtFin Type 2', 'BsmtFin SF 2', 'Bsmt Unf SF', 'Total Bsmt SF',\n",
" 'Heating', 'Heating QC', 'Central Air', 'Electrical', '1st Flr SF',\n",
" '2nd Flr SF', 'Low Qual Fin SF', 'Gr Liv Area', 'Bsmt Full Bath',\n",
" 'Bsmt Half Bath', 'Full Bath', 'Half Bath', 'Bedroom AbvGr',\n",
" 'Kitchen AbvGr', 'Kitchen Qual', 'TotRms AbvGrd', 'Functional',\n",
" 'Fireplaces', 'Fireplace Qu', 'Garage Type', 'Garage Yr Blt',\n",
" 'Garage Finish', 'Garage Cars', 'Garage Area', 'Garage Qual',\n",
" 'Garage Cond', 'Paved Drive', 'Wood Deck SF', 'Open Porch SF',\n",
" 'Enclosed Porch', '3Ssn Porch', 'Screen Porch', 'Pool Area', 'Pool QC',\n",
" 'Fence', 'Misc Feature', 'Misc Val', 'Mo Sold', 'Yr Sold', 'Sale Type'],\n",
" dtype='object')"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Closing up the column names so I can use dot notation."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"def colclean(column_list): \n",
" columns=[]\n",
" for n in column_list:\n",
" n = n.lower().replace(' ','')\n",
" columns.append(n)\n",
" return columns\n",
"colclean(df.columns)\n",
"X_train.columns = colclean(X_train.columns)\n",
"X_test.columns = colclean(X_test.columns)\n",
"X_train.columns\n",
"X_full.columns = colclean(X_full.columns)\n",
"finaltest.columns = colclean(finaltest.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Checking for duplicate PIDs. "
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.duplicated(subset='pid', keep='first').sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Got any nulls lying around?"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id 0\n",
"pid 0\n",
"mssubclass 0\n",
"mszoning 0\n",
"lotfrontage 244\n",
"lotarea 0\n",
"street 0\n",
"alley 1428\n",
"lotshape 0\n",
"landcontour 0\n",
"utilities 0\n",
"lotconfig 0\n",
"landslope 0\n",
"neighborhood 0\n",
"condition1 0\n",
"condition2 0\n",
"bldgtype 0\n",
"housestyle 0\n",
"overallqual 0\n",
"overallcond 0\n",
"yearbuilt 0\n",
"yearremod/add 0\n",
"roofstyle 0\n",
"roofmatl 0\n",
"exterior1st 0\n",
"exterior2nd 0\n",
"masvnrtype 17\n",
"masvnrarea 17\n",
"exterqual 0\n",
"extercond 0\n",
" ... \n",
"fullbath 0\n",
"halfbath 0\n",
"bedroomabvgr 0\n",
"kitchenabvgr 0\n",
"kitchenqual 0\n",
"totrmsabvgrd 0\n",
"functional 0\n",
"fireplaces 0\n",
"fireplacequ 740\n",
"garagetype 89\n",
"garageyrblt 89\n",
"garagefinish 89\n",
"garagecars 0\n",
"garagearea 0\n",
"garagequal 89\n",
"garagecond 89\n",
"paveddrive 0\n",
"wooddecksf 0\n",
"openporchsf 0\n",
"enclosedporch 0\n",
"3ssnporch 0\n",
"screenporch 0\n",
"poolarea 0\n",
"poolqc 1529\n",
"fence 1252\n",
"miscfeature 1488\n",
"miscval 0\n",
"mosold 0\n",
"yrsold 0\n",
"saletype 0\n",
"Length: 80, dtype: int64"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a trick I learned."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"lotfrontage 244\n",
"alley 1428\n",
"masvnrtype 17\n",
"masvnrarea 17\n",
"bsmtqual 37\n",
"bsmtcond 37\n",
"bsmtexposure 40\n",
"bsmtfintype1 37\n",
"bsmtfintype2 38\n",
"bsmtfullbath 1\n",
"bsmthalfbath 1\n",
"fireplacequ 740\n",
"garagetype 89\n",
"garageyrblt 89\n",
"garagefinish 89\n",
"garagequal 89\n",
"garagecond 89\n",
"poolqc 1529\n",
"fence 1252\n",
"miscfeature 1488\n",
"dtype: int64"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.isna().sum()[X_train.isna().sum() !=0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Having a look at what the object column with null values looks like. "
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"array([nan, 'TA', 'Gd', 'Ex', 'Fa'], dtype=object)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.poolqc.unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just for the sake of time, going to fill all null values with their numerical averages for numerical columns."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"X_train = X_train.fillna(X_train.mean())\n",
"X_test = X_test.fillna(X_test.mean())\n",
"X_full = X_full.fillna(X_full.mean())\n",
"finaltest = finaltest.fillna(finaltest.mean())"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 1538 entries, 484 to 1169\n",
"Data columns (total 80 columns):\n",
"id 1538 non-null int64\n",
"pid 1538 non-null int64\n",
"mssubclass 1538 non-null int64\n",
"mszoning 1538 non-null object\n",
"lotfrontage 1538 non-null float64\n",
"lotarea 1538 non-null int64\n",
"street 1538 non-null object\n",
"alley 110 non-null object\n",
"lotshape 1538 non-null object\n",
"landcontour 1538 non-null object\n",
"utilities 1538 non-null object\n",
"lotconfig 1538 non-null object\n",
"landslope 1538 non-null object\n",
"neighborhood 1538 non-null object\n",
"condition1 1538 non-null object\n",
"condition2 1538 non-null object\n",
"bldgtype 1538 non-null object\n",
"housestyle 1538 non-null object\n",
"overallqual 1538 non-null int64\n",
"overallcond 1538 non-null int64\n",
"yearbuilt 1538 non-null int64\n",
"yearremod/add 1538 non-null int64\n",
"roofstyle 1538 non-null object\n",
"roofmatl 1538 non-null object\n",
"exterior1st 1538 non-null object\n",
"exterior2nd 1538 non-null object\n",
"masvnrtype 1521 non-null object\n",
"masvnrarea 1538 non-null float64\n",
"exterqual 1538 non-null object\n",
"extercond 1538 non-null object\n",
"foundation 1538 non-null object\n",
"bsmtqual 1501 non-null object\n",
"bsmtcond 1501 non-null object\n",
"bsmtexposure 1498 non-null object\n",
"bsmtfintype1 1501 non-null object\n",
"bsmtfinsf1 1538 non-null float64\n",
"bsmtfintype2 1500 non-null object\n",
"bsmtfinsf2 1538 non-null float64\n",
"bsmtunfsf 1538 non-null float64\n",
"totalbsmtsf 1538 non-null float64\n",
"heating 1538 non-null object\n",
"heatingqc 1538 non-null object\n",
"centralair 1538 non-null object\n",
"electrical 1538 non-null object\n",
"1stflrsf 1538 non-null int64\n",
"2ndflrsf 1538 non-null int64\n",
"lowqualfinsf 1538 non-null int64\n",
"grlivarea 1538 non-null int64\n",
"bsmtfullbath 1538 non-null float64\n",
"bsmthalfbath 1538 non-null float64\n",
"fullbath 1538 non-null int64\n",
"halfbath 1538 non-null int64\n",
"bedroomabvgr 1538 non-null int64\n",
"kitchenabvgr 1538 non-null int64\n",
"kitchenqual 1538 non-null object\n",
"totrmsabvgrd 1538 non-null int64\n",
"functional 1538 non-null object\n",
"fireplaces 1538 non-null int64\n",
"fireplacequ 798 non-null object\n",
"garagetype 1449 non-null object\n",
"garageyrblt 1538 non-null float64\n",
"garagefinish 1449 non-null object\n",
"garagecars 1538 non-null float64\n",
"garagearea 1538 non-null float64\n",
"garagequal 1449 non-null object\n",
"garagecond 1449 non-null object\n",
"paveddrive 1538 non-null object\n",
"wooddecksf 1538 non-null int64\n",
"openporchsf 1538 non-null int64\n",
"enclosedporch 1538 non-null int64\n",
"3ssnporch 1538 non-null int64\n",
"screenporch 1538 non-null int64\n",
"poolarea 1538 non-null int64\n",
"poolqc 9 non-null object\n",
"fence 286 non-null object\n",
"miscfeature 50 non-null object\n",
"miscval 1538 non-null int64\n",
"mosold 1538 non-null int64\n",
"yrsold 1538 non-null int64\n",
"saletype 1538 non-null object\n",
"dtypes: float64(11), int64(27), object(42)\n",
"memory usage: 973.3+ KB\n"
]
}
],
"source": [
"X_train.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Breaking dataframes into two where one is all num values and one is all obj values so I can look at them more easily."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"X_tr_obj = X_train.select_dtypes(exclude=[np.number])\n",
"X_tr_num = X_train.select_dtypes(include=[np.number])\n",
"X_ts_obj = X_test.select_dtypes(exclude=[np.number])\n",
"X_ts_num = X_test.select_dtypes(include=[np.number])\n",
"X_full_obj = X_full.select_dtypes(exclude=[np.number])\n",
"X_full_num = X_full.select_dtypes(include=[np.number])\n",
"finaltest_obj = finaltest.select_dtypes(exclude=[np.number])\n",
"finaltest_num = finaltest.select_dtypes(include=[np.number])"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>pid</th>\n",
" <th>mssubclass</th>\n",
" <th>lotfrontage</th>\n",
" <th>lotarea</th>\n",
" <th>overallqual</th>\n",
" <th>overallcond</th>\n",
" <th>yearbuilt</th>\n",
" <th>yearremod/add</th>\n",
" <th>masvnrarea</th>\n",
" <th>...</th>\n",
" <th>wooddecksf</th>\n",
" <th>openporchsf</th>\n",
" <th>enclosedporch</th>\n",
" <th>3ssnporch</th>\n",
" <th>screenporch</th>\n",
" <th>poolarea</th>\n",
" <th>poolqc</th>\n",
" <th>miscval</th>\n",
" <th>mosold</th>\n",
" <th>yrsold</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>908</th>\n",
" <td>2559</td>\n",
" <td>534455080</td>\n",
" <td>20</td>\n",
" <td>80.0</td>\n",
" <td>9600</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1961</td>\n",
" <td>1990</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>144</td>\n",
" <td>0</td>\n",
" <td>205</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>2006</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1619</th>\n",
" <td>1947</td>\n",
" <td>535375130</td>\n",
" <td>50</td>\n",
" <td>60.0</td>\n",
" <td>10134</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1940</td>\n",
" <td>1950</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>39</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>2007</td>\n",
" </tr>\n",
" <tr>\n",
" <th>391</th>\n",
" <td>81</td>\n",
" <td>531453010</td>\n",
" <td>20</td>\n",
" <td>81.0</td>\n",
" <td>9672</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>1984</td>\n",
" <td>1985</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>861</th>\n",
" <td>2573</td>\n",
" <td>535151130</td>\n",
" <td>90</td>\n",
" <td>70.0</td>\n",
" <td>7728</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>1962</td>\n",
" <td>1962</td>\n",
" <td>120.0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>18</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2006</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1270</th>\n",
" <td>1569</td>\n",
" <td>914476080</td>\n",
" <td>90</td>\n",
" <td>76.0</td>\n",
" <td>10260</td>\n",
" <td>5</td>\n",
" <td>4</td>\n",
" <td>1976</td>\n",
" <td>1976</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 39 columns</p>\n",
"</div>"
],
"text/plain": [
" id pid mssubclass lotfrontage lotarea overallqual \\\n",
"908 2559 534455080 20 80.0 9600 5 \n",
"1619 1947 535375130 50 60.0 10134 5 \n",
"391 81 531453010 20 81.0 9672 6 \n",
"861 2573 535151130 90 70.0 7728 5 \n",
"1270 1569 914476080 90 76.0 10260 5 \n",
"\n",
" overallcond yearbuilt yearremod/add masvnrarea ... wooddecksf \\\n",
"908 6 1961 1990 0.0 ... 144 \n",
"1619 6 1940 1950 0.0 ... 0 \n",
"391 5 1984 1985 0.0 ... 0 \n",
"861 6 1962 1962 120.0 ... 0 \n",
"1270 4 1976 1976 0.0 ... 0 \n",
"\n",
" openporchsf enclosedporch 3ssnporch screenporch poolarea poolqc \\\n",
"908 0 205 0 0 0 NaN \n",
"1619 39 0 0 0 0 NaN \n",
"391 0 0 0 0 0 NaN \n",
"861 18 0 0 0 0 NaN \n",
"1270 0 0 0 0 0 NaN \n",
"\n",
" miscval mosold yrsold \n",
"908 0 6 2006 \n",
"1619 0 7 2007 \n",
"391 0 5 2010 \n",
"861 0 5 2006 \n",
"1270 0 11 2008 \n",
"\n",
"[5 rows x 39 columns]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_ts_num.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Checking to make sure it worked. "
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(1538, 80)\n",
"(1538, 42)\n",
"(1538, 38)\n"
]
}
],
"source": [
"print(X_train.shape)\n",
"print(X_tr_obj.shape)\n",
"print(X_tr_num.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check for potential outliers."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>id</th>\n",
" <td>1538.0</td>\n",
" <td>1.469118e+03</td>\n",
" <td>8.442267e+02</td>\n",
" <td>1.0</td>\n",
" <td>7.465000e+02</td>\n",
" <td>1.496500e+03</td>\n",
" <td>2.174750e+03</td>\n",
" <td>2930.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>pid</th>\n",
" <td>1538.0</td>\n",
" <td>7.148299e+08</td>\n",
" <td>1.887552e+08</td>\n",
" <td>526301100.0</td>\n",
" <td>5.284567e+08</td>\n",
" <td>5.354546e+08</td>\n",
" <td>9.071855e+08</td>\n",
" <td>924152030.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mssubclass</th>\n",
" <td>1538.0</td>\n",
" <td>5.754226e+01</td>\n",
" <td>4.335184e+01</td>\n",
" <td>20.0</td>\n",
" <td>2.000000e+01</td>\n",
" <td>5.000000e+01</td>\n",
" <td>7.000000e+01</td>\n",
" <td>190.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lotfrontage</th>\n",
" <td>1538.0</td>\n",
" <td>6.954096e+01</td>\n",
" <td>2.108364e+01</td>\n",
" <td>21.0</td>\n",
" <td>6.000000e+01</td>\n",
" <td>6.954096e+01</td>\n",
" <td>7.900000e+01</td>\n",
" <td>313.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lotarea</th>\n",
" <td>1538.0</td>\n",
" <td>1.017908e+04</td>\n",
" <td>7.353026e+03</td>\n",
" <td>1300.0</td>\n",
" <td>7.455500e+03</td>\n",
" <td>9.465000e+03</td>\n",
" <td>1.163550e+04</td>\n",
" <td>159000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>overallqual</th>\n",
" <td>1538.0</td>\n",
" <td>6.109883e+00</td>\n",
" <td>1.405082e+00</td>\n",
" <td>1.0</td>\n",
" <td>5.000000e+00</td>\n",
" <td>6.000000e+00</td>\n",
" <td>7.000000e+00</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>overallcond</th>\n",
" <td>1538.0</td>\n",
" <td>5.571521e+00</td>\n",
" <td>1.110848e+00</td>\n",
" <td>1.0</td>\n",
" <td>5.000000e+00</td>\n",
" <td>5.000000e+00</td>\n",
" <td>6.000000e+00</td>\n",
" <td>9.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>yearbuilt</th>\n",
" <td>1538.0</td>\n",
" <td>1.971674e+03</td>\n",
" <td>3.025887e+01</td>\n",
" <td>1879.0</td>\n",
" <td>1.953000e+03</td>\n",
" <td>1.975000e+03</td>\n",
" <td>2.001000e+03</td>\n",
" <td>2010.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>yearremod/add</th>\n",
" <td>1538.0</td>\n",
" <td>1.984081e+03</td>\n",
" <td>2.120002e+01</td>\n",
" <td>1950.0</td>\n",
" <td>1.964000e+03</td>\n",
" <td>1.993000e+03</td>\n",
" <td>2.004000e+03</td>\n",
" <td>2010.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>masvnrarea</th>\n",
" <td>1538.0</td>\n",
" <td>9.911308e+01</td>\n",
" <td>1.731902e+02</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1.600000e+02</td>\n",
" <td>1600.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bsmtfinsf1</th>\n",
" <td>1538.0</td>\n",
" <td>4.402523e+02</td>\n",
" <td>4.704420e+02</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>3.610000e+02</td>\n",
" <td>7.287500e+02</td>\n",
" <td>5644.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bsmtfinsf2</th>\n",
" <td>1538.0</td>\n",
" <td>4.791873e+01</td>\n",
" <td>1.636097e+02</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1474.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bsmtunfsf</th>\n",
" <td>1538.0</td>\n",
" <td>5.709402e+02</td>\n",
" <td>4.445010e+02</td>\n",
" <td>0.0</td>\n",
" <td>2.222500e+02</td>\n",
" <td>4.800000e+02</td>\n",
" <td>8.150000e+02</td>\n",
" <td>2336.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totalbsmtsf</th>\n",
" <td>1538.0</td>\n",
" <td>1.059111e+03</td>\n",
" <td>4.524990e+02</td>\n",
" <td>0.0</td>\n",
" <td>7.895000e+02</td>\n",
" <td>9.945000e+02</td>\n",
" <td>1.324000e+03</td>\n",
" <td>6110.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1stflrsf</th>\n",
" <td>1538.0</td>\n",
" <td>1.165298e+03</td>\n",
" <td>4.039276e+02</td>\n",
" <td>334.0</td>\n",
" <td>8.782500e+02</td>\n",
" <td>1.092500e+03</td>\n",
" <td>1.408500e+03</td>\n",
" <td>5095.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2ndflrsf</th>\n",
" <td>1538.0</td>\n",
" <td>3.328362e+02</td>\n",
" <td>4.238517e+02</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>6.995000e+02</td>\n",
" <td>1836.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lowqualfinsf</th>\n",
" <td>1538.0</td>\n",
" <td>5.667100e+00</td>\n",
" <td>5.337538e+01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1064.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>grlivarea</th>\n",
" <td>1538.0</td>\n",
" <td>1.503801e+03</td>\n",
" <td>5.047836e+02</td>\n",
" <td>334.0</td>\n",
" <td>1.143000e+03</td>\n",
" <td>1.452000e+03</td>\n",
" <td>1.724000e+03</td>\n",
" <td>5642.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bsmtfullbath</th>\n",
" <td>1538.0</td>\n",
" <td>4.307092e-01</td>\n",
" <td>5.182866e-01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bsmthalfbath</th>\n",
" <td>1538.0</td>\n",
" <td>6.115810e-02</td>\n",
" <td>2.476318e-01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fullbath</th>\n",
" <td>1538.0</td>\n",
" <td>1.583225e+00</td>\n",
" <td>5.445938e-01</td>\n",
" <td>0.0</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.000000e+00</td>\n",
" <td>2.000000e+00</td>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>halfbath</th>\n",
" <td>1538.0</td>\n",
" <td>3.719116e-01</td>\n",
" <td>4.993598e-01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bedroomabvgr</th>\n",
" <td>1538.0</td>\n",
" <td>2.843953e+00</td>\n",
" <td>8.124554e-01</td>\n",
" <td>0.0</td>\n",
" <td>2.000000e+00</td>\n",
" <td>3.000000e+00</td>\n",
" <td>3.000000e+00</td>\n",
" <td>8.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>kitchenabvgr</th>\n",
" <td>1538.0</td>\n",
" <td>1.041612e+00</td>\n",
" <td>2.093097e-01</td>\n",
" <td>0.0</td>\n",
" <td>1.000000e+00</td>\n",
" <td>1.000000e+00</td>\n",
" <td>1.000000e+00</td>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>totrmsabvgrd</th>\n",
" <td>1538.0</td>\n",
" <td>6.445384e+00</td>\n",
" <td>1.545643e+00</td>\n",
" <td>2.0</td>\n",
" <td>5.000000e+00</td>\n",
" <td>6.000000e+00</td>\n",
" <td>7.000000e+00</td>\n",
" <td>15.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fireplaces</th>\n",
" <td>1538.0</td>\n",
" <td>6.046814e-01</td>\n",
" <td>6.481274e-01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1.000000e+00</td>\n",
" <td>1.000000e+00</td>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>garageyrblt</th>\n",
" <td>1538.0</td>\n",
" <td>1.978795e+03</td>\n",
" <td>2.501178e+01</td>\n",
" <td>1895.0</td>\n",
" <td>1.962000e+03</td>\n",
" <td>1.978795e+03</td>\n",
" <td>2.001000e+03</td>\n",
" <td>2207.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>garagecars</th>\n",
" <td>1538.0</td>\n",
" <td>1.774382e+00</td>\n",
" <td>7.672165e-01</td>\n",
" <td>0.0</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.000000e+00</td>\n",
" <td>2.000000e+00</td>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>garagearea</th>\n",
" <td>1538.0</td>\n",
" <td>4.714246e+02</td>\n",
" <td>2.163963e+02</td>\n",
" <td>0.0</td>\n",
" <td>3.162500e+02</td>\n",
" <td>4.800000e+02</td>\n",
" <td>5.760000e+02</td>\n",
" <td>1418.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>wooddecksf</th>\n",
" <td>1538.0</td>\n",
" <td>9.520741e+01</td>\n",
" <td>1.324116e+02</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>1.680000e+02</td>\n",
" <td>1424.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>openporchsf</th>\n",
" <td>1538.0</td>\n",
" <td>4.925618e+01</td>\n",
" <td>6.924440e+01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>2.800000e+01</td>\n",
" <td>7.200000e+01</td>\n",
" <td>547.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>enclosedporch</th>\n",
" <td>1538.0</td>\n",
" <td>2.301886e+01</td>\n",
" <td>6.003742e+01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>432.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3ssnporch</th>\n",
" <td>1538.0</td>\n",
" <td>2.914174e+00</td>\n",
" <td>2.777646e+01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>508.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>screenporch</th>\n",
" <td>1538.0</td>\n",
" <td>1.720026e+01</td>\n",
" <td>5.957139e+01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>490.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>poolarea</th>\n",
" <td>1538.0</td>\n",
" <td>3.197659e+00</td>\n",
" <td>4.360531e+01</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>800.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>miscval</th>\n",
" <td>1538.0</td>\n",
" <td>5.528283e+01</td>\n",
" <td>6.173629e+02</td>\n",
" <td>0.0</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>17000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mosold</th>\n",
" <td>1538.0</td>\n",
" <td>6.195709e+00</td>\n",
" <td>2.753136e+00</td>\n",
" <td>1.0</td>\n",
" <td>4.000000e+00</td>\n",
" <td>6.000000e+00</td>\n",
" <td>8.000000e+00</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>yrsold</th>\n",
" <td>1538.0</td>\n",
" <td>2.007785e+03</td>\n",
" <td>1.313997e+00</td>\n",
" <td>2006.0</td>\n",
" <td>2.007000e+03</td>\n",
" <td>2.008000e+03</td>\n",
" <td>2.009000e+03</td>\n",
" <td>2010.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std min 25% \\\n",
"id 1538.0 1.469118e+03 8.442267e+02 1.0 7.465000e+02 \n",
"pid 1538.0 7.148299e+08 1.887552e+08 526301100.0 5.284567e+08 \n",
"mssubclass 1538.0 5.754226e+01 4.335184e+01 20.0 2.000000e+01 \n",
"lotfrontage 1538.0 6.954096e+01 2.108364e+01 21.0 6.000000e+01 \n",
"lotarea 1538.0 1.017908e+04 7.353026e+03 1300.0 7.455500e+03 \n",
"overallqual 1538.0 6.109883e+00 1.405082e+00 1.0 5.000000e+00 \n",
"overallcond 1538.0 5.571521e+00 1.110848e+00 1.0 5.000000e+00 \n",
"yearbuilt 1538.0 1.971674e+03 3.025887e+01 1879.0 1.953000e+03 \n",
"yearremod/add 1538.0 1.984081e+03 2.120002e+01 1950.0 1.964000e+03 \n",
"masvnrarea 1538.0 9.911308e+01 1.731902e+02 0.0 0.000000e+00 \n",
"bsmtfinsf1 1538.0 4.402523e+02 4.704420e+02 0.0 0.000000e+00 \n",
"bsmtfinsf2 1538.0 4.791873e+01 1.636097e+02 0.0 0.000000e+00 \n",
"bsmtunfsf 1538.0 5.709402e+02 4.445010e+02 0.0 2.222500e+02 \n",
"totalbsmtsf 1538.0 1.059111e+03 4.524990e+02 0.0 7.895000e+02 \n",
"1stflrsf 1538.0 1.165298e+03 4.039276e+02 334.0 8.782500e+02 \n",
"2ndflrsf 1538.0 3.328362e+02 4.238517e+02 0.0 0.000000e+00 \n",
"lowqualfinsf 1538.0 5.667100e+00 5.337538e+01 0.0 0.000000e+00 \n",
"grlivarea 1538.0 1.503801e+03 5.047836e+02 334.0 1.143000e+03 \n",
"bsmtfullbath 1538.0 4.307092e-01 5.182866e-01 0.0 0.000000e+00 \n",
"bsmthalfbath 1538.0 6.115810e-02 2.476318e-01 0.0 0.000000e+00 \n",
"fullbath 1538.0 1.583225e+00 5.445938e-01 0.0 1.000000e+00 \n",
"halfbath 1538.0 3.719116e-01 4.993598e-01 0.0 0.000000e+00 \n",
"bedroomabvgr 1538.0 2.843953e+00 8.124554e-01 0.0 2.000000e+00 \n",
"kitchenabvgr 1538.0 1.041612e+00 2.093097e-01 0.0 1.000000e+00 \n",
"totrmsabvgrd 1538.0 6.445384e+00 1.545643e+00 2.0 5.000000e+00 \n",
"fireplaces 1538.0 6.046814e-01 6.481274e-01 0.0 0.000000e+00 \n",
"garageyrblt 1538.0 1.978795e+03 2.501178e+01 1895.0 1.962000e+03 \n",
"garagecars 1538.0 1.774382e+00 7.672165e-01 0.0 1.000000e+00 \n",
"garagearea 1538.0 4.714246e+02 2.163963e+02 0.0 3.162500e+02 \n",
"wooddecksf 1538.0 9.520741e+01 1.324116e+02 0.0 0.000000e+00 \n",
"openporchsf 1538.0 4.925618e+01 6.924440e+01 0.0 0.000000e+00 \n",
"enclosedporch 1538.0 2.301886e+01 6.003742e+01 0.0 0.000000e+00 \n",
"3ssnporch 1538.0 2.914174e+00 2.777646e+01 0.0 0.000000e+00 \n",
"screenporch 1538.0 1.720026e+01 5.957139e+01 0.0 0.000000e+00 \n",
"poolarea 1538.0 3.197659e+00 4.360531e+01 0.0 0.000000e+00 \n",
"miscval 1538.0 5.528283e+01 6.173629e+02 0.0 0.000000e+00 \n",
"mosold 1538.0 6.195709e+00 2.753136e+00 1.0 4.000000e+00 \n",
"yrsold 1538.0 2.007785e+03 1.313997e+00 2006.0 2.007000e+03 \n",
"\n",
" 50% 75% max \n",
"id 1.496500e+03 2.174750e+03 2930.0 \n",
"pid 5.354546e+08 9.071855e+08 924152030.0 \n",
"mssubclass 5.000000e+01 7.000000e+01 190.0 \n",
"lotfrontage 6.954096e+01 7.900000e+01 313.0 \n",
"lotarea 9.465000e+03 1.163550e+04 159000.0 \n",
"overallqual 6.000000e+00 7.000000e+00 10.0 \n",
"overallcond 5.000000e+00 6.000000e+00 9.0 \n",
"yearbuilt 1.975000e+03 2.001000e+03 2010.0 \n",
"yearremod/add 1.993000e+03 2.004000e+03 2010.0 \n",
"masvnrarea 0.000000e+00 1.600000e+02 1600.0 \n",
"bsmtfinsf1 3.610000e+02 7.287500e+02 5644.0 \n",
"bsmtfinsf2 0.000000e+00 0.000000e+00 1474.0 \n",
"bsmtunfsf 4.800000e+02 8.150000e+02 2336.0 \n",
"totalbsmtsf 9.945000e+02 1.324000e+03 6110.0 \n",
"1stflrsf 1.092500e+03 1.408500e+03 5095.0 \n",
"2ndflrsf 0.000000e+00 6.995000e+02 1836.0 \n",
"lowqualfinsf 0.000000e+00 0.000000e+00 1064.0 \n",
"grlivarea 1.452000e+03 1.724000e+03 5642.0 \n",
"bsmtfullbath 0.000000e+00 1.000000e+00 2.0 \n",
"bsmthalfbath 0.000000e+00 0.000000e+00 2.0 \n",
"fullbath 2.000000e+00 2.000000e+00 4.0 \n",
"halfbath 0.000000e+00 1.000000e+00 2.0 \n",
"bedroomabvgr 3.000000e+00 3.000000e+00 8.0 \n",
"kitchenabvgr 1.000000e+00 1.000000e+00 3.0 \n",
"totrmsabvgrd 6.000000e+00 7.000000e+00 15.0 \n",
"fireplaces 1.000000e+00 1.000000e+00 4.0 \n",
"garageyrblt 1.978795e+03 2.001000e+03 2207.0 \n",
"garagecars 2.000000e+00 2.000000e+00 4.0 \n",
"garagearea 4.800000e+02 5.760000e+02 1418.0 \n",
"wooddecksf 0.000000e+00 1.680000e+02 1424.0 \n",
"openporchsf 2.800000e+01 7.200000e+01 547.0 \n",
"enclosedporch 0.000000e+00 0.000000e+00 432.0 \n",
"3ssnporch 0.000000e+00 0.000000e+00 508.0 \n",
"screenporch 0.000000e+00 0.000000e+00 490.0 \n",
"poolarea 0.000000e+00 0.000000e+00 800.0 \n",
"miscval 0.000000e+00 0.000000e+00 17000.0 \n",
"mosold 6.000000e+00 8.000000e+00 12.0 \n",
"yrsold 2.008000e+03 2.009000e+03 2010.0 "
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_tr_num.describe().T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Obviously, some of the maxes are large, but that's the way real estate works. Otherwise, nothing here pops out as being wroong. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, now that we have separated our dataframe into a numerical one and a categorical one, let's take a look-see at the numerical correlations."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"X_tr_num['sp']=y_train"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"id 0.046963\n",
"pid 0.243312\n",
"mssubclass 0.104183\n",
"lotfrontage 0.320966\n",
"lotarea 0.301233\n",
"overallqual 0.787963\n",
"overallcond 0.094205\n",
"yearbuilt 0.560274\n",
"yearremod/add 0.531071\n",
"masvnrarea 0.503817\n",
"bsmtfinsf1 0.423380\n",
"bsmtfinsf2 0.010237\n",
"bsmtunfsf 0.180959\n",
"totalbsmtsf 0.621630\n",
"1stflrsf 0.616752\n",
"2ndflrsf 0.244477\n",
"lowqualfinsf 0.031802\n",
"grlivarea 0.695442\n",
"bsmtfullbath 0.279659\n",
"bsmthalfbath 0.036105\n",
"fullbath 0.541253\n",
"halfbath 0.296527\n",
"bedroomabvgr 0.150723\n",
"kitchenabvgr 0.131280\n",
"totrmsabvgrd 0.495609\n",
"fireplaces 0.467371\n",
"garageyrblt 0.499246\n",
"garagecars 0.646358\n",
"garagearea 0.654564\n",
"wooddecksf 0.322195\n",
"openporchsf 0.326194\n",
"enclosedporch 0.113088\n",
"3ssnporch 0.062105\n",
"screenporch 0.128890\n",
"poolarea 0.026498\n",
"miscval 0.013014\n",
"mosold 0.026836\n",
"yrsold 0.014487\n",
"sp 1.000000\n",
"Name: sp, dtype: float64"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"abs(X_tr_num.corr().sp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this first model, we're going to choose all the columns where the correlation coefficient with SalePrice is greater than or equal to some value I determine."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"['overallqual',\n",
" 'grlivarea',\n",
" 'garagearea',\n",
" 'garagecars',\n",
" 'totalbsmtsf',\n",
" '1stflrsf',\n",
" 'yearbuilt',\n",
" 'fullbath',\n",
" 'yearremod/add',\n",
" 'masvnrarea',\n",
" 'garageyrblt',\n",
" 'totrmsabvgrd',\n",
" 'fireplaces',\n",
" 'bsmtfinsf1',\n",
" 'openporchsf',\n",
" 'wooddecksf',\n",
" 'lotfrontage',\n",
" 'lotarea']"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vals = abs(X_tr_num.corr().sp).drop('sp').sort_values(ascending=False)\n",
"corr_cols = list(vals[vals >= 0.3].index)\n",
"\n",
"X_tr_mod1 = X_tr_num[corr_cols]\n",
"X_ts_mod1 = X_ts_num[corr_cols]\n",
"X_full_mod1 = X_full_num[corr_cols]\n",
"finaltest_num = finaltest_num[corr_cols]\n",
"\n",
"corr_cols"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's just notice that garageyrblt, yearbuilt should be correlated, as well as garagearea, garagecars, as well as totalbsmtsf, masvnrarea, grlivarea, 1stflrsf. So let's make some interaction variables. "
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import PolynomialFeatures\n",
"pf = PolynomialFeatures(degree=2, interaction_only=False, \n",
" include_bias=True)\n",
"pf.fit(X_tr_mod1)\n",
"X_tr_mod1 = pf.transform(X_tr_mod1)\n",
"X_ts_mod1 = pf.transform(X_ts_mod1)\n",
"X_full_mod1 = pf.transform(X_full_mod1)\n",
"finaltest_num = pf.transform(finaltest_num)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, now let's use a standard scalar to make everything line up nicely. "
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"ss = StandardScaler()\n",
"X_tr_mod1 = ss.fit_transform(X_tr_mod1)\n",
"X_ts_mod1 = ss.transform(X_ts_mod1)\n",
"X_full_mod1 = ss.fit_transform(X_full_mod1)\n",
"finaltest_num = ss.transform(finaltest_num)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, for lasso, ridge, and enet, I played with different alpha ranges, different numbers of iterations, and also different correlation thresholds. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try a lasso!"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"l_alphas = np.arange(.001, .15, .0025)\n",
"lasso_model = LassoCV(alphas=l_alphas, max_iter=2000, cv=5)\n",
"# lasso_model = LassoCV(max_iter=10000, cv=5)\n",
"\n",
"model_1 = lasso_model.fit(X_tr_mod1, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great, let's score the lasso."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.8775294669615423\n"
]
}
],
"source": [
"print(model_1.score(X_ts_mod1, y_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hey, that's not a bad score at all! What if we tried ridge? "
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RidgeCV(alphas=array([1.00000e+00, 1.05956e+00, ..., 9.43788e+04, 1.00000e+05]),\n",
" cv=10, fit_intercept=True, gcv_mode=None, normalize=False,\n",
" scoring=None, store_cv_values=False)"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ridge_alphas = np.logspace(0, 5, 200)\n",
"\n",
"ridge_model = RidgeCV(alphas=ridge_alphas, cv=10)\n",
"# ridge_model = RidgeCV(cv=10)\n",
"ridge_model.fit(X_tr_mod1, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.41482135 0.92429854 0.93223297 0.87297726 0.96470258 0.92663356\n",
" 0.86130404 0.90217608 0.76066959 0.89760725 0.88741725 0.9228639\n",
" 0.79991588 0.71831878 0.93233873]\n",
"0.8478851837319284\n"
]
}
],
"source": [
"ridge = Ridge(alpha=ridge_model.alpha_)\n",
"\n",
"ridge_scores = cross_val_score(ridge, X_ts_mod1, y_test, cv=15)\n",
"\n",
"print(ridge_scores)\n",
"print(np.mean(ridge_scores))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hmmm. And last, everyone's favorite, the elastic net. "
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"161.32818304593434\n",
"1.0\n"
]
}
],
"source": [
"l1_ratios = np.linspace(0.01, 1.0, 25)\n",
"\n",
"enet = ElasticNetCV(l1_ratio=l1_ratios, n_alphas=100, cv=10,\n",
" verbose=0)\n",
"# enet = ElasticNetCV(cv=10, verbose=0)\n",
"enet.fit(X_tr_mod1, y_train)\n",
"\n",
"print(enet.alpha_)\n",
"print(enet.l1_ratio_)\n"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.55454395 0.94231695 0.90016697 0.95585052 0.88206291 0.76624644\n",
" 0.90431674 0.94806442 0.8154628 0.90050468]\n",
"0.8569536375192447\n"
]
}
],
"source": [
"enet = ElasticNet(alpha=enet.alpha_, l1_ratio=enet.l1_ratio_)\n",
"\n",
"enet_scores = cross_val_score(enet, X_ts_mod1, y_test, cv=10)\n",
"\n",
"print(enet_scores)\n",
"print(np.mean(enet_scores))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's basically the same. But I have discovered that as I decreased my cut off for correlation, my lasso score remained largely the same, but my ridge and enet scores went up a tiny bit, culminating with my pulling an R-squared on .89 from Elastic Net. "
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# l_alphas = np.arange(.001, .15, .0025)\n",
"# lasso_model_final = LassoCV(alphas=l_alphas, cv=5)\n",
"# model_1_final = lasso_model.fit(X_full_mod1, y)\n",
"\n",
"enet = ElasticNetCV(l1_ratio=l1_ratios, n_alphas=100, cv=10,\n",
" verbose=0)\n",
"model_1_final = enet.fit(X_full_mod1, y)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is how I send my model's predictions to a file."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"evansubmission1 = pd.DataFrame(data = model_1_final.predict(finaltest_num), columns = ['SalePrice'], index=finaltest['id'])\n",
"evansubmission1.to_csv('./evansubmission1.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plotting my test predictions vs. my test y for a nice visualization of the efficacy of my model. "
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 576x576 with 1 Axes>"
]
},
"metadata": {
"image/png": {
"height": 479,
"width": 515
}
},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"score: 0.855802088154267\n"
]
}
],
"source": [
"predictions = model_1_final.predict(X_ts_mod1)\n",
"y = y_test\n",
"\n",
"# Plot the model\n",
"plt.figure(figsize=(8,8))\n",
"plt.scatter(predictions, y, s=30, c='g', marker='*', zorder=10)\n",
"plt.xlabel(\"Predicted Values of Price From My Horrible Model\")\n",
"plt.ylabel(\"Actual Values of Price\")\n",
"\n",
"plt.plot([0, np.max(y)], [0, np.max(y)], c = 'k')\n",
"\n",
"plt.show()\n",
"score = cross_val_score(model_1_final, X_ts_mod1, y_test, cv=10)\n",
"print(\"score: \", score.mean())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And that's it for the first model! Don't forget to look at the other two model files by checking out the GitHub [repository](https://github.com/esjacobs/Predicting-Ames-Housing-Prices). I've included the data dictionary below. "
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# evansubmission1 = pd.DataFrame(data = model_1.predict(X_ts_mod1), columns = ['SalePrice'], index=y_test['Id'])\n",
"# evansubmission1.to_csv('./evansubmission1.csv')"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"# There are three files:\n",
"\n",
"# train.csv -- this data contains all of the training data for your model.\n",
"# The target variable (SalePrice) is removed from the test set!\n",
"# test.csv -- this data contains the test data for your model. You will feed this data into your regression model to make predictions.\n",
"# sample_sub_reg.csv -- An example of a correctly formatted submission for this challenge (with a random number provided as predictions for SalePrice. Please ensure that your submission to Kaggle matches this format.\n",
"# Codebook / Data Dictionary:\n",
"\n",
"# SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict for this challenge.\n",
"# MSSubClass: The building class\n",
"# 20 1-STORY 1946 & NEWER ALL STYLES\n",
"# 30 1-STORY 1945 & OLDER\n",
"# 40 1-STORY W/FINISHED ATTIC ALL AGES\n",
"# 45 1-1/2 STORY - UNFINISHED ALL AGES\n",
"# 50 1-1/2 STORY FINISHED ALL AGES\n",
"# 60 2-STORY 1946 & NEWER\n",
"# 70 2-STORY 1945 & OLDER\n",
"# 75 2-1/2 STORY ALL AGES\n",
"# 80 SPLIT OR MULTI-LEVEL\n",
"# 85 SPLIT FOYER\n",
"# 90 DUPLEX - ALL STYLES AND AGES\n",
"# 120 1-STORY PUD (Planned Unit Development) - 1946 & NEWER\n",
"# 150 1-1/2 STORY PUD - ALL AGES\n",
"# 160 2-STORY PUD - 1946 & NEWER\n",
"# 180 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER\n",
"# 190 2 FAMILY CONVERSION - ALL STYLES AND AGES\n",
"# MSZoning: Identifies the general zoning classification of the sale.\n",
"# A Agriculture\n",
"# C Commercial\n",
"# FV Floating Village Residential\n",
"# I Industrial\n",
"# RH Residential High Density\n",
"# RL Residential Low Density\n",
"# RP Residential Low Density Park\n",
"# RM Residential Medium Density\n",
"# LotFrontage: Linear feet of street connected to property\n",
"# LotArea: Lot size in square feet\n",
"# Street: Type of road access to property\n",
"# Grvl Gravel\n",
"# Pave Paved\n",
"# Alley: Type of alley access to property\n",
"# Grvl Gravel\n",
"# Pave Paved\n",
"# NA No alley access\n",
"# LotShape: General shape of property\n",
"# Reg Regular\n",
"# IR1 Slightly irregular\n",
"# IR2 Moderately Irregular\n",
"# IR3 Irregular\n",
"# LandContour: Flatness of the property\n",
"# Lvl Near Flat/Level\n",
"# Bnk Banked - Quick and significant rise from street grade to building\n",
"# HLS Hillside - Significant slope from side to side\n",
"# Low Depression\n",
"# Utilities: Type of utilities available\n",
"# AllPub All public Utilities (E,G,W,& S)\n",
"# NoSewr Electricity, Gas, and Water (Septic Tank)\n",
"# NoSeWa Electricity and Gas Only\n",
"# ELO Electricity only\n",
"# LotConfig: Lot configuration\n",
"# Inside Inside lot\n",
"# Corner Corner lot\n",
"# CulDSac Cul-de-sac\n",
"# FR2 Frontage on 2 sides of property\n",
"# FR3 Frontage on 3 sides of property\n",
"# LandSlope: Slope of property\n",
"# Gtl Gentle slope\n",
"# Mod Moderate Slope\n",
"# Sev Severe Slope\n",
"# Neighborhood: Physical locations within Ames city limits\n",
"# Blmngtn Bloomington Heights\n",
"# Blueste Bluestem\n",
"# BrDale Briardale\n",
"# BrkSide Brookside\n",
"# ClearCr Clear Creek\n",
"# CollgCr College Creek\n",
"# Crawfor Crawford\n",
"# Edwards Edwards\n",
"# Gilbert Gilbert\n",
"# IDOTRR Iowa DOT and Rail Road\n",
"# MeadowV Meadow Village\n",
"# Mitchel Mitchell\n",
"# Names North Ames\n",
"# NoRidge Northridge\n",
"# NPkVill Northpark Villa\n",
"# NridgHt Northridge Heights\n",
"# NWAmes Northwest Ames\n",
"# OldTown Old Town\n",
"# SWISU South & West of Iowa State University\n",
"# Sawyer Sawyer\n",
"# SawyerW Sawyer West\n",
"# Somerst Somerset\n",
"# StoneBr Stone Brook\n",
"# Timber Timberland\n",
"# Veenker Veenker\n",
"# Condition1: Proximity to main road or railroad\n",
"# Artery Adjacent to arterial street\n",
"# Feedr Adjacent to feeder street\n",
"# Norm Normal\n",
"# RRNn Within 200' of North-South Railroad\n",
"# RRAn Adjacent to North-South Railroad\n",
"# PosN Near positive off-site feature--park, greenbelt, etc.\n",
"# PosA Adjacent to postive off-site feature\n",
"# RRNe Within 200' of East-West Railroad\n",
"# RRAe Adjacent to East-West Railroad\n",
"# Condition2: Proximity to main road or railroad (if a second is present)\n",
"# Artery Adjacent to arterial street\n",
"# Feedr Adjacent to feeder street\n",
"# Norm Normal\n",
"# RRNn Within 200' of North-South Railroad\n",
"# RRAn Adjacent to North-South Railroad\n",
"# PosN Near positive off-site feature--park, greenbelt, etc.\n",
"# PosA Adjacent to postive off-site feature\n",
"# RRNe Within 200' of East-West Railroad\n",
"# RRAe Adjacent to East-West Railroad\n",
"# BldgType: Type of dwelling\n",
"# 1Fam Single-family Detached\n",
"# 2FmCon Two-family Conversion; originally built as one-family dwelling\n",
"# Duplx Duplex\n",
"# TwnhsE Townhouse End Unit\n",
"# TwnhsI Townhouse Inside Unit\n",
"# HouseStyle: Style of dwelling\n",
"# 1Story One story\n",
"# 1.5Fin One and one-half story: 2nd level finished\n",
"# 1.5Unf One and one-half story: 2nd level unfinished\n",
"# 2Story Two story\n",
"# 2.5Fin Two and one-half story: 2nd level finished\n",
"# 2.5Unf Two and one-half story: 2nd level unfinished\n",
"# SFoyer Split Foyer\n",
"# SLvl Split Level\n",
"# OverallQual: Overall material and finish quality\n",
"# 10 Very Excellent\n",
"# 9 Excellent\n",
"# 8 Very Good\n",
"# 7 Good\n",
"# 6 Above Average\n",
"# 5 Average\n",
"# 4 Below Average\n",
"# 3 Fair\n",
"# 2 Poor\n",
"# 1 Very Poor\n",
"# OverallCond: Overall condition rating\n",
"# 10 Very Excellent\n",
"# 9 Excellent\n",
"# 8 Very Good\n",
"# 7 Good\n",
"# 6 Above Average\n",
"# 5 Average\n",
"# 4 Below Average\n",
"# 3 Fair\n",
"# 2 Poor\n",
"# 1 Very Poor\n",
"# YearBuilt: Original construction date\n",
"# YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)\n",
"# RoofStyle: Type of roof\n",
"# Flat Flat\n",
"# Gable Gable\n",
"# Gambrel Gabrel (Barn)\n",
"# Hip Hip\n",
"# Mansard Mansard\n",
"# Shed Shed\n",
"# RoofMatl: Roof material\n",
"# ClyTile Clay or Tile\n",
"# CompShg Standard (Composite) Shingle\n",
"# Membran Membrane\n",
"# Metal Metal\n",
"# Roll Roll\n",
"# Tar&Grv Gravel & Tar\n",
"# WdShake Wood Shakes\n",
"# WdShngl Wood Shingles\n",
"# Exterior1st: Exterior covering on house\n",
"# AsbShng Asbestos Shingles\n",
"# AsphShn Asphalt Shingles\n",
"# BrkComm Brick Common\n",
"# BrkFace Brick Face\n",
"# CBlock Cinder Block\n",
"# CemntBd Cement Board\n",
"# HdBoard Hard Board\n",
"# ImStucc Imitation Stucco\n",
"# MetalSd Metal Siding\n",
"# Other Other\n",
"# Plywood Plywood\n",
"# PreCast PreCast\n",
"# Stone Stone\n",
"# Stucco Stucco\n",
"# VinylSd Vinyl Siding\n",
"# Wd Sdng Wood Siding\n",
"# WdShing Wood Shingles\n",
"# Exterior2nd: Exterior covering on house (if more than one material)\n",
"# AsbShng Asbestos Shingles\n",
"# AsphShn Asphalt Shingles\n",
"# BrkComm Brick Common\n",
"# BrkFace Brick Face\n",
"# CBlock Cinder Block\n",
"# CemntBd Cement Board\n",
"# HdBoard Hard Board\n",
"# ImStucc Imitation Stucco\n",
"# MetalSd Metal Siding\n",
"# Other Other\n",
"# Plywood Plywood\n",
"# PreCast PreCast\n",
"# Stone Stone\n",
"# Stucco Stucco\n",
"# VinylSd Vinyl Siding\n",
"# Wd Sdng Wood Siding\n",
"# WdShing Wood Shingles\n",
"# MasVnrType: Masonry veneer type\n",
"# BrkCmn Brick Common\n",
"# BrkFace Brick Face\n",
"# CBlock Cinder Block\n",
"# None None\n",
"# Stone Stone\n",
"# MasVnrArea: Masonry veneer area in square feet\n",
"# ExterQual: Exterior material quality\n",
"# Ex Excellent\n",
"# Gd Good\n",
"# TA Average/Typical\n",
"# Fa Fair\n",
"# Po Poor\n",
"# ExterCond: Present condition of the material on the exterior\n",
"# Ex Excellent\n",
"# Gd Good\n",
"# TA Average/Typical\n",
"# Fa Fair\n",
"# Po Poor\n",
"# Foundation: Type of foundation\n",
"# BrkTil Brick & Tile\n",
"# CBlock Cinder Block\n",
"# PConc Poured Contrete\n",
"# Slab Slab\n",
"# Stone Stone\n",
"# Wood Wood\n",
"# BsmtQual: Height of the basement\n",
"# Ex Excellent (100+ inches)\n",
"# Gd Good (90-99 inches)\n",
"# TA Typical (80-89 inches)\n",
"# Fa Fair (70-79 inches)\n",
"# Po Poor (<70 inches)\n",
"# NA No Basement\n",
"# BsmtCond: General condition of the basement\n",
"# Ex Excellent\n",
"# Gd Good\n",
"# TA Typical - slight dampness allowed\n",
"# Fa Fair - dampness or some cracking or settling\n",
"# Po Poor - Severe cracking, settling, or wetness\n",
"# NA No Basement\n",
"# BsmtExposure: Walkout or garden level basement walls\n",
"# Gd Good Exposure\n",
"# Av Average Exposure (split levels or foyers typically score average or above)\n",
"# Mn Mimimum Exposure\n",
"# No No Exposure\n",
"# NA No Basement\n",
"# BsmtFinType1: Quality of basement finished area\n",
"# GLQ Good Living Quarters\n",
"# ALQ Average Living Quarters\n",
"# BLQ Below Average Living Quarters\n",
"# Rec Average Rec Room\n",
"# LwQ Low Quality\n",
"# Unf Unfinshed\n",
"# NA No Basement\n",
"# BsmtFinSF1: Type 1 finished square feet\n",
"# BsmtFinType2: Quality of second finished area (if present)\n",
"# GLQ Good Living Quarters\n",
"# ALQ Average Living Quarters\n",
"# BLQ Below Average Living Quarters\n",
"# Rec Average Rec Room\n",
"# LwQ Low Quality\n",
"# Unf Unfinshed\n",
"# NA No Basement\n",
"# BsmtFinSF2: Type 2 finished square feet\n",
"# BsmtUnfSF: Unfinished square feet of basement area\n",
"# TotalBsmtSF: Total square feet of basement area\n",
"# Heating: Type of heating\n",
"# Floor Floor Furnace\n",
"# GasA Gas forced warm air furnace\n",
"# GasW Gas hot water or steam heat\n",
"# Grav Gravity furnace\n",
"# OthW Hot water or steam heat other than gas\n",
"# Wall Wall furnace\n",
"# HeatingQC: Heating quality and condition\n",
"# Ex Excellent\n",
"# Gd Good\n",
"# TA Average/Typical\n",
"# Fa Fair\n",
"# Po Poor\n",
"# CentralAir: Central air conditioning\n",
"# N No\n",
"# Y Yes\n",
"# Electrical: Electrical system\n",
"# SBrkr Standard Circuit Breakers & Romex\n",
"# FuseA Fuse Box over 60 AMP and all Romex wiring (Average)\n",
"# FuseF 60 AMP Fuse Box and mostly Romex wiring (Fair)\n",
"# FuseP 60 AMP Fuse Box and mostly knob & tube wiring (poor)\n",
"# Mix Mixed\n",
"# 1stFlrSF: First Floor square feet\n",
"# 2ndFlrSF: Second floor square feet\n",
"# LowQualFinSF: Low quality finished square feet (all floors)\n",
"# GrLivArea: Above grade (ground) living area square feet\n",
"# BsmtFullBath: Basement full bathrooms\n",
"# BsmtHalfBath: Basement half bathrooms\n",
"# FullBath: Full bathrooms above grade\n",
"# HalfBath: Half baths above grade\n",
"# Bedroom: Number of bedrooms above basement level\n",
"# Kitchen: Number of kitchens\n",
"# KitchenQual: Kitchen quality\n",
"# Ex Excellent\n",
"# Gd Good\n",
"# TA Typical/Average\n",
"# Fa Fair\n",
"# Po Poor\n",
"# TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)\n",
"# Functional: Home functionality rating\n",
"# Typ Typical Functionality\n",
"# Min1 Minor Deductions 1\n",
"# Min2 Minor Deductions 2\n",
"# Mod Moderate Deductions\n",
"# Maj1 Major Deductions 1\n",
"# Maj2 Major Deductions 2\n",
"# Sev Severely Damaged\n",
"# Sal Salvage only\n",
"# Fireplaces: Number of fireplaces\n",
"# FireplaceQu: Fireplace quality\n",
"# Ex Excellent - Exceptional Masonry Fireplace\n",
"# Gd Good - Masonry Fireplace in main level\n",
"# TA Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement\n",
"# Fa Fair - Prefabricated Fireplace in basement\n",
"# Po Poor - Ben Franklin Stove\n",
"# NA No Fireplace\n",
"# GarageType: Garage location\n",
"# 2Types More than one type of garage\n",
"# Attchd Attached to home\n",
"# Basment Basement Garage\n",
"# BuiltIn Built-In (Garage part of house - typically has room above garage)\n",
"# CarPort Car Port\n",
"# Detchd Detached from home\n",
"# NA No Garage\n",
"# GarageYrBlt: Year garage was built\n",
"# GarageFinish: Interior finish of the garage\n",
"# Fin Finished\n",
"# RFn Rough Finished\n",
"# Unf Unfinished\n",
"# NA No Garage\n",
"# GarageCars: Size of garage in car capacity\n",
"# GarageArea: Size of garage in square feet\n",
"# GarageQual: Garage quality\n",
"# Ex Excellent\n",
"# Gd Good\n",
"# TA Typical/Average\n",
"# Fa Fair\n",
"# Po Poor\n",
"# NA No Garage\n",
"# GarageCond: Garage condition\n",
"# Ex Excellent\n",
"# Gd Good\n",
"# TA Typical/Average\n",
"# Fa Fair\n",
"# Po Poor\n",
"# NA No Garage\n",
"# PavedDrive: Paved driveway\n",
"# Y Paved\n",
"# P Partial Pavement\n",
"# N Dirt/Gravel\n",
"# WoodDeckSF: Wood deck area in square feet\n",
"# OpenPorchSF: Open porch area in square feet\n",
"# EnclosedPorch: Enclosed porch area in square feet\n",
"# 3SsnPorch: Three season porch area in square feet\n",
"# ScreenPorch: Screen porch area in square feet\n",
"# PoolArea: Pool area in square feet\n",
"# PoolQC: Pool quality\n",
"# Ex Excellent\n",
"# Gd Good\n",
"# TA Average/Typical\n",
"# Fa Fair\n",
"# NA No Pool\n",
"# Fence: Fence quality\n",
"# GdPrv Good Privacy\n",
"# MnPrv Minimum Privacy\n",
"# GdWo Good Wood\n",
"# MnWw Minimum Wood/Wire\n",
"# NA No Fence\n",
"# MiscFeature: Miscellaneous feature not covered in other categories\n",
"# Elev Elevator\n",
"# Gar2 2nd Garage (if not described in garage section)\n",
"# Othr Other\n",
"# Shed Shed (over 100 SF)\n",
"# TenC Tennis Court\n",
"# NA None\n",
"# MiscVal: $Value of miscellaneous feature\n",
"# MoSold: Month Sold\n",
"# YrSold: Year Sold\n",
"# SaleType: Type of sale\n",
"# WD Warranty Deed - Conventional\n",
"# CWD Warranty Deed - Cash\n",
"# VWD Warranty Deed - VA Loan\n",
"# New Home just constructed and sold\n",
"# COD Court Officer Deed/Estate\n",
"# Con Contract 15% Down payment regular terms\n",
"# ConLw Contract Low Down payment and low interest\n",
"# ConLI Contract Low Interest\n",
"# ConLD Contract Low Down\n",
"# Oth Other\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:dsi]",
"language": "python",
"name": "conda-env-dsi-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment