Dataaspirant-XGBoost-Boston-Housing-Price-Prediction.ipynb
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "Dataaspirant-XGBoost-Boston-Housing-Price-Prediction.ipynb", | |
"provenance": [], | |
"collapsed_sections": [], | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/saimadhu-polamuri/92f91ad5b7a3931154e236918931f4a7/dataaspirant-xgboost-boston-housing-price-prediction.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "kgxp85MHjyLf" | |
}, | |
"source": [ | |
"XGBoost for Classification Problem Overiew in Python 3.x\n", | |
"Pipeline: \n", | |
"1. Import the libraries/modules needed\n", | |
"2. Import data\n", | |
"3. Data cleaning and pre-processing\n", | |
"4. Train-test split\n", | |
"5. XGBoost training and prediction\n", | |
"6. Model Evaluation" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "rZqliePiepxq" | |
}, | |
"source": [ | |
"## Import the libraries/modules needed" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "Lpmc6xllkJ0-" | |
}, | |
"source": [ | |
"## import the libraries needed\n", | |
"import pandas as pd\n", | |
"import numpy as np" | |
], | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "-ffO3aAHew0L" | |
}, | |
"source": [ | |
"## Import data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "9Wmpq1HqkUoK" | |
}, | |
"source": [ | |
"## Import the dataset from scikit-learn library, and assign to a variable\n", | |
"from sklearn.datasets import load_boston\n", | |
"boston = load_boston()\n", | |
"## If you have another practice dataset import at this step" | |
], | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "1B3OelIie8I6" | |
}, | |
"source": [ | |
"## Data cleaning and pre-processing" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "jAWUxwmoksce" | |
}, | |
"source": [ | |
"## assign your target\n", | |
"boston['PRICE'] = boston.target " | |
], | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "lwK91yRxm3BG" | |
}, | |
"source": [ | |
"## assign the data to target and independent variables\n", | |
"X = boston.data\n", | |
"y = boston['PRICE']" | |
], | |
"execution_count": 5, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "6Y5kwvRUfBnP" | |
}, | |
"source": [ | |
"## Train-test split" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "r7XiMTcElQQd" | |
}, | |
"source": [ | |
"## split the data into train and test set. The test size here is 30% of the data\n", | |
"from sklearn.model_selection import train_test_split\n", | |
"X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 4)" | |
], | |
"execution_count": 6, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "1wGBnVmBfHGC" | |
}, | |
"source": [ | |
"## XGBoost training and prediction" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "q7xv1dsYlSk_", | |
"outputId": "00314454-805d-46f1-dc6c-5d5c4743b010", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"## import xgboost regressor algorithm and fit the model\n", | |
"from xgboost import XGBRegressor\n", | |
"xgb = XGBRegressor()\n", | |
"xgb.fit(X_train, y_train)" | |
], | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
"[15:20:29] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.\n" | |
], | |
"name": "stdout" | |
}, | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", | |
" colsample_bynode=1, colsample_bytree=1, gamma=0,\n", | |
" importance_type='gain', learning_rate=0.1, max_delta_step=0,\n", | |
" max_depth=3, min_child_weight=1, missing=None, n_estimators=100,\n", | |
" n_jobs=1, nthread=None, objective='reg:linear', random_state=0,\n", | |
" reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,\n", | |
" silent=None, subsample=1, verbosity=1)" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 7 | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "0UNg_XTpld-T" | |
}, | |
"source": [ | |
"## After training the model, make a prediction on the train data\n", | |
"y_pred = xgb.predict(X_train)" | |
], | |
"execution_count": 8, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "8_BcfmP0fOtU" | |
}, | |
"source": [ | |
"## Model Evaluation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "zNX8iUconT3O", | |
"outputId": "6551266a-9f73-46a5-e51c-bfb0ecab8197", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"## import metrics to evaluate the performance of the XGBoost model\n", | |
"from sklearn import metrics\n", | |
"print('R^2:',metrics.r2_score(y_train, y_pred))\n", | |
"print('Adjusted R^2:',1 - (1-metrics.r2_score(y_train, y_pred))*(len(y_train)-1)/(len(y_train)-X_train.shape[1]-1))\n", | |
"print('MAE:',metrics.mean_absolute_error(y_train, y_pred))\n", | |
"print('MSE:',metrics.mean_squared_error(y_train, y_pred))\n", | |
"print('RMSE:',np.sqrt(metrics.mean_squared_error(y_train, y_pred)))" | |
], | |
"execution_count": 9, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
"R^2: 0.9703652512761263\n", | |
"Adjusted R^2: 0.9692321579425663\n", | |
"MAE: 1.1372202838208043\n", | |
"MSE: 2.230632123289034\n", | |
"RMSE: 1.4935300878419002\n" | |
], | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "jox6hGl5opSF" | |
}, | |
"source": [ | |
"## Appply the model to the test set\n", | |
"y_test_pred = xgb.predict(X_test)" | |
], | |
"execution_count": 10, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "afxZ1a14opwq", | |
"outputId": "6af1e52d-d814-4596-c374-72700cd881a1", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"## Evaluate the performance of the model on the test set\n", | |
"acc_xgb = metrics.r2_score(y_test, y_test_pred)\n", | |
"print('R^2:', acc_xgb)\n", | |
"print('Adjusted R^2:',1 - (1-metrics.r2_score(y_test, y_test_pred))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))\n", | |
"print('MAE:',metrics.mean_absolute_error(y_test, y_test_pred))\n", | |
"print('MSE:',metrics.mean_squared_error(y_test, y_test_pred))\n", | |
"print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_test_pred)))" | |
], | |
"execution_count": 11, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
"R^2: 0.8494894736313225\n", | |
"Adjusted R^2: 0.8353109457849979\n", | |
"MAE: 2.4509708843733136\n", | |
"MSE: 15.716320042597493\n", | |
"RMSE: 3.9643814199188117\n" | |
], | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "YN0OhXatPMIx", | |
"outputId": "81b5a04d-bbe7-4655-ffba-ad09f889534e", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"y_test_pred.shape" | |
], | |
"execution_count": 12, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"(152,)" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 12 | |
} | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment