Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Dataaspirant-XGBoost-Boston-Housing-Price-Prediction.ipynb
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Dataaspirant-XGBoost-Boston-Housing-Price-Prediction.ipynb",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/saimadhu-polamuri/92f91ad5b7a3931154e236918931f4a7/dataaspirant-xgboost-boston-housing-price-prediction.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kgxp85MHjyLf"
},
"source": [
"XGBoost for Classification Problem Overiew in Python 3.x\n",
"Pipeline: \n",
"1. Import the libraries/modules needed\n",
"2. Import data\n",
"3. Data cleaning and pre-processing\n",
"4. Train-test split\n",
"5. XGBoost training and prediction\n",
"6. Model Evaluation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rZqliePiepxq"
},
"source": [
"## Import the libraries/modules needed"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Lpmc6xllkJ0-"
},
"source": [
"## import the libraries needed\n",
"import pandas as pd\n",
"import numpy as np"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "-ffO3aAHew0L"
},
"source": [
"## Import data"
]
},
{
"cell_type": "code",
"metadata": {
"id": "9Wmpq1HqkUoK"
},
"source": [
"## Import the dataset from scikit-learn library, and assign to a variable\n",
"from sklearn.datasets import load_boston\n",
"boston = load_boston()\n",
"## If you have another practice dataset import at this step"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "1B3OelIie8I6"
},
"source": [
"## Data cleaning and pre-processing"
]
},
{
"cell_type": "code",
"metadata": {
"id": "jAWUxwmoksce"
},
"source": [
"## assign your target\n",
"boston['PRICE'] = boston.target "
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "lwK91yRxm3BG"
},
"source": [
"## assign the data to target and independent variables\n",
"X = boston.data\n",
"y = boston['PRICE']"
],
"execution_count": 5,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "6Y5kwvRUfBnP"
},
"source": [
"## Train-test split"
]
},
{
"cell_type": "code",
"metadata": {
"id": "r7XiMTcElQQd"
},
"source": [
"## split the data into train and test set. The test size here is 30% of the data\n",
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 4)"
],
"execution_count": 6,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "1wGBnVmBfHGC"
},
"source": [
"## XGBoost training and prediction"
]
},
{
"cell_type": "code",
"metadata": {
"id": "q7xv1dsYlSk_",
"outputId": "00314454-805d-46f1-dc6c-5d5c4743b010",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"## import xgboost regressor algorithm and fit the model\n",
"from xgboost import XGBRegressor\n",
"xgb = XGBRegressor()\n",
"xgb.fit(X_train, y_train)"
],
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": [
"[15:20:29] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
" colsample_bynode=1, colsample_bytree=1, gamma=0,\n",
" importance_type='gain', learning_rate=0.1, max_delta_step=0,\n",
" max_depth=3, min_child_weight=1, missing=None, n_estimators=100,\n",
" n_jobs=1, nthread=None, objective='reg:linear', random_state=0,\n",
" reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,\n",
" silent=None, subsample=1, verbosity=1)"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "0UNg_XTpld-T"
},
"source": [
"## After training the model, make a prediction on the train data\n",
"y_pred = xgb.predict(X_train)"
],
"execution_count": 8,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "8_BcfmP0fOtU"
},
"source": [
"## Model Evaluation"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zNX8iUconT3O",
"outputId": "6551266a-9f73-46a5-e51c-bfb0ecab8197",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"## import metrics to evaluate the performance of the XGBoost model\n",
"from sklearn import metrics\n",
"print('R^2:',metrics.r2_score(y_train, y_pred))\n",
"print('Adjusted R^2:',1 - (1-metrics.r2_score(y_train, y_pred))*(len(y_train)-1)/(len(y_train)-X_train.shape[1]-1))\n",
"print('MAE:',metrics.mean_absolute_error(y_train, y_pred))\n",
"print('MSE:',metrics.mean_squared_error(y_train, y_pred))\n",
"print('RMSE:',np.sqrt(metrics.mean_squared_error(y_train, y_pred)))"
],
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"text": [
"R^2: 0.9703652512761263\n",
"Adjusted R^2: 0.9692321579425663\n",
"MAE: 1.1372202838208043\n",
"MSE: 2.230632123289034\n",
"RMSE: 1.4935300878419002\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "jox6hGl5opSF"
},
"source": [
"## Appply the model to the test set\n",
"y_test_pred = xgb.predict(X_test)"
],
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "afxZ1a14opwq",
"outputId": "6af1e52d-d814-4596-c374-72700cd881a1",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"## Evaluate the performance of the model on the test set\n",
"acc_xgb = metrics.r2_score(y_test, y_test_pred)\n",
"print('R^2:', acc_xgb)\n",
"print('Adjusted R^2:',1 - (1-metrics.r2_score(y_test, y_test_pred))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))\n",
"print('MAE:',metrics.mean_absolute_error(y_test, y_test_pred))\n",
"print('MSE:',metrics.mean_squared_error(y_test, y_test_pred))\n",
"print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_test_pred)))"
],
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"text": [
"R^2: 0.8494894736313225\n",
"Adjusted R^2: 0.8353109457849979\n",
"MAE: 2.4509708843733136\n",
"MSE: 15.716320042597493\n",
"RMSE: 3.9643814199188117\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "YN0OhXatPMIx",
"outputId": "81b5a04d-bbe7-4655-ffba-ad09f889534e",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"y_test_pred.shape"
],
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(152,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.