Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kirthik-d/b3527423b83cd09cbc079bc45ce47f41 to your computer and use it in GitHub Desktop.
Save kirthik-d/b3527423b83cd09cbc079bc45ce47f41 to your computer and use it in GitHub Desktop.
Prediction using Supervised Machine Learning.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Prediction using Supervised Machine Learning.ipynb",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/kirthikmicrosoft/b3527423b83cd09cbc079bc45ce47f41/prediction-using-supervised-machine-learning.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Qc6iV5SZnvNB"
},
"source": [
"Name: Kirthik.D\n",
"<br>\n",
"Task: Prediction using Supervised Machine Learning.Supervised Learning is to prediction with a labelled data <br>\n",
"Algorithm: Linear Regression\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "81oKB31Ungtn"
},
"source": [
"#Import Libraries \n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 225
},
"id": "ATb_PffGoRyQ",
"outputId": "33a1be03-d8d1-447b-8b86-b7d2de7992ba"
},
"source": [
"#Read data from a link\n",
"url = \"http://bit.ly/w-data\"\n",
"s_data = pd.read_csv(url)\n",
"print(\"Data has been imported\")\n",
"\n",
"s_data.head()"
],
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": [
"Data has been imported\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Hours</th>\n",
" <th>Scores</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2.5</td>\n",
" <td>21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5.1</td>\n",
" <td>47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3.2</td>\n",
" <td>27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>8.5</td>\n",
" <td>75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3.5</td>\n",
" <td>30</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Hours Scores\n",
"0 2.5 21\n",
"1 5.1 47\n",
"2 3.2 27\n",
"3 8.5 75\n",
"4 3.5 30"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b40NkBFj0_b1"
},
"source": [
"##Plotting the data\n",
"Plotting the dataset to find if any relationship exists between the data\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
},
"id": "H-JAhfMUqadF",
"outputId": "02404a51-8709-45bd-85e6-198e90112e4f"
},
"source": [
"s_data.plot(x='Hours', y='Scores', style='o', grid=True) \n",
"plt.title('Hours vs Percentage')\n",
"plt.xlabel('Hours Studied')\n",
"plt.ylabel('Percentage Score')\n",
"plt.show()"
],
"execution_count": 18,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rgTz8t5HvrWH"
},
"source": [
"**From the graph above, we can clearly see that there is a positive linear relation between the number of hours studied and percentage of score.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pCSSt9Cj9Ebf"
},
"source": [
"#Prepearing the Data\n",
"The next step is to divide the data into \"attributes\" (inputs) and \"labels\" (outputs)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "eMuRsvoU87d6"
},
"source": [
"X = s_data.iloc[:, :-1].values\n",
"y = s_data.iloc[:, 1].values\n"
],
"execution_count": 9,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "U6xdceinv3SI"
},
"source": [
"Split data to training and testing set"
]
},
{
"cell_type": "code",
"metadata": {
"id": "JWMM3cHL_2lw"
},
"source": [
"from sklearn.model_selection import train_test_split \n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, \n",
" test_size=0.2, random_state=0) "
],
"execution_count": 10,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZiPiwTwrwE_U"
},
"source": [
"### **Training the Algorithm**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "qoRPJJRFBCTr",
"outputId": "c79e83dc-e79e-410c-ff5f-58eca797db79"
},
"source": [
"from sklearn.linear_model import LinearRegression\n",
"regressor = LinearRegression()\n",
"regressor.fit(X_train, y_train)\n",
"\n",
"print(\"Training Complete\")"
],
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"text": [
"Training Complete\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ph32Ya8iwW3W"
},
"source": [
"### **Plotting Regression Line**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 265
},
"id": "1rNFEWjlBUYh",
"outputId": "78e94b84-98a4-49e7-cc42-1eeb138c5658"
},
"source": [
"line = regressor.coef_*X+regressor.intercept_\n",
"\n",
"#Plotting for the test data\n",
"plt.scatter(X,y)\n",
"plt.plot(X, line);\n",
"plt.show()"
],
"execution_count": 12,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VU2B1xy9wljT"
},
"source": [
"### **Making Predictions**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cvCLn4QZwzkJ",
"outputId": "7bce7727-6703-4540-eaf4-d597adba0c29"
},
"source": [
"#Testing data in hours\n",
"print(X_test)\n",
"\n",
"#Making predictions for the scores \n",
"y_pred = regressor.predict(X_test)"
],
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"text": [
"[[1.5]\n",
" [3.2]\n",
" [7.4]\n",
" [2.5]\n",
" [5.9]]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aq49u0nRxFpY"
},
"source": [
"Comparing actual vs predicted data"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "qxzkaj6-CAKB",
"outputId": "8038694f-436d-45da-e7c4-7cf78d4a6392"
},
"source": [
"df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred}) \n",
"df "
],
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Actual</th>\n",
" <th>Predicted</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>16.884145</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>27</td>\n",
" <td>33.732261</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>69</td>\n",
" <td>75.357018</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>26.794801</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>62</td>\n",
" <td>60.491033</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Actual Predicted\n",
"0 20 16.884145\n",
"1 27 33.732261\n",
"2 69 75.357018\n",
"3 30 26.794801\n",
"4 62 60.491033"
]
},
"metadata": {
"tags": []
},
"execution_count": 14
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AkmMrBDrxMQz"
},
"source": [
"Testing again with a new data"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "mkWYu2-8seMo",
"outputId": "b69cb85f-dd91-488e-ddf5-e806aee3009e"
},
"source": [
"#Testing with new data\n",
"hours = [[9.25]]\n",
"my_pred = regressor.predict(hours)\n",
"print(\"No of Hours = {}\".format(hours[0][0]))\n",
"print(\"Predicted Score = {}\".format(my_pred[0]))"
],
"execution_count": 15,
"outputs": [
{
"output_type": "stream",
"text": [
"No of Hours = 9.25\n",
"Predicted Score = 93.69173248737539\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NSiCGAz4xXxd"
},
"source": [
"###**Evaluating the model**\n",
"Evaluating the performance of algorithm. "
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_Gz6ovPZt_B8",
"outputId": "269682b9-54f7-4e01-858d-2dc0bf4a5363"
},
"source": [
"from sklearn import metrics\n",
"print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))"
],
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"text": [
"Mean Absolute Error: 4.183859899002982\n"
],
"name": "stdout"
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment