Skip to content

Instantly share code, notes, and snippets.

@arghyadeep99
Created August 16, 2020 12:29
Show Gist options
  • Save arghyadeep99/feeb6bee49b028313c7f09eb65245f11 to your computer and use it in GitHub Desktop.
Save arghyadeep99/feeb6bee49b028313c7f09eb65245f11 to your computer and use it in GitHub Desktop.
SVMs.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "SVMs.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyPpO34kfHjFpfT/AcLKo0oI",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/arghyadeep99/feeb6bee49b028313c7f09eb65245f11/svms.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lfx7mMhA2cFe",
"colab_type": "text"
},
"source": [
"# **Support Vector Machines (Classification)** "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Tu6Arl9x2R6D",
"colab_type": "text"
},
"source": [
"# Importing Libraries and data"
]
},
{
"cell_type": "code",
"metadata": {
"id": "GeBTF7K7Uljz",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 74
},
"outputId": "c178017f-c15e-4115-cad1-7a53bf86162e"
},
"source": [
"#we first import the necessary libraries\n",
"import numpy as np \n",
"import pandas as pd \n",
"import matplotlib.pyplot as plt # This library is for plotting graphs \n",
"import seaborn as sns #This one is also for plotting raphs\n",
"\n",
"#Next we import the confusion matrix which tells us about our actual values and predicted values of testing examples \n",
"from sklearn.metrics import confusion_matrix \n",
"\n",
"from sklearn.model_selection import train_test_split \n",
"from sklearn import datasets \n",
"from sklearn.svm import LinearSVC #importing Linear Support Vector Classifier from sllearn library"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n",
" import pandas.util.testing as tm\n"
],
"name": "stderr"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "hVV3aGqAU0Tr",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "a30d1732-5188-4427-c10b-1cb88164eb71"
},
"source": [
"iris = datasets.load_iris() #taking dataset from the library itself. Libraries have some pre-loaded datasets\n",
"print(type(iris)) #Iris is a bunch datatype. It belongs to sklearn.utils class \n",
"iris"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'sklearn.utils.Bunch'>\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'DESCR': '.. _iris_dataset:\\n\\nIris plants dataset\\n--------------------\\n\\n**Data Set Characteristics:**\\n\\n :Number of Instances: 150 (50 in each of three classes)\\n :Number of Attributes: 4 numeric, predictive attributes and the class\\n :Attribute Information:\\n - sepal length in cm\\n - sepal width in cm\\n - petal length in cm\\n - petal width in cm\\n - class:\\n - Iris-Setosa\\n - Iris-Versicolour\\n - Iris-Virginica\\n \\n :Summary Statistics:\\n\\n ============== ==== ==== ======= ===== ====================\\n Min Max Mean SD Class Correlation\\n ============== ==== ==== ======= ===== ====================\\n sepal length: 4.3 7.9 5.84 0.83 0.7826\\n sepal width: 2.0 4.4 3.05 0.43 -0.4194\\n petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)\\n petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)\\n ============== ==== ==== ======= ===== ====================\\n\\n :Missing Attribute Values: None\\n :Class Distribution: 33.3% for each of 3 classes.\\n :Creator: R.A. Fisher\\n :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\\n :Date: July, 1988\\n\\nThe famous Iris database, first used by Sir R.A. Fisher. The dataset is taken\\nfrom Fisher\\'s paper. Note that it\\'s the same as in R, but not as in the UCI\\nMachine Learning Repository, which has two wrong data points.\\n\\nThis is perhaps the best known database to be found in the\\npattern recognition literature. Fisher\\'s paper is a classic in the field and\\nis referenced frequently to this day. (See Duda & Hart, for example.) The\\ndata set contains 3 classes of 50 instances each, where each class refers to a\\ntype of iris plant. One class is linearly separable from the other 2; the\\nlatter are NOT linearly separable from each other.\\n\\n.. topic:: References\\n\\n - Fisher, R.A. \"The use of multiple measurements in taxonomic problems\"\\n Annual Eugenics, 7, Part II, 179-188 (1936); also in \"Contributions to\\n Mathematical Statistics\" (John Wiley, NY, 1950).\\n - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.\\n (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.\\n - Dasarathy, B.V. (1980) \"Nosing Around the Neighborhood: A New System\\n Structure and Classification Rule for Recognition in Partially Exposed\\n Environments\". IEEE Transactions on Pattern Analysis and Machine\\n Intelligence, Vol. PAMI-2, No. 1, 67-71.\\n - Gates, G.W. (1972) \"The Reduced Nearest Neighbor Rule\". IEEE Transactions\\n on Information Theory, May 1972, 431-433.\\n - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al\"s AUTOCLASS II\\n conceptual clustering system finds 3 classes in the data.\\n - Many, many more ...',\n",
" 'data': array([[5.1, 3.5, 1.4, 0.2],\n",
" [4.9, 3. , 1.4, 0.2],\n",
" [4.7, 3.2, 1.3, 0.2],\n",
" [4.6, 3.1, 1.5, 0.2],\n",
" [5. , 3.6, 1.4, 0.2],\n",
" [5.4, 3.9, 1.7, 0.4],\n",
" [4.6, 3.4, 1.4, 0.3],\n",
" [5. , 3.4, 1.5, 0.2],\n",
" [4.4, 2.9, 1.4, 0.2],\n",
" [4.9, 3.1, 1.5, 0.1],\n",
" [5.4, 3.7, 1.5, 0.2],\n",
" [4.8, 3.4, 1.6, 0.2],\n",
" [4.8, 3. , 1.4, 0.1],\n",
" [4.3, 3. , 1.1, 0.1],\n",
" [5.8, 4. , 1.2, 0.2],\n",
" [5.7, 4.4, 1.5, 0.4],\n",
" [5.4, 3.9, 1.3, 0.4],\n",
" [5.1, 3.5, 1.4, 0.3],\n",
" [5.7, 3.8, 1.7, 0.3],\n",
" [5.1, 3.8, 1.5, 0.3],\n",
" [5.4, 3.4, 1.7, 0.2],\n",
" [5.1, 3.7, 1.5, 0.4],\n",
" [4.6, 3.6, 1. , 0.2],\n",
" [5.1, 3.3, 1.7, 0.5],\n",
" [4.8, 3.4, 1.9, 0.2],\n",
" [5. , 3. , 1.6, 0.2],\n",
" [5. , 3.4, 1.6, 0.4],\n",
" [5.2, 3.5, 1.5, 0.2],\n",
" [5.2, 3.4, 1.4, 0.2],\n",
" [4.7, 3.2, 1.6, 0.2],\n",
" [4.8, 3.1, 1.6, 0.2],\n",
" [5.4, 3.4, 1.5, 0.4],\n",
" [5.2, 4.1, 1.5, 0.1],\n",
" [5.5, 4.2, 1.4, 0.2],\n",
" [4.9, 3.1, 1.5, 0.2],\n",
" [5. , 3.2, 1.2, 0.2],\n",
" [5.5, 3.5, 1.3, 0.2],\n",
" [4.9, 3.6, 1.4, 0.1],\n",
" [4.4, 3. , 1.3, 0.2],\n",
" [5.1, 3.4, 1.5, 0.2],\n",
" [5. , 3.5, 1.3, 0.3],\n",
" [4.5, 2.3, 1.3, 0.3],\n",
" [4.4, 3.2, 1.3, 0.2],\n",
" [5. , 3.5, 1.6, 0.6],\n",
" [5.1, 3.8, 1.9, 0.4],\n",
" [4.8, 3. , 1.4, 0.3],\n",
" [5.1, 3.8, 1.6, 0.2],\n",
" [4.6, 3.2, 1.4, 0.2],\n",
" [5.3, 3.7, 1.5, 0.2],\n",
" [5. , 3.3, 1.4, 0.2],\n",
" [7. , 3.2, 4.7, 1.4],\n",
" [6.4, 3.2, 4.5, 1.5],\n",
" [6.9, 3.1, 4.9, 1.5],\n",
" [5.5, 2.3, 4. , 1.3],\n",
" [6.5, 2.8, 4.6, 1.5],\n",
" [5.7, 2.8, 4.5, 1.3],\n",
" [6.3, 3.3, 4.7, 1.6],\n",
" [4.9, 2.4, 3.3, 1. ],\n",
" [6.6, 2.9, 4.6, 1.3],\n",
" [5.2, 2.7, 3.9, 1.4],\n",
" [5. , 2. , 3.5, 1. ],\n",
" [5.9, 3. , 4.2, 1.5],\n",
" [6. , 2.2, 4. , 1. ],\n",
" [6.1, 2.9, 4.7, 1.4],\n",
" [5.6, 2.9, 3.6, 1.3],\n",
" [6.7, 3.1, 4.4, 1.4],\n",
" [5.6, 3. , 4.5, 1.5],\n",
" [5.8, 2.7, 4.1, 1. ],\n",
" [6.2, 2.2, 4.5, 1.5],\n",
" [5.6, 2.5, 3.9, 1.1],\n",
" [5.9, 3.2, 4.8, 1.8],\n",
" [6.1, 2.8, 4. , 1.3],\n",
" [6.3, 2.5, 4.9, 1.5],\n",
" [6.1, 2.8, 4.7, 1.2],\n",
" [6.4, 2.9, 4.3, 1.3],\n",
" [6.6, 3. , 4.4, 1.4],\n",
" [6.8, 2.8, 4.8, 1.4],\n",
" [6.7, 3. , 5. , 1.7],\n",
" [6. , 2.9, 4.5, 1.5],\n",
" [5.7, 2.6, 3.5, 1. ],\n",
" [5.5, 2.4, 3.8, 1.1],\n",
" [5.5, 2.4, 3.7, 1. ],\n",
" [5.8, 2.7, 3.9, 1.2],\n",
" [6. , 2.7, 5.1, 1.6],\n",
" [5.4, 3. , 4.5, 1.5],\n",
" [6. , 3.4, 4.5, 1.6],\n",
" [6.7, 3.1, 4.7, 1.5],\n",
" [6.3, 2.3, 4.4, 1.3],\n",
" [5.6, 3. , 4.1, 1.3],\n",
" [5.5, 2.5, 4. , 1.3],\n",
" [5.5, 2.6, 4.4, 1.2],\n",
" [6.1, 3. , 4.6, 1.4],\n",
" [5.8, 2.6, 4. , 1.2],\n",
" [5. , 2.3, 3.3, 1. ],\n",
" [5.6, 2.7, 4.2, 1.3],\n",
" [5.7, 3. , 4.2, 1.2],\n",
" [5.7, 2.9, 4.2, 1.3],\n",
" [6.2, 2.9, 4.3, 1.3],\n",
" [5.1, 2.5, 3. , 1.1],\n",
" [5.7, 2.8, 4.1, 1.3],\n",
" [6.3, 3.3, 6. , 2.5],\n",
" [5.8, 2.7, 5.1, 1.9],\n",
" [7.1, 3. , 5.9, 2.1],\n",
" [6.3, 2.9, 5.6, 1.8],\n",
" [6.5, 3. , 5.8, 2.2],\n",
" [7.6, 3. , 6.6, 2.1],\n",
" [4.9, 2.5, 4.5, 1.7],\n",
" [7.3, 2.9, 6.3, 1.8],\n",
" [6.7, 2.5, 5.8, 1.8],\n",
" [7.2, 3.6, 6.1, 2.5],\n",
" [6.5, 3.2, 5.1, 2. ],\n",
" [6.4, 2.7, 5.3, 1.9],\n",
" [6.8, 3. , 5.5, 2.1],\n",
" [5.7, 2.5, 5. , 2. ],\n",
" [5.8, 2.8, 5.1, 2.4],\n",
" [6.4, 3.2, 5.3, 2.3],\n",
" [6.5, 3. , 5.5, 1.8],\n",
" [7.7, 3.8, 6.7, 2.2],\n",
" [7.7, 2.6, 6.9, 2.3],\n",
" [6. , 2.2, 5. , 1.5],\n",
" [6.9, 3.2, 5.7, 2.3],\n",
" [5.6, 2.8, 4.9, 2. ],\n",
" [7.7, 2.8, 6.7, 2. ],\n",
" [6.3, 2.7, 4.9, 1.8],\n",
" [6.7, 3.3, 5.7, 2.1],\n",
" [7.2, 3.2, 6. , 1.8],\n",
" [6.2, 2.8, 4.8, 1.8],\n",
" [6.1, 3. , 4.9, 1.8],\n",
" [6.4, 2.8, 5.6, 2.1],\n",
" [7.2, 3. , 5.8, 1.6],\n",
" [7.4, 2.8, 6.1, 1.9],\n",
" [7.9, 3.8, 6.4, 2. ],\n",
" [6.4, 2.8, 5.6, 2.2],\n",
" [6.3, 2.8, 5.1, 1.5],\n",
" [6.1, 2.6, 5.6, 1.4],\n",
" [7.7, 3. , 6.1, 2.3],\n",
" [6.3, 3.4, 5.6, 2.4],\n",
" [6.4, 3.1, 5.5, 1.8],\n",
" [6. , 3. , 4.8, 1.8],\n",
" [6.9, 3.1, 5.4, 2.1],\n",
" [6.7, 3.1, 5.6, 2.4],\n",
" [6.9, 3.1, 5.1, 2.3],\n",
" [5.8, 2.7, 5.1, 1.9],\n",
" [6.8, 3.2, 5.9, 2.3],\n",
" [6.7, 3.3, 5.7, 2.5],\n",
" [6.7, 3. , 5.2, 2.3],\n",
" [6.3, 2.5, 5. , 1.9],\n",
" [6.5, 3. , 5.2, 2. ],\n",
" [6.2, 3.4, 5.4, 2.3],\n",
" [5.9, 3. , 5.1, 1.8]]),\n",
" 'feature_names': ['sepal length (cm)',\n",
" 'sepal width (cm)',\n",
" 'petal length (cm)',\n",
" 'petal width (cm)'],\n",
" 'filename': '/usr/local/lib/python3.6/dist-packages/sklearn/datasets/data/iris.csv',\n",
" 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]),\n",
" 'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10')}"
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DnAh4gDgYLGN",
"colab_type": "text"
},
"source": [
"For simplicity, We demostrate the binary classification using SVMs. So We take only the two classes of the flowers, i.e. Setosa and Versicolor. Also we take into consideration only their petal length and petal width. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tpcuvCan2s8F",
"colab_type": "text"
},
"source": [
"# Data Pre-processing and Plotting "
]
},
{
"cell_type": "code",
"metadata": {
"id": "OK548y_6U5d6",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 90
},
"outputId": "17e31ab8-e024-46d9-baf6-2783fa0c091d"
},
"source": [
"#We are extracting data from iris bunch into arrays for easier implementation purposes. \n",
"#In iris databunch, we take the data part and in data part we take the first 50 entries of the 3rd row (index i=2 means 3rd row as index starts from 0)\n",
"petal_length_setosa = iris['data'][:50, (2)]\n",
"petal_width_setosa = iris['data'][:50, (3)]\n",
"petal_length_versicolor = iris['data'][50:100, (2)]\n",
"petal_width_versicolor = iris['data'][50:100, (3)]\n",
"\n",
"petal_length_setosa"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4,\n",
" 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1. , 1.7, 1.9, 1.6,\n",
" 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3,\n",
" 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4])"
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "EmFVNjvgW0Eh",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 298
},
"outputId": "212d56c7-113b-46e1-9a45-72b3f32be3bb"
},
"source": [
"#We now plot the graph \n",
"\n",
"plt.xlabel('petal_length')\n",
"plt.ylabel('petal_width')\n",
"plt.scatter(petal_length_setosa,petal_width_setosa, color='red') #plotting setosa flower\n",
"plt.scatter(petal_length_versicolor,petal_width_versicolor, color='green') #plotting versicolor flower"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.collections.PathCollection at 0x7fe6e53b4e48>"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
},
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "xK2p_Av4aUN3",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "6c657e34-cf0a-4b0b-e142-664dac999b74"
},
"source": [
"X = iris['data'][:100, 2:] #We make an 2-D array of the petal length and petal width of the first 100 samples only. \n",
"y = iris['target'][:100] # We similarly make a 1-D array of the target class. \n",
"# class 0 represents iris setosa and class 1 represents versicolor\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)\n",
"print(y_test)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"[0 1 1 1 0 0 1 1 1 0 0 0 1 0 0 1 0 1 0 1]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i02IF5p421W8",
"colab_type": "text"
},
"source": [
"# Training and Testing"
]
},
{
"cell_type": "code",
"metadata": {
"id": "5ENn1PGNnO0_",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "e163de94-36bb-447f-d064-eafc07868d2e"
},
"source": [
"model = LinearSVC() # define the model as LinearSVC class\n",
"model.fit(X_train,y_train) #use model.fit() to train the model on the training dataset\n",
"y_pred = model.predict(X_test) # Using model.predict() predict the y values for test data and store it in y_pred\n",
"y_pred "
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1])"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YDXAdSJKMLqV",
"colab_type": "text"
},
"source": [
"Confusion Matrix is an n*n matrix that gives us the actual and predicted values of each of the n classes. For binary classification it goes as follows: \n",
"\n",
"![image.png]()\n",
"\n",
"TP:True Positives \n",
"\n",
"FN:False Negatives \n",
"\n",
"TN:True Negatives \n",
"\n",
"FT:False Positives"
]
},
{
"cell_type": "code",
"metadata": {
"id": "mcO6HtK4Ki8j",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 72
},
"outputId": "f7a3fd93-ef14-4448-fc94-da6389dc8b16"
},
"source": [
"print(confusion_matrix(y_test,y_pred)) \n",
"\n",
"print(model.coef_[0])\n",
"#In the output as we can see the classifier predicted that 12 data points belonged to class 0 and 8 to class 1\n",
"#Also as seen in confusion matrix, all the values are true. "
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"[[10 0]\n",
" [ 0 10]]\n",
"[0.61997277 0.77965952]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fvAQPSw3zyqX",
"colab_type": "text"
},
"source": [
"The decision boundary for SVMs is given by wx+b=0. Here x and w both represent vectors. Note that x is the position vector 'x' of the data point and not the x-coordinate. \n",
"So the Calculation of the x and y coordiante goes as follows - \n",
"\n",
"(w[0] i + w[1] j).(x i + y j) + b = 0\n",
"\n",
"w[0] * x + w[1] * y + b = 0 \n",
"\n",
"y = ( - w[0]*x - b )/w[1]\n",
"\n",
"Now b is the intercept_ attribute of LinearSVC"
]
},
{
"cell_type": "code",
"metadata": {
"id": "d2iJgyT2LS7d",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 298
},
"outputId": "c08cda3a-b393-470c-bac8-0ec781f5fdb5"
},
"source": [
"#We first plot the data points\n",
"plt.xlabel('petal_length')\n",
"plt.ylabel('petal_width')\n",
"plt.scatter(petal_length_setosa,petal_width_setosa, color='red') #plotting setosa flower\n",
"plt.scatter(petal_length_versicolor,petal_width_versicolor, color='green') #plotting versicolor flower\n",
"\n",
"#Now we plot the decision boundary\n",
"# w.x + b = 0 is the decision boundary \n",
"w = model.coef_[0] \n",
"\n",
"x1=1 #We tale arbitrary values of x to construct a line using the slope and intercept\n",
"x2=4\n",
"y1=-(x1*w[0]+model.intercept_[0])/w[1] #We calculate the values of y-coordinate from x\n",
"y2=-(x2*w[0]+model.intercept_[0])/w[1] \n",
"plt.plot([x1,x2],[y1,y2]) #We plot the line using two co-ordinates\n",
" \n",
"#Now we plot the support vectors of both the classes\n",
"#The +1,-1 here stands for the distance between the decision boundary and support vector in either direction\n",
"y1=-(x1*w[0]+model.intercept_[0]-1)/w[1] \n",
"y2=-(x2*w[0]+model.intercept_[0]-1)/w[1]\n",
"plt.plot([x1,x2],[y1,y2],'k--') #Here 'k--' stands for dotted line \n",
"\n",
"y1=-(x1*w[0]+model.intercept_[0]+1)/w[1]\n",
"y2=-(x2*w[0]+model.intercept_[0]+1)/w[1]\n",
"plt.plot([x1,x2],[y1,y2],'k--')\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7fe6e4eb9080>]"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
},
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qqxQhDmm3JhJ",
"colab_type": "text"
},
"source": [
"# Kernels\n",
"Using kernels we try to separate non-linear data using a linear classifier. Kernelization treats as if there are dimensional changes in data and then the kernelized data is sent through a linear classifier to do the usual classification "
]
},
{
"cell_type": "code",
"metadata": {
"id": "oGiTS1K33I1Q",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "40157bd3-bc0c-42e8-e4af-fdff2e35f97b"
},
"source": [
"#Here we take a non-linear dataset\n",
"from sklearn.datasets import make_moons #we import make_moons for using non-linear dataset\n",
"moon = make_moons(n_samples=100) #we take sample size = 100, 50 for each class and store it in a tuple\n",
"print(moon)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"(array([[ 1.99179001e+00, 3.72122838e-01],\n",
" [ 7.18349350e-01, 6.95682551e-01],\n",
" [ 7.15472413e-01, -4.58667853e-01],\n",
" [ 4.04783343e-01, 9.14412623e-01],\n",
" [-9.49055747e-01, 3.15108218e-01],\n",
" [ 1.22252093e+00, -4.74927912e-01],\n",
" [ 7.61445958e-01, 6.48228395e-01],\n",
" [ 4.81607432e-01, -3.55142763e-01],\n",
" [ 1.99794539e+00, 4.35929780e-01],\n",
" [ 9.81559157e-01, 1.91158629e-01],\n",
" [ 3.20515776e-02, 9.99486216e-01],\n",
" [ 8.01413622e-01, 5.98110530e-01],\n",
" [ 8.20998618e-03, 3.72122838e-01],\n",
" [ 9.91790014e-01, 1.27877162e-01],\n",
" [-1.00000000e+00, 1.22464680e-16],\n",
" [ 9.49055747e-01, 3.15108218e-01],\n",
" [-3.20515776e-02, 9.99486216e-01],\n",
" [ 6.23489802e-01, 7.81831482e-01],\n",
" [ 1.00000000e+00, 0.00000000e+00],\n",
" [ 6.72300890e-01, 7.40277997e-01],\n",
" [ 3.45365054e-01, 9.38468422e-01],\n",
" [-3.45365054e-01, 9.38468422e-01],\n",
" [ 1.46253829e+00, -3.86599306e-01],\n",
" [ 9.67948422e-01, -4.99486216e-01],\n",
" [ 5.09442530e-02, 1.84891782e-01],\n",
" [ 8.38088105e-01, 5.45534901e-01],\n",
" [ 5.18392568e-01, 8.55142763e-01],\n",
" [ 1.98586378e-01, -9.81105305e-02],\n",
" [-1.59599895e-01, 9.87181783e-01],\n",
" [-4.62538290e-01, 8.86599306e-01],\n",
" [ 1.40478334e+00, -4.14412623e-01],\n",
" [-9.67294863e-01, 2.53654584e-01],\n",
" [ 3.27051370e-02, 2.46345416e-01],\n",
" [-7.61445958e-01, 6.48228395e-01],\n",
" [ 4.62538290e-01, 8.86599306e-01],\n",
" [ 1.94905575e+00, 1.84891782e-01],\n",
" [ 1.96729486e+00, 2.46345416e-01],\n",
" [-9.91790014e-01, 1.27877162e-01],\n",
" [ 9.26916757e-01, 3.75267005e-01],\n",
" [ 1.87131870e+00, 9.28244800e-03],\n",
" [ 0.00000000e+00, 5.00000000e-01],\n",
" [ 1.15959990e+00, -4.87181783e-01],\n",
" [ 1.80141362e+00, -9.81105305e-02],\n",
" [ 1.62348980e+00, -2.81831482e-01],\n",
" [-8.71318704e-01, 4.90717552e-01],\n",
" [ 7.77479066e-01, -4.74927912e-01],\n",
" [ 1.28681296e-01, 9.28244800e-03],\n",
" [ 2.38554042e-01, -1.48228395e-01],\n",
" [ 1.92691676e+00, 1.24732995e-01],\n",
" [ 9.67294863e-01, 2.53654584e-01],\n",
" [-8.01413622e-01, 5.98110530e-01],\n",
" [-9.60230259e-02, 9.95379113e-01],\n",
" [ 1.51839257e+00, -3.55142763e-01],\n",
" [-5.18392568e-01, 8.55142763e-01],\n",
" [ 1.03205158e+00, -4.99486216e-01],\n",
" [ 2.22520934e-01, 9.74927912e-01],\n",
" [ 1.67230089e+00, -2.40277997e-01],\n",
" [-9.97945393e-01, 6.40702200e-02],\n",
" [ 1.61911895e-01, -4.55349012e-02],\n",
" [ 8.40400105e-01, -4.87181783e-01],\n",
" [-6.23489802e-01, 7.81831482e-01],\n",
" [ 4.27883340e-01, -3.20172255e-01],\n",
" [ 9.90311321e-02, 6.61162609e-02],\n",
" [ 1.71834935e+00, -1.95682551e-01],\n",
" [ 3.76510198e-01, -2.81831482e-01],\n",
" [-2.84527587e-01, 9.58667853e-01],\n",
" [ 9.60230259e-02, 9.95379113e-01],\n",
" [ 1.28452759e+00, -4.58667853e-01],\n",
" [-6.72300890e-01, 7.40277997e-01],\n",
" [ 5.72116660e-01, 8.20172255e-01],\n",
" [ 1.34536505e+00, -4.38468422e-01],\n",
" [ 1.83808810e+00, -4.55349012e-02],\n",
" [ 9.97945393e-01, 6.40702200e-02],\n",
" [ 9.00968868e-01, 4.33883739e-01],\n",
" [ 2.84527587e-01, 9.58667853e-01],\n",
" [-2.22520934e-01, 9.74927912e-01],\n",
" [-9.26916757e-01, 3.75267005e-01],\n",
" [ 1.57211666e+00, -3.20172255e-01],\n",
" [ 5.37461710e-01, -3.86599306e-01],\n",
" [ 7.30832427e-02, 1.24732995e-01],\n",
" [ 6.54634946e-01, -4.38468422e-01],\n",
" [ 1.90096887e+00, 6.61162609e-02],\n",
" [-8.38088105e-01, 5.45534901e-01],\n",
" [-7.18349350e-01, 6.95682551e-01],\n",
" [ 1.59599895e-01, 9.87181783e-01],\n",
" [-9.00968868e-01, 4.33883739e-01],\n",
" [ 2.81650650e-01, -1.95682551e-01],\n",
" [ 1.98155916e+00, 3.08841371e-01],\n",
" [ 5.95216657e-01, -4.14412623e-01],\n",
" [-5.72116660e-01, 8.20172255e-01],\n",
" [ 8.71318704e-01, 4.90717552e-01],\n",
" [ 1.84408430e-02, 3.08841371e-01],\n",
" [-4.04783343e-01, 9.14412623e-01],\n",
" [ 2.00000000e+00, 5.00000000e-01],\n",
" [-9.81559157e-01, 1.91158629e-01],\n",
" [ 1.76144596e+00, -1.48228395e-01],\n",
" [ 3.27699110e-01, -2.40277997e-01],\n",
" [ 9.03976974e-01, -4.95379113e-01],\n",
" [ 1.09602303e+00, -4.95379113e-01],\n",
" [ 2.05460725e-03, 4.35929780e-01]]), array([1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1,\n",
" 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0,\n",
" 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1,\n",
" 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1]))\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "hKiarNEn3f4_",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 283
},
"outputId": "45b31d9e-14d1-44d5-9d83-88d5ebb0309f"
},
"source": [
"x1, x2, y1, y2 = list(),list(),list(),list()\n",
"\n",
"#We now put the data in different list for plotting and implementation purposes\n",
"for i in range(100):\n",
" if moon[1][i]==0:\n",
" x1.append(moon[0][i][0])\n",
" y1.append(moon[0][i][1])\n",
" else:\n",
" x2.append(moon[0][i][0])\n",
" y2.append(moon[0][i][1])\n",
"\n",
"# We now plot the data \n",
"plt.scatter(x1,y1,color='red') \n",
"plt.scatter(x2,y2,color='green')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.collections.PathCollection at 0x7fe6f400d940>"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
},
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "hI3n5RaK3kKS",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 109
},
"outputId": "b5823d5e-9ffa-4737-e7df-b2a2777014d2"
},
"source": [
"#We use the SVC class in sklearn which has rbf as default kernel \n",
"from sklearn.svm import SVC\n",
"\n",
"X = moon[0] #We take the data from moon tuple and store it in arrays \n",
"y = moon[1]\n",
"\n",
"model = SVC() #We fit the data in our classifier\n",
"model.fit(X,y)\n",
"\n",
"#We now predict the first datapoint of our moon's tuple which should be present in class 0 as denoted by the dataset\n",
"print(moon[0][0])\n",
"print(moon[1][0])\n",
"\n",
"print('\\nModel Prediction ')\n",
"model.predict([moon[0][0]])"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"[1.99179001 0.37212284]\n",
"1\n",
"\n",
"Model Prediction \n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([1])"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bhWXP46c4cTY",
"colab_type": "text"
},
"source": [
"We now try to visualize the decision boundary predicted by our Support Vector Classifier which uses the Radial Basis function Kernel, popularly known as Gaussian Kernel. \n",
"\n",
"Gaussian kernel function is given by \n",
"\n",
"![kernel.png]()"
]
},
{
"cell_type": "code",
"metadata": {
"id": "pZkhYNCK3n4V",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 283
},
"outputId": "8fbac2bd-1d98-4be1-9532-06c8a68c704f"
},
"source": [
"#We make a grid of the x and y values to be plotted with each set of values of x and y in data point \n",
"#The purpose of meshgrid is to create a grid by using the coordinate of each dim\n",
"def make_meshgrid(x, y, h=.02): \n",
" x_min, x_max = x.min() - 1, x.max() + 1 \n",
" y_min, y_max = y.min() - 1, y.max() + 1\n",
" xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
" return xx, yy\n",
"\n",
"#Using the plot)contour function we plot the filled contours represtning the decision boundary \n",
"def plot_contours(ax, clf, xx, yy, **params):\n",
" Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) #ravel numpy function flattens the array to 1-D shape, c_ stands for concatenation of arrays\n",
" Z = Z.reshape(xx.shape) #We reshape the shape of Z to match our requirements \n",
" out = ax.contourf(xx, yy, Z, **params) #contourf draws filled contours \n",
" return out\n",
"\n",
"fig, ax = plt.subplots() \n",
"X0, X1 = X[:, 0], X[:, 1]\n",
"xx, yy = make_meshgrid(X0, X1)\n",
"plot_contours(ax, model, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8) \n",
"\n",
"#plotting graph of data points\n",
"plt.scatter(x1,y1,color='red')\n",
"plt.scatter(x2,y2,color='green')"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.collections.PathCollection at 0x7fe6e4de22b0>"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
},
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment