bedohazizsolt/HW_07.ipynb Secret

## HW_07.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.) Linear SVC in case of linear separation\n",
    "\n",
    "- load the Wine dataset (can be found in sklearn API) (sklearn.datasets.load_wine)\n",
    "- scale the data and plot the flavanoids vs hue in a scatterplot colored with the target, where the target should be class_0 and class_2, so class_1 left out\n",
    "- train an SVC model with linear kernel with default parameter settings, but once with C=0.1 and then C=1000\n",
    "- visualize the model's decision boundary and the margins based on the coefficients learnt by the model\n",
    "- interpret the results, what is the role of the C hyperparameter?\n",
    "\n",
    "\n",
    "### 2.) Linear SVC but non-linear separation\n",
    "\n",
    "- create a dataset with the following: X, y = make_circles(n_samples=100, noise=0.075, random_state=0)\n",
    "- perform the same steps just as in the previous exercise and use the linear kernel for the SVC\n",
    "- since linear SVC cannot do non-linear separation, you will need to do some workaround, for example adding polynomial features (find the simpest combination for this dataset)\n",
    "- write down with your own words in few sentences how the support vector machine works \n",
    "\n",
    "      * Here are some hints:\n",
    "        -use PolynomialFeatures on the two input features, if you omit the bias, it will be with shape (100,9)\n",
    "        -scale the input features\n",
    "        -train SVC on the new, transformed data\n",
    "        -visualization can be made in the original feature space, so (x1, x2), for this:\n",
    "            -create a meshgrid on the original feature space that will be used for visualization\n",
    "            -transform this grid to the space where you have the new, transformed data (use the same PolynomialFeatures() function)\n",
    "            -scale this too, since you already scaled the features\n",
    "            -get the coefs from SVC (w) and perform z=w*x+b, store this for every grid points\n",
    "            -after reshaping arrays you should be able to get this z on the original (x1, x2) and you are able to do for example a contourplot with levels showing -1, 1 margins and the z=0. \n",
    "\n",
    "      * Other possible solution is made by the built in decision function of the SVM, but this also needs the transformed grid that is made for visualization.\n",
    "\n",
    "      *  In addition, there is another solution with the use of kernel poly in SVC, but then you need to set degree=3. A meshgrid for visualization is needed here, too. At every grid points you can get a prediction that will act as the variable z in the previous solution.\n",
    "\n",
    "### 3.) Load the dataset from 2 weeks ago and build/evaluate the SVC with default settings\n",
    "\n",
    "- you need to build a classifier that predicts the wine quality. It is based on a score from 0-10, but you need to split it into two classes with the treshold 6.5. Values below should be 0, above 1.\n",
    "\n",
    "- data can be downloaded from this site: https://archive.ics.uci.edu/ml/datasets/Wine+Quality (csv file also attached). Use the winequality-white.csv for this time.\n",
    "\n",
    "- train the SVM classifier (SVC in sklearn API) on every second sample (not first 50% of the data (!), use every second line) and generate (probabilistic) prediction for the samples that were not used during the training\n",
    "\n",
    "- build default SVC, but set it to predict probabilities\n",
    "\n",
    "- plot the ROC curve and calculate the confusion matrix for the predictions\n",
    "\n",
    "- how good is the performance of the model? What are your experiences?\n",
    "\n",
    "\n",
    "### 4.) Scale data and try different kernels\n",
    "\n",
    "- scale your data before applying the SVC model\n",
    "- plot the ROC curve and calculate the confusion matrix for the predictions\n",
    "- do your model perform better or worse after scaling? \n",
    "- try out other kernels (linear, poly) and evaluate the performance of the model the same way\n",
    "\n",
    "### 5.) Split the data randomly to 3 parts: 70% train, 15% validation, 15% test data and tune hyperparameters\n",
    "\n",
    "- prepare data as described in the title, then scale all input based on the training set (IMPORTANT: you can use ONLY the training data for scaling, otherwise the model will be fake (!) and we cannot give you points)\n",
    "- select your best performing SVC model from the previous exercise\n",
    "- check the behaviour of the SVC by modifying at least 3 of its hyperparameters (C, gamma, ...) and plot the AUC value vs the modified parameter (logscale may be better for visualization)\n",
    "- create plots (at least 2) that shows the train, val and test accuracy based on a given hyperparameter's different values. Is it a good idea to rely on validation data when tuning hyperparameter in this case?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 1.) Linear SVC in case of linear separation\n",
	"\n",
	"- load the Wine dataset (can be found in sklearn API) (sklearn.datasets.load_wine)\n",
	"- scale the data and plot the flavanoids vs hue in a scatterplot colored with the target, where the target should be class_0 and class_2, so class_1 left out\n",
	"- train an SVC model with linear kernel with default parameter settings, but once with C=0.1 and then C=1000\n",
	"- visualize the model's decision boundary and the margins based on the coefficients learnt by the model\n",
	"- interpret the results, what is the role of the C hyperparameter?\n",
	"\n",
	"\n",
	"### 2.) Linear SVC but non-linear separation\n",
	"\n",
	"- create a dataset with the following: X, y = make_circles(n_samples=100, noise=0.075, random_state=0)\n",
	"- perform the same steps just as in the previous exercise and use the linear kernel for the SVC\n",
	"- since linear SVC cannot do non-linear separation, you will need to do some workaround, for example adding polynomial features (find the simpest combination for this dataset)\n",
	"- write down with your own words in few sentences how the support vector machine works \n",
	"\n",
	" * Here are some hints:\n",
	" -use PolynomialFeatures on the two input features, if you omit the bias, it will be with shape (100,9)\n",
	" -scale the input features\n",
	" -train SVC on the new, transformed data\n",
	" -visualization can be made in the original feature space, so (x1, x2), for this:\n",
	" -create a meshgrid on the original feature space that will be used for visualization\n",
	" -transform this grid to the space where you have the new, transformed data (use the same PolynomialFeatures() function)\n",
	" -scale this too, since you already scaled the features\n",
	" -get the coefs from SVC (w) and perform z=w*x+b, store this for every grid points\n",
	" -after reshaping arrays you should be able to get this z on the original (x1, x2) and you are able to do for example a contourplot with levels showing -1, 1 margins and the z=0. \n",
	"\n",
	" * Other possible solution is made by the built in decision function of the SVM, but this also needs the transformed grid that is made for visualization.\n",
	"\n",
	" * In addition, there is another solution with the use of kernel poly in SVC, but then you need to set degree=3. A meshgrid for visualization is needed here, too. At every grid points you can get a prediction that will act as the variable z in the previous solution.\n",
	"\n",
	"### 3.) Load the dataset from 2 weeks ago and build/evaluate the SVC with default settings\n",
	"\n",
	"- you need to build a classifier that predicts the wine quality. It is based on a score from 0-10, but you need to split it into two classes with the treshold 6.5. Values below should be 0, above 1.\n",
	"\n",
	"- data can be downloaded from this site: https://archive.ics.uci.edu/ml/datasets/Wine+Quality (csv file also attached). Use the winequality-white.csv for this time.\n",
	"\n",
	"- train the SVM classifier (SVC in sklearn API) on every second sample (not first 50% of the data (!), use every second line) and generate (probabilistic) prediction for the samples that were not used during the training\n",
	"\n",
	"- build default SVC, but set it to predict probabilities\n",
	"\n",
	"- plot the ROC curve and calculate the confusion matrix for the predictions\n",
	"\n",
	"- how good is the performance of the model? What are your experiences?\n",
	"\n",
	"\n",
	"### 4.) Scale data and try different kernels\n",
	"\n",
	"- scale your data before applying the SVC model\n",
	"- plot the ROC curve and calculate the confusion matrix for the predictions\n",
	"- do your model perform better or worse after scaling? \n",
	"- try out other kernels (linear, poly) and evaluate the performance of the model the same way\n",
	"\n",
	"### 5.) Split the data randomly to 3 parts: 70% train, 15% validation, 15% test data and tune hyperparameters\n",
	"\n",
	"- prepare data as described in the title, then scale all input based on the training set (IMPORTANT: you can use ONLY the training data for scaling, otherwise the model will be fake (!) and we cannot give you points)\n",
	"- select your best performing SVC model from the previous exercise\n",
	"- check the behaviour of the SVC by modifying at least 3 of its hyperparameters (C, gamma, ...) and plot the AUC value vs the modified parameter (logscale may be better for visualization)\n",
	"- create plots (at least 2) that shows the train, val and test accuracy based on a given hyperparameter's different values. Is it a good idea to rely on validation data when tuning hyperparameter in this case?"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.10"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}