Skip to content

Instantly share code, notes, and snippets.

@iaroslav-ai
Created October 31, 2018 14:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iaroslav-ai/3bfceeb277960f9585f284212f838963 to your computer and use it in GitHub Desktop.
Save iaroslav-ai/3bfceeb277960f9585f284212f838963 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Analysis of SMOTE data preprocessing approach\n",
"\n",
"It is important to ensure that data agumentation approach does not produce outliers. This might happen sometimes with SMOTE, when convex combinations of feature vectors end up being in another class cluster. A simple way to test for such situation is to compare validation accuracy with and without data augmentation."
]
},
{
"cell_type": "code",
"execution_count": 201,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Positive class instances: 11\n",
"Negative class instances: 86\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAEICAYAAABLdt/UAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJztnX+YHGWV779nJh3SSXQmYXIXMkkIj4uwC0SyDAImz+OSCAElMAvrKOgqV9no3auuwAaCujGwKoHcCxjvg7uIii6oRAkhgiwosLrkGm4mRhJAUFAxmcBuQpjwI22mZ+bcP6orU93zvlVvddevt+p8nmeeyVRVV1d3vn361HnPD2JmCIIgCPmhLe0LEARBEKJFDLsgCELOEMMuCIKQM8SwC4Ig5Awx7IIgCDlDDLsgCELOEMOeMkT0GSK6LepjDc7FRPSnUZxLEIRsIYY9YojoEiLaQUQHiOglIvoqEXXqjmfmLzHzpSbnDnOsIMQBES0kov9LRPuJaB8RbSKiU2q6ZyK6qeH482vbb/dsO4yIriOiPxBRhYh+Q0TLiYhq+58iotdrPyNE9EfP35+pPdeIZ5v7MzPhtyOziGGPECK6AsD1AJYD6ABwGoCjAPyYiCYqjp+Q7BUKQvMQ0ZsB3AfgKwCmA+gGcA2Ag7VDngfQ16DrDwP4dcOpvg9gMYB3A3gTgL8BsAzAlwGAmY9n5qnMPBXAfwD4hPs3M3+pdo6fe7a5P7ujfs22IoY9ImqivwbAJ5n535i5ysy/B9AHYC6ADxLRKiL6ARHdQUSvAriktu0Oz3k+REQvENHLRPSPRPR7InpXbd+hY4lobs0T+nDN89lLRJ/1nOftRPRzIhokoheJ6P+ovlwEIQRvBQBm/i4zjzBzhZkfYubttf0vAdgBYAkAENF0AO8AsNE9AREtBnAWgAuZ+UlmHmbmzQA+COB/SngwGsSwR8c7AEwCsN67kZlfB/AjAGfWNp0P4AcAOgHc6T2WiP4cwC0APgDgSDhef3fA8y4EcCwcD2glEf1ZbfsIgMsAdAE4vbb/75p4XYLg8msAI0T0LSI6h4imKY75NoAP1f79fgD3YsyjB5zPwePMvNP7IGZ+HMAuODoVWkQMe3R0AdjLzMOKfS/W9gPOLeQGZh5l5krDcX8N4IfM/BgzDwFYCSComc81Nc/pCQBPAHgbADDzVmbeXPOIfg/gXwC8s7mXJggAM78Kx5FgAF8DsIeINhLRn3gOuwfAXxJRBxwD/+2G03TB+Tyo8H5Ogjitdjfq/jxv/EIKgBj26NgLoEsTNz+yth8Adir2u8z07mfmAwBeDnjelzz/PgBgKgAQ0VuJ6L7aAu6rAL4E8w+NIChh5l8x8yXMPAvACXA0e7NnfwXA/QA+B+BwZt7UcIq9cD4PKryfkyA2M3On5+ctoV5IzhHDHh0/h3PLeYF3IxFNBXAOgIdrm/w88BcBzPI8tgzg8Cav56sAngFwDDO/GcBnAFCT5xKEcTDzMwBuh2PgvXwbwBUA7mh8DICfADiViGZ7NxLRqQBmA3gk+istHmLYI4KZ98NZPP0KEZ1NRCUimgtgHZzY4b8anOYHAJYS0TtqC52r0LwxfhOAVwG8TkTHAfgfTZ5HEAAARHQcEV1BRLNqf88GcBGAzQ2H/hROLP0rjedg5p/AcXLuJqLjiaidiE6D8yXwVWb+TawvoiCIYY8QZr4Bjmf8v+AY1cfhhFYWM/NBv8fWHv8UgE8C+B4c7/11AP+F+sUnU/4BwMUAXoMTD72riXMIgpfXAJwK4HEiegOOQX8Sjnd+CHZ4mJn3ac5zIYBHAfwbHI3fAeDrcLRvyumKPPZTQr6e3EIyaCO71MI4g3DCKb9L+3oEQbAD8dgzBhEtJaLJRDQFjue/A8Dv070qQRBsQgx79jgfwO7azzEA3s9yWyUIQggkFCMIgpAzxGMXBEHIGak0oerq6uK5c+em8dRCAdi6deteZp6RxnOLtoU4MdV2KoZ97ty56O/vT+OphQJARC+k9dyibSFOTLUtoRhBEIScIYZdEAQhZ4hhFwRByBli2AVBEHKGGHZBEIScIYZdEAQhZ4hhFwRByBli2AVBEHJGKgVKQhNsXwc8fC2wfxfQMQtYvBKY15f2VQmCOYc07JkOSW3AhDJQPSC6jhAx7FkgyGhvXwf88FNAtTb7ev9O529APgRCdrnvcmDr7QCPACCgrQ0YHak/hkeB6hvOv0XXkSGhmLRxjfb+nQB4TNzb140d8/C1Y0bdpVpxtgtCFrnvcqD/6zWjDgA83qirEF1Hghj2tDEx2vt3qR+r2y4IabP19uYfK7puGTHsaWNitDtmqY/RbReEtGED71yH6LplxLCnjYnRXrwSKJXr95fKznZByCLU3tzjRNeRIIY9bUyM9rw+YOlaoGM2AHJ+L10rC0xCdjn5EvX2iVP0j6F2f11vXwfcdAKwqtP57V2HEuqQrJi0cUUclMo4r08MuWAP597o/HazYqjdMfbn3ugYZihGcvKov1GXzDBjxLBnATHaQh4598YxA++lY1Z9Lrt3uxdvGjC1jY/bu0kG8tkZh4RiBEFIFpPwY2MasG4xVjJolIhhFwQhWUzWjFRpwCokg0ZJy6EYIpoN4NsA/gRO4OxWZv5yq+cVhLQRbcdIUPjRxBNXefnSdgNANDH2YQBXMPMviOhNALYS0Y+Z+ekIzi0IaSLaTgtdHJ7anUXWjlnAMWc5hnz9MqA8DRh6HRgZco4r+OJqy6EYZn6RmX9R+/drAH4FoLvV8wpC2oi2QxJlOqIuDv9X/wysGnT2P/GdsRh8Zd+YUXcpcHuCSGPsRDQXwHwAjyv2LSOifiLq37NnT5RPKwixI9oOwKTnURiC4vCmMfiCLq4SsyKftJkTEU0F8FMAX2Tm9X7H9vT0cH9/fyTPKwiNENFWZu6J8Hyi7SBuOkEdOilPd4qSoo5763LhG+mYDVz2ZOvPlxFMtR1JHjsRlQDcDeDOIOELybNh2wDWPPgsdg9WMLOzjOVLjkXvfIkomCDaNkTnGVf2OT9AtHFvXQzeS4HbE7QciiEiAvB1AL9iZkU1gpAmG7YN4Or1OzAwWAEDGBis4Or1O7Bh20Dal5Z5RNshME07jCjuveUtn0SFJ9ZtG+J2HCx1Qhm6KVg7gihi7AsA/A2ARUT0y9rPuyM4rxABax58FpVqfXFHpTqCNQ8+m9IVWYVo2xTVYqeOCOLen376GFxVvRS7RrswyoRdo134h+rHsKj9m87i6mVP1hv1KOP/FtByKIaZHwNAEVyLEBKTEMvuQfUCk267MIZoOwSqnkdDb4yFYbwYePdB2t49WMEAFmLj0MK6x5FK134zD3KaCim9YizFDbG43rgbYgFQ9wGY2VnGgELsMzsNvStBMKWx6KixcRdgFPc20XYoXRdwUI20FLAU0xDL8iXHolyq741dLrVj+ZJjY79GoeA02W7aRNuhdF3AQTXisVuKylsBxodYXA9HsmKEVGiic6mJtkPpevHKpu4cbEYMe8YwiZtv2DYAgjqLV3Ur2ju/Wwy5kDphtX1e22O4csI6zKS92M1duG3iBwG859Cxxro2nXmQI8SwZwjTuPmaB59VGnUCJMQiZJKw2j6v7TGsLt2GyeS0CZhFe/E5/mdg+/HNGeSCzTyQGHuGMI2b6zJaGBDPXIiPFnLBw2r7ygnrDhl1lwkjfyxs75ewiMeeIUxTE3UZAd0NYRipOBUio8XRdGG1PZP2qk+0f5fo2gDx2DOELgWxcbtJRoBJxemGbQNYsPoRHL3ifixY/YhUowp6/HLBDQir7d3cpTz+QPkIo0rqomtbDHuGME3h6p3fjesuOBHdnWUQHE/9ugtOHBer9Lv1lVYDQii0ueA7jcIyYbV928QP4kBDywCUyrih+r7AkI5oW0IxmSJMCldQRkDQra+f4ZfbWmEcfk23DMIy4bV9jbNQ2pDJ8q3vTFGe36t30bYY9sxhmsIVFGcMqsyTVgNCIN5Rc+VpQFsJGK2qjzUo0TfRdr2uu7B8yYP1uv7RI3W6PpQS2fYycJNj/HcPBhv/vCOhGAsxudUMuvU1jXnGSsE67llFY+Osyj6ACCAfk9FiiX5YXbspkbPa9qLN09zrw1P/n/L8iWk7A7oWw24hJqljQXH41FsNFLDjnlWoFktHhpx5ozpaLNEPq2tVSiSqFVxZuis9bWdE1xKKsRDTMIrfrW/qrQYK2HHPKkJ739RyiX5oXa96WXn85MqLeLr9/fjP9i5cN/Re9L/5zOS0nRFdi2G3kLAdG3Xx+FRbDRSw455VlKepW+6WpwPDlQbjRUDPR1o2XKE7kfos6BIYR2APvjzlm8C75wPzFrV0bcZkRNcSirGQMGGUzKZ+FbDjnjVsXwccfG389vaJwDnXj+/YeMGtwLmtD5gKGx58vnMBRoPGnkY0scmYjOhaPHYLCRNGyWzqVwE77lnDw9eqs18mTh3zymMIK4QKD25fh5kv3IM2zxgUZmd9dxxJessZ0bUYdksxDaNkNq2xgB33rEE7mPqV2J/aODz48LUo42DdJqVRB5L1ljOiazHsUeLN+43wP7SV3hiZnqBUsI571qCLXVObk8KXBW1rvnzGee0FvQuUGHtUxJTm1GqMPPW0RsE+dIOpeQSZ0bbGC38FU3GgfCTCTGyKlIykO4phj4oWmyTpMG13qsOkr4wg1FE30k5D2tpWfPlUcBieP3klJl/1DLBqELjsyeTvCGOyA2GRUExUxJTmFEWMPKo2BUKBcMNk1x+tTnsE0tW2IpZdXrwSp2gMeWLazki6oxj2qNDFJVtcuGk2Rh5WyKYTboSCoTPqQAa0PQUzO9di+fkZ0nZMdiAsEoqJClVcMoKFm2Zi5LrY5ec27ND2qG415CMUkIxoe/kPnsBJ1zyk7b2eqLZjsgNhEY89KmJKc2qm9F8n5Ds3/+HQrNRGryWzaZFCupSnq7320hS9tg2zw6LSdnWEMVhx8u5V3nii2pZ0xxwSU/pe2NJ/v5moXryFSplOixTS45zrgQ1/V1+w1FYClt6sPj7kCL2otO2lsQCviNqWUEwOCSNY94MiaZGCknl9QO8t9S0Eem/ROzAxZ4WYatv7BZCotjOS7igeew5ZvuTYusUiP9wPSurdHoVkaRyiATiVparQQZg70ZizQky17f0CSFTb0t1RiItGIbcRYYTHd0sioM5rSbXbo5AcjeESbwzdYMydLzFnhXi1rQqvAON17T6ud3537QvtKuDeXcC/xxD/zki6o4RiLMN0+nrv/G5sWrEIv1v9HowqjDrgxNzFkBcQlVfppZXQSZNZIaa6Bsa0rWsNo9V1EmES6e4ohKXZfFzd4lF3yMUjKWDKCSbeY7MeZhNZIYnpWhMmOfDASpz5o65odJ2R7o6ReOxE9A0i+i8iejKK8xWVIK+l2XzcKBaPMtvXPUZyq2sT77EVD3Nen1PO7ynr99N2YrrWfFlNOvBSdLqua8eQUr8aRBeKuR3A2RGdq5CYGM5m83Gj6BdT0AKm25FHXeuafLlE7GEGaTsxXWu+rHbz4XV/t6xrxRdb0kQSimHmnxHR3CjOVVRMBmLobj3biHD0ivt9byPDLow2hl10C1V5LmDKra4bwyVBWTEtEqTtxHS9eCWG7/0kJoz88dCmAzwRNwyPf6226zqxGDsRLQOwDADmzJmT1NNag4nXokv1cjNemu2B0WjEzzhuBu7eOlAX8ySML3AC8l3kYYqV2k6wF36QtoN0ffKrP8YpG/4WfO/LoJBfPF5td5Q7cUb1o7ii7S7MpJexmw/HmuE+bBxdOO5xtus6McPOzLcCuBUAenp6giYVFg6T6jiTNMawY+9UC1fe1gMuDIwz7lLA5GC1tmMaDuMlSNt+uj6v7TGsLt2GyTTkPChEOmajtgcrVdyDBbgHC+qO0+o6gfcmLiTdMSOYLgR50xhVuekAtGETFarbZJ1lYkD6uueJhKokTbSt0/WVE9aNGXUXw3RMlbZVKHXdvikTFaTNIumOGaGZ6rh2TeFRu3b443jCxBK7O8vYtGKR8fFCxkmoSjKstr26nkl71Sc1SMc01bZS1zdF+N6k4PlHYtiJ6LsA/hJAFxHtAvB5Zv56FOcuEmEXOHUeu267Ct1tsoRdCqDrBKskw2jbq9/d3IVZKuNukI7pt+jvotV1VO9NyKZoURFJKIaZL2LmI5m5xMyzciX+lDCpxJtc0v/3tToT9QOnzSl82CX3uk6pSjJI29Mmlw79+4bhPhzgiXX7h9snGaVjqrRdaiNMm1wK1nVU701Ko/IkFJNB/CrxAP8+Gd5jTAyxNP8qMClUSeq03f/CPjz6zJ5xut44uhCoOrF2N5PltgkfxCoDb7clbUf13qTUO0YMewbR5f2u2vgUDg6PGi0I+cUXVa0BJHZeEBrjvW+7GPjNQ4nFf02GwDSycXQhNg6NpSTSELBKcZyu5UVTTkpUAzNSGpUnhj2D6IyyOyXGBF0ersw2LTCqeO8T30m05N10CIwfKm3Housocv1T6h0j6Y4ZJIriiDOOm6HcXtDWAAKQWrzXS1zaNtb19nXATScAqzqd33GnL6bUO0YMewZRLfqE5e6tA8oFVNO+HGHaqAopEmSovPtVIQEg0V7hy5cci1KbeTquCpW2jXStyNuvrP8Etmz8l5auJ5AUeseIYc8gbnMjk3x03RE6L7yjXFIcXe9JFbGTo5UEFRg17teRYK/w3vndmDopOALs6lr1GVBpW3cnULddccdSxkHM3HpD7rQthj2j9M7v1g7IcGknwgdOm6M17iov/I2h4XHHldqoLpdXwjWWEBRaCRqoAaTSK3zwQPBaUUe5hJvfd5L2M9CobV3osW675s7kSLycO22LYc8wQfHIEWbcvXUAnZODvXDAMdjVkfEflKmTJtQtMDXbRlVImKBUOt8QS3q9wnW69joog5Uqrl6/w1jbjz6zR3lc3Xaftr1507YY9gxjEmuvVEfADKM+M9psmwYPSheu6SiXJO6eJYKKaLT7Z6faK1yla1X30Ci07d2+5S2fHFfs5LbtdVsE50XXYthTxm+RsnGQgI79larRwAGTOKQuXNMG4I2hYYm7Z4mg+aJNzh+NCp22VQMydEHHqLS9YdsAPrTlKKyoXopdo10YZcKu0S6sqF6KjaMLMcKcK10Th+grEhU9PT3c39+f+PNmjcbcW8DxRnRlzgtWP6Kd8WhSYGTyfLrnaCNgVCGVLDYGI6KtzNyTxnMnru2gBlMptZ7NmrZ159eRRV0D5tqWAqUUMZma5EU1kCBMcy6TEmvdLa3KqPsdL0TIfZcDW28HeASgduDkS4Bzb3T2BRXRJDhQw0vWtB1Wp7brWgx7ioRdpIyir0tQibWuI56uRbDtk2Yyz32XA/2e3mM8Mva3a9wzSNa0XTRdi2FPEZOpSY003fvCEJ3ndOHJ3XXj8tztRWvlmzhbb9dvz7Bhz5q2i6ZrWTxNEdOpSUmim/z+hd4Tw02EF6KBNQ3fdNszQta0XTRdi8eeIra1zI37bkFQQO1qI06ttZyIG5u0nUddi2FPmayJKtHujxYPC06Mky+pj7F7twOZfg+zpO2i6VoMu0Xo+k1HSdhshqZJaWSYdbhxdFVWTE7eQ9F19Ihht4SkPI7E2gkkNEg5F5x7o3qhVPce3vNx598WvI+i63iQxVNLSKoxl1GXvChIaWRYrtC9VzxS3+Uxw4iu40EMuyUk5XEkls2Q0iDlXOH3XiU8QKNZRNfxIIbdEpLyOHRpYZEvMKXcxyQXLF4JtKkbtgGw4u6nELoGgKE3Er2DymeMPQOr0lHTasl1GBLJZohqWHCRaNT1MWcBfsNYLLj7ya2uH7gKqOwb217Zl+giav4Me0ZWpaPGprxgY1LqY5J5DhnwnWN57OXpwMHXgNFai+X9O4H+b0A7GcmSu5/c6vrha+sNO+DYpAeuEsPeFBlZlY6DLOUFCzHR6Ji4xUmNRsLZqT9PCgM0miWXutaFwSr7nP/jwg2zbnWKeEZWpQVhHPddDlwzHVjV4fy+7/Lxx5iMswuiY7Y1Rj23+IXBEljUzpbHHkUYpWOWehq7BfHGZkiiuEOIANMuja06IJaEYEywWtuLVwLr/1a9b/9Ox2mNcW0pWx570HBeEwqUbbFh2wCWf/+JuqlGy7//hPXTX3KJX5dGL804IB2zkeYM0zj43IYduOyuX9o7sWten7MuooRqziePOa8RZ8xky7BHEUaZ1+eIO4dib2TVxqdQbZiAUR1lrNr4VEpXJGjx69LoDT0ec5Y6XU5Hx2xndmmKM0yjZsO2Ady5+Q/KGahRFy7FyjnXK/4vFdNdY6g5yFYoJqowSkGyLQYr1VDbhRTRdWkExjS/fyfwxHeAt10M/OYh/6wYILd3omsefFa7LGzVZCNVSq/KvgGRrwFmy2MvUBhFKBhuN8YgqpWx2PsFXwM+vw9YtR+46nfAX3xorF0vtTtfADl0YPyMt3WTjeb11d9RdcxWHxfxGmAkhp2IziaiZ4noOSJa0fSJChRGiYJpk9VVh7rtQngi0/a5NwI9H603zH40xl63r3O8edfr5xHnbwv6wYRFZ7wJsH6yUVLOK7Fi3l+oExC1A/g1gDMB7AKwBcBFzPy07jGJT3LPKRu2DWD5D55AdWTs/7DUTljz12+zJ3sgBkwnuRucJ15t33SC/tbcS8dspyRdlcvuxthzRGPHR8Ax6h84bQ6+0HtiehcWFS1UxptqO4oY+9sBPMfMv6098fcAnA9AK34hGnJZtZct4tX24pX16b06/Ix/DuszcqvrRoN+wa2xRSOiMOzdALzK2wXg1MaDiGgZgGUAMGfOnAietoEI+sPYmDeby6q97BCvtusW1ww8dxWGsVnbtJ07XSfc6iSxxVNmvpWZe5i5Z8aMGdGe3H3TWsgNdW//rM2bFVKjJW27i2sXfC1cmiNgHJsVbWeAKGp0QhCFYR8A4F3qnVXblhwRvGm6hv/X/PApLFj9CI5ecT8WrH5EPgzFIj5tN7bOADyJAwZQu3FigWg7AyTc6iQKw74FwDFEdDQRTQTwfgAbIzivORG8aboUq1cOVMXTKS7xaFt3hwmYe+88anwLL9rOAAkP4Gg5xs7Mw0T0CQAPAmgH8A1mTrb0UZP4/xK6cPqK+9E5uQRmYH+lqo0vzuwsY8Cg+CGWAbhCJolN20EdSA1i7y+hC6etuB/tRBhhRrdP3Fy0nQFUC+Ux1uhEEmNn5h8x81uZ+S3M/MUozhkKRW5ohSfiS0PvBcPxTAYrVV/PRDU6S4dV1W9CS8SibZM7TJ/Yu6ttABippSv7edyi7YwwwfP/WJ4ea41OtipPm6WhsOklzMBV1UuxcXSh8nBVzwnV6KzOsrrQhwGJSQrNU55mvj2EtnW9VETbKeOG3rx1CMPxfoFmq1dMK3huYU9fcb/fCAIAas/Em2K1YdsArvmh/q7b9ZDcxwmCMSMHwx0fQts6j1u0nSIpDP/Jh8fegEk/Cb9j3PSwVw74N9OyrtuckD7b1zlVpCoqrwQ+PEjbQftF2ymQwvCfXBr2oJhi0LBcVXqYDolJCqHwS8E1yJDw07bJEGjRdgoknBED5NSwN8YUp00uobNcOhRfvO6CE31vMcMI2rpuc0K6+HlpQRkS29eh99+X4On292PzpL/HeW2PoZ0IgJmuAdF2KqTQtTY/MfYGWilJNk0PM/GQBKEOXU/u8nT/eKunJJ0AHIE9WDvlm1i7dH6oOG1HuWTUr1+0HSGqvuwxjMPzklvD3iwbtg3gjYPD2v3tRBhltqLfhpBBdPnM51zv/7gWF+DcXjE6oz651IZpUw6zppeMdSQ8/EcMuwdVu9BGRpnxu9XvSfCqhFzRrPfWwgKcia4r1VE8vWJR4LkEOyiUYffrcLdh2wCuWPfEoYIPHRJ3FFqmGe8tYGykTtui62JSGMPe6LV4c3UB4Or1OwLFL3FHITV8StJ12u5/YR/u3jogui4ghTHsug53bq5uUAqYXy8OIQEi6LdvLe5rr1bGhlt3zD70HqxZ/YhS2999fGegURddZ4AYtF0Yw65L8wpK/yqX2o3SyIQYSXhIQaZofO08MpYqV3vtOg37GXXRdUaISdu5y2PfsG1A2WNaF0Oc2VnW7msnEvFngYSHFGQKzWt/af1nArXt5rirtouuM0JM2s6VYfebFKOq2HNji7p9/7uv2EOhM0MKJdmZQfMa/xvvDdT2RafOFl1nnZi0natQjF8cfVMtlctv7mMzMyFtmyVpJQEZIblG89p38+FG2u45anrT+hRtJ0BM2s6VYQ+Ko/tVozZTqeqXaSMfgAhJeEhBaqgW0RSv/QBPxA3D9fF1nX6brcAWbSdETNq22rA3ehSdk0vKrnVx5ej63SGI+CMkhZLsxFEsolXWfwIrhj6KaZM/hmW4A0fwy9jNh+OG4b5D/dhF25YTk7atNewqj6LURii1E6ojY9kAceboNptpIzRBwiXZiaNYRCvjIJZPWIeFr6/FnW2nAgTRdh6JQdvWLp6qPIrqKGPKxAl1k2LiXP33y7QRhFBoFstm0ssAktd2h2bCkmjbDqz12HWew/5KFb/8/FmJXMMZx83AHZv/oNwuCKHwWSR1SUrbG7YN4I2h8Y3wSm0kFaqWYK3HngVv+dFn9oTaLghaFD27vYukQHLaXvPgs3UhH5epkyZIfN0SrPXYly85dlzHujhijn4pXxKHFCLDs4jG+3dhNx+O66tji6RJalun38GAcXpCdrDWsLvGNc4826CUL91ADolDCk1RW0QjAFu2DWDrg8+CUtC26Np+rDXsQGtTkkwISvlK6q5BKB5palt0bT9WG/aoCHtL6i0KAeK9axAEJYYdAZvRtujafgpv2Fu9JY3bsxKEcRh2BGxF26Jru7E2KyYqgm5JdY3DdF0kBSF2DDsCiraLS+E99mZuSQFIHw0hWsIMWzDsCCjaLi6FN+xhb0l1MySlj4bQNGGHLeg6Apan1f0ZRttuLF51vGjbPgofivG7JW3EjVnqJtO4HpLcygqhCDtsYfFKoE1R8j/0uvMlUcNU2945BjpE23ZRGMOuE2Tv/G5cd8GJRj04VDFLLzM7y77DPgRBSdhhC/P6gMPeNH63lbS7AAAODklEQVT7yBAOPDDW7tVU20G6BkTbttFSKIaI3gtgFYA/A/B2Zu6P4qKiRpUdsPz7T+CaHz6FwQNV43Quv4pS1xOSdqf5IFFtNzNsofKKcvOkAy9h/rUP1enaHcShw2Tur2jbLlr12J8EcAGAn0VwLbGh6wT5yoFqKM/DZDaqtBnIDclpW9EnJnDYgsbo7+bDI9M1UO/li7btoSXDzsy/YuZno7qYuDARnut5+GEyGzULzcmE1klU2/P6gKVrgY7ZAMj5vXStf4/uxStxgCfWbWpsGga0puub33cSNq1YJNq2kMSyYohoGYBlADBnzpyknhaAPjugEdUXQGPl3oUnd+PRZ/ZoK/KkHLt4RKLtsMMW5vXhho1P4dKhOzCTxk9W8hKFrgHRtk0EGnYi+gmAIxS7PsvM95o+ETPfCuBWAOjp6VGnlcSESpAqGj0PVWz+ri07MWWi/m2Tcmx7sF3bJ71nGc5cfzoqQ63r+u6tA3XG3fXyvboVbdtDoGFn5nclcSFx0ijISaU2VKqjdceoPA9lbH6EMVhx2pfqijekHNsObNd2o647yiW8dnAYI6Nj3y2l9vHDMXSLoHdu/gPcR4q27aYw6Y6987uxacUi3PS+kwBQ3T4CcOHJ4wUbVWxeEOLC1fXvVr8Hq847fvwHWnH/oNN146GibXtpybAT0V8R0S4ApwO4n4gejOay4kPlrTDUU49MF4VM4veCXdiq7epovXmujvI44xxmsVO0bSctLZ4y8z0A7onoWkLhN9nID5OULW95NUHp9IyjMXdYblftJo/a9tO1n87nrrgf3aJrq7CyV0zQZCM/dBkyHeUSFqx+ZJzoGWOinza5hFc048Hc7dI0SWgFnba7d96HU57/im+TMJ2224gwd8X9Wl13d5ZxxnEz6mLsjYiu7cLKGLtfBVwQqpzdUhvhjaHhQx+KRnG74t+20mxCvMQmhWZRafvMkZ/ihF/8Y606lceahHn6wgBqbQM41NtIp+tNKxbhC70nBt6Ziq7twUrD3koFnKp/xtRJE5RT2VXn7jaMT0o1ntAMKt1cOWEdyjhYv1HRJKxR2+1UnyQQ9Hwm2hZd24GVhr3VCjhvJsGmFYuMpq+3EWHDtgGtV9TstQiCF5VuZtJe9cGKJmFebY9qupB6YeBQU7zlS45Fqd3/y0B0bQdWGvYwrXZNMBHrCPOhGOOFJ3fDT/5SjSc0i0rbL6JLfbBfkzCEy+q6ev0O9L+wzzdTQHRtD1Ya9jCtdk0w9cLdGOOjz+zR6r/VaxGKjUrbu0++MnyTMKh1rXNIKtURfPfxnePSJV1E13ZhZVYMEG0FnKpUWpe/6xdjJACbViw61Ptdyq6FZhiv7UXA3Gnmo/M85wHGtwC47K5fKh0T3QAZV9dA86mYQrJYa9iBaEXW+GFyUx8bcW9vdftaScUUBECn65BNwmqoHCDdCLx2IqVxdzUv2rYHK0MxAGKf5uIXx/fb10oqpiAkMaVIp9+LTp3tu3Yl2rYHaz32uKe5mHSy8zZfIoL2FheQNDHBDCNdb18XOizjxU/bPUdNV+pa5+UDou0sYq1hj3Kaiy6k4xfHd/c13p7qkDQxwYRAXW9f5xQnucOv3WIlQGncw2pbp2u/9hqi7exhbSgmqmkurd76mgwCljQxwZRAXT987ZhRd1EUKwGtaVvXLK8xq0a0nU2sNexR5bK3GjcMypKRNDEhDIG6VhQl6ba3om2/1r5RpRkL8WFtKAZAbWCGI9zOcgmrzjs+tMhaDenoUiPdHhyCEIbe+d3of2Efvvv4Towwo52oflZAx6xaz5gGFMVKrWhbdG03Vnrs7i2mt9PiweFRn0foaTWkE3UVrFBsNmwbwN1bBw6lHY4w4+6tA2Phk8UrjYuVWtG26NpurDTsUaZdtSrgqKtghWITqO15fcDStUDHbADk/F66Vrlw2oq2Rdd2Y2UoppVbTO+wAbcgo7NcwqRSm9GgDL8sA0FoFSNtzxtfrKTSdXdnuW5AtZ+2Rdf5wkrDrov/Bd1iNqZwube7g5UqSu2EjnJJO6Fd9XipvBOiphlt63Q9MFjBXVt2YspE/4+56Dp/WBmKCbrFdHu1HL3i/kMtSQH/1MTqCGOwUvVNC5PKOyFu/LQtuhZMsdJj96uc8/M+whQvqapYdY+Xgb9CVOi0DSAVXW/YNiBeu4VYadgBfXdHP+/Dr2ujikbB6x5PgHwAhMhQaXvB6kcS1zUACclYipWhGD/8Fp9M+667uN0a3dvfA0PDyuMYkNtWIVbi1nWpTd2pXUIydpI7w+6Xu+tN4QKCZ0KecdyMupJsb958I9IISYiT2HXt8xDRtn3kzrAHLaz2zu/G8iXHoruzrB0sAADTJpfw6DN7AvvAuEgjJCFOTHS9acUi3Py+k/Dmsj7CqtN1dYS1XwiibfvInWEPKqzwNkbSQQA+v/R4Y09FKvKEuDEpGFJVZHsJ0vUIs1Sb5gRrF0/98CusMOnGyLVzrNr4FAYr4z8k5VIbpk85TMaDCYkSVDAUpO0gXbv9lmT0nf3k0rD7YeKFu7FKXahyUqldGiEJmSNI20G6Jop2lrCQHrkLxQQRFC/03noOam5pddsFIU38tC26LhaFM+yqRSjXgWmMW0Y1zEMQkuDmP/8NNh32Kfz2sIvx2MRP4by2xwA4IRbRdbEoXChG1e/6olNn4wu9J447dvmSY8eNvZPFJCGTbF+HU3Z8HiAnHDOL9mJ16TYsPKoLfR+5ou5Q0XX+KZzHHtjv2oO0LhWsQTEybzINYcELt4zTtug6/7TksRPRGgBLAQwBeB7Af2fmwSgurCkMprcbTYH3IItJxcQ6bWtG5s3EXpx27zuB9i/VHS+6zjeteuw/BnACM88D8GsAV7d+SU3iTm/fvxMAj01v376u7rBWR+EJhcEubStG4wFOpssR2KP8LAj5pSXDzswPMbPbQGUzALW6ksBwerssHAkmWKftxStRwWH6cyg+C0J+iTLG/hEAD+h2EtEyIuonov49e/ZE+LQ1DKe3yyxHoQmyr+15fXjyL/4JA9wFbacM3XmE3BFo2InoJ0T0pOLnfM8xnwUwDOBO3XmY+VZm7mHmnhkzZkRz9V40t6KN22XhSHDJm7ZPOe9j2NL7M/wnaa5Bdx4hdwQunjLzu/z2E9ElAM4FsJjZp6tW3Cxe6cQRvbesmuntsnAkAJZrG+TE2m86oW4htXd+t7NQavhZEPJJq1kxZwO4EsA7mflANJfUJO6Kf0BWjCCYkF1t74RTUlf7nnEXUr3HyWeh8FArjggRPQfgMAAv1zZtZuaPBz2up6eH+/v7m35eQfCDiLYyc0+L58imtm86oWbcG+iYDVz2ZHzPK2QCU2235LEz85+28nhByCqZ1bZhkoBQbApXeSoIVmO4kCoUGzHsgmATi1c6C6FeZGFUaEAMuyDYxLw+YOlaJ6YOcn4vXSsLo0IdhevuKAjWM69PbcgNeiUJxUAMuyDkAbefjJu7rkqDFAqDGHZByANB/WTEky8UYtgFIQ9o0yB3iidfQGTxVBDygC7dkdqNup4K+UIMuyDkAV0aJI+oj1dVrwq5QQy7IOQBVRrk2y7WH0/t+n2C9UiMXRDygjcN0s2S0aHz5IVcIB67IOQRVZaMl47ZyV2LkDhi2AUhjwQ1BTvmrGSuQ0gFMexetq9z2qKu6nR+y/BfwVaCmoJt+1fRd44Rw+5iMgleEGxh8Uo4Azk0jAwBD1yV2OUIySKG3cVkErwg2MK8PqDnI/A17pV9iV2OkCxi2F1kgIGQN869Ebjg1rSvQkgBMewuMsBAyCPz+oDydPU+3XbBesSwu8gAAyGvnHM90Faq39ZWcrYLuUQMu4sMMBDyyrw+oPeWem333iLazjFSeepFN8BAEGxHtF0oxGMXBEHIGWLYBUEQcoYYdkEQhJwhhl0QBCFniGEXBEHIGWLYBUEQcoYYdkEQhJxBzJz8kxLtAfBCiId0Adgb0+UkTZ5eC5DN13MUM89I44lDajuL710ryOuJHyNtp2LYw0JE/czck/Z1REGeXguQv9eTJHl77+T1ZAcJxQiCIOQMMeyCIAg5wxbDnqem0nl6LUD+Xk+S5O29k9eTEayIsQuCIAjm2OKxC4IgCIaIYRcEQcgZVhh2IlpDRM8Q0XYiuoeIOtO+pmYgorOJ6Fkieo6IVqR9Pa1ARLOJ6FEiepqIniKiv0/7mmxEtJ098qBtK2LsRHQWgEeYeZiIrgcAZr4q5csKBRG1A/g1gDMB7AKwBcBFzPx0qhfWJER0JIAjmfkXRPQmAFsB9Nr6etJCtJ098qBtKzx2Zn6ImYdrf24GYOOE6bcDeI6Zf8vMQwC+B+D8lK+paZj5RWb+Re3frwH4FYDudK/KPkTb2SMP2rbCsDfwEQAPpH0RTdANYKfn712wTCw6iGgugPkAHk/3SqxHtJ0xbNV2ZmaeEtFPAByh2PVZZr63dsxnAQwDuDPJaxP0ENFUAHcD+DQzv5r29WQR0bad2KztzBh2Zn6X334iugTAuQAWsw0LA+MZADDb8/es2jZrIaISHOHfyczr076erCLatg/btW3L4unZAG4E8E5m3pP29TQDEU2As8C0GI7otwC4mJmfSvXCmoSICMC3AOxj5k+nfT22ItrOHnnQti2G/TkAhwF4ubZpMzN/PMVLagoiejeAmwG0A/gGM38x5UtqGiJaCOA/AOwAMFrb/Blm/lF6V2Ufou3skQdtW2HYBUEQBHNszIoRBEEQfBDDLgiCkDPEsAuCIOQMMeyCIAg5Qwy7IAhCzhDDLgiCkDPEsAuCIOSM/w9qWpnmcrI+iAAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from imblearn.over_sampling import SMOTE\n",
"\n",
"np.random.seed(1)\n",
"\n",
"# Example disballanced data to be fixed by SMOTE\n",
"# Data with positive class points surrounding two \n",
"# clusters of negative data points.\n",
"X = np.random.randn(200, 2)\n",
"y = np.sum(X ** 2, axis=-1) > 1\n",
"I = 2*(np.random.rand(len(X)) > 0.5)-1\n",
"X = (X.T+I.T).T\n",
"\n",
"# Make data disballanced\n",
"I = np.copy(y)\n",
"I[::10] = False\n",
"I = ~I\n",
"X = X[I]\n",
"y = y[I]\n",
"\n",
"print('Positive class instances: %s' % np.sum(y == True))\n",
"print('Negative class instances: %s' % np.sum(y == False))\n",
"\n",
"# resample\n",
"smote = SMOTE()\n",
"Xr, yr = smote.fit_resample(X, y)\n",
"\n",
"# visualize results\n",
"def plot_data(X, y, title):\n",
" plt.title(title)\n",
" plt.scatter(X[~y, 0], X[~y, 1])\n",
" plt.scatter(X[y, 0], X[y, 1])\n",
" \n",
"plt.subplot(1, 2, 1)\n",
"plot_data(X, y, 'Original')\n",
"plt.subplot(1, 2, 2)\n",
"plot_data(Xr, yr, 'SMOTE')\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment