Skip to content

Instantly share code, notes, and snippets.

@GabrielCzar
Last active June 25, 2018 03:36
Show Gist options
  • Save GabrielCzar/65206fe5a6cc09b77c213da9ec7220c6 to your computer and use it in GitHub Desktop.
Save GabrielCzar/65206fe5a6cc09b77c213da9ec7220c6 to your computer and use it in GitHub Desktop.
Trabalho Final de Machine Learning - Salary Prediction any UK Job Ad Based
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "BsX4l60fEwAa"
},
"source": [
"# Job Salary Prediction\n",
"_Predict the salary of any UK job ad based on its contents_\n",
"\n",
"### Job Data\n",
"\n",
"- **Id**: Identificador para cada job.\n",
"\n",
"- **Title**: Texto livre com o titulo ou resumo da vaga.\n",
"\n",
"- **FullDescription**: Descrição da vaga sem qualquer informação salarial.\n",
"\n",
"- **LocationRaw**: Localização da vaga em texto livre.\n",
"\n",
"- **LocationNormalized**: Localização aproximada a partir da convesao do texto livre.\n",
"\n",
"- **ContractType**: full_time ou part_time.\n",
"\n",
"- **ContractTime**: permanent or contract.\n",
"\n",
"- **Company**: Nome da empresa.\n",
"\n",
"- **Category**: Qual das 30 categorias de trabalho padrão esse anúncio se encaixa, inferida de uma maneira muito confusa com base na origem da origem do anúncio. Sabemos que há muito barulho e erro nesse campo.\n",
"\n",
"- **SalaryRaw**: Descrição salarial em texto livre.\n",
"\n",
"- **SalaryNormalised**: Salario bruto anual. Valor que estamos tentando prever.\n",
"\n",
"- **SourceName**: Nome do site ou anunciante da vaga.\n",
"\n",
"### Location Tree\n",
"\n",
"Este é um conjunto de dados suplementares que descreve o relacionamento hierárquico entre os diferentes locais normalizados mostrados nos dados do trabalho. É provável que existam relações significativas entre os salários dos empregos em uma área geográfica semelhante, por exemplo, os salários médios em Londres e no Sudeste são mais altos do que no resto do Reino Unido.\n",
"\n",
"### Saida\n",
"\n",
"\n",
" Id,SalaryNormalized\n",
" 13656201,36205\n",
" 14663195,74570\n",
" 16530664,31910.50\n",
" ... \n",
" \n",
"### Sizes\n",
"\n",
"- Train:\n",
" - 421M \n",
" - 244768 entries\n",
"- Test: \n",
" - 206M\n",
" - 122463 entries\n",
" \n",
"### Problema\n",
"- Regressão Linear\n",
" - Determinar os salarios a partir de anúncios\n",
" \n",
"### Métricas\n",
"- Mean Squared Error – MSE\n",
"- Mean Absolute Error – MAE"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "gx6OgD6XEwAd"
},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1333,
"status": "ok",
"timestamp": 1529805951283,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "AA0eH24pEwAe",
"outputId": "7b086481-a9af-48e5-f9dc-bab2a63cb8c6"
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.svm import SVR\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.neighbors import KNeighborsRegressor\n",
"from sklearn.model_selection import KFold, cross_validate\n",
"from sklearn.feature_extraction.text import CountVectorizer\n",
"from sklearn.linear_model import LinearRegression, LogisticRegression\n",
"from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA\n",
"from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "cLl3SdOOEwAh"
},
"source": [
"## Dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1837,
"status": "ok",
"timestamp": 1529800925284,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "jJyRejGsGJ78",
"outputId": "f37fe0c7-41b3-4850-ae3b-f457e006c051"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 811M\r\n",
"drwxr-xr-x 4 unknown unknown 4,0K jun 24 16:18 .\r\n",
"drwxr-xr-x 7 unknown unknown 4,0K jun 19 01:34 ..\r\n",
"drwxr-xr-x 8 unknown unknown 4,0K jun 19 01:37 .git\r\n",
"-rw-r--r-- 1 unknown unknown 19 jun 17 18:09 .gitignore\r\n",
"drwxr-xr-x 2 unknown unknown 4,0K jun 23 23:38 .ipynb_checkpoints\r\n",
"-rw-r--r-- 1 unknown unknown 108K jun 24 16:18 Job Salary Prediction.ipynb\r\n",
"-rw-r--r-- 1 unknown unknown 111K jun 24 01:47 Job_Salary_Prediction__v1.ipynb\r\n",
"-rw-r--r-- 1 unknown unknown 108K jun 24 04:30 Job_Salary_Prediction__v2.ipynb\r\n",
"-rw-r--r-- 1 unknown unknown 161K jun 18 02:07 List_12__Clustering.ipynb\r\n",
"-rw-r--r-- 1 unknown unknown 376K jun 19 01:31 List_13__Clusterization_Hierarchical.ipynb\r\n",
"-rw-r--r-- 1 unknown unknown 216 jun 18 01:16 README.md\r\n",
"-rw-r--r-- 1 unknown unknown 206M fev 21 2013 Test_rev1.csv\r\n",
"-rw-r--r-- 1 unknown unknown 62M jun 23 23:55 Test_rev1.zip\r\n",
"-rw-r--r-- 1 unknown unknown 421M fev 21 2013 Train_rev1.csv\r\n",
"-rw-r--r-- 1 unknown unknown 123M jun 23 23:49 Train_rev1.zip\r\n"
]
}
],
"source": [
"!ls -lha"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df_job_data = pd.read_csv('Train_rev1.csv')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"df_test_rev1 = pd.read_csv('Test_rev1.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "PRC1vAQrEwAr"
},
"source": [
"## Informações"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "bQe_4GOHEwAr"
},
"source": [
"### Job Data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 216
},
"colab_type": "code",
"executionInfo": {
"elapsed": 3211,
"status": "ok",
"timestamp": 1529801029609,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "GozC7kFvEwAs",
"outputId": "825320fc-27e0-4ece-8a9e-773c6e668fbb"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Title</th>\n",
" <th>FullDescription</th>\n",
" <th>LocationRaw</th>\n",
" <th>LocationNormalized</th>\n",
" <th>ContractType</th>\n",
" <th>ContractTime</th>\n",
" <th>Company</th>\n",
" <th>Category</th>\n",
" <th>SalaryRaw</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>12612628</td>\n",
" <td>Engineering Systems Analyst</td>\n",
" <td>Engineering Systems Analyst Dorking Surrey Sal...</td>\n",
" <td>Dorking, Surrey, Surrey</td>\n",
" <td>Dorking</td>\n",
" <td>NaN</td>\n",
" <td>permanent</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>Engineering Jobs</td>\n",
" <td>20000 - 30000/annum 20-30K</td>\n",
" <td>25000</td>\n",
" <td>cv-library.co.uk</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>12612830</td>\n",
" <td>Stress Engineer Glasgow</td>\n",
" <td>Stress Engineer Glasgow Salary **** to **** We...</td>\n",
" <td>Glasgow, Scotland, Scotland</td>\n",
" <td>Glasgow</td>\n",
" <td>NaN</td>\n",
" <td>permanent</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>Engineering Jobs</td>\n",
" <td>25000 - 35000/annum 25-35K</td>\n",
" <td>30000</td>\n",
" <td>cv-library.co.uk</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Title \\\n",
"0 12612628 Engineering Systems Analyst \n",
"1 12612830 Stress Engineer Glasgow \n",
"\n",
" FullDescription \\\n",
"0 Engineering Systems Analyst Dorking Surrey Sal... \n",
"1 Stress Engineer Glasgow Salary **** to **** We... \n",
"\n",
" LocationRaw LocationNormalized ContractType ContractTime \\\n",
"0 Dorking, Surrey, Surrey Dorking NaN permanent \n",
"1 Glasgow, Scotland, Scotland Glasgow NaN permanent \n",
"\n",
" Company Category SalaryRaw \\\n",
"0 Gregory Martin International Engineering Jobs 20000 - 30000/annum 20-30K \n",
"1 Gregory Martin International Engineering Jobs 25000 - 35000/annum 25-35K \n",
"\n",
" SalaryNormalized SourceName \n",
"0 25000 cv-library.co.uk \n",
"1 30000 cv-library.co.uk "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_job_data.head(n=2)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 306
},
"colab_type": "code",
"executionInfo": {
"elapsed": 5385,
"status": "ok",
"timestamp": 1529801041548,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "Hvfq0CKlEwA0",
"outputId": "8184344d-3cb5-43bb-fb11-c8fbbc94eb5a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 244768 entries, 0 to 244767\n",
"Data columns (total 12 columns):\n",
"Id 244768 non-null int64\n",
"Title 244767 non-null object\n",
"FullDescription 244768 non-null object\n",
"LocationRaw 244768 non-null object\n",
"LocationNormalized 244768 non-null object\n",
"ContractType 65442 non-null object\n",
"ContractTime 180863 non-null object\n",
"Company 212338 non-null object\n",
"Category 244768 non-null object\n",
"SalaryRaw 244768 non-null object\n",
"SalaryNormalized 244768 non-null int64\n",
"SourceName 244767 non-null object\n",
"dtypes: int64(2), object(10)\n",
"memory usage: 22.4+ MB\n"
]
}
],
"source": [
"df_job_data.info()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f63273053c8>]],\n",
" dtype=object)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0sAAAF1CAYAAAA5q3GCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3X20ZWV9J/jvr6mgRkRAkxoDJIVp8kLLdII1Sl5MF8GloHawZ7QHYwdI20OPrYnpwRnLdiXYRic6PeaFads0aWnxJZZKkpERHMKY1GSlRxEwRiSolAS1gGCUFy0l2mX/5o+zyxyuz60q7q17blH1+ax11t3n2c9+9nN+d9e593v3PruquwMAAMCD/Z31ngAAAMDBSFgCAAAYEJYAAAAGhCUAAIABYQkAAGBAWAIAABgQlgA4YKrq9qp6+nrPYz1U1aur6h3T8vdW1a6qOuIA7+OwrS/AehCWAPg2VfWTVfX/VdX9VXVPVf2nqvpv1nlOt1fV3VX16Lm2f1ZV29dxWkPd/bnuPqq7v7necwFg5YQlAB6kqo5O8v4k/0eS45Icn+RfJ/n6Gu5zw3523ZDkZQdgf1VVfgYCsFd+UACw1A8kSXe/q7u/2d0PdPcfdvfHq+r7q+qPqupLVfXFqnpnVR0zGqSqnlJVH6qq+6rqrqr6t1V15Nz6rqqXVNWtSW6tqjdV1RuXjPF/VdUvzTX9myQv38s+f7yqrp/OiF1fVT8+t257Vb2uqv5Tkq8leeLU9trpLNquaX+Pm17Xl6cxNs2N8VtV9flp3Y1V9bRl5rFpen0bqurHprH3PP6mqm6f+v2dqtpaVZ+Zavqeqjpubpyfq6rPTuteNf52AbBWhCUAlvp0km9W1eVVdXZVHTu3rpL8WpLvSfLDSU5M8uplxvlmkn+Z5PFJfizJmUn+xZI+z03y1CSnJLk8yQv2nPGpqsdP27xrrv8NSbYnefnSnU0h46oklyR5XJJfT3JVVT1urtvPJbkwyWOSfHZqO3dqPz7J9yf5UJL/mNlZtVuSXDy3/fVJfmRa97tJ3ltVj1zm9SdJuvtD0yV5RyU5NsmH517TL041+AeZ1fTeJG+aXs8pSd48ze17ptd0wt72BcCBJSwB8CDd/eUkP5mkk/xOkr+uqiuramN37+jua7v7693915kFkn+wzDg3dveHu3t3d9+e5N8P+v5ad98znb36SJL7MwtIySzEbO/uu5ds8ytJfqGqvmtJ+7OT3Nrdb5/2+a4kn0zyD+f6vLW7b57W/+ep7T9292e6+/4kH0jyme7+f7p7d5L3JvnRudf0ju7+0rT9G5M8IskPLl/Nb3NJkq8m2XOW6J8neVV37+zur2cWPJ83XZb4vCTv7+4/mdb9cpL/8hD2BcAqCUsAfJvuvqW7L+juE5I8KbMzG79ZVd9dVduq6o6q+nKSd2R25ujbVNUPVNX7q+qvpr7/66Dv55c8vzzJP5mW/0mStw/m9onMPlO1dcmq78nfni3a47OZnTFabn9JMh/GHhg8P2rPk6q6qKpumS7zuy/JY7PM61+qqv55ki1Jfra794Se70vyB9Olivdldibrm0k2Tq/nW/Pt7q8m+dL+7AuAA0NYAmCvuvuTSd6aWWj6tczOOP3X3X10ZoGmltn0zZmd2Tl56vuvBn17yfN3JDmnqv5+Zpf5/Z/LjH1xkv8hDw5Cd2YWPuZ9b5I79rK//TZ9PukVSf5xkmO7+5jMzoQt9/qXbvurSc6ZzmDt8fkkZ3f3MXOPR3b3HUnuyuwyxz1jfGdml+IBsCDCEgAPUlU/NJ1BOWF6fmKSF2T2WZvHJNmV5L6qOj7J/7yXoR6T5MtJdlXVDyV58b723d07M/tc0NuT/F53P7BMvx1J3p3ZZ372uDrJD1TVz043VvjvM/ss1Pv3td/99Jgku5P8dZINVfUrSY7e10ZT/d6d5Lzu/vSS1b+d5HVV9X1T3++qqnOmdVckec50G/cjk7wmfm4DLJQ3XQCW+kpmN124rqq+mllI+kSSizK7hfhpmZ1RuSrJ7+9lnJcn+dlpvN/JLDDsj8uTnJrBJXhLvCbJt/7Ppe7+UpLnTPP8UpL/JclzuvuL+7nffbkms880fTqzy/v+JuPL+pY6M8l/leSKuTvi3Tyt+60kVyb5w6r6Sma1fur0em5O8pLMbiRxV2Y3f9h5gF4LAPuhuld8RQIAHHBV9VOZXY63ae6zPQCwcM4sAXDQqKrvyOw/nf0PghIA601YAuCgUFU/nOS+JE9I8pvrPB0AcBkeAADAiDNLAAAAA8ISAADAwIb1nsCB9vjHP743bdq0qjG++tWv5tGPfvS+O7Jqar04ar04ar04ar04ar04ar04ar1YB1O9b7zxxi9293ftq98hF5Y2bdqUG264YVVjbN++PVu2bDkwE2Kv1Hpx1Hpx1Hpx1Hpx1Hpx1Hpx1HqxDqZ6V9Vn96efy/AAAAAGhCUAAIABYQkAAGBAWAIAABgQlgAAAAaEJQAAgAFhCQAAYEBYAgAAGBCWAAAABoQlAACAAWEJAABgQFgCAAAYEJYAAAAGNqz3BDg8bdp6VZLkolN354JpmZnbX//s9Z4CAABxZgkAAGBIWAIAABgQlgAAAAaEJQAAgAFhCQAAYEBYAgAAGBCWAAAABoQlAACAAWEJAABgQFgCAAAYEJYAAAAGhCUAAIABYQkAAGBAWAIAABgQlgAAAAb2GZaq6rKq+kJVfWKu7biquraqbp2+Hju1V1VdUlU7qurjVXXa3DbnT/1vrarz59qfXFU3TdtcUlW1t30AAAAswv6cWXprkrOWtG1N8sHuPjnJB6fnSXJ2kpOnx4VJ3pzMgk+Si5M8NclTklw8F37ePPXds91Z+9gHAADAmttnWOruP0lyz5Lmc5JcPi1fnuS5c+1v65kPJzmmqp6Q5JlJru3ue7r73iTXJjlrWnd0d3+ouzvJ25aMNdoHAADAmqtZRtlHp6pNSd7f3U+ant/X3cfMrb+3u4+tqvcneX13/+nU/sEkr0iyJckju/u1U/svJ3kgyfap/9On9qcleUV3P2e5fSwzvwszOzuVjRs3Pnnbtm0PqQhL7dq1K0cdddSqxmDvbrrj/iTJxkcldz+wzpM5yJx6/GPXZFzH9eKo9eKo9eKo9eKo9eKo9WIdTPU+44wzbuzuzfvqt+EA77cGbb2C9oekuy9NcmmSbN68ubds2fJQh3iQ7du3Z7VjsHcXbL0qSXLRqbvzxpsO9GH48Hb7C7esybiO68VR68VR68VR68VR68VR68V6ONZ7pXfDu3u6hC7T1y9M7TuTnDjX74Qkd+6j/YRB+972AQAAsOZWGpauTLLnjnbnJ3nfXPt5013xTk9yf3ffleSaJM+oqmOnGzs8I8k107qvVNXp013wzlsy1mgfAAAAa26f1z9V1bsy+8zR46tqZ2Z3tXt9kvdU1YuSfC7J86fuVyd5VpIdSb6W5OeTpLvvqapfTXL91O813b3nphEvzuyOe49K8oHpkb3sAwAAYM3tMyx19wuWWXXmoG8necky41yW5LJB+w1JnjRo/9JoHwAAAIuw0svwAAAADmnCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADCwYb0ncCjbtPWq9Z4CAACwQs4sAQAADKwqLFXVv6yqm6vqE1X1rqp6ZFWdVFXXVdWtVfXuqjpy6vuI6fmOaf2muXFeObV/qqqeOdd+1tS2o6q2rmauAAAAD8WKw1JVHZ/kF5Ns7u4nJTkiyblJ3pDkN7r75CT3JnnRtMmLktzb3X83yW9M/VJVp0zb/b0kZyX5d1V1RFUdkeRNSc5OckqSF0x9AQAA1txqL8PbkORRVbUhyXcmuSvJTye5Ylp/eZLnTsvnTM8zrT+zqmpq39bdX+/uv0yyI8lTpseO7r6tu7+RZNvUFwAAYM2tOCx19x1J/vckn8ssJN2f5MYk93X37qnbziTHT8vHJ/n8tO3uqf/j5tuXbLNcOwAAwJpb8d3wqurYzM70nJTkviTvzeySuaV6zybLrFuufRTketCWqrowyYVJsnHjxmzfvn1vU9+nXbt2rXqMJLno1N377nSY2/godVrqQBx7IwfquGbf1Hpx1Hpx1Hpx1Hpx1HqxHo71Xs2tw5+e5C+7+6+TpKp+P8mPJzmmqjZMZ49OSHLn1H9nkhOT7Jwu23tsknvm2veY32a59gfp7kuTXJokmzdv7i1btqziZc1+WV3tGElygVuH79NFp+7OG29yB/t5t79wy5qMe6COa/ZNrRdHrRdHrRdHrRdHrRfr4Vjv1Xxm6XNJTq+q75w+e3Rmkr9I8sdJnjf1OT/J+6blK6fnmdb/UXf31H7udLe8k5KcnOQjSa5PcvJ0d70jM7sJxJWrmC8AAMB+W/Gf9Lv7uqq6IslHk+xO8meZnd25Ksm2qnrt1PaWaZO3JHl7Ve3I7IzSudM4N1fVezILWruTvKS7v5kkVfXSJNdkdqe9y7r75pXOFwAA4KFY1fVP3X1xkouXNN+W2Z3slvb9myTPX2ac1yV53aD96iRXr2aOAAAAK7HaW4cDAAAckoQlAACAAWEJAABgQFgCAAAYEJYAAAAGhCUAAIABYQkAAGBAWAIAABgQlgAAAAaEJQAAgAFhCQAAYEBYAgAAGBCWAAAABoQlAACAAWEJAABgQFgCAAAYEJYAAAAGhCUAAIABYQkAAGBAWAIAABgQlgAAAAaEJQAAgAFhCQAAYEBYAgAAGBCWAAAABoQlAACAAWEJAABgQFgCAAAYEJYAAAAGhCUAAIABYQkAAGBAWAIAABgQlgAAAAaEJQAAgAFhCQAAYEBYAgAAGBCWAAAABoQlAACAAWEJAABgQFgCAAAYEJYAAAAGhCUAAIABYQkAAGBAWAIAABgQlgAAAAaEJQAAgAFhCQAAYEBYAgAAGBCWAAAABoQlAACAgVWFpao6pqquqKpPVtUtVfVjVXVcVV1bVbdOX4+d+lZVXVJVO6rq41V12tw450/9b62q8+fan1xVN03bXFJVtZr5AgAA7K/Vnln6rST/d3f/UJK/n+SWJFuTfLC7T07ywel5kpyd5OTpcWGSNydJVR2X5OIkT03ylCQX7wlYU58L57Y7a5XzBQAA2C8rDktVdXSSn0ryliTp7m90931Jzkly+dTt8iTPnZbPSfK2nvlwkmOq6glJnpnk2u6+p7vvTXJtkrOmdUd394e6u5O8bW4sAACANVWzHLKCDat+JMmlSf4is7NKNyZ5WZI7uvuYuX73dvexVfX+JK/v7j+d2j+Y5BVJtiR5ZHe/dmr/5SQPJNk+9X/61P60JK/o7ucM5nJhZmegsnHjxidv27ZtRa9pj127duWoo45a1RhJctMd9696jEPdxkcldz+w3rM4uJx6/GPXZNwDdVyzb2q9OGq9OGq9OGq9OGq9WAdTvc8444wbu3vzvvptWMU+NiQ5LckvdPd1VfVb+dtL7kZGnzfqFbR/e2P3pZkFt2zevLm3bNmyl2ns2/bt27PaMZLkgq1XrXqMQ91Fp+7OG29azWF46Ln9hVvWZNwDdVyzb2q9OGq9OGq9OGq9OGq9WA/Heq/mM0s7k+zs7uum51dkFp7uni6hy/T1C3P9T5zb/oQkd+6j/YRBOwAAwJpbcVjq7r9K8vmq+sGp6czMLsm7MsmeO9qdn+R90/KVSc6b7op3epL7u/uuJNckeUZVHTvd2OEZSa6Z1n2lqk6f7oJ33txYAAAAa2q11z/9QpJ3VtWRSW5L8vOZBbD3VNWLknwuyfOnvlcneVaSHUm+NvVNd99TVb+a5Pqp32u6+55p+cVJ3prkUUk+MD0AAADW3KrCUnd/LMnog1FnDvp2kpcsM85lSS4btN+Q5EmrmSMAAMBKrPb/WQIAADgkCUsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMrDosVdURVfVnVfX+6flJVXVdVd1aVe+uqiOn9kdMz3dM6zfNjfHKqf1TVfXMufazprYdVbV1tXMFAADYXwfizNLLktwy9/wNSX6ju09Ocm+SF03tL0pyb3f/3SS/MfVLVZ2S5Nwkfy/JWUn+3RTAjkjypiRnJzklyQumvgAAAGtuVWGpqk5I8uwk/2F6Xkl+OskVU5fLkzx3Wj5nep5p/ZlT/3OSbOvur3f3XybZkeQp02NHd9/W3d9Ism3qCwAAsOaqu1e+cdUVSX4tyWOSvDzJBUk+PJ09SlWdmOQD3f2kqvpEkrO6e+e07jNJnprk1dM275ja35LkA9Muzurufza1/1ySp3b3SwfzuDDJhUmycePGJ2/btm3FrylJdu3alaOOOmpVYyTJTXfcv+oxDnUbH5Xc/cB6z+Lgcurxj12TcQ/Ucc2+qfXiqPXiqPXiqPXiqPViHUz1PuOMM27s7s376rdhpTuoquck+UJ331hVW/Y0D7r2PtYt1z466zVMdt19aZJLk2Tz5s29ZcuWUbf9tn379qx2jCS5YOtVqx7jUHfRqbvzxptWfBgekm5/4ZY1GfdAHdfsm1ovjlovjlovjlovjlov1sOx3qv5LfUnkvxMVT0rySOTHJ3kN5McU1Ubunt3khOS3Dn135nkxCQ7q2pDkscmuWeufY/5bZZrBwAAWFMrDkvd/cokr0yS6czSy7v7hVX13iTPy+wzRucned+0yZXT8w9N6/+ou7uqrkzyu1X160m+J8nJST6S2Rmnk6vqpCR3ZHYTiJ9d6Xzh4WLTGp2RvOjU3Q/7s523v/7Z6z0FAOAwshbXP70iybaqem2SP0vylqn9LUneXlU7MjujdG6SdPfNVfWeJH+RZHeSl3T3N5Okql6a5JokRyS5rLtvXoP5AgAAfJsDEpa6e3uS7dPybZndyW5pn79J8vxltn9dktcN2q9OcvWBmCMAAMBDcSD+nyUAAIBDjrAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAwIrDUlWdWFV/XFW3VNXNVfWyqf24qrq2qm6dvh47tVdVXVJVO6rq41V12txY50/9b62q8+fan1xVN03bXFJVtZoXCwAAsL9Wc2Zpd5KLuvuHk5ye5CVVdUqSrUk+2N0nJ/ng9DxJzk5y8vS4MMmbk1m4SnJxkqcmeUqSi/cErKnPhXPbnbWK+QIAAOy3FYel7r6ruz86LX8lyS1Jjk9yTpLLp26XJ3nutHxOkrf1zIeTHFNVT0jyzCTXdvc93X1vkmuTnDWtO7q7P9TdneRtc2MBAACsqQPymaWq2pTkR5Ncl2Rjd9+VzAJVku+euh2f5PNzm+2c2vbWvnPQDgAAsOY2rHaAqjoqye8l+aXu/vJePlY0WtEraB/N4cLMLtfLxo0bs3379n3Meu927dq16jGS5KJTd696jEPdxkep06IcCrU+EP8uF+FAvYewb2q9OGq9OGq9OGq9WA/Heq8qLFXVd2QWlN7Z3b8/Nd9dVU/o7rumS+m+MLXvTHLi3OYnJLlzat+ypH371H7CoP+36e5Lk1yaJJs3b+4tW7aMuu237du3Z7VjJMkFW69a9RiHuotO3Z033rTqzM5+OBRqffsLt6z3FPbLgXoPYd/UenHUenHUenHUerEejvVezd3wKslbktzS3b8+t+rKJHvuaHd+kvfNtZ833RXv9CT3T5fpXZPkGVV17HRjh2ckuWZa95WqOn3a13lzYwEAAKyp1fyZ+SeS/FySm6rqY1Pbv0ry+iTvqaoXJflckudP665O8qwkO5J8LcnPJ0l331NVv5rk+qnfa7r7nmn5xUnemuRRST4wPQAAANbcisNSd/9pxp8rSpIzB/07yUuWGeuyJJcN2m9I8qSVzhEAAGClDsjd8AAAAA41whIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwMCG9Z4AwP7atPWq9Z7Cfrno1N25YMFzvf31z17o/gDgcODMEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMCAsAQAADAgLAEAAAwISwAAAAPCEgAAwICwBAAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMbFjvCQCwepu2XrXeU1gXF526Oxfs47Xf/vpnL2g2ABxqnFkCAAAYEJYAAAAGhCUAAIABYQkAAGBAWAIAABgQlgAAAAaEJQAAgAFhCQAAYEBYAgAAGBCWAAAABg76sFRVZ1XVp6pqR1VtXe/5AAAAh4eDOixV1RFJ3pTk7CSnJHlBVZ2yvrMCAAAOBwd1WErylCQ7uvu27v5Gkm1JzlnnOQEAAIeBgz0sHZ/k83PPd05tAAAAa2rDek9gH2rQ1t/WqerCJBdOT3dV1adWud/HJ/niKsdgP/yiWi+MWi+OWi/O/tS63rCgyRz6HNeLo9aLo9aLdTDV+/v2p9PBHpZ2Jjlx7vkJSe5c2qm7L01y6YHaaVXd0N2bD9R4LE+tF0etF0etF0etF0etF0etF0etF+vhWO+D/TK865OcXFUnVdWRSc5NcuU6zwkAADgMHNRnlrp7d1W9NMk1SY5Icll337zO0wIAAA4DB3VYSpLuvjrJ1Qve7QG7pI99UuvFUevFUevFUevFUevFUevFUevFetjVu7q/7X4JAAAAh72D/TNLAAAA60JYWqKqzqqqT1XVjqraut7zeTioqhOr6o+r6paqurmqXja1v7qq7qiqj02PZ81t88qpxp+qqmfOtQ/rP93k47qqurWq3j3d8OOwVFW3V9VNU01vmNqOq6prp/pcW1XHTu1VVZdM9fx4VZ02N875U/9bq+r8ufYnT+PvmLYd3cL/kFdVPzh37H6sqr5cVb/kuD5wquqyqvpCVX1irm3Nj+Xl9nEoW6bW/6aqPjnV8w+q6pipfVNVPTB3jP/23DYPqaZ7+74dqpap9Zq/b1TVI6bnO6b1mxbzitfPMrV+91ydb6+qj03tjusVquV/zzs83q+722N6ZHYTic8keWKSI5P8eZJT1nteB/sjyROSnDYtPybJp5OckuTVSV4+6H/KVNtHJDlpqvkRe6t/kvckOXda/u0kL17v172O9b49yeOXtP1vSbZOy1uTvGFaflaSD2T2f5adnuS6qf24JLdNX4+dlo+d1n0kyY9N23wgydnr/ZrX+zEdm3+V2f/J4Lg+cHX9qSSnJfnEXNuaH8vL7eNQfixT62ck2TAtv2Gu1pvm+y0Z5yHVdLnv26H8WKbWa/6+keRfJPntafncJO9e71qsR62XrH9jkl+Zlh3XK6/zcr/nHRbv184sPdhTkuzo7tu6+xtJtiU5Z53ndNDr7ru6+6PT8leS3JLk+L1sck6Sbd399e7+yyQ7Mqv9sP7TXxd+OskV0/aXJ3nu2ryah61zMqtL8uD6nJPkbT3z4STHVNUTkjwzybXdfU9335vk2iRnTeuO7u4P9eyd6W1R6yQ5M8lnuvuze+njuH6IuvtPktyzpHkRx/Jy+zhkjWrd3X/Y3bunpx/O7P8yXNYKa7rc9+2QtcxxvZwD+b4x/z24IsmZe/46f6jaW62n1/6Pk7xrb2M4rvdtL7/nHRbv18LSgx2f5PNzz3dm77/0s8R02v9Hk1w3Nb10OgV72dyp0+XqvFz745LcN/dD/XD/vnSSP6yqG6vqwqltY3fflcze1JJ899T+UGt9/LS8tP1wd24e/APXcb12FnEsL7ePw9k/zeyvuXucVFV/VlX/b1U9bWpbSU39XP1ba/2+8a1tpvX3T/0PV09Lcnd33zrX5rhepSW/5x0W79fC0oON/gLjdoH7qaqOSvJ7SX6pu7+c5M1Jvj/JjyS5K7PT4cnydX6o7Yern+ju05KcneQlVfVTe+mr1qs0fR7gZ5K8d2pyXK8P9V0jVfWqJLuTvHNquivJ93b3jyb5n5L8blUdnZXV1PdhZhHvG2r9YC/Ig//I5bhepcHvect2HbQ9bN+vhaUH25nkxLnnJyS5c53m8rBSVd+R2T+gd3b37ydJd9/d3d/s7v+S5Hcyu6wgWb7Oy7V/MbNTuBuWtB+WuvvO6esXkvxBZnW9e88lANPXL0zdH2qtd+bBl+Ic1rUKjqSHAAACYklEQVSenJ3ko919d+K4XoBFHMvL7eOwM33A+jlJXjhd/pLpkrAvTcs3ZvbZmR/Iymrq52oW9r7xrW2m9Y/N/l8OeEiZXv9/m+Tde9oc16sz+j0vh8n7tbD0YNcnOblmd5o5MrNLb65c5zkd9Kbrgt+S5Jbu/vW59vnrd/9Rkj13q7kyybk1u3PPSUlOzuyDfcP6Tz/A/zjJ86btz0/yvrV8TQerqnp0VT1mz3JmH9D+RGY13XNXmfn6XJnkvOnONKcnuX86jX1NkmdU1bHT5SDPSHLNtO4rVXX69H09L4dprec86K+Tjus1t4hjebl9HFaq6qwkr0jyM939tbn276qqI6blJ2Z2LN+2wpou9307rCzofWP+e/C8JH+0JwAfhp6e5JPd/a1LuxzXK7fc73k5XN6v+yC4y8bB9MjsDh6fzuwvDq9a7/k8HB5JfjKz06UfT/Kx6fGsJG9PctPUfmWSJ8xt86qpxp/K3N3Wlqt/ZncE+khmH359b5JHrPfrXqdaPzGzuyL9eZKb99Qos+vSP5jk1unrcVN7JXnTVM+bkmyeG+ufTvXckeTn59o3Z/aD/DNJ/m2m/7z6cHwk+c4kX0ry2Lk2x/WBq++7Mrs05j9n9pfFFy3iWF5uH4fyY5la78js8wN73rf33Entv5veX/48yUeT/MOV1nRv37dD9bFMrdf8fSPJI6fnO6b1T1zvWqxHraf2tyb5H5f0dVyvvM7L/Z53WLxf75kIAAAAc1yGBwAAMCAsAQAADAhLAAAAA8ISAADAgLAEAAAwICwBAAAMCEsAAAADwhIAAMDA/w839LRpTGjj0QAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 1008x432 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df_job_data.hist(column='SalaryNormalized', figsize=(14,6))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>SalaryNormalized</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>2.447680e+05</td>\n",
" <td>244768.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>6.970142e+07</td>\n",
" <td>34122.577576</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>3.129813e+06</td>\n",
" <td>17640.543124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.261263e+07</td>\n",
" <td>5000.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>6.869550e+07</td>\n",
" <td>21500.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>6.993700e+07</td>\n",
" <td>30000.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>7.162606e+07</td>\n",
" <td>42500.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7.270524e+07</td>\n",
" <td>200000.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id SalaryNormalized\n",
"count 2.447680e+05 244768.000000\n",
"mean 6.970142e+07 34122.577576\n",
"std 3.129813e+06 17640.543124\n",
"min 1.261263e+07 5000.000000\n",
"25% 6.869550e+07 21500.000000\n",
"50% 6.993700e+07 30000.000000\n",
"75% 7.162606e+07 42500.000000\n",
"max 7.270524e+07 200000.000000"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_job_data.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "O5pvNAk2EwA6"
},
"source": [
"### Test"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 162
},
"colab_type": "code",
"executionInfo": {
"elapsed": 2533,
"status": "ok",
"timestamp": 1529801091483,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "eLuOUoyREwA6",
"outputId": "92b02f00-f266-44ec-8661-23fd9d420996"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Title</th>\n",
" <th>FullDescription</th>\n",
" <th>LocationRaw</th>\n",
" <th>LocationNormalized</th>\n",
" <th>ContractType</th>\n",
" <th>ContractTime</th>\n",
" <th>Company</th>\n",
" <th>Category</th>\n",
" <th>SourceName</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>11888454</td>\n",
" <td>Business Development Manager</td>\n",
" <td>The Company: Our client is a national training...</td>\n",
" <td>Tyne Wear, North East</td>\n",
" <td>Newcastle Upon Tyne</td>\n",
" <td>NaN</td>\n",
" <td>permanent</td>\n",
" <td>Asset Appointments</td>\n",
" <td>Teaching Jobs</td>\n",
" <td>cv-library.co.uk</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>11988350</td>\n",
" <td>Internal Account Manager</td>\n",
" <td>The Company: Founded in **** our client is a U...</td>\n",
" <td>Tyne and Wear, North East</td>\n",
" <td>Newcastle Upon Tyne</td>\n",
" <td>NaN</td>\n",
" <td>permanent</td>\n",
" <td>Asset Appointments</td>\n",
" <td>Consultancy Jobs</td>\n",
" <td>cv-library.co.uk</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Title \\\n",
"0 11888454 Business Development Manager \n",
"1 11988350 Internal Account Manager \n",
"\n",
" FullDescription \\\n",
"0 The Company: Our client is a national training... \n",
"1 The Company: Founded in **** our client is a U... \n",
"\n",
" LocationRaw LocationNormalized ContractType ContractTime \\\n",
"0 Tyne Wear, North East Newcastle Upon Tyne NaN permanent \n",
"1 Tyne and Wear, North East Newcastle Upon Tyne NaN permanent \n",
"\n",
" Company Category SourceName \n",
"0 Asset Appointments Teaching Jobs cv-library.co.uk \n",
"1 Asset Appointments Consultancy Jobs cv-library.co.uk "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test_rev1.head(n=2)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 272
},
"colab_type": "code",
"executionInfo": {
"elapsed": 741,
"status": "ok",
"timestamp": 1529801095157,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "23KJVhIKEwA_",
"outputId": "e92fd119-9ad8-4f17-f9f1-c2cf9736e848"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 122463 entries, 0 to 122462\n",
"Data columns (total 10 columns):\n",
"Id 122463 non-null int64\n",
"Title 122463 non-null object\n",
"FullDescription 122463 non-null object\n",
"LocationRaw 122463 non-null object\n",
"LocationNormalized 122463 non-null object\n",
"ContractType 33013 non-null object\n",
"ContractTime 90702 non-null object\n",
"Company 106202 non-null object\n",
"Category 122463 non-null object\n",
"SourceName 122463 non-null object\n",
"dtypes: int64(1), object(9)\n",
"memory usage: 9.3+ MB\n"
]
}
],
"source": [
"df_test_rev1.info()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "OBLh0PIxEwBF"
},
"source": [
"## Pré-processamento"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 955,
"status": "ok",
"timestamp": 1529801099122,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "aqPyCLOfEwBG",
"outputId": "e842d9b0-3453-4c2b-fe3f-6c43ffce7024"
},
"outputs": [],
"source": [
"def normalizeTextField(df, field):\n",
" vectorizer = CountVectorizer(max_features=100)\n",
" fields = vectorizer.fit_transform(df[field]).toarray()\n",
" # Generate field names\n",
" fcols = np.vectorize(lambda x: field + str(x))(np.arange(2))\n",
" # Reduz a dimensionalidade para 2 \n",
" pca = PCA(n_components = 2)\n",
" _df = pd.DataFrame(pca.fit_transform(fields), columns=fcols)\n",
" # Concatena o dataframe com o novo\n",
" df = pd.concat([df, _df], join ='inner', axis=1)\n",
" del df[field]\n",
" return df"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "OHzogV28EwBJ"
},
"source": [
"### SalaryRaw"
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1138,
"status": "ok",
"timestamp": 1529801103174,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "Tf_wd3knEwBK",
"outputId": "f9844ff9-02c1-4641-9498-8250636e6d09"
},
"outputs": [],
"source": [
"del df_job_data['SalaryRaw']"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "SUuskyQsEwBP"
},
"source": [
"### Remove ContractType"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "SkLiRtjpEwBP"
},
"source": [
"Grande quantidade de valores null"
]
},
{
"cell_type": "code",
"execution_count": 203,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 2707,
"status": "ok",
"timestamp": 1529801107862,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "PwsCuAoyEwBQ",
"outputId": "f083dfd4-58a1-4bbc-efcf-cb7b42e6ae3d"
},
"outputs": [],
"source": [
"del df_job_data['ContractType']\n",
"del df_test_rev1['ContractType']"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "TuGX7DRrEwBW"
},
"source": [
"### Remove ContractTime"
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 887,
"status": "ok",
"timestamp": 1529801110023,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "L7JlYf_dEwBY",
"outputId": "6f32ac6e-6608-4fa6-b977-8059eae0b64a"
},
"outputs": [],
"source": [
"del df_job_data['ContractTime']\n",
"del df_test_rev1['ContractTime']"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "3qa4BYJUEwBb"
},
"source": [
"### Removendo Category"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 738,
"status": "ok",
"timestamp": 1529801113956,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "9QF_BvwYEwBe",
"outputId": "916c7390-15a2-4db6-f7d5-728b75d8028b"
},
"outputs": [],
"source": [
"del df_job_data['Category']\n",
"del df_test_rev1['Category']"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "iILrtpDxEwBi"
},
"source": [
"### Removendo Location Raw"
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 963,
"status": "ok",
"timestamp": 1529801118238,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "XYyvZMbjEwBi",
"outputId": "261cae9d-276a-4549-bd7d-8dbbbd23d067"
},
"outputs": [],
"source": [
"del df_job_data['LocationRaw']\n",
"del df_test_rev1['LocationRaw']"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "uIOf83lkEwBm"
},
"source": [
"### Company"
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {},
"outputs": [],
"source": [
"df_job_data['Company'].replace(value='NULL', to_replace=np.nan, inplace=True)\n",
"df_test_rev1['Company'].replace(value='NULL', to_replace=np.nan, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 208,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Gregory Martin International', 'Indigo 21 Ltd',\n",
" 'Code Blue Recruitment', ..., 'Jobs North ',\n",
" 'National Army Museum', 'DMC Healthcare'], dtype=object)"
]
},
"execution_count": 208,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_job_data['Company'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 210,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(20813,)"
]
},
"execution_count": 210,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_job_data['Company'].unique().shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "JeIbKYJCEwBz"
},
"source": [
"### Removendo linhas com valores NULL"
]
},
{
"cell_type": "code",
"execution_count": 211,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 778,
"status": "ok",
"timestamp": 1529801127234,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "1goRZOL-EwB1",
"outputId": "e2b250f2-03d5-48f8-ddf4-08d0f7764ce8"
},
"outputs": [],
"source": [
"df_job_data.dropna(subset=['Title'], inplace = True)"
]
},
{
"cell_type": "code",
"execution_count": 212,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 748,
"status": "ok",
"timestamp": 1529801129354,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "MEUOmmdmEwB4",
"outputId": "ced2f425-5e70-476e-cb98-17dfbcd74d78"
},
"outputs": [],
"source": [
"df_job_data.dropna(subset=['SourceName'], inplace = True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "FDBRPSj_EwB8"
},
"source": [
"### Retirando Label"
]
},
{
"cell_type": "code",
"execution_count": 213,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 789,
"status": "ok",
"timestamp": 1529801134604,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "8LftYmQUEwB8",
"outputId": "c0d28223-86eb-4b0f-889a-c8eceddfe985"
},
"outputs": [],
"source": [
"y = df_job_data['SalaryNormalized'].values"
]
},
{
"cell_type": "code",
"execution_count": 214,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 741,
"status": "ok",
"timestamp": 1529801137435,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "bHra0m_cEwCA",
"outputId": "8ef54773-01f3-4faa-a8e4-cae54008bf11"
},
"outputs": [
{
"data": {
"text/plain": [
"array([25000, 30000, 30000, ..., 22800, 22800, 42500])"
]
},
"execution_count": 214,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ztzrX_FOEwCE"
},
"source": [
"### Retirando IDS"
]
},
{
"cell_type": "code",
"execution_count": 215,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 820,
"status": "ok",
"timestamp": 1529801142077,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "iLsjAp48EwCF",
"outputId": "cedfc00b-981e-41c6-c794-5c6c0baed4eb"
},
"outputs": [],
"source": [
"idx_job = df_job_data['Id'].values"
]
},
{
"cell_type": "code",
"execution_count": 216,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1004,
"status": "ok",
"timestamp": 1529801144276,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "2WC4Yn_-EwCI",
"outputId": "07086f3f-25a5-4a5e-bb4a-a5b1ef476189"
},
"outputs": [
{
"data": {
"text/plain": [
"array([12612628, 12612830, 12612844, ..., 72705213, 72705216, 72705235])"
]
},
"execution_count": 216,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"idx_job"
]
},
{
"cell_type": "code",
"execution_count": 217,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 762,
"status": "ok",
"timestamp": 1529801146895,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "QpyY_VXBEwCM",
"outputId": "e9977d70-2ff9-42e6-c5b3-0a39274c318e"
},
"outputs": [],
"source": [
"idx_test = df_test_rev1['Id'].values"
]
},
{
"cell_type": "code",
"execution_count": 218,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 732,
"status": "ok",
"timestamp": 1529801149196,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "noj4zYiaEwCT",
"outputId": "74e041cb-20cc-4c37-8778-c3c51041d3d9"
},
"outputs": [
{
"data": {
"text/plain": [
"array([11888454, 11988350, 12612558, ..., 72705210, 72705214, 72705218])"
]
},
"execution_count": 218,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"idx_test"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "slLrezFsEwCZ"
},
"source": [
"### Juntando conteudo"
]
},
{
"cell_type": "code",
"execution_count": 219,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 765,
"status": "ok",
"timestamp": 1529801154922,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "c4kx1jyOEwCa",
"outputId": "55517387-f516-4921-f719-5cb541f8b58c"
},
"outputs": [
{
"data": {
"text/plain": [
"(244766, 7)"
]
},
"execution_count": 219,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_job_tuple = df_job_data.shape\n",
"df_job_tuple"
]
},
{
"cell_type": "code",
"execution_count": 220,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 736,
"status": "ok",
"timestamp": 1529801157401,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "pZ2qSLGwEwCf",
"outputId": "b6916215-2bd4-42fe-dcd1-b37875b26a1e"
},
"outputs": [
{
"data": {
"text/plain": [
"(122463, 6)"
]
},
"execution_count": 220,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test_tuple = df_test_rev1.shape\n",
"df_test_tuple"
]
},
{
"cell_type": "code",
"execution_count": 221,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 755,
"status": "ok",
"timestamp": 1529801161403,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "eRG7FgLoEwCl",
"outputId": "0186aee5-5356-4a00-9be1-55b35d7908e6"
},
"outputs": [],
"source": [
"df = df_job_data.append(df_test_rev1, sort=False)"
]
},
{
"cell_type": "code",
"execution_count": 222,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 779,
"status": "ok",
"timestamp": 1529801163957,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "c2ZxM7ThEwCo",
"outputId": "f206b2c9-60b8-4f31-b590-899ab1321e55"
},
"outputs": [
{
"data": {
"text/plain": [
"(367229, 7)"
]
},
"execution_count": 222,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "WUHN-XxLEwCv"
},
"source": [
"#### LocationNormalized"
]
},
{
"cell_type": "code",
"execution_count": 223,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 3215,
"status": "ok",
"timestamp": 1529801169844,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "iQKRzS_DEwCv",
"outputId": "c008de64-1b60-46e4-8f64-536af8a08e0b"
},
"outputs": [],
"source": [
"df = normalizeTextField(df, 'LocationNormalized')"
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 745,
"status": "ok",
"timestamp": 1529801171609,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "q2FnUvsLEwC0",
"outputId": "cb9bef89-0efd-43f9-88f5-b2a0b1c3eafa"
},
"outputs": [
{
"data": {
"text/plain": [
"(367229, 8)"
]
},
"execution_count": 224,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 225,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Title</th>\n",
" <th>FullDescription</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>12612628</td>\n",
" <td>Engineering Systems Analyst</td>\n",
" <td>Engineering Systems Analyst Dorking Surrey Sal...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>25000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>12612830</td>\n",
" <td>Stress Engineer Glasgow</td>\n",
" <td>Stress Engineer Glasgow Salary **** to **** We...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>30000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.118995</td>\n",
" <td>-0.237572</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>12612844</td>\n",
" <td>Modelling and simulation analyst</td>\n",
" <td>Mathematical Modeller / Simulation Analyst / O...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>30000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.120516</td>\n",
" <td>-0.241914</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>12613049</td>\n",
" <td>Engineering Systems Analyst / Mathematical Mod...</td>\n",
" <td>Engineering Systems Analyst / Mathematical Mod...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>27500.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>12613647</td>\n",
" <td>Pioneer, Miser Engineering Systems Analyst</td>\n",
" <td>Pioneer, Miser Engineering Systems Analyst Do...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>25000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Title \\\n",
"0 12612628 Engineering Systems Analyst \n",
"1 12612830 Stress Engineer Glasgow \n",
"2 12612844 Modelling and simulation analyst \n",
"3 12613049 Engineering Systems Analyst / Mathematical Mod... \n",
"4 12613647 Pioneer, Miser Engineering Systems Analyst \n",
"\n",
" FullDescription \\\n",
"0 Engineering Systems Analyst Dorking Surrey Sal... \n",
"1 Stress Engineer Glasgow Salary **** to **** We... \n",
"2 Mathematical Modeller / Simulation Analyst / O... \n",
"3 Engineering Systems Analyst / Mathematical Mod... \n",
"4 Pioneer, Miser Engineering Systems Analyst Do... \n",
"\n",
" Company SalaryNormalized SourceName \\\n",
"0 Gregory Martin International 25000.0 cv-library.co.uk \n",
"1 Gregory Martin International 30000.0 cv-library.co.uk \n",
"2 Gregory Martin International 30000.0 cv-library.co.uk \n",
"3 Gregory Martin International 27500.0 cv-library.co.uk \n",
"4 Gregory Martin International 25000.0 cv-library.co.uk \n",
"\n",
" LocationNormalized0 LocationNormalized1 \n",
"0 -0.116790 -0.229172 \n",
"1 -0.118995 -0.237572 \n",
"2 -0.120516 -0.241914 \n",
"3 -0.122604 -0.249312 \n",
"4 -0.122604 -0.249312 "
]
},
"execution_count": 225,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "tL-laH_pEwC-"
},
"source": [
"#### Title"
]
},
{
"cell_type": "code",
"execution_count": 226,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 4337,
"status": "ok",
"timestamp": 1529801179499,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "TpmwNKR_EwC_",
"outputId": "f83246de-8b37-4d08-8a4b-f69de188cfcc"
},
"outputs": [],
"source": [
"df = normalizeTextField(df, 'Title')"
]
},
{
"cell_type": "code",
"execution_count": 227,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 991,
"status": "ok",
"timestamp": 1529801182206,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "kB93el4PEwDC",
"outputId": "223e13d3-1c58-4d59-d82b-8c15f5517cc2"
},
"outputs": [
{
"data": {
"text/plain": [
"(367229, 9)"
]
},
"execution_count": 227,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 228,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>FullDescription</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>12612628</td>\n",
" <td>Engineering Systems Analyst Dorking Surrey Sal...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>25000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>12612830</td>\n",
" <td>Stress Engineer Glasgow Salary **** to **** We...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>30000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.118995</td>\n",
" <td>-0.237572</td>\n",
" <td>-0.379568</td>\n",
" <td>-0.578663</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>12612844</td>\n",
" <td>Mathematical Modeller / Simulation Analyst / O...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>30000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.120516</td>\n",
" <td>-0.241914</td>\n",
" <td>-0.204017</td>\n",
" <td>0.064045</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>12613049</td>\n",
" <td>Engineering Systems Analyst / Mathematical Mod...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>27500.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>12613647</td>\n",
" <td>Pioneer, Miser Engineering Systems Analyst Do...</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>25000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id FullDescription \\\n",
"0 12612628 Engineering Systems Analyst Dorking Surrey Sal... \n",
"1 12612830 Stress Engineer Glasgow Salary **** to **** We... \n",
"2 12612844 Mathematical Modeller / Simulation Analyst / O... \n",
"3 12613049 Engineering Systems Analyst / Mathematical Mod... \n",
"4 12613647 Pioneer, Miser Engineering Systems Analyst Do... \n",
"\n",
" Company SalaryNormalized SourceName \\\n",
"0 Gregory Martin International 25000.0 cv-library.co.uk \n",
"1 Gregory Martin International 30000.0 cv-library.co.uk \n",
"2 Gregory Martin International 30000.0 cv-library.co.uk \n",
"3 Gregory Martin International 27500.0 cv-library.co.uk \n",
"4 Gregory Martin International 25000.0 cv-library.co.uk \n",
"\n",
" LocationNormalized0 LocationNormalized1 Title0 Title1 \n",
"0 -0.116790 -0.229172 -0.211709 0.010168 \n",
"1 -0.118995 -0.237572 -0.379568 -0.578663 \n",
"2 -0.120516 -0.241914 -0.204017 0.064045 \n",
"3 -0.122604 -0.249312 -0.211709 0.010168 \n",
"4 -0.122604 -0.249312 -0.211709 0.010168 "
]
},
"execution_count": 228,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "xDIMGEN7EwDG"
},
"source": [
"#### Full Description"
]
},
{
"cell_type": "code",
"execution_count": 229,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 68085,
"status": "ok",
"timestamp": 1529801253123,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "nDp6SmCVEwDG",
"outputId": "e6f242f1-591f-47f1-8454-d2e2eb75ee00"
},
"outputs": [],
"source": [
"df = normalizeTextField(df, 'FullDescription')"
]
},
{
"cell_type": "code",
"execution_count": 230,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 2471,
"status": "ok",
"timestamp": 1529801284445,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "jOBnnMV8EwDK",
"outputId": "a027949e-77b5-4018-9e74-15a7a3dddfda"
},
"outputs": [
{
"data": {
"text/plain": [
"(367229, 10)"
]
},
"execution_count": 230,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 231,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" <th>FullDescription0</th>\n",
" <th>FullDescription1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>12612628</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>25000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-18.530014</td>\n",
" <td>2.881801</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>12612830</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>30000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.118995</td>\n",
" <td>-0.237572</td>\n",
" <td>-0.379568</td>\n",
" <td>-0.578663</td>\n",
" <td>1.115408</td>\n",
" <td>-2.899837</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>12612844</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>30000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.120516</td>\n",
" <td>-0.241914</td>\n",
" <td>-0.204017</td>\n",
" <td>0.064045</td>\n",
" <td>-1.111251</td>\n",
" <td>2.198475</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>12613049</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>27500.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-18.890457</td>\n",
" <td>3.393423</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>12613647</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>25000.0</td>\n",
" <td>cv-library.co.uk</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-19.451188</td>\n",
" <td>2.751042</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Company SalaryNormalized SourceName \\\n",
"0 12612628 Gregory Martin International 25000.0 cv-library.co.uk \n",
"1 12612830 Gregory Martin International 30000.0 cv-library.co.uk \n",
"2 12612844 Gregory Martin International 30000.0 cv-library.co.uk \n",
"3 12613049 Gregory Martin International 27500.0 cv-library.co.uk \n",
"4 12613647 Gregory Martin International 25000.0 cv-library.co.uk \n",
"\n",
" LocationNormalized0 LocationNormalized1 Title0 Title1 \\\n",
"0 -0.116790 -0.229172 -0.211709 0.010168 \n",
"1 -0.118995 -0.237572 -0.379568 -0.578663 \n",
"2 -0.120516 -0.241914 -0.204017 0.064045 \n",
"3 -0.122604 -0.249312 -0.211709 0.010168 \n",
"4 -0.122604 -0.249312 -0.211709 0.010168 \n",
"\n",
" FullDescription0 FullDescription1 \n",
"0 -18.530014 2.881801 \n",
"1 1.115408 -2.899837 \n",
"2 -1.111251 2.198475 \n",
"3 -18.890457 3.393423 \n",
"4 -19.451188 2.751042 "
]
},
"execution_count": 231,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "3UqZ9i79EwDN"
},
"source": [
"#### Source Name"
]
},
{
"cell_type": "code",
"execution_count": 232,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 819,
"status": "ok",
"timestamp": 1529801289739,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "Wakkvtn4EwDV",
"outputId": "39e2df5b-a80a-4d86-89da-8c83a777754b"
},
"outputs": [],
"source": [
"_, sources = np.unique(df['SourceName'], return_inverse=True)"
]
},
{
"cell_type": "code",
"execution_count": 233,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 3545,
"status": "ok",
"timestamp": 1529801294803,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "W_FwJ5TYEwDZ",
"outputId": "486c3805-18bc-40d4-e525-002b70e3453c"
},
"outputs": [
{
"data": {
"text/plain": [
"(367229,)"
]
},
"execution_count": 233,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sources.shape"
]
},
{
"cell_type": "code",
"execution_count": 234,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 4481,
"status": "ok",
"timestamp": 1529801299695,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "JN1a4NevEwDj",
"outputId": "2c7e337a-0f1b-4ec8-cfcd-1f605d426501"
},
"outputs": [],
"source": [
"df['SourceName'] = sources"
]
},
{
"cell_type": "code",
"execution_count": 235,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 702,
"status": "ok",
"timestamp": 1529801300859,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "BBjy-ZqbEwDo",
"outputId": "639a972b-b8d7-49e6-e630-f6a234e1cb57"
},
"outputs": [
{
"data": {
"text/plain": [
"(367229, 10)"
]
},
"execution_count": 235,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 236,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 160
},
"colab_type": "code",
"executionInfo": {
"elapsed": 749,
"status": "ok",
"timestamp": 1529801304114,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "CTVv0buBEwDw",
"outputId": "5b58b9d4-c607-4c6d-a6a4-75a7d04bce92"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" <th>FullDescription0</th>\n",
" <th>FullDescription1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>12612628</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>25000.0</td>\n",
" <td>42</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-18.530014</td>\n",
" <td>2.881801</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>12612830</td>\n",
" <td>Gregory Martin International</td>\n",
" <td>30000.0</td>\n",
" <td>42</td>\n",
" <td>-0.118995</td>\n",
" <td>-0.237572</td>\n",
" <td>-0.379568</td>\n",
" <td>-0.578663</td>\n",
" <td>1.115408</td>\n",
" <td>-2.899837</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Company SalaryNormalized SourceName \\\n",
"0 12612628 Gregory Martin International 25000.0 42 \n",
"1 12612830 Gregory Martin International 30000.0 42 \n",
"\n",
" LocationNormalized0 LocationNormalized1 Title0 Title1 \\\n",
"0 -0.116790 -0.229172 -0.211709 0.010168 \n",
"1 -0.118995 -0.237572 -0.379568 -0.578663 \n",
"\n",
" FullDescription0 FullDescription1 \n",
"0 -18.530014 2.881801 \n",
"1 1.115408 -2.899837 "
]
},
"execution_count": 236,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(n=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Company"
]
},
{
"cell_type": "code",
"execution_count": 237,
"metadata": {},
"outputs": [],
"source": [
"_, companies = np.unique(df['Company'], return_inverse=True)"
]
},
{
"cell_type": "code",
"execution_count": 238,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(367229,)"
]
},
"execution_count": 238,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"companies.shape"
]
},
{
"cell_type": "code",
"execution_count": 239,
"metadata": {},
"outputs": [],
"source": [
"df['Company'] = companies"
]
},
{
"cell_type": "code",
"execution_count": 240,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(367229, 10)"
]
},
"execution_count": 240,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 241,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" <th>FullDescription0</th>\n",
" <th>FullDescription1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>12612628</td>\n",
" <td>9229</td>\n",
" <td>25000.0</td>\n",
" <td>42</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-18.530014</td>\n",
" <td>2.881801</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>12612830</td>\n",
" <td>9229</td>\n",
" <td>30000.0</td>\n",
" <td>42</td>\n",
" <td>-0.118995</td>\n",
" <td>-0.237572</td>\n",
" <td>-0.379568</td>\n",
" <td>-0.578663</td>\n",
" <td>1.115408</td>\n",
" <td>-2.899837</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Company SalaryNormalized SourceName LocationNormalized0 \\\n",
"0 12612628 9229 25000.0 42 -0.116790 \n",
"1 12612830 9229 30000.0 42 -0.118995 \n",
"\n",
" LocationNormalized1 Title0 Title1 FullDescription0 FullDescription1 \n",
"0 -0.229172 -0.211709 0.010168 -18.530014 2.881801 \n",
"1 -0.237572 -0.379568 -0.578663 1.115408 -2.899837 "
]
},
"execution_count": 241,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(n=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pós processamento"
]
},
{
"cell_type": "code",
"execution_count": 242,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" <th>FullDescription0</th>\n",
" <th>FullDescription1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>12612628</td>\n",
" <td>9229</td>\n",
" <td>25000.0</td>\n",
" <td>42</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-18.530014</td>\n",
" <td>2.881801</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>12612830</td>\n",
" <td>9229</td>\n",
" <td>30000.0</td>\n",
" <td>42</td>\n",
" <td>-0.118995</td>\n",
" <td>-0.237572</td>\n",
" <td>-0.379568</td>\n",
" <td>-0.578663</td>\n",
" <td>1.115408</td>\n",
" <td>-2.899837</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>12612844</td>\n",
" <td>9229</td>\n",
" <td>30000.0</td>\n",
" <td>42</td>\n",
" <td>-0.120516</td>\n",
" <td>-0.241914</td>\n",
" <td>-0.204017</td>\n",
" <td>0.064045</td>\n",
" <td>-1.111251</td>\n",
" <td>2.198475</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>12613049</td>\n",
" <td>9229</td>\n",
" <td>27500.0</td>\n",
" <td>42</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-18.890457</td>\n",
" <td>3.393423</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>12613647</td>\n",
" <td>9229</td>\n",
" <td>25000.0</td>\n",
" <td>42</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-19.451188</td>\n",
" <td>2.751042</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Company SalaryNormalized SourceName LocationNormalized0 \\\n",
"0 12612628 9229 25000.0 42 -0.116790 \n",
"1 12612830 9229 30000.0 42 -0.118995 \n",
"2 12612844 9229 30000.0 42 -0.120516 \n",
"3 12613049 9229 27500.0 42 -0.122604 \n",
"4 12613647 9229 25000.0 42 -0.122604 \n",
"\n",
" LocationNormalized1 Title0 Title1 FullDescription0 FullDescription1 \n",
"0 -0.229172 -0.211709 0.010168 -18.530014 2.881801 \n",
"1 -0.237572 -0.379568 -0.578663 1.115408 -2.899837 \n",
"2 -0.241914 -0.204017 0.064045 -1.111251 2.198475 \n",
"3 -0.249312 -0.211709 0.010168 -18.890457 3.393423 \n",
"4 -0.249312 -0.211709 0.010168 -19.451188 2.751042 "
]
},
"execution_count": 242,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 243,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 160
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1926,
"status": "ok",
"timestamp": 1529801314400,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "QB7-RNIHEwD4",
"outputId": "1cc11811-1a6a-4572-e7c7-6f37bdef14f9"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" <th>FullDescription0</th>\n",
" <th>FullDescription1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>122458</th>\n",
" <td>72703426</td>\n",
" <td>22483</td>\n",
" <td>NaN</td>\n",
" <td>95</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.140759</td>\n",
" <td>0.027805</td>\n",
" <td>-16.425155</td>\n",
" <td>3.326807</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122459</th>\n",
" <td>72703453</td>\n",
" <td>232</td>\n",
" <td>NaN</td>\n",
" <td>95</td>\n",
" <td>-0.118020</td>\n",
" <td>-0.233316</td>\n",
" <td>-0.148008</td>\n",
" <td>0.061885</td>\n",
" <td>-17.558738</td>\n",
" <td>2.838631</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122460</th>\n",
" <td>72705210</td>\n",
" <td>14637</td>\n",
" <td>NaN</td>\n",
" <td>64</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.187463</td>\n",
" <td>0.364341</td>\n",
" <td>-11.138799</td>\n",
" <td>-0.978168</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122461</th>\n",
" <td>72705214</td>\n",
" <td>14637</td>\n",
" <td>NaN</td>\n",
" <td>64</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>0.868984</td>\n",
" <td>-0.102670</td>\n",
" <td>-3.389519</td>\n",
" <td>-0.760346</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122462</th>\n",
" <td>72705218</td>\n",
" <td>14637</td>\n",
" <td>NaN</td>\n",
" <td>64</td>\n",
" <td>-0.118635</td>\n",
" <td>-0.235408</td>\n",
" <td>-0.168568</td>\n",
" <td>0.034974</td>\n",
" <td>-13.765711</td>\n",
" <td>-0.120907</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Company SalaryNormalized SourceName LocationNormalized0 \\\n",
"122458 72703426 22483 NaN 95 -0.116790 \n",
"122459 72703453 232 NaN 95 -0.118020 \n",
"122460 72705210 14637 NaN 64 -0.116790 \n",
"122461 72705214 14637 NaN 64 -0.116790 \n",
"122462 72705218 14637 NaN 64 -0.118635 \n",
"\n",
" LocationNormalized1 Title0 Title1 FullDescription0 \\\n",
"122458 -0.229172 -0.140759 0.027805 -16.425155 \n",
"122459 -0.233316 -0.148008 0.061885 -17.558738 \n",
"122460 -0.229172 -0.187463 0.364341 -11.138799 \n",
"122461 -0.229172 0.868984 -0.102670 -3.389519 \n",
"122462 -0.235408 -0.168568 0.034974 -13.765711 \n",
"\n",
" FullDescription1 \n",
"122458 3.326807 \n",
"122459 2.838631 \n",
"122460 -0.978168 \n",
"122461 -0.760346 \n",
"122462 -0.120907 "
]
},
"execution_count": 243,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail()"
]
},
{
"cell_type": "code",
"execution_count": 244,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" <th>FullDescription0</th>\n",
" <th>FullDescription1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>3.672290e+05</td>\n",
" <td>367229.000000</td>\n",
" <td>244766.000000</td>\n",
" <td>367229.000000</td>\n",
" <td>367229.000000</td>\n",
" <td>367229.000000</td>\n",
" <td>367229.000000</td>\n",
" <td>367229.000000</td>\n",
" <td>367229.000000</td>\n",
" <td>367229.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>6.969881e+07</td>\n",
" <td>12360.855161</td>\n",
" <td>34122.192494</td>\n",
" <td>88.657734</td>\n",
" <td>-0.003500</td>\n",
" <td>-0.003753</td>\n",
" <td>-0.000547</td>\n",
" <td>0.002329</td>\n",
" <td>-0.026405</td>\n",
" <td>0.014674</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>3.127609e+06</td>\n",
" <td>6570.361799</td>\n",
" <td>17639.753029</td>\n",
" <td>56.313850</td>\n",
" <td>0.456049</td>\n",
" <td>0.351458</td>\n",
" <td>0.429128</td>\n",
" <td>0.318787</td>\n",
" <td>12.385516</td>\n",
" <td>4.597004</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.188845e+07</td>\n",
" <td>0.000000</td>\n",
" <td>5000.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-0.568546</td>\n",
" <td>-0.420773</td>\n",
" <td>-1.119124</td>\n",
" <td>-2.701985</td>\n",
" <td>-19.732022</td>\n",
" <td>-40.437940</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>6.869505e+07</td>\n",
" <td>7040.000000</td>\n",
" <td>21500.000000</td>\n",
" <td>42.000000</td>\n",
" <td>-0.124085</td>\n",
" <td>-0.236447</td>\n",
" <td>-0.204409</td>\n",
" <td>-0.110315</td>\n",
" <td>-8.602899</td>\n",
" <td>-2.505604</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>6.993552e+07</td>\n",
" <td>13860.000000</td>\n",
" <td>30000.000000</td>\n",
" <td>85.000000</td>\n",
" <td>-0.117893</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.171657</td>\n",
" <td>0.034974</td>\n",
" <td>-2.243902</td>\n",
" <td>0.173910</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>7.162515e+07</td>\n",
" <td>17047.000000</td>\n",
" <td>42500.000000</td>\n",
" <td>154.000000</td>\n",
" <td>-0.116790</td>\n",
" <td>0.107018</td>\n",
" <td>-0.130477</td>\n",
" <td>0.057478</td>\n",
" <td>5.772098</td>\n",
" <td>2.474907</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7.270524e+07</td>\n",
" <td>24854.000000</td>\n",
" <td>200000.000000</td>\n",
" <td>168.000000</td>\n",
" <td>1.290674</td>\n",
" <td>0.648287</td>\n",
" <td>3.708054</td>\n",
" <td>3.296137</td>\n",
" <td>244.121250</td>\n",
" <td>58.723964</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Company SalaryNormalized SourceName \\\n",
"count 3.672290e+05 367229.000000 244766.000000 367229.000000 \n",
"mean 6.969881e+07 12360.855161 34122.192494 88.657734 \n",
"std 3.127609e+06 6570.361799 17639.753029 56.313850 \n",
"min 1.188845e+07 0.000000 5000.000000 0.000000 \n",
"25% 6.869505e+07 7040.000000 21500.000000 42.000000 \n",
"50% 6.993552e+07 13860.000000 30000.000000 85.000000 \n",
"75% 7.162515e+07 17047.000000 42500.000000 154.000000 \n",
"max 7.270524e+07 24854.000000 200000.000000 168.000000 \n",
"\n",
" LocationNormalized0 LocationNormalized1 Title0 Title1 \\\n",
"count 367229.000000 367229.000000 367229.000000 367229.000000 \n",
"mean -0.003500 -0.003753 -0.000547 0.002329 \n",
"std 0.456049 0.351458 0.429128 0.318787 \n",
"min -0.568546 -0.420773 -1.119124 -2.701985 \n",
"25% -0.124085 -0.236447 -0.204409 -0.110315 \n",
"50% -0.117893 -0.229172 -0.171657 0.034974 \n",
"75% -0.116790 0.107018 -0.130477 0.057478 \n",
"max 1.290674 0.648287 3.708054 3.296137 \n",
"\n",
" FullDescription0 FullDescription1 \n",
"count 367229.000000 367229.000000 \n",
"mean -0.026405 0.014674 \n",
"std 12.385516 4.597004 \n",
"min -19.732022 -40.437940 \n",
"25% -8.602899 -2.505604 \n",
"50% -2.243902 0.173910 \n",
"75% 5.772098 2.474907 \n",
"max 244.121250 58.723964 "
]
},
"execution_count": 244,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 248,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>Company</th>\n",
" <th>SalaryNormalized</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" <th>FullDescription0</th>\n",
" <th>FullDescription1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Id</th>\n",
" <td>1.000000</td>\n",
" <td>-0.020986</td>\n",
" <td>0.047094</td>\n",
" <td>0.109891</td>\n",
" <td>0.032935</td>\n",
" <td>0.057275</td>\n",
" <td>0.002192</td>\n",
" <td>-0.002024</td>\n",
" <td>0.035829</td>\n",
" <td>0.004801</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Company</th>\n",
" <td>-0.020986</td>\n",
" <td>1.000000</td>\n",
" <td>0.004974</td>\n",
" <td>0.027165</td>\n",
" <td>-0.007489</td>\n",
" <td>-0.017697</td>\n",
" <td>-0.003113</td>\n",
" <td>0.001284</td>\n",
" <td>-0.003085</td>\n",
" <td>0.004680</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SalaryNormalized</th>\n",
" <td>0.047094</td>\n",
" <td>0.004974</td>\n",
" <td>1.000000</td>\n",
" <td>0.123441</td>\n",
" <td>0.082108</td>\n",
" <td>0.050715</td>\n",
" <td>0.013384</td>\n",
" <td>-0.077149</td>\n",
" <td>0.030054</td>\n",
" <td>0.031389</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SourceName</th>\n",
" <td>0.109891</td>\n",
" <td>0.027165</td>\n",
" <td>0.123441</td>\n",
" <td>1.000000</td>\n",
" <td>0.017216</td>\n",
" <td>0.112476</td>\n",
" <td>0.049994</td>\n",
" <td>0.020802</td>\n",
" <td>0.071979</td>\n",
" <td>-0.021501</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LocationNormalized0</th>\n",
" <td>0.032935</td>\n",
" <td>-0.007489</td>\n",
" <td>0.082108</td>\n",
" <td>0.017216</td>\n",
" <td>1.000000</td>\n",
" <td>0.000530</td>\n",
" <td>0.050502</td>\n",
" <td>0.044066</td>\n",
" <td>0.018854</td>\n",
" <td>0.003637</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LocationNormalized1</th>\n",
" <td>0.057275</td>\n",
" <td>-0.017697</td>\n",
" <td>0.050715</td>\n",
" <td>0.112476</td>\n",
" <td>0.000530</td>\n",
" <td>1.000000</td>\n",
" <td>0.039818</td>\n",
" <td>0.016730</td>\n",
" <td>0.046324</td>\n",
" <td>-0.014547</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Title0</th>\n",
" <td>0.002192</td>\n",
" <td>-0.003113</td>\n",
" <td>0.013384</td>\n",
" <td>0.049994</td>\n",
" <td>0.050502</td>\n",
" <td>0.039818</td>\n",
" <td>1.000000</td>\n",
" <td>-0.004641</td>\n",
" <td>0.120983</td>\n",
" <td>-0.020667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Title1</th>\n",
" <td>-0.002024</td>\n",
" <td>0.001284</td>\n",
" <td>-0.077149</td>\n",
" <td>0.020802</td>\n",
" <td>0.044066</td>\n",
" <td>0.016730</td>\n",
" <td>-0.004641</td>\n",
" <td>1.000000</td>\n",
" <td>0.004257</td>\n",
" <td>-0.139567</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FullDescription0</th>\n",
" <td>0.035829</td>\n",
" <td>-0.003085</td>\n",
" <td>0.030054</td>\n",
" <td>0.071979</td>\n",
" <td>0.018854</td>\n",
" <td>0.046324</td>\n",
" <td>0.120983</td>\n",
" <td>0.004257</td>\n",
" <td>1.000000</td>\n",
" <td>-0.002455</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FullDescription1</th>\n",
" <td>0.004801</td>\n",
" <td>0.004680</td>\n",
" <td>0.031389</td>\n",
" <td>-0.021501</td>\n",
" <td>0.003637</td>\n",
" <td>-0.014547</td>\n",
" <td>-0.020667</td>\n",
" <td>-0.139567</td>\n",
" <td>-0.002455</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id Company SalaryNormalized SourceName \\\n",
"Id 1.000000 -0.020986 0.047094 0.109891 \n",
"Company -0.020986 1.000000 0.004974 0.027165 \n",
"SalaryNormalized 0.047094 0.004974 1.000000 0.123441 \n",
"SourceName 0.109891 0.027165 0.123441 1.000000 \n",
"LocationNormalized0 0.032935 -0.007489 0.082108 0.017216 \n",
"LocationNormalized1 0.057275 -0.017697 0.050715 0.112476 \n",
"Title0 0.002192 -0.003113 0.013384 0.049994 \n",
"Title1 -0.002024 0.001284 -0.077149 0.020802 \n",
"FullDescription0 0.035829 -0.003085 0.030054 0.071979 \n",
"FullDescription1 0.004801 0.004680 0.031389 -0.021501 \n",
"\n",
" LocationNormalized0 LocationNormalized1 Title0 \\\n",
"Id 0.032935 0.057275 0.002192 \n",
"Company -0.007489 -0.017697 -0.003113 \n",
"SalaryNormalized 0.082108 0.050715 0.013384 \n",
"SourceName 0.017216 0.112476 0.049994 \n",
"LocationNormalized0 1.000000 0.000530 0.050502 \n",
"LocationNormalized1 0.000530 1.000000 0.039818 \n",
"Title0 0.050502 0.039818 1.000000 \n",
"Title1 0.044066 0.016730 -0.004641 \n",
"FullDescription0 0.018854 0.046324 0.120983 \n",
"FullDescription1 0.003637 -0.014547 -0.020667 \n",
"\n",
" Title1 FullDescription0 FullDescription1 \n",
"Id -0.002024 0.035829 0.004801 \n",
"Company 0.001284 -0.003085 0.004680 \n",
"SalaryNormalized -0.077149 0.030054 0.031389 \n",
"SourceName 0.020802 0.071979 -0.021501 \n",
"LocationNormalized0 0.044066 0.018854 0.003637 \n",
"LocationNormalized1 0.016730 0.046324 -0.014547 \n",
"Title0 -0.004641 0.120983 -0.020667 \n",
"Title1 1.000000 0.004257 -0.139567 \n",
"FullDescription0 0.004257 1.000000 -0.002455 \n",
"FullDescription1 -0.139567 -0.002455 1.000000 "
]
},
"execution_count": 248,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corr()"
]
},
{
"cell_type": "code",
"execution_count": 254,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Id 0.047094\n",
"Company 0.004974\n",
"SalaryNormalized 1.000000\n",
"SourceName 0.123441\n",
"LocationNormalized0 0.082108\n",
"LocationNormalized1 0.050715\n",
"Title0 0.013384\n",
"Title1 -0.077149\n",
"FullDescription0 0.030054\n",
"FullDescription1 0.031389\n",
"dtype: float64"
]
},
"execution_count": 254,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corrwith(df['SalaryNormalized'])"
]
},
{
"cell_type": "code",
"execution_count": 256,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f631ee06d30>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f6333601be0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f63335fa2b0>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7f631dcb7940>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f631e24bc88>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f6324e9e128>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7f631de40320>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f631de78978>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f6320624048>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x7f63335f0d68>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f63335c4d68>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x7f631e3d9438>]],\n",
" dtype=object)"
]
},
"execution_count": 256,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0sAAAK7CAYAAAAjoRTbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzs3X+8XFV97//X24Rf8iv8kNOQpA2WaAVSEfKAWG/tqbQQwBpswQapBKVNr4WqNW0J6hUU8QveIooiLUhKsGhIEUsqoTEFzrXeS8IvkRAi5RijOSQmQgIkIODBz/ePvYbsDHvOmTkzc2afc97Px2MeZ2bttff6zJ7Myqy111pbEYGZmZmZmZnt6jWdDsDMzMzMzKyM3FgyMzMzMzMr4MaSmZmZmZlZATeWzMzMzMzMCrixZGZmZmZmVsCNJTMzMzMzswJuLJmZWVtIuljSv6TnUyWFpPGdjitP0u9KeqyJ/T8m6autjMnM6ud6xtrNjSUrJOm9ku6XtEPSJkl3SPofnY7LzDpD0npJv0h1QuVxaBPH65H0gqTtkp6V9ICkBZL2aGXcg4mI/4qIN9aTV1K3pL6q/T8bEX/eilgknSDph5Kel3S3pN9oxXHNRgrXM+2tZyTtLumWdJ5DUnezxxwL3FiyV5H0UeALwGeBLuDXga8AszsZl5l13B9FxD65x8Ymj3d+ROwLTATmA3OAZZLUdKR1KFPvs6SDgVuB/wUcCNwP3NzRoMw6w/VMe30P+DPgZ50OZKRwY8l2IWl/4NPAeRFxa0Q8FxG/jIh/j4i/k7SHpC9I2pgeX6j00FR6QyT9vaQt6YrUaZJOkfTfkrZK+liurItTD8fNqdfnQUlvzm1fIOlHadujkt6d23aOpO9J+gdJ2yT9WNLJadsZkh6oel/zJf1bu8+f2VhS1AOaeiz/oJHjpHqmB3gX8Fbg1HSs1+TqgackLZF0YNq2p6R/SelPS7pPUlfadqCkf0511LbKdz9XR10g6WfAP1e/hxT/hanO2ZaOs6ekvYE7gEPzPd7KDQFK+79L0poUU4+kN1Ud+28lPSzpmVT37Zk2/zGwJiL+NSJeAC4G3izptxo5l2ajjeuZ1tUzEfFSRHwhIr4HvNzI+RvL3Fiyam8F9gS+VWP7x4GZwNHAm4HjgE/ktv9a2n8S8EngOrIejGOB3wU+Ken1ufyzgX8l60n9OvBvknZL236U9tkf+BTwL5Im5vY9HngMOBj4HHC9JAFLgcPylUeK4Wv1nQIz64SI+CnZFZXfTUkfAk4Dfg84FNgGXJ22zSWrG6YABwH/E/hF2vY14LXAkcAhwJW5Yn6NrL75DWBejVDOAk4CfhN4A/CJiHgOOBnYWKvHW9IbgG8AHwFeBywD/l3S7rls7wFmAYcBvw2ck9KPBH6QOxfPkdWBR9aI0cyGYIzXMzYEbixZtYOAJyOiv8b2s4BPR8SWiPg5WSPmfbntvwQujYhfAovJGjJfjIjtEbEGWEP2xa14ICJuSfk/T9bQmgmQelg3RsSvIuJm4HGyxlnFTyLiuoh4GVhEdom9KyJeJBu+8mcAko4EpgLfHuI5MbPMv6WezKfbeKV2I9mPDIC/BD4eEX3pe30xcLqyYS2/JKuvDo+IlyPigYh4NnWonAz8z4jYlq6M/5/c8X8FXBQRL0bELyj25YjYEBFbgUuBM+uM/U+B2yNiRarT/gHYC/idXJ6rUr22Ffh3so4ngH2AZ6qO9wywb51lm40WrmcG1kw9Y0PgxpJVewo4WLXH2B4K/CT3+icp7ZX9U+MFdva+bM5t/wXZj4KKDZUnEfEroK9yPElnS3qoUmkCR5E1vip+ltv3+fS0cuxFwHvTlab3AUtSJWhmQ3daRExIj9PaVMYkYGt6/hvAt3J1wFqyoSNdZL26y4HFaRjM59JV6SnA1ojYVuP4P0/D3AayIfe8uo4byC71Y6rTNqT3VJGfJ/A8O+usHcB+VcfbD9heZ9lmo4XrmYE1U8/YELixZNXuAV4guyRdZCNZxVLx6yltqKZUnkh6DTAZ2KhsFajrgPOBgyJiAvAIUNeEzIhYCbxEdpn9vXgInlk7PEc2DAUASePIhoUMiaQpZEN2/yslbQBOzv1wmhARe0bEE6kn91MRcQRZj+o7gbPTPgdKmlCjmKgjlCm55/k6brB9d6kfU2fNFOCJOspcQza0ubLv3mTDc9bUsa/ZaOZ6ZlfN1DM2BG4s2S4i4hmyuUZXK1uc4bWSdpN0sqTPkY2T/YSk1ylbvemTwL8MdMxBHCvpj9OVrI8ALwIrgb3JKoyfA0h6P9mVpUbcCHwZ6E+TGc2stf4b2FPSqam39RNAw0vypnrm94DbgHvJxuAD/CNwaeo8IdU7s9Pz35c0Pf1wepZsuMzLEbGJbIL0VyQdkOqvtzcY0nmSJqdJ3h9j56p0m4GDlC2EU2QJcKqyJcB3I1t560Xg/9VR5reAoyT9SZqM/Ung4Yj4YYOxm402rmd21Uw9g7KFuioLy+yeFpYYlpUBRyo3luxVIuLzwEfJKqSfk/WgnA/8G/AZsomRDwOrgQdT2lDdRjb+dhvZcLk/Tj05jwJXkF3p2gxMB/5vg8f+GlkDy1eVzNogda78FfBVsl7N58iG0tbry5K2k33HvwB8E5iVhpUAfJFswZbvpHwryRZ2gWwC9S1kP2DWAv+HnR037yP7UfNDYAtZR0wjvg58B1iXHp8BSA2XbwDr0pCdXYbNRMRjZHMlvwQ8CfwR2TLILw1WYJoD+idkcxe2pfc5p8G4zUYd1zOtq2eSx8imREwiG2L4C3YdMWRVFFHPlUKz1pN0MdmkyT9r0/H3IqvAjomIx9tRhpmNLpLWA38eEf/Z6VjMbHRyPTOy+MqSjWYfBO5zQ8nMzMzMhqJsdxU2a4nUayNqL1RhZmZmZjYgD8MzMzMzMzMr4GF4ZmZmZmZmBUbdMLyDDz44pk6dOmi+5557jr333rv9AQ1BmWMDx9eMMscG9cX3wAMPPBkRQ77HxWhQbz3TjDL+W3FM9StjXCMpJtczjdUzZfxs88ocX5ljA8fXrIHiq7ueiYhR9Tj22GOjHnfffXdd+TqhzLFFOL5mlDm2iPriA+6PEnzXO/mot55pRhn/rTim+pUxrpEUk+uZxuqZMn62eWWOr8yxRTi+Zg0UX731jIfhmZmZmZmZFXBjyczMzMzMrIAbS2ZmZmZmZgXcWDIzMzMzMyvgxpKZmZmZmVmBUbd0uI0sUxfc3tLjrb/s1JYez6wZzfz7nj+9n3Oq9ve/bzOz1pm64PbCunaoXEePTk1dWZL0N5LWSHpE0jck7SnpMEmrJD0u6WZJu6e8e6TXvWn71NxxLkzpj0k6KZc+K6X1SlrQTKxmZmZmZmaNGHJjSdIk4EPAjIg4ChgHzAEuB66MiGnANuDctMu5wLaIOBy4MuVD0hFpvyOBWcBXJI2TNA64GjgZOAI4M+U1MzMza4ikKZLulrQ2dfR+OKUfKGlF6uRdIemAlC5JV6UO24clHZM71tyU/3FJc3Ppx0panfa5SpIGKsPMyq/ZOUvjgb0kjQdeC2wC3gHckrYvAk5Lz2en16TtJ6RKZDawOCJejIgfA73AcenRGxHrIuIlYHHKa2ZmZtaofmB+RLwJmAmclzphFwB3pk7eO9NryDprp6XHPOAayBo+wEXA8WS/VS7KNX6uSXkr+81K6bXKMLOSG/KcpYh4QtI/AD8FfgF8B3gAeDoi+lO2PmBSej4J2JD27Zf0DHBQSl+ZO3R+nw1V6ccXxSJpHlnlRFdXFz09PYPGv2PHjrrydUKZY4PWxjd/ev/gmRrQ09NT6vNX5tig/PGZmQ1VRGwi69QlIrZLWkv2e2M20J2yLQJ6gAtS+o0REcBKSRMkTUx5V0TEVgBJK4BZknqA/SLinpR+I1mH8R0DlGFmJTfkxlLqRZkNHAY8DfwrWS9MtajsUmNbrfSiq15RkEZEXAtcCzBjxozo7u4eKHQg+1FdT75OKHNs0Nr4WjWpsmL9Wd2lPn9ljg3KH5+ZWSukedNvAVYBXakhRURsknRIyvZKJ29S6cwdKL2vIJ0ByqiOq+HOXyh/R1dZ45s/vZ+uvVrXcduO91jWc1cxFuJrZjW8PwB+HBE/B5B0K/A7wARJ49PVpcnAxpS/D5gC9KVhe/sDW3PpFfl9aqWbmZmZNUzSPsA3gY9ExLNpWlFh1oK0gTp5a6XXbSidv1D+jq6yxndOWg3vitWtWRx6/VndLTlOXlnPXcVYiK+ZOUs/BWZKem2ae3QC8ChwN3B6yjMXuC09X5pek7bflS5tLwXmpNXyDiMb43svcB8wLa2utzvZIhBLm4jXzMzMxjBJu5E1lG6KiFtT8uY0vI70d0tKr9WZO1D65IL0gcows5IbcmMpIlaRLdTwILA6HetasjG4H5XUSzYn6fq0y/XAQSn9o6TJjRGxBlhC1tD6D+C8iHg5XZk6H1gOrAWWpLxmZmZmDUkdu9cDayPi87lN+c7c6k7es9OqeDOBZ9JQuuXAiZIOSFMSTgSWp23bJc1MZZ1NcYdxvgwzK7mmrjtGxEVkK8LkrSNbHaY67wvAGTWOcylwaUH6MmBZMzGamZmZAW8D3geslvRQSvsYcBmwRNK5ZKNmKr9VlgGnkK3S+zzwfoCI2CrpErIRMACfriz2AHwQuAHYi2xhhztSeq0yzKzkWjNI08zMzKzEIuJ7FM8rgmwqQXX+AM6rcayFwMKC9PuBowrSnyoqw8zKr9n7LJmZNW2Am0VeLOkJSQ+lxym5fS5MN358TNJJufRZKa1X0oJc+mGSVqWbQt6c5kKS5kvenPKvSqtkmZmZmbmxZGalUOtmkQBXRsTR6bEMIG2bAxxJdtPHr0gaJ2kccDXZbQyOAM7MHefydKxpwDbg3JR+LrAtIg4Hrkz5zMzMzNxYMrPOi4hNEfFger6dbFGXSQPsMhtYHBEvRsSPyeYUHJcevRGxLiJeAhYDs9Nk63eQLUoD2U0hT8sda1F6fgtwggZYS9jMzMzGDs9ZMrNSqbpZ5NuA8yWdDdxPdvVpG1lDamVut/zNH6tvFnk82cqcT6dVNqvzv3KDyYjol/RMyv9kVVwN3yyymRsdFt0osdM3/ivjzQfLGBOUMy7HZGbWODeWzKw0Cm4WeQ1wCdmNHS8BrgA+QO2bPxZdLR/sZpF13UhyKDeLPGfB7YPmqaXoRontuOFhI8p488EyxgTljMsxmZk1zsPwzKwUim4WGRGb033XfgVcx87bEjR6s8gngQmSxlel73KstH1/YCtmZmY25rmxZGYdV+tmkZU73ifvBh5Jz5cCc9JKdocB04B7ye57Mi2tfLc72SIQS9MSwHcDp6f9q288WblZ5OnAXSm/mZmZjXEehmdmZVDrZpFnSjqabFjceuAvASJijaQlwKNkK+mdFxEvA0g6H1gOjAMWRsSadLwLgMWSPgN8n6xxRvr7NUm9ZFeU5rTzjZqZmdnI4caSmXXcADeLXDbAPpcClxakLyvaLyLWsXMYXz79BeCMRuI1MzOzscHD8MzMzMzMzAq4sWRmZmZmZlbAjSUzMzMzM7MCbiyZmZmZmZkVcGPJzMzMzMysgBtLZmZmZmZmBdxYMjMzMzMzK+DGkpmZmZmZWQE3lszMzMzMzAq4sWRmZmZmZlbAjSUzMzMzM7MCTTWWJE2QdIukH0paK+mtkg6UtELS4+nvASmvJF0lqVfSw5KOyR1nbsr/uKS5ufRjJa1O+1wlSc3Ea2ZmZmZmVq9mryx9EfiPiPgt4M3AWmABcGdETAPuTK8BTgampcc84BoASQcCFwHHA8cBF1UaWCnPvNx+s5qM18zMzMYgSQslbZH0SC7tYklPSHooPU7JbbswddY+JumkXPqslNYraUEu/TBJq1LH782Sdk/pe6TXvWn71OF5x2bWCkNuLEnaD3g7cD1ARLwUEU8Ds4FFKdsi4LT0fDZwY2RWAhMkTQROAlZExNaI2AasAGalbftFxD0REcCNuWOZmZmZNeIGijtdr4yIo9NjGYCkI4A5wJFpn69IGidpHHA1WQfwEcCZKS/A5elY04BtwLkp/VxgW0QcDlyZ8pnZCNHMlaXXAz8H/lnS9yV9VdLeQFdEbAJIfw9J+ScBG3L796W0gdL7CtLNbJSRNEXS3Wk47xpJH07pbR/WW6sMMxtdIuK7wNY6s88GFkfEixHxY6CXbPTLcUBvRKyLiJeAxcDsVJ+8A7gl7V/dWVzpRL4FOMHTCsxGjvFN7nsM8NcRsUrSF9k55K5IUcUQQ0h/9YGleWTD9ejq6qKnp2eAMDI7duyoK18nlDk2aG1886f3t+Q4FT09PaU+f2WODToaXz8wPyIelLQv8ICkFcA5ZMN6L0vDXRYAF7DrsN7jyYbsHp8b1juDrL54QNLSdNW6Mqx3JbCMrLf4DnYOHa4uw8zGhvMlnQ3cT1YPbSPrnF2Zy5PvsK3u4D0eOAh4OiL6C/K/0ikcEf2Snkn5n2zDezGzFmumsdQH9EXEqvT6FrIfGZslTYyITWko3ZZc/im5/ScDG1N6d1V6T0qfXJD/VSLiWuBagBkzZkR3d3dRtl309PRQT75OKHNs0Nr4zllwe0uOU7H+rO5Sn78yxwadiy9dha5ckd4uaS3ZD4zZ7KwfFpHVDReQG9YLrEyLzUxMeVdExFaA1OCaJamHNKw3pVeG9d4xQBlmNvpdA1xC1rlyCXAF8AFqd9gWjcgZrIO3rZ2/4I64oZo/vZ+uvVrXcduO91jWc1cxFuIbcmMpIn4maYOkN0bEY8AJwKPpMRe4LP29Le2ylKz3ZjFZL8wzqUG1HPhsbujLicCFEbFV0nZJM4FVwNnAl4Yar5mNDGny81vIvve7DOuV1I5hvbXKqI6r4R8xzfwHXPQfeKf/Qyrjf4pljAnKGZdjerWI2Fx5Luk64NvpZa0OXmqkP0k2F3t8urqUz185Vp+k8cD+1BgOOJTOX3BH3FCds+B25k/v54rVzVw72Gn9Wd0tOU5eWc9dxViIr9l/HX8N3JRWfFkHvJ+s12WJpHOBnwJnpLzLgFPIxv0+n/KSGkWXAPelfJ+u9AoDHySbkLkXWQ/wHU3Ga2YlJmkf4JvARyLi2QGG9bdtWG8tQ/kR08yV06L/wNvxH3EjyvifYhljgnLG5ZherTISJr18N1BZKW8p8HVJnwcOJRvyey9ZXTJN0mHAE2SLQLw3IkLS3cDpZPOYqjuL5wL3pO13paviZjYCNNVYioiHyOYGVDuhIG8A59U4zkJgYUH6/cBRzcRoZiODpN3IGko3RcStKXk4hvXWKsPMRhFJ3yCrHw6W1Ec2v7Fb0tFknSfrgb8EiIg1kpaQjZbpB86LiJfTcc4HlgPjgIURsSYVcQGwWNJngO+TVgtOf78mqZfsitKcNr9VM2uh1lx3NDNrQloZ6npgbUR8Prep0iPbzmG9tcows1EkIs4sSL6+IK2S/1Lg0oL0ZWSjZarT15Gtlled/gI7R9mY2QjjxpKZlcHbgPcBqyU9lNI+RtaAafew3lplmJlZiUxt8aJQZvVwY8nMOi4ivkfxvCJo87DeiHiqqAwzMzOzZm5Ka2ZmZmZmNmq5sWRmZmZmZlbAjSUzMzMzM7MCbiyZmZmZmZkVcGPJzMzMzMysgBtLZmZmZmZmBdxYMjMzMzMzK+DGkpmZmZmZWQE3lszMzMzMzAq4sWRmZmZmZlbAjSUzMzMzM7MCbiyZmZmZmZkVcGPJzMzMzMysgBtLZtZxkhZK2iLpkVzaxZKekPRQepyS23ahpF5Jj0k6KZc+K6X1SlqQSz9M0ipJj0u6WdLuKX2P9Lo3bZ86PO/YzMzMRgI3lsysDG4AZhWkXxkRR6fHMgBJRwBzgCPTPl+RNE7SOOBq4GTgCODMlBfg8nSsacA24NyUfi6wLSIOB65M+czMzMwAN5bMrAQi4rvA1jqzzwYWR8SLEfFjoBc4Lj16I2JdRLwELAZmSxLwDuCWtP8i4LTcsRal57cAJ6T8ZmZmZozvdABmZgM4X9LZwP3A/IjYBkwCVuby9KU0gA1V6ccDBwFPR0R/Qf5JlX0iol/SMyn/k9WBSJoHzAPo6uqip6dn0ODnT+8fNE8tXXu9ev96ymynHTt2dDyGamWMCcoZl2MyM2ucG0tmVlbXAJcAkf5eAXwAKLryExRfKY8B8jPItl0TI64FrgWYMWNGdHd3DxB65pwFtw+ap5b50/u5YvWuVfT6swYvs516enqo530PpzLGBOWMyzGZmTXOw/DMrJQiYnNEvBwRvwKuIxtmB9mVoSm5rJOBjQOkPwlMkDS+Kn2XY6Xt+1P/cEAzMzMb5ZpuLKWJ1d+X9O30uuFVpxpd2crMRj9JE3Mv3w1UVspbCsxJdcphwDTgXuA+YFqqg3YnWwRiaUQEcDdwetp/LnBb7lhz0/PTgbtSfjMzM7OWXFn6MLA297qhVaeGuLKVmY0ikr4B3AO8UVKfpHOBz0laLelh4PeBvwGIiDXAEuBR4D+A89IVqH7gfGA5WZ20JOUFuAD4qKResjlJ16f064GDUvpHAXfKmI1iNW5TcKCkFamTd4WkA1K6JF2VOmwflnRMbp+5Kf/jkubm0o9N9VZv2lcDlWFm5ddUY0nSZOBU4Kvp9VBWnWpoZatm4jWzcoqIMyNiYkTsFhGTI+L6iHhfREyPiN+OiHdFxKZc/ksj4jcj4o0RcUcufVlEvCFtuzSXvi4ijouIwyPijIh4MaW/kF4fnravG953bmbD7AZefZuCBcCdqZP3TnZ2mpxMduV6GtniLtdA1vABLiJbQOY44KJc4+ealLey36xByjCzkmt2gYcvAH8P7JteD2XVqUZXtnqVoaxSVeYVeMocG7Q2vmZWCyvS09NT6vNX5tig/PGZmTUjIr5bcPPp2UB3er4I6CG7Gj0buDENzV0paUIaHtwNrIiIrQCSVgCzJPUA+0XEPSn9RrIO4zsGKMPMSm7IjSVJ7wS2RMQDkroryQVZB1t1qtGVrV6dOIRVqsq8Ak+ZY4PWxtfMamFF1p/VXerzV+bYoPzxmZm1QVflynVEbJJ0SEp/pZM3qXTmDpTeV5A+UBm7GErnL5S/o6tV8bW6gxWKb9MwVO34DMbKZ9surYivmStLbwPeJekUYE9gP7IrTRMkjU9Xl4pWneqrWnWq1gpWDJBuZmZm1k6NdvLWfSuCWobS+Qvl7+hqVXyt7mCF4ts0DFU7bu8wVj7bdmlFfEOesxQRF6a5BVPJFmi4KyLOovFVpxpa2Wqo8ZqZmZkV2FxZfTP93ZLSG71NQV96Xp0+UBlmVnLtuM9SQ6tODXFlKzMzM7NWyHfmVnfynp1WxZsJPJOG0i0HTpR0QFrY4URgedq2XdLMtIDV2RR3GOfLMLOSa8l1x4joIZusSFpN6riCPC8AZ9TY/1Lg0oL0ZcCyVsRoZmZmY1u6TUE3cLCkPrJV7S4DlqRbFvyUnb9VlgGnkK3S+zzwfoCI2CrpErIRMACfriz2AHyQbMW9vcgWdqis1lmrDDMrudYM0jQzMzMruYg4s8amEwryBnBejeMsBBYWpN8PHFWQ/lRRGWZWfu0YhmdmZmZmZjbijdkrS6ufeKYtq6q0wg2z9m7p8aa2+H22Or5WmrrgduZP72/ZZ7v+slNbchwzMzMzG3nGbGPJrB5jqaFpZmZmZrvyMDwzMzMzM7MCbiyZmZmZmZkVcGPJzMzMzMysgOcsWcPKvDiGjUySFgLvBLZExFEp7UDgZmAqsB54T0RsSzd7/CLZ/U+eB86JiAfTPnOBT6TDfiYiFqX0Y9l575NlwIcjImqV0ea3a2ZmZiOEryyZWRncAMyqSlsA3BkR04A702uAk4Fp6TEPuAZeaVxdBBxPdmPsiyQdkPa5JuWt7DdrkDLMzMzM3Fgys86LiO8CW6uSZwOL0vNFwGm59BsjsxKYIGkicBKwIiK2pqtDK4BZadt+EXFPusnkjVXHKirDzMzMzI0lMyutrojYBJD+HpLSJwEbcvn6UtpA6X0F6QOVYWZmZuY5S2Y24qggLYaQ3lih0jyyoXx0dXXR09Mz6D7zp/c3WswruvZ69f71lNlOO3bs6HgM1coYE5QzLsdkZtY4N5bMRqhW3zB3/WWntvR4LbBZ0sSI2JSG0m1J6X3AlFy+ycDGlN5dld6T0icX5B+ojFeJiGuBawFmzJgR3d3dtbK+opmFUOZP7+eK1btW0evPGrzMdurp6aGe9z2cyhgTlDMux2Rm1jgPwzOzsloKzE3P5wK35dLPVmYm8EwaQrccOFHSAWlhhxOB5Wnbdkkz00p6Z1cdq6gMMzMzM19ZKiMvzW1jjaRvkF0VOlhSH9mqdpcBSySdC/wUOCNlX0a2bHgv2dLh7weIiK2SLgHuS/k+HRGVRSM+yM6lw+9IDwYow8zMzMyNJTPrvIg4s8amEwryBnBejeMsBBYWpN8PHFWQ/lRRGWZmZmbgYXhmZmZmZmaF3FgyMzMzMzMr4MaSmZmZmZlZATeWzMzMzMzMCniBB7Nh5JUOzczMzEYON5bMzEaIVt6IuIQ3ITYzMyudIQ/DkzRF0t2S1kpaI+nDKf1ASSskPZ7+HpDSJekqSb2SHpZ0TO5Yc1P+xyXNzaUfK2l12ueqdENJMzMzs5aRtD793nhI0v0pzb9nzKypOUv9wPyIeBMwEzhP0hHAAuDOiJgG3JleA5wMTEuPecA1kFVGZDegPB44DrioUiGlPPNy+81qIl4zMzOzWn4/Io6OiBnptX/PmNnQG0sRsSkiHkzPtwNrgUnAbGBRyrYIOC09nw3cGJmVwARJE4GTgBURsTUitgErgFlp234RcU+6CeWNuWOZmZmZtZN/z5hZa+YsSZoKvAVYBXRFxCbIGlSSDknZJgEbcrv1pbSB0vsK0ovKn0fWY0NXVxc9PT2Dxty1F8yf3j9ovk4oc2zg+JpR5th6enrYsWNHXd8fM7NRJoDvSArgnyLiWkbI7xmg9HV3q+Jrx/+frfx/uR2fwVj5bNulFfE13ViStA/wTeAjEfHsAMNwizbEENJfnZhVatcCzJgxI7q7uweJGr50021csbqc61vMn95f2tjA8TWjzLGtP6ubnp4e6vn+mJmNMm8GSkZeAAAgAElEQVSLiI2pQbRC0g8HyFuq3zNA6evuVsXXjtVkW/n/8vqzultynLyx8tm2Syvia+o+S5J2I2so3RQRt6bkzemSM+nvlpTeB0zJ7T4Z2DhI+uSCdDMzM7OWiYiN6e8W4Ftkc478e8bMmloNT8D1wNqI+Hxu01KgsgLMXOC2XPrZaRWZmcAz6fL2cuBESQekiZAnAsvTtu2SZqayzs4dy8zMzKxpkvaWtG/lOdnvkEfw7xkzo7lheG8D3geslvRQSvsYcBmwRNK5wE+BM9K2ZcApQC/wPPB+gIjYKukS4L6U79MRsTU9/yBwA7AXcEd6mJmZmbVKF/CtNI1gPPD1iPgPSffh3zNmY96QG0sR8T2Kx+ECnFCQP4DzahxrIbCwIP1+4KihxmhmI5+k9cB24GWgPyJmpCV6bwamAuuB90TEttRr+0WyHzLPA+dUVu1M9zz5RDrsZyJiUUo/lp0/YpYBH071lZmNARGxDnhzQfpT+PeM2ZjX1JwlM7Nh4vufmJmZ2bBzY8nMRiLf/8TMzMzarpxrGJuZ7TRi73/SzL072n1PrqHcd6KM99MoY0xQzrgck5lZ49xYMrOyG7H3P2nmniDtvifXUO4HUsb7aZQxJihnXI7JzKxxHoZnZqXm+5+YmZlZp7ixZGal5fufmJmZWSd5GJ6ZlZnvf2JmZmYd48aSmZWW739iZmZmneRheGZmZmZmZgXcWDIzMzMzMyvgxpKZmZmZmVkBN5bMzMzMzMwKuLFkZmZmZmZWwI0lMzMzMzOzAm4smZmZmZmZFXBjyczMzMzMrIAbS2ZmZmZmZgXcWDIzMzMzMyvgxpKZmZmZmVmB8Z0OwMzMzMxspJu64PaWHWv9Zae27FjWHF9ZMjMzMzMzK1D6K0uSZgFfBMYBX42IyzockpmNMq5nzKzdxmI9U7nSMn96P+e08KqL2XAqdWNJ0jjgauAPgT7gPklLI+LRzkZmZqPFWK1nhjJcZKAfPB4yYlbbWK1nzEaDsg/DOw7ojYh1EfESsBiY3eGYzGx0cT1jZu3mesZshCr1lSVgErAh97oPOL5DsZjZ6OR6pgVaObEZfKXKRh3XM9aQVg9hdJ06dGVvLKkgLV6VSZoHzEsvd0h6rI5jHww82URsbfOhEscGjq8ZZY5NlwP1xfcbbQ9meLWznhmyMv5bGc6Y0r/HepTuPCVljGskxeR6prF6poyf7SvKWJ9VlDk2aF18DdSpjSr1+WPg+OqqZ8reWOoDpuReTwY2VmeKiGuBaxs5sKT7I2JGc+G1R5ljA8fXjDLHBuWPr03aVs80o4yfhWOqXxnjckwd1dZ6puznsczxlTk2cHzNakV8ZZ+zdB8wTdJhknYH5gBLOxyTmY0urmfMrN1cz5iNUKW+shQR/ZLOB5aTLbW5MCLWdDgsMxtFXM+YWbu5njEbuUrdWAKIiGXAsjYcetiG0wxBmWMDx9eMMscG5Y+vLdpYzzSjjJ+FY6pfGeNyTB3U5nqm7OexzPGVOTZwfM1qOj5FvGp+oZmZmZmZ2ZhX9jlLZmZmZmZmHTHmGkuSZkl6TFKvpAXDXPZ6SaslPSTp/pR2oKQVkh5Pfw9I6ZJ0VYrzYUnH5I4zN+V/XNLcIcayUNIWSY/k0loWi6Rj03vtTfsWLZvaaHwXS3oinb+HJJ2S23ZhKusxSSfl0gs/7zTJdlWK++Y04bbe2KZIulvSWklrJH24TOdvgPhKcf5sYJ2so6riqLu+anMcLamrhiGmhr9fLY6pZfXSMMXV0fM1mkj6W0kh6eD0elg+2zri+t+Sfphi+JakCbltpfiMy1Lf5uJp6HvcoRjHSfq+pG+n16X5PSBpgqRb0r+7tZLe2pJzFxFj5kE2qfJHwOuB3YEfAEcMY/nrgYOr0j4HLEjPFwCXp+enAHeQ3ZthJrAqpR8IrEt/D0jPDxhCLG8HjgEeaUcswL3AW9M+dwAntyC+i4G/Lch7RPos9wAOS5/xuIE+b2AJMCc9/0fggw3ENhE4Jj3fF/jvFEMpzt8A8ZXi/Pkx4GfX0TqqKpb11FlftTmOpuuqYYqpoe9XG2JqSb00jHF19HyNlgfZcuTLgZ9Uvq/D9dnWEduJwPj0/PLcv71SfMZlqm9zMTX0Pe5QjB8Fvg58O70uze8BYBHw5+n57sCEVpy7sXZl6TigNyLWRcRLwGJgdodjmk324ZL+npZLvzEyK4EJkiYCJwErImJrRGwDVgCzGi00Ir4LbG1HLGnbfhFxT2T/Om/MHauZ+GqZDSyOiBcj4sdAL9lnXfh5SxLwDuCWgvdaT2ybIuLB9Hw7sJbs7uylOH8DxFfLsJ4/G1AZ66i8Wv/G26ZFddVwxFRLre9Xq2NqVb00XHHVMiznaxS5Evh7dr3B7bB8toOJiO9ERH96uZLs3lKV+MrwGZeuvh3C93hYSZoMnAp8Nb0uze8BSfuRdWRdDxARL0XE07Tg3I21xtIkYEPudR8DV9qtFsB3JD2g7C7dAF0RsQmyLwlwSEqvFWs730OrYpmUnrcjxvPTJf2FuUupjcZ3EPB0rhIfcnySpgJvAVZRwvNXFR+U7PzZq3S6jsprpL4abo1+14ZLI9+vtmmyXhquuKAk52ukkvQu4ImI+EHVpjKeww+QXe2C8sRXljgK1fk9Hm5fIGuc/yq9LtPvgdcDPwf+OQ0T/KqkvWnBuRtrjaWieR/DuRzg2yLiGOBk4DxJbx8gb61YO/EeGo2lXTFeA/wmcDSwCbiik/FJ2gf4JvCRiHh2oKwlia9U588KlencNlJflUUnz1+j36+2aEG91BYtqI/GJEn/KemRgsds4OPAJ4t2K0hryzkcJL5Kno8D/cBNwx3fIMoSx6s08D0eNpLeCWyJiAfyyQVZO3UOx5MNj74mIt4CPEc27K4lBx5L+sjG91ZMBjYOV+ERsTH93SLpW2SXgDdLmhgRm9Jl8i2DxNoHdFel97QoxFbF0sfOy+35/E2JiM2V55KuA749SHzUSH+SbFjC+NQb0nB8knYjq8huiohbU3Jpzl9RfGU6f1ZTR+uovAbrq+HW6Het7Yb4/WqpFtVLwxJXGc7XSBARf1CULmk62XyfH2QjoZgMPCjpOIbxHNaKLxfnXOCdwAlpWDnDGd8gyhLHLhr8Hg+ntwHvUrYYy57AfmRXmsrye6AP6IuIypXrW8gaS02fu7F2Zek+YFpauWN3YA6wdDgKlrS3pH0rz8kmPj6Syq+sgjYXuC09XwqcrcxM4Jl0+XA5cKKkA9KwhRNTWiu0JJa0bbukmWk869m5Yw1Z1Zjrd5Odv0p8cyTtIekwYBrZAgmFn3eqsO8GTi94r/XEIbIxsWsj4vO5TaU4f7XiK8v5swF1rI7KG0J9Ndwa/a613RC+X60uv1X10rDE1enzNdJFxOqIOCQipkbEVLIfisdExM/o4PcgT9Is4ALgXRHxfG5TWT7jUtS3eUP4Hg+biLgwIianf29zgLsi4ixK8nsg/dvfIOmNKekE4FFace6ig6t+dOJBtkrMf5OtgPLxYSz39WQrrfwAWFMpm2y8553A4+nvgSldwNUpztXAjNyxPkA2IbIXeP8Q4/kG2dCHX5JVsue2MhZgBtl/fj8Cvky6AXKT8X0tlf9w+sc/MZf/46msx8itHFfr806fx70p7n8F9mggtv9Bdpn5YeCh9DilLOdvgPhKcf78GPTz60gdVRVDQ/VVm2NpSV01DDE1/P1qcUwtq5eGKa6Onq/R9iC3euVwfbZ1xNRLNieo8rn/Y9k+4zLUt1XxNPQ97mCc3excDa80vwfIhvXen87fv5GtNNz0uVM6uJmZmZmZmeWMtWF4ZmZmZmZmdXFjyczMzMzMrIAbS2ZmZmZmZgXcWDIzMzMzMyvgxpKZmZmZmVkBN5bMzMzMzMwKuLFkZmZmZmZWwI0lMzMzMzOzAm4sWSlImiopJI3vdCxm1nqSflfSY52OYzhIukHSZ9LztrzvVF8e3urjmo1krmdaXobrGdxYsmEmab2kP+h0HGZj0XB+/6r/k42I/4qIN7bw2KslvSaX9hlJN7Ti+K3UyvddizKXS3oqPT4nSe0s06wW1zPDb5jqmd+XdLekZyStb2dZZePGkpmZjUSHAnOaPcgouZo9DzgNeDPw28A7gb/saERmo4PrmZ2eAxYCf9fpQIabG0vWEZLGSfoHSU9KWgec2umYzMYqSX8hqVfSVklLJR2a23akpBVp22ZJH0vpx0m6R9LTkjZJ+rKk3dO276bdfyBph6Q/ldQtqS933DdJ6kn7r5H0rty2GyRdLel2SdslrZL0m1Vhfw74VK0fIZLelY77dCrnTblt6yVdIOlh4DlJ41Pa30l6WNJzkq6X1CXpjhTDf0o6IHeMf5X0s9TL+l1JR9aI45X3nc7DjtzjRUk9adseqU78aTrP/yhpr9xx/i6d542SPlBVzFzgiojoi4gngCuAc4riMesU1zMju56JiHsj4mvAuqIYRjM3lqxT/oKs9/MtwAzg9M6GYzY2SXoH8P8B7wEmAj8BFqdt+wL/CfwHWQ/r4cCdadeXgb8BDgbeCpwA/BVARLw95XlzROwTETdXlbkb8O/Ad4BDgL8GbpKUH0ZyJvAp4ACgF7i0KvRbgWcpaBRIegPwDeAjwOuAZcC/V35k5Y5/KjAhIvpT2p8Afwi8Afgj4A7gY+k9vgb4UG7/O4BpKf4HgZuq46gWETen87EP2flcl+IEuDyVezTZeZ4EfDK9n1nA36bYpgHVQ5yOBH6Qe/2DlGZWCq5nRkU9M2a5sWSd8h7gCxGxISK2klWiZjb8zgIWRsSDEfEicCHwVklTyTo0fhYRV0TECxGxPSJWAUTEAxGxMiL6I2I98E/A79VZ5kxgH+CyiHgpIu4Cvk32w6Li1tST2U/2A+HoqmME8L+AT0rao2rbnwK3R8SKiPgl8A/AXsDv5PJcleqfX+TSvhQRm9PVmf8CVkXE99N5+RZZ5w7p/S9M5+NF4GLgzZL2r+fNK5sD8XWgJyL+SZLIOpD+JiK2RsR24LPsHP7zHuCfI+KRiHgulZe3D/BM7vUzwD7puGZl4Hpmp5Faz4xZo2EMpY1MhwIbcq9/0qlAzMa4Q8l6LAGIiB2SniLrcZwC/Khop9Sr+nmyK8OvJfv/5IEGytwQEb/Kpf0klVnxs9zz58l+9OwiIpZJ+inZnJ3q4/8kl+9XkjZUHX8Dr7Y59/wXBa/3gWwYMVkP9BlkPcqV93EwuzZaarkU2JedPcivIzuHD+TaNwLG5d5P/txW15c7gP1yr/cDdkRE1BGL2XBwPbPTSK1nxixfWbJO2URWQVb8eqcCMRvjNgK/UXkhaW/gIOAJsv/oq8fwV1wD/BCYFhH7kQ0jqfdKxkZginKrTJHVAU80FjoAnwA+TvYjIH/8/HsSWX2TP34zDYn3ArPJhqnsD0ytFDXYjpLmkPVsn556owGeJPuRdGRETEiP/dMwGhi8vlxDtrhDxZtTmllZuJ5pXNnqmTHLjSXrlCXAhyRNTpMZF3Q6ILMxYjdJe1YeZN/F90s6Og0z+SzZsJD1ZENWfk3SR9LE4H0lHZ+Osy/ZWP4dkn4L+GBVOZuB19eIYRXZykp/L2k3Sd1kY/cXN/pmIqIHWE22yEHFEuBUSSekeQvzgReB/9fo8WvYNx3vKbIfT5+tZydJbwG+BJwWET+vpKee7+uAKyUdkvJOknRS7v2cI+kISa8FLqo69I3AR9M+h5K93xuG+ubMWsD1TPNKVc9Iek36LHfLXmrPqvlZo5YbS9Yp1wHLySYiP0g2idLM2m8ZWe9i5fG7ZGPyv0nWs/ibpDHsaUz7H5L9wPgZ8Djw++k4f0vW87md7Pu8y+RqsvHui5StEvWe/IaIeAl4F3AyWW/nV4CzI+KHQ3xPnwAOzB3/MeDPyH4wPJni/6NUbivcSDZE5QngUWBlnfvNJptI/r3cSlV3pG0XkE0wXynpWbIJ729M7+cO4AvAXSnPXVXH/SeyieyrgUeA21OaWae4nmle2eqZt5N9lsvIrjr9gmzxjFFPHtJsZmZmZmb2ar6yZGZmZmZmVsCNJTMzMzMzswJuLJmZmZmZmRVwY8nMzMzMzKyAG0tmZmZmZmYFxnc6gFY7+OCDY+rUqa+8fu6559h77707Fs9YL78MMXS6/DLE0MryH3jggScj4nUtOdgIVV3PDEWn/020g9/TyDAS3pPrmaHXM2X8fB3T4MoWD4z+mOquZyJiVD2OPfbYyLv77rujk8Z6+WWIodPllyGGVpYP3B8l+K538lFdzwxFp/9NtIPf08gwEt6T65mh1zNl/Hwd0+DKFk/E6I+p3nrGw/DMzMzMzMwKuLFkZmZmZmZWwI0lMzMzMzOzAm4smZmZmZmZFRh1q+GZjRVTF9xed9750/s5Z5D86y87tdmQbARp5N9PPfzvx2x0aGXd4HrBRgNfWTIzMzMzMyvgxpKZmZmZmVkBN5bMzMzMzMwKuLFkZmZmZmZWwI0lMzMzG/UkTZF0t6S1ktZI+nBKP1DSCkmPp78HpHRJukpSr6SHJR2TO9bclP9xSXNz6cdKWp32uUqSBirDzMrPjSUzMzMbC/qB+RHxJmAmcJ6kI4AFwJ0RMQ24M70GOBmYlh7zgGsga/gAFwHHA8cBF+UaP9ekvJX9ZqX0WmWYWcm5sWRmZmajXkRsiogH0/PtwFpgEjAbWJSyLQJOS89nAzdGZiUwQdJE4CRgRURsjYhtwApgVtq2X0TcExEB3Fh1rKIyzKzk3FgyMzOzMUXSVOAtwCqgKyI2QdagAg5J2SYBG3K79aW0gdL7CtIZoAwzKznflNbMzMzGDEn7AN8EPhIRz6ZpRYVZC9JiCOmNxDaPbBgfXV1d9PT0NLI7ADt27BjSfhXzp/cPed9qlTiajakdyhZT2eIBx1ThxpKZmZmNCZJ2I2so3RQRt6bkzZImRsSmNJRuS0rvA6bkdp8MbEzp3VXpPSl9ckH+gcrYRURcC1wLMGPGjOju7i7KNqCenh6Gsl/FOQtuH/K+1daflcXRbEztULaYyhYPOKYKD8Mzs47zKlVm1m7pO389sDYiPp/btBSo1BVzgdty6Wen+mYm8EwaQrccOFHSAam+OBFYnrZtlzQzlXV21bGKyjCzknNjyczKwKtUmVm7vQ14H/AOSQ+lxynAZcAfSnoc+MP0GmAZsA7oBa4D/gogIrYClwD3pcenUxrAB4Gvpn1+BNyR0muVYWYl52F4ZtZxqUe2Mvl5u6T8KlXdKdsisqEuF5BbpQpYKamySlU3aZUqAEmVVap6SKtUpfTKKlV3DFCGmY0iEfE9iucVAZxQkD+A82ocayGwsCD9fuCogvSnisows/Ib9MqSpD0l3SvpB2l4zKdS+mGSVqWhKzdL2j2l75Fe96btU3PHujClPybppFz6rJTWK2lBLr2wDDMbvbxKlZmZmZVFPVeWXgTeERE70sTI70m6A/gocGVELJb0j8C5ZMNczgW2RcThkuYAlwN/mobUzAGOBA4F/lPSG1IZV5Ndlu4D7pO0NCIeTfsWlWFmo9BoX6Uqr9OrDLVyxSvIJt12+j21g9+TmdnYNmhjKV2G3pFe7pYeAbwDeG9KXwRcTNaQmZ2eA9wCfDlNdJwNLI6IF4EfS+olm1MA0BsR6wAkLQZmp2E4tcows1FmLKxSldfpVYZaueIVZKtedfo9tYPfk5nZ2FbXnCVJ44AHgMPJrgL9CHg6Iipdk/khLa8Mg4mIfknPAAel9JW5w+b3qR42c3zap1YZ1fHV7PHtdA/aWC+/DDF0uvx2xdDIlYGuvQbP38lzVMcqVZfx6lWqzk+dK8eTVqmStBz4bG5RhxOBCyNiq6TtaUWrVWSrVH1pkDLMzMxsjKursRQRLwNHS5oAfAt4U1G29LfRYTBF86YaGjYzUI9vp3vQxnr5ZYih0+W3K4ZGrgzMn97PFasH/rpX7ofRIZVVqlZLeiilfYysAbNE0rnAT4Ez0rZlwClkK049D7wfslWqJFVWqYJXr1J1A7AX2cIO+VWqisowMzOzMa6h1fAi4um0qtRMYIKk8enKT35IS2V4TJ+k8cD+wFZqD5uhRvqTA5RhZqOIV6kyMzOzMqpnNbzXpStKSNoL+ANgLXA3cHrKVj08pnLjtdOBu9IPm6XAnLRa3mFk9zm5l6wHeFpa+W53skUglqZ9apVhZmZmZmbWVvVcWZoILErzll4DLImIb0t6FFgs6TPA98nmG5D+fi0t4LCVrPFDRKyRtAR4lOwGlOel4X1IOp/sjtjjgIURsSYd64IaZZiZmZmZmbVVPavhPUx2z5Pq9HXsXM0un/4CNcb8R8SlwKUF6cvI5iDUVYaZmZmZmVm7DToMz8zMzMzMbCxyY8nMzMzMzKyAG0tmZmZmZmYF3FgyMzMzMzMr4MaSmZmZmZlZATeWzMzMzMzMCrixZGZmZmZmVsCNJTMzMzMzswJuLJmZmZmZmRVwY8nMzMzMzKyAG0tmZmZmZmYF3FgyMzMzMzMr4MaSmZmZjXqSFkraIumRXNrFkp6Q9FB6nJLbdqGkXkmPSToplz4rpfVKWpBLP0zSKkmPS7pZ0u4pfY/0ujdtnzo879jMWsGNJTMzMxsLbgBmFaRfGRFHp8cyAElHAHOAI9M+X5E0TtI44GrgZOAI4MyUF+DydKxpwDbg3JR+LrAtIg4Hrkz5zGyEcGPJzMzMRr2I+C6wtc7ss4HFEfFiRPwY6AWOS4/eiFgXES8Bi4HZkgS8A7gl7b8IOC13rEXp+S3ACSm/mY0A4zsdgJmZmVkHnS/pbOB+YH5EbAMmAStzefpSGsCGqvTjgYOApyOivyD/pMo+EdEv6ZmU/8nqQCTNA+YBdHV10dPT0/Cb2bFjx5D2q5g/vX/wTHWqxNFsTO1QtpjKFg84pgo3lsys4yQtBN4JbImIo1LaxcBfAD9P2T6WGyJzIdnQlpeBD0XE8pQ+C/giMA74akRcltIPI+sBPhB4EHhfRLwkaQ/gRuBY4CngTyNifdvfsJmVxTXAJUCkv1cAHwCKrvwExSNyYoD8DLJt18SIa4FrAWbMmBHd3d0DhF6sp6eHoexXcc6C24e8b7X1Z2VxNBtTO5QtprLFA46pwsPwzKwMbsBzCcxsmEXE5oh4OSJ+BVxHNswOsitDU3JZJwMbB0h/EpggaXxV+i7HStv3p/7hgGbWYW4smVnHeS6BmXWCpIm5l+8GKivlLQXmpJXsDgOmAfcC9wHT0sp3u5N13CyNiADuBk5P+88Fbssda256fjpwV8pvZiOAh+GZWZmVZi6BmY1skr4BdAMHS+oDLgK6JR1NNixuPfCXABGxRtIS4FGgHzgvIl5OxzkfWE423HdhRKxJRVwALJb0GeD7wPUp/Xrga5J6yTqF5rT5rZpZC7mxZGZlVaq5BK2YeJ3X6YmzrZzEDdk48k6/p3bwexo9IuLMguTrC9Iq+S8FLi1IXwYsK0hfx85hfPn0F4AzGgrWzErDjSUzK6WI2Fx5Luk64NvpZa05A9RIf2UuQbq6VDSXoG+wuQStmHid1+mJs62cxA3ZRO5Ov6d28HsyMxvbPGfJzErJcwnMzMys03xlycw6znMJzMzMrIzcWDKzjvNcAjMzMysjD8MzMzMzMzMr4MaSmZmZmZlZgUEbS5KmSLpb0lpJayR9OKUfKGmFpMfT3wNSuiRdJalX0sOSjskda27K/7ikubn0YyWtTvtcVbkpZK0yzMzMzMzM2q2eK0v9ZDeDfBMwEzhP0hHAAuDOiJgG3JleA5xMtjrVNLJ7klwDWcOHbNL28WRzBy7KNX6uSXkr+81K6bXKMDMzMzMza6tBG0sRsSkiHkzPtwNrye56PxtYlLItAk5Lz2cDN0ZmJdn9TSYCJwErImJrRGwDVgCz0rb9IuKetGTvjVXHKirDzMzMzMysrRqasyRpKvAWYBXQFRGbIGtQAYekbJOADbnd+lLaQOl9BekMUIaZmZmZmVlb1b10uKR9gG8CH4mIZ9O0osKsBWkxhPS6SZpHNoyPrq4uenp6Xtm2Y8eOXV4Pt7Fefhli6HT57Yph/vT+uvN27TV4/k6fIzMzM7OyqauxJGk3sobSTRFxa0reLGliRGxKQ+m2pPQ+YEpu98nAxpTeXZXek9InF+QfqIxdRMS1wLUAM2bMiO7uncX09PSQfz3cxnr5ZYih0+W3K4ZzFtxed9750/u5YvXAX/f1/z979x9nV1Xf+//1NgFEfkiQksYEDdaAIihCLsTS2qlUCOg19HFFg1QSipcWwWKb3hKst1AQi/0WLChFESKJPwgUtaQSjBGZy7WXYAICASNmgBQGIlECMRHFBj/fP9Y6cDjZZ+bMzPmxz5n38/E4jzln7bX3Wmt+rNnrx17r5L4x5sjMzMystzSyGp5Im0Oui4hLqw4tAyor2s0DbqoKPyWvijcL2JKn0K0AjpE0KS/scAywIh/bKmlWTuuUmmsVpWFmZmZmZtZSjYwsHQV8EFgr6Z4c9jHgYuAGSacBjwIn5mPLgeOBAeBZ4FSAiNgs6UJgdY53QURszu/PAK4FdgVuyS+GSMPMzMzMzKylhm0sRcT3KH6uCODogvgBnFnnWouARQXha4CDC8KfKkrDzMzMzMys1Ua0Gp6ZmZmZmdl44caSmZmZmZlZATeWzMzMzMzMCrixZGZmZmZmVsCNJTMzMzMzswJuLJmZmdm4IGmRpE2S7q8K21vSSknr89dJOVySLpc0IOk+SYdVnTMvx18vaV5V+OGS1uZzLs/7R9ZNw8zKz40lMzMzGy+uBWbXhC0Ebo2IGcCt+TPAccCM/DoduBJSwwc4DzgSOAI4r6rxc2WOWzlv9jBpmFnJubFkZmZm40JE3A5srgmeAyzO7xcDJ1SFL4lkFbCXpCnAscDKiNgcEU8DK4HZ+dieEXFH3nNySc21itIws5JzY8nMOs5TY8ysgyZHxEaA/JSbnPwAACAASURBVHXfHD4VeKwq3mAOGyp8sCB8qDTMrOQmdjoDZmakqTGfJfXEVlSmrVwsaWH+fA4vnRpzJGnay5FVU2NmAgHcJWlZ7vmtTI1ZBSwnTY25ZYg0zMxUEBajCG88Qel0Ul3F5MmT6e/vH8npAGzbtm1U51UsOGT7qM+tVcnHWPPUCmXLU9nyA85ThRtLZtZxEXG7pOk1wXOAvvx+MdBPasi8MDUGWCWpMjWmjzw1BkBSZWpMP3lqTA6vTI25ZYg0zGz8eFLSlIjYmOuSTTl8ENivKt404Ikc3lcT3p/DpxXEHyqNl4iIq4CrAGbOnBl9fX1F0YbU39/PaM6rmL/w5lGfW2vDySkfY81TK5QtT2XLDzhPFZ6GZ2Zl5akxZtYOy4DKtN15wE1V4afkqb+zgC25nlgBHCNpUp66ewywIh/bKmlWnup7Ss21itIws5LzyJKZdZu2T42B5kyPqdbp6Q3NnGoDqbev02VqBZept0i6jjQqtI+kQdLU3YuBGySdBjwKnJijLweOBwaAZ4FTASJis6QLgdU53gWVEW3gDNK04l1Jo9e35PB6aZhZybmxZGZlVZqpMdCc6THVOj29oZlTbSBNt+l0mVrBZeotEXFSnUNHF8QN4Mw611kELCoIXwMcXBD+VFEaZlZ+noZnZmXlqTFmZmbWUR5ZMrOO89QYMzMzKyM3lsys4zw1xszMzMrI0/DMzMzMzMwKuLFkZmZmZmZWwI0lMzMzMzOzAm4smZmZmZmZFXBjyczMzMzMrIAbS2ZmZmZmZgXcWDIzMzMzMyvgxpKZmZmZmVkBN5bMzMzMzMwKuLFkZmZmZmZWwI0lMzMzMzOzAsM2liQtkrRJ0v1VYXtLWilpff46KYdL0uWSBiTdJ+mwqnPm5fjrJc2rCj9c0tp8zuWSNFQaZmZmZmZm7dDIyNK1wOyasIXArRExA7g1fwY4DpiRX6cDV0Jq+ADnAUcCRwDnVTV+rsxxK+fNHiYNMzMzMzOzlhu2sRQRtwOba4LnAIvz+8XACVXhSyJZBewlaQpwLLAyIjZHxNPASmB2PrZnRNwREQEsqblWURpmZmZmZmYtN3GU502OiI0AEbFR0r45fCrwWFW8wRw2VPhgQfhQaexA0umk0SkmT55Mf3//C8e2bdv2ks/tNt7TL0MeOp1+q/Kw4JDtDcedvOvw8Tv9PTIzMzMrm9E2lupRQViMInxEIuIq4CqAmTNnRl9f3wvH+vv7qf7cbuM9/TLkodPptyoP8xfe3HDcBYds55K1Q/+5bzi5b4w5MjMzMxuZ6SO8nxnu/mfDxe8aa5ZeYrSNpSclTckjPlOATTl8ENivKt404Ikc3lcT3p/DpxXEHyoNM7OuMNQ/gEYq/FrN/gdgZmZmQxvt0uHLgMqKdvOAm6rCT8mr4s0CtuSpdCuAYyRNygs7HAOsyMe2SpqVV8E7peZaRWmYmZmZmZm1XCNLh18H3AEcKGlQ0mnAxcA7Ja0H3pk/AywHHgYGgC8AHwaIiM3AhcDq/LoghwGcAVydz3kIuCWH10vDzMzMrGkkbcjbmNwjaU0Oa/k2KWZWfsNOw4uIk+ocOrogbgBn1rnOImBRQfga4OCC8KeK0jCz8UXSBmAr8DywPSJm5u0IrgemAxuA90XE0/kG5DLgeOBZYH5E3J2vMw/4eL7sJyJicQ4/nLRFwq6kDp+zc11mZuPLH0bEz6o+V7YwuVjSwvz5HF66TcqRpC1QjqzaJmUm6fnruyQty6sAV7ZJWUWqZ2bzYuewmZXYaKfhmZm10x9GxKERMTN/bsdeb2Y2vrVjmxQzK7lmr4ZnZtYOc3hx0ZjFpAVjzqHqJgZYJalyE9NHvokBkFS5iekn38Tk8MpNjHt8zcaXAL4tKYDP51V227FNyksMtRVKo8a6VcVItqUYTiUfZdjCo1bZ8lS2/ED78lT2rVDcWDKzsivFTYxZxUiWuW2EVzkshaMi4olcl6yU9KMh4rZsm5ShtkJp1Fi3qhjpKp1DqWxJUYYtPGr19/cz/1u/aOo1x/K3XNbvUTvyVPatUNxYMrOyK8VNzGh6fIfq/Wqkd6xWM3vLmtl7DClvZewZHauiMrXie9dOvfhzGquIeCJ/3STpG6Tpuu3YJsXMSs6NJTMrtbLcxIymx3eo3rJGesdqNbO3rJm9x5DyVsae0bEqKlMrvnft1Is/p7GQtBvwsojYmt8fA1zAi1uYXMyO26ScJWkp6TnILbkuWgF8sup5yGOAcyNis6SteUuVO0nbpHymXeUzs7HxAg9mVlqSdpO0R+U96ebjftqz15uZjQ+Tge9Juhf4PnBzRHyL9myTYmYl55ElMyuzycA38pYkE4GvRsS3JK0Gbsj7vj0KnJjjLyctGz5AWjr8VEg3MZIqNzGw403MtaSlw2/BNzFm40pEPAy8pSC8cAuTZm6TYmbl58aSmZWWb2LMzMyskzwNz8zMzMzMrIAbS2ZmZmZmZgXcWDIzMzMzMyvgxpKZmZmZmVkBN5bMzMzMzMwKuLFkZmZmZmZWwI0lMzMzMzOzAm4smZmZmZmZFXBjyczMzMzMrIAbS2ZmZmZmZgXcWDIzMzMzMyvgxpKZmZmZmVkBN5bMzMzMzMwKTOx0BszMzKycpi+8uanX23Dxu5p6PTPrjGbXDWXmkSUzMzMzM7MCbiyZmZmZmZkV8DQ8MzMzsy5VOx1qwSHbmT+OpkiZtZobS2ZtNJ7m+JqZmZl1OzeWzMysdJrZseBFBcx6QzPrhQWHbMe3wdaI0j+zJGm2pAclDUha2On8mFnvcT1jZq3mesasO5W6SS1pAnAF8E5gEFgtaVlE/LCzOTOzXuF6xsxazfXM+DCWka/aZ808Il4epW4sAUcAAxHxMICkpcAcwJWLmTWL6xkzazXXMzYi3uOsPMreWJoKPFb1eRA4skN5sRZox4IHZVgZqAx5sLpcz/S4Zvb2doPhytuNZeoBrmfMupQiotN5qEvSicCxEfGh/PmDwBER8ZGaeKcDp+ePBwIPVh3eB/hZG7Jbz3hPvwx56HT6ZchDM9N/bUT8VpOu1XFNqmdGo9O/E63gMnWHbiiT65nR1zNl/Pk6T8MrW36g9/PUUD1T9pGlQWC/qs/TgCdqI0XEVcBVRReQtCYiZrYme8Mb7+mXIQ+dTr8Meeh0+iU35npmNHrxZ+IydYdeLFMXaFs9U8afr/M0vLLlB5ynirKvhrcamCFpf0k7A3OBZR3Ok5n1FtczZtZqrmfMulSpR5YiYruks4AVwARgUUQ80OFsmVkPcT1jZq3mesase5W6sQQQEcuB5WO4RNOmzTj9Uet0HjqdPnQ+D51Ov9SaUM+MRi/+TFym7tCLZSq9NtYzZfz5Ok/DK1t+wHkCSr7Ag5mZmZmZWaeU/ZklMzMzMzOzjujZxpKkRZI2Sbq/Q+nvJ+k2SeskPSDp7Dan/3JJ35d0b07/79uZflU+Jkj6gaRvdij9DZLWSrpH0poOpL+XpBsl/Sj/LrytzekfmMteef1c0kfbmQdLJO0taaWk9fnrpDrxnq/6eZXyAXBJsyU9KGlA0sKC47tIuj4fv1PS9PbncmQaKNN8ST+t+tl8qBP5bNRw/wOVXJ7Le5+kw9qdR2uORuuWHHdPSY9L+myn8yTpUEl35HuU+yS9vwX5KF1d1UCe/krSD/P35FZJr+10nqrivVdSSGr5anSN5EnS+/L36gFJX21ZZiKiJ1/A24HDgPs7lP4U4LD8fg/gx8BBbUxfwO75/U7AncCsDnwf/gr4KvDNDv0cNgD7dCLtnP5i4EP5/c7AXh3MywTgJ6R9BTqSh/H8Av4RWJjfLwQ+VSfetk7ndZhyTAAeAl6Xf6fvra3bgA8Dn8vv5wLXdzrfTSjTfOCznc7rCMo05P9A4Hjglvy/YhZwZ6fz7Neof9YN1S35+GX5f3JLf5cbyRNwADAjv381sLGZ/yPLWFc1mKc/BF6R359RhjzleHsAtwOrgJmdzhMwA/gBMCl/3rdV+enZkaWIuB3Y3MH0N0bE3fn9VmAdaQfvdqUfEbEtf9wpv9r6gJqkacC7gKvbmW5ZSNqTdMNyDUBE/Doinulglo4GHoqI/+xgHsazOaTGM/nrCR3My1gcAQxExMMR8WtgKals1arLeiNwtCS1MY8j1UiZukoD/wPnAEvy/4pVwF6SprQnd9ZkDdUtkg4HJgPfLkOeIuLHEbE+v38C2AQ0cyPiMtZVw+YpIm6LiGfzx1WkPblaqdH670JSI/hXLc5Po3n6n8AVEfE0QERsalVmeraxVCZ5WPetpNGddqY7QdI9pApoZUS0NX3gn4G/AX7T5nSrBfBtSXcp7YzeTq8Dfgp8MU9FvFrSbm3OQ7W5wHUdTH+8mxwRGyF1pgD71on3cklrJK2SVMYG1VTgsarPg+zYEfRCnIjYDmwBXtWW3I1OI2UC+B95asyNkvYrON5NGi2zld+wdYuklwGXAP+rLHmqJukI0gjCQ03MQxnrqpH+3Z1GGgFupWHzJOmtwH4R0a5HKhr5Ph0AHCDpP/L/y9mtykzplw7vdpJ2B74GfDQift7OtCPieeBQSXsB35B0cES05RkuSe8GNkXEXZL62pFmHUdFxBOS9gVWSvpR7nFth4mkaTAfiYg7JV1Gmo7wv9uU/guUNkF8D3Buu9MeTyR9B/jtgkN/O4LLvCb/zr4O+K6ktRHRzBuIsSrqda0dtW4kTpk0kt9/B66LiOck/TmpN/odLc9Z63Tbz2hca0Ld8mFgeUQ81qyBkybVd+QRzS8B8yKimZ2rZayrGk5P0p8AM4E/aGF+YJg85Yb2p0lTkdulke/TRNJUvD7S6Nv/zfe5TZ/B48ZSC0naidRQ+kpEfL1T+YiIZyT1A7OBdi14cRTwHknHAy8H9pT05Yj4kzalD7wwtE9EbJL0DdLQbrsaS4PAYNWI3o2kxlInHAfcHRFPdij9cSEi/qjeMUlPSpoSERvzzUHhlIGq39mH89/tW2lub+tYDQLVoyrTgCfqxBmUNBF4JR2cFt2AYcsUEU9VffwC8Kk25KuVGvk5Wkk0oW55G/D7kj4M7A7sLGlbRIz6f1Iz6rs8Xf1m4ON5OmgzlbGuaujvTtIfkRqdfxARz7UwP43kaQ/gYKA/N7R/G1gm6T0R0aqFsxr92a2KiP8CHpH0IKnxtLrZmfE0vBbJc16vAdZFxKUdSP+38ogSknYF/gj4UbvSj4hzI2JaREwnTf/6brsbSpJ2k7RH5T1wDO1rLBIRPwEek3RgDjoa+GG70q9xEp6C12nLgHn5/TzgptoIkiZJ2iW/34fU6dCp35l6VgMzJO2fRyznkspWrbqs7yX9/Zd51GLYMtU8z/Me0nOo3WwZcIqSWcCWyrQp6zrD1i0RcXJEvCb/T/5r0vNqrey8a6S+2xn4Rs7Lv7YgD2Wsqxqpa94KfB54Tyufw2k0TxGxJSL2iYjp+fdnVc5bK1cYbuRn92+kxTAq/y8PAB5uSW5atXJEp1+kG8ONwH+RWp+ntTn93yMNGd4H3JNfx7cx/TeTVgm5j9RA+LsO/iz66MBqeKRnhu7NrweAv+1AHg4F1uSfw7+RV21pcx5eATwFvLJTvwN+BaR58LcC6/PXvXP4TODq/P53gbX5d3Ztu+utEZTleNIKnw9V/q6AC0j/QCGNJv8rMAB8H3hdp/PchDL9Q65H7gVuA97Q6TwPU54d/gcCfw78eT4u4Ipc3rW0eHUrv1r6sx62bqmJP5/Wr4bXSH33J/n3856q16FNzkfp6qoG8vQd4Mmq78myTuepJm5/O+qLBr5PAi4ldSiuBea2Ki/KCZqZmZmZmVkVT8MzMzMzMzMr4MaSmZmZmZlZATeWzMzMzMzMCrixZGZmZmZmVsCNJbNxQtIiSZskDbt8uqRPS7onv34sqembvJmZmZmVnVfDMxsnJL0d2Eba0+LgEZz3EeCtEfGnLcucmZmZWQl5ZMlsnIiI26nZmVzS70j6lqS7JP1fSW8oONUb2pqZmdm4NLHTGTCzjrqKtEnleklHAv8CvKNyUNJrgf2B73Yof2ZmZmYd45GlcUTSBkl/1Ol8dIKk8yV9Ob9/jaRtkiY0OY2u+v5K2h34XeBfJd0DfB6YUhNtLnBjRDzf7vyZmZmVhaSPSbp6iOPzJX2vnXmy9nBjqQtJ+j1J/0/SFkmbJf2HpP/W4TxtkPSkpN2qwj4kqb+D2SoUEY9GxO5uAPAy4JmIOLTq9caaOHPxFDwboTLWUdUkXSspJB1RFfZ6SX6I12ycyp2olddvJP2y6vPJEfHJiPhQjjs91yGjmqGl5FOSnsqvf5Sk5pbImsWNpS4jaU/gm8BngL2BqcDfA8+1MM1GK4OJwNlNSE+S/LvZYhHxc+ARSSfCC9/3t1SOSzoQmATc0aEsWhdqVx012puUKpuBTzQjL2bW/XIn6u4RsTvwKPDfq8K+0uTkTgdOAN4CvBl4N/BnTU7DmsQ3pN3nAICIuC4ino+IX0bEtyPivvyw/ndzL8XPJH1F0l5FF5F0hKQ7JD0jaaOkz0rauep4SDpT0npgvaQrJF1Sc41/l/TRqqD/D/jrIdL8XUmrc2/zakm/W3WsX9JFkv4DeBZ4XQ77RO6h3pbTe1Uu18/zNaZXXeMySY/lY3dJ+v06+XihR0jS22p6k34laUOO9zJJCyU9lL+nN0jau+o6H5T0n/nY3xb/uMpD0nWkhs+BkgYlnQacDJwm6V7gAWBO1SknAUvDS2bayAxVR71M0sfz380mSUskvRJAUp+kweoLqWpqq9JU2hslfVnSz4H5kiYoTY15SNLW/He/X47/BkkrlUa2HpT0vpp8LgbeLOkPigoh6VRJ6/J1H5b0Z1XH+vLf0N/kcmyUdIKk45WW2t8s6WNV8YesS8ys/FQ1nR+4PX99Jt87vK0g/lB10DzgkogYjIjHgUuA+S0tgI2aG0vd58fA85IWSzpO0qSqYwL+AXg18EZgP+D8Otd5HvhLYB/gbcDRwIdr4pwAHAkcRLqxOEl5xEfSPvmc6ilaa4B+4K9rE8s3BjcDlwOvAi4Fbpb0qqpoHyT1tuwB/GcOm5vDpwK/Q7rZ/yKpx3odcF7V+auBQ/Oxr5KexXl5nfIDEBF3VPUkTQJWVZXpL/L34A9I39OngStyeQ4Crsx5e3Uu07Sh0uq0iDgpIqZExE4RMS0iromIRyJidkS8JSIOiogLquKfHxELO5ln60pD1VHz8+sPgdcBuwOfHcG15wA3AnsBXwH+itSoPx7YE/hT4Fml6cArSfXAvjnOv0h6U9W1ngU+CVxUJ61NpN7ePYFTgU9LOqzq+G8DLyfVTX8HfAH4E+Bw4PeBv5P0uhy3bl1iZl3p7fnrXvke4iUzMBqog94E3Ft1yr05zErIjaUuk6dO/R4QpH/OP5W0TNLkiBiIiJUR8VxE/JTUICnsNY2IuyJiVURsj4gNpIf7a+P+Q0Rszj3D3we2kBpIkBox/RHxZM05fwd8RNJv1YS/C1gfEV/KaV4H/Aj471Vxro2IB/Lx/8phX4yIhyJiC3AL8FBEfCcitgP/Cry1qkxfjoin8vmXALsAB9b/bu7gcuAXQGWU6M+Av809P8+RGp7vVZr+817gmxFxez72v4HfjCAts540VB1FGsm8NCIejohtwLnAXDU+pe6OiPi3iPhNRPwS+BDw8Yh4MJJ7I+IpUiNnQ0R8MdcHdwNfI/3dVvs88BpJxxWU4+Zc90RE/B/g26RGUMV/ARflumopqePpsojYGhEPkEZq35zjDlWXmFnvGa4O2p10T1WxBdhd8nNLZeTGUheKiHURMT8ipgEHk3oq/1nSvpKWSno8T1P5Mukf+A4kHSDpm5J+kuN+siDuYzWfF5N6Tslfv1SQt/tJzyvUjki8mhdHiyr+k9QrWy89gOrG2C8LPu9e+SBpQZ42s0XSM8ArqVP+WnmKTR/wgYioNHpeC3xDaariM6SRrOeBybk8L+Q3In4BPNVIWma9rl4dxY71wH+SnnWc3OCla+uI/YCHCuK9Fjiy8reb/35PJo0GVefzOeDC/HrJTUoeFVuVp9A8Qxq9qq5PnqpaJOaX+Wu9+mmousTMes9wddA20qh1xZ7ANk97Lyc3lrpcRPwIuJZ0Q/IPpN7cN0fEnqQGTb1eiitJIzszctyPFcSt/aP9MjBHaRGANwL/Vufa5wH/k5c2hJ4gVR7VXgM8PkR6DVN6Pukc4H3ApIjYi9RTM2wvTT73QmBOHsGqeAw4LiL2qnq9PM8v3ki6Uatc4xWkqXhmVqWmjqqtB14DbCc1Mn4BvKJyQGlp/9oR6to64jHS9NxajwH/p+Zvd/eIOKMg7hdJHSt/XJX2LqRe4H8CJuf6ZDkN1Cd1DFWXmFn3Ge5+Zbg66AHS4g4Vb8lhVkJuLHWZ/MDgAknT8uf9SHNhV5Ge9dlGeuBwKvC/hrjUHsDPgW2S3gAU3US8REQMkp4L+hLwtTwNpijeAHA9aZ5+xXLgAEkfUFpY4f2kZ6G+OVy6DdqDdNP1U2CipL/jpb02hfL373rglIj4cc3hzwEXKW3MiqTfklRZAOFG4N1KSyTvDFyA/57MhqujrgP+UtL+Svt8fRK4Pk+r/THwcknvkrQT8HHSVNqhXA1cKGmGkjfn5yC/SapvPihpp/z6b5Jql8Ynp30+qbOlYuec9k+B7Xma3jGj/Z4wdF1iZt3np6Sp96+rc3y4OmgJ8FeSpkp6NbCA1KlkJeSbu+6zlbTowp2SfkG6Abmf9If298BhpBGVm4GvD3GdvwY+kK/3BVKDoRGLgUMomIJX4wLghT2Xqp4jWECarvY3wLsj4mcNpjucFaRnmn5MmtrzK4qn9dU6mjQsfqNeXBGv0rtzGbAM+LakraTv9ZG5PA8AZ5Ie3txIemB7sPbiZuPQUHXUIlLdcTvwCOnv9CMAeVT3w6QG0OOkkabh/qYuBW4gPU/0c+AaYNeI2Epq3MwljWb9BPgU9Rtf15H+jsl52Urq7LmB9Lf9AVJdMFp16xIz6z4R8SxpcZj/yNPsZtUcH64O+jzw78BaUv14cw6zEpKnR9pISHo7aTre9Kpne8zMzMzMeo5HlqxheWrM2cDVbiiZmZmZWa9zY8kakufZPgNMIa1qZWZmZmbW0zwNz8zMzMzMrIBHlszMzMzMzAq4sWRmZmZmZlZgYqcz0Gz77LNPTJ8+/SVhv/jFL9htt92KT+givVIO6J2y9Eo5oPGy3HXXXT+LiNrNQseVonqmSC/9fgxnvJR1vJQTOltW1zON1zNFuuH3tOx5LHv+wHkcq4brmYjoqdfhhx8etW677bYdwrpRr5QjonfK0ivliGi8LMCaaPLfLWn/nU3A/VVhewMrgfX566QcLuByYAC4Dzis6px5Of56YF5V+OGk/SwG8rkaKo3hXkX1zFi+p71gvJR1vJQzorNlbUU9022vRuuZIt3we1r2PJY9fxHO41g1Ws94Gp6ZlcG1wOyasIXArRExA7g1fwY4DpiRX6cDVwJI2hs4j7TZ5xHAeZIm5XOuzHEr580eJg0zMzMzN5bMrPMi4nZgc03wHGBxfr8YOKEqfEnuGFoF7CVpCnAssDIiNkfE06SRotn52J4RcUfuSVpSc62iNMysx0jaT9JtktZJekDS2Tn8fEmPS7onv46vOudcSQOSHpR0bFX47Bw2IGlhVfj+ku6UtF7S9ZJ2zuG75M8D+fj09pXczMai555ZMrOeMTkiNgJExEZJ++bwqcBjVfEGc9hQ4YMF4UOlsQNJp5NGp5g8eTL9/f3DFmDbtm0NxesF46Ws46Wc0JNl3Q4siIi7Je0B3CVpZT726Yj4p+rIkg4C5gJvAl4NfEfSAfnwFcA7SfXJaknLIuKHwKfytZZK+hxwGmlk+zTg6Yh4vaS5Od77W1paM2sKN5bMrNuoICxGET4iEXEVcBXAzJkzo6+vb9hz+vv7aSReLxgvZR0v5YTeK2vuGKl0jmyVtI4XO06KzAGWRsRzwCOSBkhTfAEGIuJhAElLgTn5eu8APpDjLAbOJzWW5uT3ADcCn5WkPNptZiXmxlITTF94c1Ovt+HidzX1emZd6klJU/KIzxTSAhCQenL3q4o3DXgih/fVhPfn8GkF8YdKw8zo3f9veRrcW4E7gaOAsySdAqwhjT49TWpIrao6rXpUunYU+0jgVcAzEbG9IP4LI98RsV3Slhz/ZzX5GvEIdpGxjAqufXzLqM4rcsjUV9Y9VvaRy7LnD5zHdnFjyczKahlpdbuL89ebqsLPyr25RwJbcmNnBfDJqkUdjgHOjYjNkrZKmkW6MToF+MwwaZhZj5K0O/A14KMR8XNJVwIXkkacLwQuAf6U+qPSRc97DzeK3dAI92hGsIuMZVRwfhMbyBtOrp+Hso9clj1/4Dy2ixtLZtZxkq4jjQrtI2mQtKrdxcANkk4DHgVOzNGXA8eTlgF/FjgVIDeKLgRW53gXRERl0YgzSCvu7Qrckl8MkYaZ9SBJO5EaSl+JiK8DRMSTVce/AHwzf6w3ik2d8J+RFpyZmEeXquNXrjUoaSLwSnZc1MbMSsiNJTPruIg4qc6howviBnBmnessIu3ZVBu+Bji4IPypojTMrPdIEnANsC4iLq0Kn1JZ6AX4Y+D+/H4Z8FVJl5IWeJgBfJ80SjRD0v7A46RFID4QESHpNuC9wFJ2HBGfB9yRj3/XzyuZdQc3lszMzGw8OAr4ILBW0j057GPASZIOJU2L2wD8GUBEPCDpBuCHpJX0zoyI5wEknQWsACYAiyLigXy9c4Clkj4B/IDUOCN//VJeJGIzqYFlZl1g2MaSpEXAu4FNEXFwDtsbuB6YTqpY3hcRT+dem8tIU2SeBeZHxN35nHnAx/NlPxERi3P44bw4PWY5cHbunSlMY8wlNjMzs3EnIr5H8bNDy4c45yLgooLw5UXn5RXyjigI/xWe5mvWlRrZlPZaXtztvqLe+26MyAAAIABJREFUrvfHkYapZ5BWc7kSXmhcnUd6GPsI4Lyqh7CvzHEr580eJg0zMzMzM7OWG7axFBG3s+NDiPV2vZ8DLIlkFelBxynAscDKiNicR4dWArPzsT0j4o48d3dJzbWK0jAzMzMzM2u50T6zVG/X+xf2EcgqewwMFT5YED5UGjsYbl+CVq/xvuCQ7cNHGoF6ee2FteoreqUsvVIO6K2ymJmZmTVDsxd4qLePwEjDR2S4fQlavcZ7M/ckgPr7EvTCWvUVvVKWXikH9FZZzMzMzJqhkWeWijyZp9BRs+t9vT0JhgqfVhA+VBpmZmZmZmYtN9rGUmW/ANhxH4FTlMwCtuSpdCuAYyRNygs7HAOsyMe2SpqVV9I7hR33JKhNw8zMzMzMrOUaWTr8OqAP2EfSIGlVu3q73i8nLRs+QFo6/FSAiNgs6UJgdY53QURUFo04gxeXDr8lvxgiDTMzMzMzs5YbtrEUESfVObTDrvd5Rbsz61xnEbCoIHwNcHBB+FNFaZiZmZmZmbXDaKfhmZmZmZmZ9TQ3lszMzMzMzAq4sWRmZmZmZlag2fssmZlZF5je5P3hrp29W1OvZ2ZmVgYeWTIzMzMzMyvgxpKZmZmZmVkBN5bMrNQk/aWkByTdL+k6SS+XtL+kOyWtl3S9pJ1z3F3y54F8fHrVdc7N4Q9KOrYqfHYOG5C0sP0lNDMzs7JyY8nMSkvSVOAvgJkRcTAwAZgLfAr4dETMAJ4GTsunnAY8HRGvBz6d4yHpoHzem4DZwL9ImiBpAnAFcBxwEHBSjmtmZmbmxpKZld5EYFdJE4FXABuBdwA35uOLgRPy+zn5M/n40ZKUw5dGxHMR8QgwAByRXwMR8XBE/BpYmuOaWY+RtJ+k2ySty6PVZ+fwvSWtzCPVKyVNyuGSdHkedb5P0mFV15qX46+XNK8q/HBJa/M5l+f6p24aZlZ+biyZWWlFxOPAPwGPkhpJW4C7gGciYnuONghMze+nAo/lc7fn+K+qDq85p164mfWe7cCCiHgjMAs4M48kLwRuzSPVt+bPkEacZ+TX6cCVkBo+wHnAkaQOl/OqGj9X5riV82bn8HppmFnJeelwMyutfAMyB9gfeAb4V9INTK2onFLnWL3wog6jKAhD0umkmyAmT55Mf3//UFkHYNu2bQ3F64QFh2wfPtIIlLmszTReygmprAsOeb6p1+zk9y4iNpI6XYiIrZLWkTpH5gB9OdpioB84J4cviYgAVknaS9KUHHdlRGwGkLQSmC2pH9gzIu7I4UtIo963DJGGmZWcG0tmVmZ/BDwSET8FkPR14HeBvSRNzKNH04AncvxBYD9gME/beyWwuSq8ovqceuEvERFXAVcBzJw5M/r6+obNfH9/P43E64T5LdhnqaxlbaYy/0ybrb+/n0u+94umXnPDyX1Nvd5o5cVf3grcCUzODSkiYqOkfXO0kY5IT83va8MZIg0zKzk3lsyszB4FZkl6BfBL4GhgDXAb8F7SM0bzgJty/GX58x35+HcjIiQtA74q6VLg1aTpMd8njTjNkLQ/8DhpEYgPtKlsZtYBknYHvgZ8NCJ+nh8rKoxaEDbUSHW98JHkbcQj2EXGMgLazFHnofJQ9lHasucPnMd2cWPJzEorIu6UdCNwN+l5gx+QRnduBpZK+kQOuyafcg3wJUkDpBGlufk6D0i6Afhhvs6ZEfE8gKSzgBWklfYWRcQD7SqfmbWXpJ1IDaWvRMTXc/CTkqbkEZ8pwKYcXm9EepAXp9RVwvtz+LSC+EOl8RKjGcEuMpYR0GaOOg81klj2Udqy5w+cx3YZ0wIP3v/EzFotIs6LiDdExMER8cG8ot3DEXFERLw+Ik6MiOdy3F/lz6/Pxx+uus5FEfE7EXFgRNxSFb48Ig7Ixy7qRBnNrPXyynTXAOsi4tKqQ5URadhxpPqUvCreLGBLnkq3AjhG0qT8XOUxwIp8bKukWTmtU9hx1Ls2DTMruVGPLFXtf3JQRPwy99rOBY4n7X+yVNLnSPueXEnV/ieSKvukvL9m/5NXA9+RdEBO5grgnaTemtWSlkXED0eb52rTmzxf38zMzErtKOCDwFpJ9+SwjwEXAzdIOo009ffEfGw56Z5mAHgWOBUgIjZLuhBYneNdUFnsATgDuBbYlbSwQ6Vjpl4aZlZyY52GV9n/5L946f4nlTn/i4HzSY2lOfk9pP1PPlu7/wnwSJ4+c0SON1DpGZZU2f+kKY0lMzMzGz8i4nsUP1cE6XnI2vgBnFnnWouARQXha4CDC8KfKkrDzMpv1NPwvP+JmZmZmZn1srFMw+ua/U+KVuJo9h4jzVRv1ZBeWFGkolfK0ivlgN4qi5mZmVkzjGUaXtfsf1K0Ekez9xhppnqrx/TCiiIVvVKWXikH9FZZzMzMzJphLKvhvbD/SX726GjS80SV/U+geP8TqNr/JIfPzavl7c+L+5+sJu9/klfUm5vjmpmZmZmZtdyoR5a8/4mZmZmZmfWyMa2GFxHnAefVBD/Mi6vZVcf9FXWWysx7m+ywv0lELCct3WlmZmZmZtZWY9qU1szMzMzMrFe5sWRmZmZmZlbAjSUzMzMzM7MCbiyZmZmZmZkVcGPJzMzMzMysgBtLZmZmZmZmBdxYMjMzMzMzK+DGkpmZmZmZWQE3lszMzMzMzAq4sWRmpSZpL0k3SvqRpHWS3iZpb0krJa3PXyfluJJ0uaQBSfdJOqzqOvNy/PWS5lWFHy5pbT7ncknqRDnNzMysfNxYMrOyuwz4VkS8AXgLsA5YCNwaETOAW/NngOOAGfl1OnAlgKS9gfOAI4EjgPMqDawc5/Sq82a3oUxmZmbWBdxYMrPSkrQn8HbgGoCI+HVEPAPMARbnaIuBE/L7OcCSSFYBe0maAhwLrIyIzRHxNLASmJ2P7RkRd0REAEuqrmVmPUTSIkmbJN1fFXa+pMcl3ZNfx1cdOzePOD8o6diq8Nk5bEDSwqrw/SXdmUevr5e0cw7fJX8eyMent6fEZtYMbiyZWZm9Dvgp8EVJP5B0taTdgMkRsREgf903x58KPFZ1/mAOGyp8sCDczHrPtRSPHH86Ig7Nr+UAkg4C5gJvyuf8i6QJkiYAV5BGsQ8CTspxAT6VrzUDeBo4LYefBjwdEa8HPp3jmVmXmNjpDJiZDWEicBjwkYi4U9JlvDjlrkjR80YxivAdLyydTpqux+TJk+nv7x8iG8m2bdsaitcJCw7Z3tTrlbmszTReygmprAsOeb6p1+zk9y4ibh/BqM4cYGlEPAc8ImmANIUXYCAiHgaQtBSYI2kd8A7gAznOYuB80jTfOfk9wI3AZyUpj2abWcm5sWRmZTYIDEbEnfnzjaTG0pOSpkTExjyVblNV/P2qzp8GPJHD+2rC+3P4tIL4O4iIq4CrAGbOnBl9fX1F0V6iv7+fRuJ1wvyFNzf1etfO3q20ZW2mMv9Mm62/v59LvveLpl5zw8l9Tb1ek5wl6RRgDbAgT9WdCqyqilM96lw7Sn0k8CrgmYjYXhD/hZHtiNguaUuO/7PajIymU6bIWBr1zexIGSoPZe94KHv+wHlslzE1liTtBVwNHEzqjf1T4EHgemA6sAF4X0Q8nVeYugw4HngWmB8Rd+frzAM+ni/7iYhYnMMPJw2b7wosB852T4zZ+BERP5H0mKQDI+JB4Gjgh/k1D7g4f70pn7KMdOOzlHQDsyU3qFYAn6xa1OEY4NyI2Cxpq6RZwJ3AKcBn2lZAM+u0K4ELSfcwFwKXkO5l6o06Fz2+MNwodcMj2KPplCkylkZ9MztShmocl73joez5A+exXcb6zJJXqTKzVvsI8BVJ9wGHAp8kNZLeKWk98M78GVKnysPAAPAF4MMAEbGZdCO0Or8uyGEAZ5A6fQaAh4Bb2lAmMyuBiHgyIp6PiN+Q6ozKVLuhRqmLwn9GWlBmYk34S66Vj78S2IyZdYVRjyxVrVI1H9IqVcCvJc3hxekui0lTXc6hapUqYFXeO2VKjruycuMiqbJKVT95laocXlmlyjcyZuNIRNwDzCw4dHRB3ADOrHOdRcCigvA1pNFxMxtnKtN588c/Bior5S0DvirpUuDVpA7b75NGiWZI2h94nLQIxAciIiTdBrwXWMqOI97zgDvy8e96loxZ9xjLNLzqVareAtwFnE3NKlWSWr5K1XBzfIvmSzb74eZmqje3sxfmfVb0Sll6pRzQW2UxM6sl6TpSB+0+kgZJs1r6JB1Kmha3AfgzgIh4QNINpCm/24EzI+L5fJ2zgBXABGBRRDyQkzgHWCrpE8APyFse5K9fyotEbCY1sMysS4ylsVSaVaqGm+NbNF+y2Q83N1O9Ob69MO+zolfK0ivlgN4qi5lZrYg4qSD4moKwSvyLgIsKwpeTpvzWhj/Mi9P4qsN/BZw4osyaWWmM5ZmlolWqDiOvUgVpeJvGVqmqF97QKlVmZmZmZmbNNurGUkT8BHhM0oE5qLJKVWVuLuw4Z/cUJbPIq1SRhrKPkTQpL+xwDLAiH9sqaVZeSe+UqmuZmZmZmZm11Fj3WaqsUrUzaQWqU0kNsBsknQY8yotDz8tJy4YPkJYOPxXSKlWSKqtUwY6rVF1LWjr8Fry4g5mZmZmZtcmYGktepcrMzMzMzHrVWPdZMjMzMzMz60luLJmZmZmZmRVwY8nMzMzMzKyAG0tmZmZmZmYF3FgyMzMzMzMr4MaSmZmZmZlZgbHus2RmZnWsfXwL8xfe3LTrbbj4XU27lpmZmQ3PI0tmZmZmZmYF3FgyMzMzMzMr4MaSmZmZmZlZATeWzMzMzMzMCrixZGalJ2mCpB9I+mb+vL+kOyWtl3S9pJ1z+C7580A+Pr3qGufm8AclHVsVPjuHDUha2O6ymZmZWXm5sWRm3eBsYF3V508Bn46IGcDTwGk5/DTg6Yh4PfDpHA9JBwFzgTcBs4F/yQ2wCcAVwHHAQcBJOa6Z9SBJiyRtknR/VdjeklbmzpeVkiblcEm6PHek3CfpsKpz5uX46yXNqwo/XNLafM7lkjRUGmZWfm4smVmpSZoGvAu4On8W8A7gxhxlMXBCfj8nfyYfPzrHnwMsjYjnIuIRYAA4Ir8GIuLhiPg1sDTHNbPedC2pw6TaQuDW3Plya/4MqRNlRn6dDlwJqeEDnAccSapDzqtq/FyZ41bOmz1MGmZWct5nyczK7p+BvwH2yJ9fBTwTEdvz50Fgan4/FXgMICK2S9qS408FVlVds/qcx2rCjyzKhKTTSTdBTJ48mf7+/mEzPnlXWHDI9mHjNaqRNBvVzHwBbNu2ran5K6vxUk5IZV1wyPNNvWanv3cRcXv19NxsDtCX3y8G+oFzcviSiAhglaS9JE3JcVdGxGYASSuB2ZL6gT0j4o4cvoTUkXPLEGmYWcmNubGUp7GsAR6PiHdL2p/UO7s3cDfwwYj4taRdgCXA4cBTwPsjYkO+xrmk6TPPA38RESty+GzgMmACcHVEXDzW/JpZ95D0bmBTRNwlqa8SXBA1hjlWL7xodD0KwoiIq4CrAGbOnBl9fX1F0V7iM1+5iUvWNq9PasPJw6fZqGZulgtw7ezdaOR70u36+/vHRTkhlfWS7/2iqdds5u9wE02OiI0AEbFR0r45/IXOl6zSyTJU+GBB+FBpvMRoOmWKjKVR364OnrJ3PJQ9f+A8tksz/otXniXYM3+uPEuwVNLnSI2gK6l6lkDS3Bzv/TXPErwa+I6kA/K1rgDeSapwVktaFhE/bEKezaw7HAW8R9LxwMtJ9cw/A3tJmphHl6YBT+T4g8B+wKCkicArgc1V4RXV59QLN7PxbaSdL0N15DRkNJ0yRcbSqG9mR8pQjeOydzyUPX/gPLbLmJ5Z8rMEZtZKEXFuREyLiOmkTpXvRsTJwG3Ae3O0ecBN+f2y/Jl8/Lt5Cs0yYG5eLW9/0rME3wdWAzPy6no75zSWtaFoZlYeT+bpdeSvm3J4vU6WocKnFYQPlYaZldxYR5a64lmCoiHAZs/Xb6Z6w5W9MJRZ0Stl6ZVyQNeV5RxgqaRPAD8Arsnh1wBfkjRAGlGaCxARD0i6AfghsB04MyKeB5B0FrCCNN13UUQ80NaSmFmnVTpZLmbHzpezJC0l3X9syVPoVgCfrFrU4Rjg3IjYLGmrpFnAncApwGeGScPMSm7UjaVuepagaAiw2fP1m2pt8RzxBYc8P6r54xsuftdYc9R0vTAsC71TDih/WSKin/RQNBHxMGn0uTbOr4AT65x/EXBRQfhyYHkTs2pmJSXpOtJCC/tIGiStancxcIOk04BHebEOWQ4cT5rx8ixwKkBuFF1IGpkGuKCy2ANwBmnFvV1JCzvcksPrpWFmJTeWkSU/S2BmZmZdIyJOqnPo6IK4AZxZ5zqLgEUF4WuAgwvCnypKw8zKb9TPLPlZAjMzMzMz62Wt2GfJzxKYmZmZmVnXa0pjyc8SmJmZmZlZrxnT0uFmZmZmZma9qhXT8MzMzHrW9CavplrGFUvNzCzxyJKZmZmZmVkBN5bMzMzMzMwKuLFkZmZmZmZWwI0lMzMzMzOzAm4smZmZmZmZFXBjyczMzMzMrIAbS2ZmZmZmZgXcWDIzMzMzMyvgxpKZmZmZmVkBN5bMzMzMzMwKTOx0BszMzKx5pi+8uSnXWXDIdnybYGbjnWtBMystSfsBS4DfBn4DXBURl0naG7gemA5sAN4XEU9LEnAZcDzwLDA/Iu7O15oHfDxf+hMRsTiHHw5cC+wKLAfOjohoSwGtLZrVeAA3IHqVpA3AVuB5YHtEzHQ9Y2bgaXhmVm7bgQUR8UZgFnCmpIOAhcCtETEDuDV/BjgOmJFfpwNXAuSbnvOAI4EjgPMkTcrnXJnjVs6b3YZymVn5/GFEHBoRM/Nn1zNmNvruMff4mlmrRcRGYGN+v1XSOmAqMAfoy9EWA/3AOTl8Sa4nVknaS9KUHHdlRGwGkLQSmC2pH9gzIu7I4UuAE4Bb2lG+XrL28S3Mb+IIzoaL39W0a5mNkusZMxvTXIJKj+/dkvYA7soVw3xST8zFkhaSemLO4aU9MUeSelmOrOqJmQlEvs6yiHiaF3tiVpEaS7Nx5WI2LkmaDrwVuBOYnBtSRMRGSfvmaFOBx6pOG8xhQ4UPFoQXpX86qT5i8uTJ9Pf3D5vnybtWpm01RyNpNqqZ+YLxU9ZmlxOaW1ZoXv66oaxNFMC3JQXw+Yi4ii6pZ4ps27Zt1Oe26+94LHlsh7LnD5zHdhl1Y8k9vmbWLpJ2B74GfDQifp4GqoujFoTFKMJ3DEw3T1cBzJw5M/r6+obJNXzmKzdxydrmPd+y4eTh02xUM0eBIN1gjYeyNruc0NyyQvPK2w1lbaKjIuKJ3CBaKelHQ8QtVT1TpL+/n9Ge29QR4iF+3mPJYzuUPX/gPLZLU2rBsvf4FrVqm91b1g6j7eUrY4u+F3oaoHfKAeUti6SdSA2lr0TE13Pwk5Km5DpmCrAphw8C+1WdPg14Iof31YT35/BpBfHNbByJiCfy102SvkF65sj1jJmNvbHUDT2+Ra3aZveqtsNoe/nK2JPXCz0N0DvlgHKWJT/reA2wLiIurTq0DJgHXJy/3lQVfpakpaTpvlvyjc4K4JNVD1sfA5wbEZslbZU0i9TZcwrwmZYXzMxKQ9JuwMvyLJndSPXDBbieMTPG2Fhyj6+ZtdhRwAeBtZLuyWEfI9283CDpNOBR4MR8bDlpEZkB0kIypwLkm5ULgdU53gWVqb/AGby4kMwteKqv2XgzGfhG7uydCHw1Ir4laTWuZ8ZkqGX7FxyyveMd115IxhoxltXw3ONrZi0VEd+jeJQZ4OiC+AGcWedai4BFBeFrgIPHkE0z62IR8TDwloLwp3A9YzbujWVkyT2+ZmZmZmbWs8ayGp57fM3MzMzMrGc1d01QMzNrmaHm/5uZmVnzubFkZmbWQW4Em5mVlxtLZmZWOm5AmJlZGbixZGZmZmbjTrOXNvdS5L3JjaVxoNk9tK4MzMzMzGw8eFmnM2BmZmZmZlZGbiyZmZmZmZkVcGPJzMzMzMysgBtLZmZmZmZmBdxYMjMzMzMzK+DGkpmZmZmZWQE3lszMzMzMzAp4nyUbsWbs21TZ7M17NpmZmZlZWXlkyczMzMzMrEDpG0uSZkt6UNKApIWdzo+Z9R7XM2bWaq5nzLpTqRtLkiYAVwDHAQcBJ0k6qLO5MrNe4nrGzFrN9YxZ9yp1Ywk4AhiIiIcj4tfAUmBOh/NkZr3F9YyZtZrrGbMuVfYFHqYCj1V9HgSO7FBezKw3uZ4xs1ZzPTMONGMBrJGoLJbVCC+oNXplbyypICx2iCSdDpyeP26T9GBNlH2AnzU5b233Fz1SDnixLPpUp3MyZj3zM6Hxsry21Rlps2bVM0V66fdjSL1UPw1lvJQTWlPWEdT5rmcar2eKlP73tOx/S2XPH4wsjx283yrz97GheqbsjaVBYL+qz9OAJ2ojRcRVwFX1LiJpTUTMbH722qtXygG9U5ZeKQf0VllGqCn1TJHx9D0dL2UdL+WE8VXWNmhZPVOkG352Zc9j2fMHzmO7lP2ZpdXADEn7S9oZmAss63CezKy3uJ4xs1ZzPWPWpUo9shQR2yWdBawAJgCLIuKBDmfLzHqI6xkzazXXM2bdq9SNJYCIWA4sH+NlxjykXRK9Ug7onbL0Sjmgt8oyIk2qZ4qMp+/peCnreCknjK+ytlwL65ki3fCzK3sey54/cB7bQhE7PF9oZmZmZmY27pX9mSUzMzMzM7OO6OnGkqTZkh6UNCBpYafzUyFpg6S1ku6RtCaH7S1ppaT1+eukHC5Jl+cy3CfpsKrrzMvx10uaVxV+eL7+QD63aMnS0eZ9kaRNku6vCmt53uul0YKynC/p8fyzuUfS8VXHzs35elDSsVXhhb9n+UHeO3Oer88P9SJpl/x5IB+fPsZy7CfpNknrJD0g6ewc3pU/l15Q1rqnWUZSh3WbZtVx3aBZdaCVj6QT8/+D30gq1UpkZa8fi/4uyqbe//0ykfRySd+XdG/O4993Ok+jFhE9+SI9QPkQ8DpgZ+Be4KBO5yvnbQOwT03YPwIL8/uFwKfy++OBW0h7NMwC7szhewMP56+T8vtJ+dj3gbflc24Bjmti3t8OHAbc386810ujBWU5H/jrgrgH5d+hXYD98+/WhKF+z4AbgLn5/eeAM/L7DwOfy+/nAtePsRxTgMPy+z2AH+f8duXPpdtfQ/1O9MqLEdRh3fZqRh3XLa9m1IGdLoNfdX+2bwQOBPqBmZ3OT1W+Sl8/Fv1dlO1V7/9+p/NVk0cBu+f3OwF3ArM6na/RvHp5ZOkIYCAiHo6IXwNLgTkdztNQ5gCL8/vFwAlV4UsiWQXsJWkKcCywMiI2R8TTwEpgdj62Z0TcEek3dEnVtcYsIm4HNncg7/XSaHZZ6pkDLI2I5yLiEWCA9DtW+HuWR17eAdxYkOfqstwIHF0ZqRllOTZGxN35/VZgHWm3+K78ufSAbqt7mqUnfheaVMd1hSbVgVZCEbEuIka7oW0rlb5+HOHfRUcM8X+/NHK9uC1/3Cm/unKhhF5uLE0FHqv6PEh5fpEC+Laku5R26waYHBEbIf0RAPvm8HrlGCp8sCC8ldqR93pptMJZeUrNoqqpRCMty6uAZyJie034S66Vj2/J8ccsT+l7K6kHp9d+Lt2izHVPs4ykDusFI/1b6nYjqQPNRsK/R01W83+/VCRNkHQPsInUGVu6PDailxtLRT31ZWnRHhUR/3979x9kV1nfcfz9NQmQghrUNUOT1GBJUxAsooV06jCpYAjUGpyBDgwjQemkOjDVGdoxlRmpWjo6HbSDVaZpSQWKUlp0yAhOjMjW2hH5JRIgIiuirKGkGIIstNjAt3/cZ6fX5dmf7L1n7933a+bMvec5zz3P9+7ZvZtPzjnPHgecClwQESdO0He89zHd9ib0Yu1XAL8OHAs8BlxW2mfzvXTkfUbEIcANwAcz8+cTdR1n/Ll8XHrJfPh6TeczrJ/147Ge7megGhIRX4+I+yrLnDpTM4bfR7NoGr/3G5GZz2fmscBy4PiIOLrpmmain8PSMLCibX05sLuhWn5JZu4uj3uAL9M6Lf346OUb5XFP6T7e+5iofXmlvZO6Uft4Y8yqzHy8/HC/APw9/3+ZyXTfyxO0LslZOKb9l/ZVtr+Sl3jKPyIW0frAvDYzv1Sa++a49Jg5+9kzW6b5GdYPpvuz1LNm8BmohmTmyZl5dGW5senaJuD30SwZ5/f+nJSZ+2jdP7e+4VJmpJ/D0h3AqmjNSHYArRvptzVcExFxcES8fPQ5sA64j1Zto7OPbQRGP+y2AeeWWZfWAE+Vy0C2A+si4tBymcQ6YHvZ9nRErCn3wZzbtq9O6Ubt440xq8bcb/AuWsdmdPyzojWT3eHAKlqTHlS/z8q9PbcCZ1Rqbn8vZwDfKP1nWnMAVwK7MvNTbZv65rj0mDn52TNbZvAZ1g+m+7PUs2bwGShNR19/PnbLBL/354yIGIiIJeX5YuBk4PvNVjVDE83+0OsLrZmKfkBr5pWLm66n1PR6WrO/fA+4f7QuWves3AI8VB5fVdoD+Gx5Dztpm9UGeC+tm2yHgPe0tb+F1i+4HwJ/S/njw7NU/xdpXZrxv7T+h+j8btQ+3hgdeC/XlFrvpfUBflhb/4tLXQ/SNsPgeN9n5VjfXt7jvwAHlvaDyvpQ2f76l/g+3krrMoZ7gXvKclqvHpd+WMb7nuiHhWl+hvXaMlufcb2wzNZnoMvcW2gF3WHgOeBxWv/x1XhdpbY5/flY+7louqZKjdXf+03XNabGNwLfLTXeB3yk6Zpmuoz+g0eSJEmS1KafL8OTJEmSpBkzLEmSJElShWFJkiRJkioMS5IkSZJUYViSJEmSpArDkiRJkiRVGJYkSZIkqcKwJEmSJEkVhiV1TER8OCL+YYLt50XEt7pZkyRJkjRVC5suQL0rIkbaVn8FeA7fkjzrAAAP4ElEQVR4vqz/cWb+VVvflcCPgEWZuX8GY/0e8BHgOODJzFw5s6olSZKkqfHMkmYsMw8ZXYCfAH/Q1nbtLA/3DLAV+LNZ3q8kSZJUZVhSx0TEX0TEP5XVb5bHfRExEhG/U+n/mxGxIyL2RsSDEfGHo9sy8/bMvAZ4uAulS5IkSYYldc2J5XFJOfP07faNEXEwsAP4AvBa4GzgcxHxhu6WKUmSJLUYljRXvAN4JDP/MTP3Z+bdwA3AGQ3XJUmSpHnKCR40V7wOOCEi9rW1LQSuaageSZIkzXOGJXVLTrL9UeDfMvPt3ShGkiRJmoyX4alb/gt4AXj9ONu/AvxGRLw7IhaV5bcj4kiAiHhZRBwELGqtxkERcUB3SpckSdJ8ZFhSV2Tms8ClwH9ExL6IWDNm+9PAOuAsYDfwn8AngQNLlxOB/wZuBn6tPP9ad6qXJEnSfBSZk10dJUmSJEnzj2eWJEmSJKnCsCRJkiRJFYYlSZIkSaowLEmSJElShWFJkiRJkir67o/SLlmyJI844ohGa3jmmWc4+OCDraHhGpoev19ruOuuu57IzIFZ26EkSdIc1XdhaenSpdx5552N1jA4OMjatWutoeEamh6/X2uIiB/P2s4kSZLmsEkvw4uIgyLi9oj4XkTcHxEfLe2HR8R3IuKhiPjniDigtB9Y1ofK9pVt+/rz0v5gRJzS1r6+tA1FxOa29uoYkiRJktRpU7ln6TngbZn5W8CxwPqIWAN8Evh0Zq4CngTOL/3PB57MzCOAT5d+RMRRwFnAG4D1wOciYkFELAA+C5wKHAWcXfoywRiSJEmS1FGThqVsGSmri8qSwNuAfy3tVwGnl+cbyjpl+0kREaX9usx8LjN/BAwBx5dlKDMfzsxfANcBG8prxhtDkiRJkjpqSrPhlTNA9wB7gB3AD4F9mbm/dBkGlpXny4BHAcr2p4BXt7ePec147a+eYAxJkiRJ6qgpTfCQmc8Dx0bEEuDLwJG1buUxxtk2XnstsE3U/0UiYhOwCWBgYIDBwcFat64ZGRnp6xp2/vSpKfVbuhg+c+2NE/Y5ZtkrZ6Okqn4/Dr1UgyRJUi+a1mx4mbkvIgaBNcCSiFhYzvwsB3aXbsPACmA4IhYCrwT2trWPan9Nrf2JCcYYW9cWYAvA6tWrs99mH5trNZy3+aYp9bvomP1ctnPib7FHzlk7CxXV9ftx6KUaJEmSetFUZsMbKGeUiIjFwMnALuBW4IzSbSMwegphW1mnbP9GZmZpP6vMlnc4sAq4HbgDWFVmvjuA1iQQ28prxhtDkiRJkjpqKmeWDgOuKrPWvQy4PjO/EhEPANdFxF8C3wWuLP2vBK6JiCFaZ5TOAsjM+yPieuABYD9wQbm8j4i4ENgOLAC2Zub9ZV8fGmcMSZIkSeqoScNSZt4LvKnS/jCtmezGtv8PcOY4+7oUuLTSfjNw81THkCRJkqROm9JseJIkSZI03xiWJEmSJKnCsCRJkiRJFYYlSZIkSaowLEmSJElShWFJkiRJkioMS5IkSZJUYViSJEmSpArDkiRJkiRVGJYkSZIkqcKwJEmSJEkVhiVJkiRJqjAsSZIkSVKFYUmSJEmSKgxLkiRJklRhWJIkSZKkCsOSJEmSJFUYliRJkiSpwrAkSZIkSRWGJUmSJEmqMCxJkiRJUoVhSZIkSZIqDEuSJEmSVGFYkiRJkqQKw5IkSZIkVRiWJEmSJKnCsCRJkiRJFYYlSZIkSaowLEmSJElShWFJkiRJkioMS5IkSZJUYViSJEmSpArDkiRJkiRVGJYkSZIkqWLSsBQRKyLi1ojYFRH3R8QHSvurImJHRDxUHg8t7RERl0fEUETcGxHHte1rY+n/UERsbGt/c0TsLK+5PCJiojEkSZIkqdOmcmZpP3BRZh4JrAEuiIijgM3ALZm5CrilrAOcCqwqyybgCmgFH+AS4ATgeOCStvBzRek7+rr1pX28MSRJkiSpoyYNS5n5WGbeXZ4/DewClgEbgKtKt6uA08vzDcDV2XIbsCQiDgNOAXZk5t7MfBLYAawv216Rmd/OzASuHrOv2hiSJEmS1FELp9M5IlYCbwK+AyzNzMegFagi4rWl2zLg0baXDZe2idqHK+1MMMbYujbROjPFwMAAg4OD03lbs25kZKSva7jomP1T6rd08eR9O/l16vfj0Es1SJIk9aIph6WIOAS4AfhgZv683FZU7Vppyxm0T1lmbgG2AKxevTrXrl07nZfPusHBQfq5hvM23zSlfhcds5/Ldk78LfbIOWtnoaK6fj8OvVSDJElSL5rSbHgRsYhWULo2M79Umh8vl9BRHveU9mFgRdvLlwO7J2lfXmmfaAxJkiRJ6qipzIYXwJXArsz8VNumbcDojHYbgRvb2s8ts+KtAZ4ql9JtB9ZFxKFlYod1wPay7emIWFPGOnfMvmpjSJIkSVJHTeUyvN8F3g3sjIh7StuHgU8A10fE+cBPgDPLtpuB04Ah4FngPQCZuTciPg7cUfp9LDP3lufvBz4PLAa+WhYmGEOSJEmSOmrSsJSZ36J+XxHASZX+CVwwzr62Alsr7XcCR1faf1YbQ5IkSZI6bUr3LEmSJEnSfGNYkiRJkqQKw5IkSZIkVRiWJEmSJKnCsCRJkiRJFYYlSZIkSaowLEmSJElShWFJkiRJkioMS5IkSZJUYViSJEmSpArDkiRJkiRVGJYkSZIkqcKwJEmSJEkVhiVJkiRJqjAsSZIkSVKFYUmSJEmSKgxLkiRJklRhWJIkSZKkCsOSJEmSJFUYliRJkiSpwrAkSZIkSRWGJUmSJEmqMCxJkiRJUoVhSZIkSZIqDEuSJEmSVGFYkiRJkqQKw5IkSZIkVRiWJEmSJKnCsCRJkiRJFYYlSZIkSaowLEmSJElShWFJkiRJkioMS5IkSZJUMWlYioitEbEnIu5ra3tVROyIiIfK46GlPSLi8ogYioh7I+K4ttdsLP0fioiNbe1vjoid5TWXR0RMNIYkSZIkdcNUzix9Hlg/pm0zcEtmrgJuKesApwKryrIJuAJawQe4BDgBOB64pC38XFH6jr5u/SRjSJIkSVLHTRqWMvObwN4xzRuAq8rzq4DT29qvzpbbgCURcRhwCrAjM/dm5pPADmB92faKzPx2ZiZw9Zh91caQJEmSpI5bOMPXLc3MxwAy87GIeG1pXwY82tZvuLRN1D5caZ9ojBeJiE20zk4xMDDA4ODgDN/W7BgZGenrGi46Zv+U+i1dPHnfTn6d+v049FINkiRJvWimYWk8UWnLGbRPS2ZuAbYArF69OteuXTvdXcyqwcFB+rmG8zbfNKV+Fx2zn8t2Tvwt9sg5a2ehorp+Pw69VIMkSVIvmulseI+XS+goj3tK+zCwoq3fcmD3JO3LK+0TjSFJkiRJHTfTsLQNGJ3RbiNwY1v7uWVWvDXAU+VSuu3Auog4tEzssA7YXrY9HRFryix4547ZV20MSZIkSeq4SS/Di4gvAmuB10TEMK1Z7T4BXB8R5wM/Ac4s3W8GTgOGgGeB9wBk5t6I+DhwR+n3scwcnTTi/bRm3FsMfLUsTDCGJEmSJHXcpGEpM88eZ9NJlb4JXDDOfrYCWyvtdwJHV9p/VhtDkiRJkrphppfhSZIkSVJfMyxJkiRJUoVhSZIkSZIqDEuSJEmSVGFYkiRJkqQKw5IkSZIkVRiWJEmSJKnCsCRJkiRJFYYlSZIkSaowLEmSJElShWFJkiRJkioMS5IkSZJUYViSJEmSpArDkiRJkiRVGJYkSZIkqcKwJEmSJEkVhiVJkiRJqjAsSZIkSVKFYUmSJEmSKgxLkiRJklRhWJIkSZKkCsOSJEmSJFUsbLoAdd7KzTc1XYIkSZLUczyzJEmSJEkVhiVJkiRJqjAsSZIkSVKF9yypUbN9P9Ujn/j9Wd2fJEmS5i/PLEmSJElShWFJkiRJkioMS5IkSZJUYViSJEmSpArDkiRJkiRVGJYkSZIkqcKpw9VX2qciv+iY/Zz3EqcmdypySZKk+WvOn1mKiPUR8WBEDEXE5qbrkSRJkjQ/zOmwFBELgM8CpwJHAWdHxFHNViVJkiRpPpjrl+EdDwxl5sMAEXEdsAF4oNGqOmzlS7x0DGbnEjS99GMx9jh4WZ8kSVLviMxsuoZxRcQZwPrM/KOy/m7ghMy8cEy/TcCmsno0cF9XC32x1wBPWEPjNTQ9fr/W8LrMHJjF/UmSJM1Jc/3MUlTaXpTuMnMLsAUgIu7MzLd0urCJWMPcqKHp8a1BkiSpt83pe5aAYWBF2/pyYHdDtUiSJEmaR+Z6WLoDWBURh0fEAcBZwLaGa5IkSZI0D8zpy/Ayc39EXAhsBxYAWzPz/kletqXzlU3KGlqarqHp8cEaJEmSetacnuBBkiRJkpoy1y/DkyRJkqRGGJYkSZIkqaIvw1JEfDwi7o2IeyLiaxHxqw3U8NcR8f1Sx5cjYkmXxz8zIu6PiBcioqvTRkfE+oh4MCKGImJzN8cu42+NiD0R0djf24qIFRFxa0TsKsfhAw3UcFBE3B4R3ys1fLTbNUiSJPWyvrxnKSJekZk/L8//BDgqM9/X5RrWAd8ok1R8EiAzP9TF8Y8EXgD+DvjTzLyzS+MuAH4AvJ3W1O93AGdn5gPdGL/UcCIwAlydmUd3a9wxNRwGHJaZd0fEy4G7gNO7/HUI4ODMHImIRcC3gA9k5m3dqkGSJKmX9eWZpdGgVBxM5Q/ZdqGGr2Xm/rJ6G62/EdXN8Xdl5oPdHLM4HhjKzIcz8xfAdcCGbhaQmd8E9nZzzEoNj2Xm3eX508AuYFmXa8jMHCmri8rSf/87IkmS1CF9GZYAIuLSiHgUOAf4SMPlvBf4asM1dMsy4NG29WG6HBLmmohYCbwJ+E4DYy+IiHuAPcCOzOx6DZIkSb2qZ8NSRHw9Iu6rLBsAMvPizFwBXAtc2EQNpc/FwP5SR9fHb0BU2ubt2YyIOAS4AfjgmDOeXZGZz2fmsbTObB4fEY1clihJktSL5vQfpZ1IZp48xa5fAG4CLul2DRGxEXgHcFJ24OawaXwNumkYWNG2vhzY3VAtjSr3Cd0AXJuZX2qylszcFxGDwHqgsYkvJEmSeknPnlmaSESsalt9J/D9BmpYD3wIeGdmPtvt8Rt0B7AqIg6PiAOAs4BtDdfUdWVyhSuBXZn5qYZqGBidhTEiFgMn08DPgiRJUq/q19nwbgBW05oN7sfA+zLzp12uYQg4EPhZabqtmzPyRcS7gM8AA8A+4J7MPKVLY58G/A2wANiamZd2Y9y28b8IrAVeAzwOXJKZV3a5hrcC/w7spPV9CPDhzLy5izW8EbiK1nF4GXB9Zn6sW+NLkiT1ur4MS5IkSZL0UvXlZXiSJEmS9FIZliRJkiSpwrAkSZIkSRWGJUmSJEmqMCxJkiRJUoVhSZIkSZIqDEuSJEmSVPF/26d8YB8nlW4AAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 1008x864 with 12 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.hist(figsize=(14, 12))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-EhpTVryEwD9"
},
"source": [
"### Separando Train e Test "
]
},
{
"cell_type": "code",
"execution_count": 257,
"metadata": {},
"outputs": [],
"source": [
"del df['Id']\n",
"del df['SalaryNormalized']"
]
},
{
"cell_type": "code",
"execution_count": 258,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "qKGGNDhVVf4B"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Company</th>\n",
" <th>SourceName</th>\n",
" <th>LocationNormalized0</th>\n",
" <th>LocationNormalized1</th>\n",
" <th>Title0</th>\n",
" <th>Title1</th>\n",
" <th>FullDescription0</th>\n",
" <th>FullDescription1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>9229</td>\n",
" <td>42</td>\n",
" <td>-0.116790</td>\n",
" <td>-0.229172</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-18.530014</td>\n",
" <td>2.881801</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9229</td>\n",
" <td>42</td>\n",
" <td>-0.118995</td>\n",
" <td>-0.237572</td>\n",
" <td>-0.379568</td>\n",
" <td>-0.578663</td>\n",
" <td>1.115408</td>\n",
" <td>-2.899837</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>9229</td>\n",
" <td>42</td>\n",
" <td>-0.120516</td>\n",
" <td>-0.241914</td>\n",
" <td>-0.204017</td>\n",
" <td>0.064045</td>\n",
" <td>-1.111251</td>\n",
" <td>2.198475</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>9229</td>\n",
" <td>42</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-18.890457</td>\n",
" <td>3.393423</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>9229</td>\n",
" <td>42</td>\n",
" <td>-0.122604</td>\n",
" <td>-0.249312</td>\n",
" <td>-0.211709</td>\n",
" <td>0.010168</td>\n",
" <td>-19.451188</td>\n",
" <td>2.751042</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Company SourceName LocationNormalized0 LocationNormalized1 Title0 \\\n",
"0 9229 42 -0.116790 -0.229172 -0.211709 \n",
"1 9229 42 -0.118995 -0.237572 -0.379568 \n",
"2 9229 42 -0.120516 -0.241914 -0.204017 \n",
"3 9229 42 -0.122604 -0.249312 -0.211709 \n",
"4 9229 42 -0.122604 -0.249312 -0.211709 \n",
"\n",
" Title1 FullDescription0 FullDescription1 \n",
"0 0.010168 -18.530014 2.881801 \n",
"1 -0.578663 1.115408 -2.899837 \n",
"2 0.064045 -1.111251 2.198475 \n",
"3 0.010168 -18.890457 3.393423 \n",
"4 0.010168 -19.451188 2.751042 "
]
},
"execution_count": 258,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 259,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1817,
"status": "ok",
"timestamp": 1529801321689,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "doodiP6IEwD_",
"outputId": "c8d75d4e-d9d0-4969-cf45-96d28e9c2b46"
},
"outputs": [],
"source": [
"X_train = df.values[:df_job_tuple[0], :df_job_tuple[0]]"
]
},
{
"cell_type": "code",
"execution_count": 260,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1466,
"status": "ok",
"timestamp": 1529801324123,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "2Ryw0iShEwEE",
"outputId": "5c518610-2da8-4e01-8c86-9def87aa91aa"
},
"outputs": [],
"source": [
"X_test = df.values[:df_test_tuple[0], :df_test_tuple[0]]"
]
},
{
"cell_type": "code",
"execution_count": 261,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 728,
"status": "ok",
"timestamp": 1529801329170,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "9RLhOM49EwEI",
"outputId": "e666aaef-3a0e-4ccf-bace-2ad37b65971e"
},
"outputs": [
{
"data": {
"text/plain": [
"((244766, 8), (122463, 8))"
]
},
"execution_count": 261,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.shape, X_test.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "2QZB6KKaEwEM"
},
"source": [
"### Criando Scaler"
]
},
{
"cell_type": "code",
"execution_count": 263,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 780,
"status": "ok",
"timestamp": 1529801336408,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "yl3cggxyEwEN",
"outputId": "c223576e-ef0a-460a-ed48-dd16d9e76615"
},
"outputs": [],
"source": [
"scaler = StandardScaler()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "btL2O-tJEwEn"
},
"source": [
"### Criando Folds"
]
},
{
"cell_type": "code",
"execution_count": 264,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 773,
"status": "ok",
"timestamp": 1529801345950,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "SLlImgbfEwEo",
"outputId": "e546ce8e-65e9-4332-df8d-bbe1b14a723e"
},
"outputs": [],
"source": [
"n_splits = 10\n",
"kfold = KFold(n_splits=n_splits)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Função para executar modelos"
]
},
{
"cell_type": "code",
"execution_count": 265,
"metadata": {},
"outputs": [],
"source": [
"def cross_validation(model, X, y):\n",
" scoring = [ 'neg_mean_absolute_error', 'neg_mean_squared_error']\n",
" pipeline = Pipeline([('transformer', scaler), ('estimator', model)])\n",
" \n",
" return cross_validate(pipeline, X=X, y=y, cv=kfold, n_jobs=1, verbose=5, scoring=scoring, return_train_score=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "YfAiDB8dEwEv"
},
"source": [
"## Criando modelos"
]
},
{
"cell_type": "code",
"execution_count": 266,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 753,
"status": "ok",
"timestamp": 1529801353639,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "GwdEADLvEwEw",
"outputId": "38ba3f27-9e4d-4adf-b0fd-0612650296b4"
},
"outputs": [],
"source": [
"rf_model = RandomForestRegressor(n_estimators=50, min_samples_split=30, random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 267,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 672,
"status": "ok",
"timestamp": 1529801355131,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "JEHZ_sfvEwE0",
"outputId": "cadc9dd1-3a5e-4e30-a4c2-83d40c5cdb36"
},
"outputs": [],
"source": [
"gb_model = GradientBoostingRegressor(min_samples_split=30, random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 268,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1055,
"status": "ok",
"timestamp": 1529801357880,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "WvFkxrQsEwE6",
"outputId": "6f3778eb-e7fc-4f03-fb58-68cf164a6a52"
},
"outputs": [],
"source": [
"ada_model = AdaBoostRegressor(random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 269,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 37
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1476,
"status": "ok",
"timestamp": 1529801359857,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "KtNyotnoEwE9",
"outputId": "8d3f5269-5f05-4dfc-9fd4-5718ca61f69a"
},
"outputs": [],
"source": [
"knn_model = KNeighborsRegressor()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "p4gR0Ps9EwFH"
},
"source": [
"## Treinamento"
]
},
{
"cell_type": "code",
"execution_count": 270,
"metadata": {},
"outputs": [],
"source": [
"def calc_metrics(cv):\n",
" '''Retorna as tuplas contendo (rmse_train, rmse_test) , (mae_train, mae_test)'''\n",
" time_train = np.sum(cv['fit_time']) / n_splits\n",
" print('Tempo médio de treinamento: %f seg. Para 1 / %d folds' % (time_train, n_splits))\n",
" train_rmse = np.sum(np.sqrt(np.abs(cv['train_neg_mean_squared_error']))) / n_splits\n",
" print('RMSE Train: %.2f' % train_rmse)\n",
" test_rmse = np.sum(np.sqrt(np.abs(cv['test_neg_mean_squared_error']))) / n_splits\n",
" print('RMSE Test: %.2f' % test_rmse)\n",
" mae_train = np.sum(np.abs(cv['train_neg_mean_squared_error'])) / n_splits\n",
" print('MAE Train: %.2f' % mae_train)\n",
" mae_test = np.sum(np.abs(cv['test_neg_mean_squared_error'])) / n_splits\n",
" print('MAE Test: %.2f' % mae_test)\n",
" return (train_rmse, test_rmse) , (mae_train, mae_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### KNN"
]
},
{
"cell_type": "code",
"execution_count": 271,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "3EU7lJjeAm3G"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-12685.973305552152, neg_mean_squared_error=-304122865.90516645, total= 11.0s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.4min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-13356.993520447766, neg_mean_squared_error=-325638520.3811414, total= 9.9s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 2.8min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-13712.026449319768, neg_mean_squared_error=-320672771.6496646, total= 8.4s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.1min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-13522.491236671161, neg_mean_squared_error=-325192332.7262622, total= 9.4s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 5.4min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-13496.07007394697, neg_mean_squared_error=-325081517.22830087, total= 9.4s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-13322.775740491072, neg_mean_squared_error=-325439213.57288396, total= 9.1s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-13444.718589638831, neg_mean_squared_error=-330733038.4749485, total= 9.4s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-13544.787800294167, neg_mean_squared_error=-335721280.6270453, total= 9.2s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-13538.33361660402, neg_mean_squared_error=-325569272.2439647, total= 8.8s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-13588.120166693907, neg_mean_squared_error=-331692605.1628795, total= 9.5s\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 13.2min finished\n"
]
},
{
"data": {
"text/plain": [
"{'fit_time': array([0.93913698, 0.95052528, 0.84312224, 0.82429481, 0.79533362,\n",
" 0.80535126, 0.97619629, 0.81978226, 0.80367112, 0.82421613]),\n",
" 'score_time': array([10.03360677, 8.97757196, 7.5689044 , 8.57959175, 8.64531207,\n",
" 8.25824142, 8.41873646, 8.33647299, 7.96712136, 8.70186543]),\n",
" 'test_neg_mean_absolute_error': array([-12685.97330555, -13356.99352045, -13712.02644932, -13522.49123667,\n",
" -13496.07007395, -13322.77574049, -13444.71858964, -13544.78780029,\n",
" -13538.3336166 , -13588.12016669]),\n",
" 'train_neg_mean_absolute_error': array([-10891.56388472, -10806.09391027, -10788.03617521, -10809.45374848,\n",
" -10802.44566728, -10824.32854931, -10805.95799174, -10783.61143039,\n",
" -10805.23312543, -10775.263036 ]),\n",
" 'test_neg_mean_squared_error': array([-3.04122866e+08, -3.25638520e+08, -3.20672772e+08, -3.25192333e+08,\n",
" -3.25081517e+08, -3.25439214e+08, -3.30733038e+08, -3.35721281e+08,\n",
" -3.25569272e+08, -3.31692605e+08]),\n",
" 'train_neg_mean_squared_error': array([-2.14095979e+08, -2.12305434e+08, -2.12535813e+08, -2.12849730e+08,\n",
" -2.12372106e+08, -2.12559687e+08, -2.12092152e+08, -2.11452972e+08,\n",
" -2.12838701e+08, -2.11672096e+08])}"
]
},
"execution_count": 271,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cv_knn = cross_validation(model=knn_model, X=X_train, y=y)\n",
"cv_knn"
]
},
{
"cell_type": "code",
"execution_count": 272,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tempo médio de treinamento: 0.858163 seg. Para 1 / 10 folds\n",
"RMSE Train: 14576.59\n",
"RMSE Test: 18025.97\n",
"MAE Train: 212477466.91\n",
"MAE Test: 324986341.80\n"
]
}
],
"source": [
"knn_rmse, knn_mae = calc_metrics(cv_knn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ADA"
]
},
{
"cell_type": "code",
"execution_count": 273,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"colab_type": "code",
"id": "W_R1eykiEwFO"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-18026.59717156553, neg_mean_squared_error=-436203278.82177216, total= 10.0s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 10.6s remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-14086.39406115166, neg_mean_squared_error=-303697370.69939584, total= 6.5s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 17.4s remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-14497.848275125934, neg_mean_squared_error=-305129969.17546636, total= 8.1s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 26.0s remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-16808.006081666368, neg_mean_squared_error=-390860148.630144, total= 10.2s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 36.8s remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-14068.020373887357, neg_mean_squared_error=-302624342.10342324, total= 6.9s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-14126.87077556214, neg_mean_squared_error=-306093109.0017542, total= 7.4s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-13732.683230498624, neg_mean_squared_error=-293095706.45876855, total= 7.5s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-14217.656180605974, neg_mean_squared_error=-312771073.1200797, total= 6.9s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-14173.262098189978, neg_mean_squared_error=-306589571.9856712, total= 7.4s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-14050.543581768403, neg_mean_squared_error=-304371643.70640105, total= 5.7s\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 1.3min finished\n"
]
},
{
"data": {
"text/plain": [
"{'fit_time': array([ 9.90668845, 6.45981526, 8.04847336, 10.13515735, 6.87510085,\n",
" 7.37868142, 7.44444728, 6.859236 , 7.38190651, 5.6998508 ]),\n",
" 'score_time': array([0.0626862 , 0.03798509, 0.04893899, 0.06131077, 0.04354501,\n",
" 0.0439086 , 0.04419541, 0.04118729, 0.04363227, 0.03371882]),\n",
" 'test_neg_mean_absolute_error': array([-18026.59717157, -14086.39406115, -14497.84827513, -16808.00608167,\n",
" -14068.02037389, -14126.87077556, -13732.6832305 , -14217.65618061,\n",
" -14173.26209819, -14050.54358177]),\n",
" 'train_neg_mean_absolute_error': array([-16281.59898887, -14006.81536966, -14404.75011922, -16784.77793312,\n",
" -14042.15380637, -14185.868585 , -13943.36532318, -14082.49068848,\n",
" -14081.79809772, -13897.6004175 ]),\n",
" 'test_neg_mean_squared_error': array([-4.36203279e+08, -3.03697371e+08, -3.05129969e+08, -3.90860149e+08,\n",
" -3.02624342e+08, -3.06093109e+08, -2.93095706e+08, -3.12771073e+08,\n",
" -3.06589572e+08, -3.04371644e+08]),\n",
" 'train_neg_mean_squared_error': array([-3.72203634e+08, -3.00704999e+08, -3.11884955e+08, -3.91016317e+08,\n",
" -3.01096915e+08, -3.05386553e+08, -2.98062609e+08, -3.03837926e+08,\n",
" -3.02289561e+08, -2.96651457e+08])}"
]
},
"execution_count": 273,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cv_ada = cross_validation(model=ada_model, X=X_train, y=y)\n",
"cv_ada"
]
},
{
"cell_type": "code",
"execution_count": 274,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tempo médio de treinamento: 7.618936 seg. Para 1 / 10 folds\n",
"RMSE Train: 17820.08\n",
"RMSE Test: 18020.35\n",
"MAE Train: 318313492.77\n",
"MAE Test: 326143621.37\n"
]
}
],
"source": [
"ada_rmse, ada_mae = calc_metrics(cv_ada)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gradient Boosting"
]
},
{
"cell_type": "code",
"execution_count": 275,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 1866
},
"colab_type": "code",
"executionInfo": {
"elapsed": 620046,
"status": "error",
"timestamp": 1527489481723,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "2IRh66SLAgbA",
"outputId": "dce47b87-c196-4d84-d65c-e12a64a195db"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-11157.757141783874, neg_mean_squared_error=-236307514.546646, total= 27.7s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 28.4s remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-11794.54542317465, neg_mean_squared_error=-252768448.2961616, total= 27.7s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 56.7s remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-11974.000602985794, neg_mean_squared_error=-248563334.5069125, total= 27.4s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 1.4min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-11718.60288485207, neg_mean_squared_error=-249430319.5574618, total= 27.7s\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 1.9min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-11680.106617317362, neg_mean_squared_error=-249995110.64746425, total= 27.7s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-11418.145479275334, neg_mean_squared_error=-244904057.39809507, total= 27.1s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-11568.160365604053, neg_mean_squared_error=-248934116.92652842, total= 27.1s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-11794.06479295684, neg_mean_squared_error=-256482078.08974105, total= 27.3s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-11780.26542657853, neg_mean_squared_error=-249402499.14327267, total= 27.2s\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-12171.39535453055, neg_mean_squared_error=-267072995.2332186, total= 26.9s\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 4.7min finished\n"
]
},
{
"data": {
"text/plain": [
"{'fit_time': array([27.61209464, 27.65011168, 27.2816534 , 27.59758711, 27.61559343,\n",
" 27.00480771, 27.07606125, 27.19110966, 27.1590209 , 26.84476233]),\n",
" 'score_time': array([0.06998491, 0.07221317, 0.0750978 , 0.07314777, 0.07354879,\n",
" 0.07360482, 0.07236648, 0.07219744, 0.07217264, 0.07455015]),\n",
" 'test_neg_mean_absolute_error': array([-11157.75714178, -11794.54542317, -11974.00060299, -11718.60288485,\n",
" -11680.10661732, -11418.14547928, -11568.1603656 , -11794.06479296,\n",
" -11780.26542658, -12171.39535453]),\n",
" 'train_neg_mean_absolute_error': array([-11722.73349644, -11628.15233363, -11615.83555431, -11668.20734179,\n",
" -11658.00362845, -11690.10426759, -11671.62031954, -11645.17506191,\n",
" -11630.84285931, -11582.4413844 ]),\n",
" 'test_neg_mean_squared_error': array([-2.36307515e+08, -2.52768448e+08, -2.48563335e+08, -2.49430320e+08,\n",
" -2.49995111e+08, -2.44904057e+08, -2.48934117e+08, -2.56482078e+08,\n",
" -2.49402499e+08, -2.67072995e+08]),\n",
" 'train_neg_mean_squared_error': array([-2.49767278e+08, -2.48153348e+08, -2.48356487e+08, -2.48617632e+08,\n",
" -2.48557717e+08, -2.49443714e+08, -2.48808790e+08, -2.48419772e+08,\n",
" -2.47943080e+08, -2.46573006e+08])}"
]
},
"execution_count": 275,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cv_gb = cross_validation(model=gb_model, X=X_train, y=y)\n",
"cv_gb"
]
},
{
"cell_type": "code",
"execution_count": 276,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tempo médio de treinamento: 27.303280 seg. Para 1 / 10 folds\n",
"RMSE Train: 15762.72\n",
"RMSE Test: 15821.84\n",
"MAE Train: 248464082.25\n",
"MAE Test: 250386047.43\n"
]
}
],
"source": [
"gb_rmse, gb_mae = calc_metrics(cv_gb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Random Forest"
]
},
{
"cell_type": "code",
"execution_count": 277,
"metadata": {
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"base_uri": "https://localhost:8080/",
"height": 442
},
"colab_type": "code",
"executionInfo": {
"elapsed": 6282845,
"status": "ok",
"timestamp": 1527482202118,
"user": {
"displayName": "Gabriel Cesar",
"photoUrl": "//lh6.googleusercontent.com/-p5vDPiaCNfw/AAAAAAAAAAI/AAAAAAAAABs/bf-pbMKqe5c/s50-c-k-no/photo.jpg",
"userId": "109223051625932368282"
},
"user_tz": 180
},
"id": "qxLDy6a0EwFI",
"outputId": "6bd7664a-5ea8-4c8d-ea48-d055e86a75e1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-10455.372119829986, neg_mean_squared_error=-217434181.2922031, total= 1.7min\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.8min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-10928.521278788283, neg_mean_squared_error=-220613589.8554696, total= 1.8min\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 3.7min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-11142.203312201136, neg_mean_squared_error=-222814795.3751126, total= 1.8min\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 5.6min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-10898.332185039357, neg_mean_squared_error=-219861113.39863864, total= 1.8min\n",
"[CV] ................................................................\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 7.4min remaining: 0.0s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[CV] , neg_mean_absolute_error=-10757.151731450098, neg_mean_squared_error=-216575519.43371564, total= 1.7min\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-10533.368663378076, neg_mean_squared_error=-213042538.59741327, total= 1.8min\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-10782.932878683925, neg_mean_squared_error=-218226930.35436878, total= 1.8min\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-10895.059390163617, neg_mean_squared_error=-223279548.11551553, total= 1.7min\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-10771.6545234562, neg_mean_squared_error=-215919823.7076418, total= 1.7min\n",
"[CV] ................................................................\n",
"[CV] , neg_mean_absolute_error=-11297.096975680786, neg_mean_squared_error=-235400339.70382506, total= 1.7min\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Parallel(n_jobs=1)]: Done 10 out of 10 | elapsed: 18.4min finished\n"
]
},
{
"data": {
"text/plain": [
"{'fit_time': array([103.4280529 , 106.41595149, 106.93811679, 105.83180761,\n",
" 104.17053199, 106.16159153, 108.26112056, 101.72895241,\n",
" 100.85266733, 100.95680785]),\n",
" 'score_time': array([0.60349679, 0.60762167, 0.62306309, 0.60878992, 0.64006448,\n",
" 0.68612719, 0.59401655, 0.6063838 , 0.59904742, 0.59874058]),\n",
" 'test_neg_mean_absolute_error': array([-10455.37211983, -10928.52127879, -11142.2033122 , -10898.33218504,\n",
" -10757.15173145, -10533.36866338, -10782.93287868, -10895.05939016,\n",
" -10771.65452346, -11297.09697568]),\n",
" 'train_neg_mean_absolute_error': array([-8369.41769811, -8320.75310612, -8326.0715284 , -8345.51824293,\n",
" -8325.88116065, -8363.95158476, -8342.9088071 , -8324.69521833,\n",
" -8343.51265885, -8257.02241748]),\n",
" 'test_neg_mean_squared_error': array([-2.17434181e+08, -2.20613590e+08, -2.22814795e+08, -2.19861113e+08,\n",
" -2.16575519e+08, -2.13042539e+08, -2.18226930e+08, -2.23279548e+08,\n",
" -2.15919824e+08, -2.35400340e+08]),\n",
" 'train_neg_mean_squared_error': array([-1.31514389e+08, -1.31258313e+08, -1.31510278e+08, -1.31873412e+08,\n",
" -1.31235206e+08, -1.32228487e+08, -1.31570988e+08, -1.30997437e+08,\n",
" -1.31894629e+08, -1.29364998e+08])}"
]
},
"execution_count": 277,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cv_rf = cross_validation(model=rf_model, X=X_train, y=y)\n",
"cv_rf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### RMSE & MAE"
]
},
{
"cell_type": "code",
"execution_count": 278,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tempo médio de treinamento: 104.474560 seg. Para 1 / 10 folds\n",
"RMSE Train: 11460.53\n",
"RMSE Test: 14841.79\n",
"MAE Train: 131344813.73\n",
"MAE Test: 220316837.98\n"
]
}
],
"source": [
"rf_rmse, rf_mae = calc_metrics(cv_rf)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualização dos resultados"
]
},
{
"cell_type": "code",
"execution_count": 279,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[14576.5875465 , 18025.96891592],\n",
" [17820.07682486, 18020.34891532],\n",
" [15762.72186577, 15821.84438009],\n",
" [11460.53036498, 14841.79143433]])"
]
},
"execution_count": 279,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rmses = np.array([knn_rmse, ada_rmse, gb_rmse, rf_rmse])\n",
"rmses"
]
},
{
"cell_type": "code",
"execution_count": 280,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2.12477467e+08, 3.24986342e+08],\n",
" [3.18313493e+08, 3.26143621e+08],\n",
" [2.48464082e+08, 2.50386047e+08],\n",
" [1.31344814e+08, 2.20316838e+08]])"
]
},
"execution_count": 280,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"maes = np.array([knn_mae, ada_mae, gb_mae, rf_mae])\n",
"maes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### RMSE "
]
},
{
"cell_type": "code",
"execution_count": 281,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5,1,'RMSE Train')"
]
},
"execution_count": 281,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEICAYAAAC0+DhzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFt1JREFUeJzt3X+w3XV95/Hnq6FYVBCQC4sEDGrUItuNkqIzLpaKaEBXsPVHMo6kLt2IhVmtdtfgdgfGlpa6Uh1cFgc1Am0FWSkSFYuR7co6hUoQRBCRJES4JkIgqDggGnzvH+dz9XC/5ybxnpuc/Hg+Zs6c7/f9+XzO+ZwzyX3d7+f7PeemqpAkqd9vjHoCkqQdj+EgSeowHCRJHYaDJKnDcJAkdRgOkqQOw0HayST5RJL3j3oe2rUZDtqlJFmb5LEkP0nygyQXJ3l6X/vFSSrJ6yeN+0ir/1Hb3zPJeUnG22Pdk+TDUzzPxO1/DpjPx/raf5bk5337X5rOa6yqP66qv5rOWGlrGQ7aFf2Hqno6MA94MXDmpPbvAosndpLsAbwJWN3X50xgPnA0sDfw+8Atg56n73bG5IlU1WkT7cBfAZ/p63/C5P5tLtLIGQ7aZVXVD4Br6YVEv88DL0+yX9tfANwG/KCvz+8CV1XVuupZW1WXzvQckzyvHbG8Pcm9wJeT/EaSz7Yjnx8m+b9JfrtvzN8nObttv6odxfzXJBuSrEtyykzPU7sfw0G7rCSzgROAVZOafgosBxa2/VOAyT/4bwTek+RPkvzbJNmmk4VXAC8EXtv2vwDMBf4NcDvwd5sZOxvYC3gWcBpwYZJ9tt1UtTswHLQr+lySR4D7gAeAswb0uRQ4JckzgN8DPjep/a+BvwHeCqwEvp9k8aQ+n2u/2U/c/tMQcz6rqh6tqseq6hdVdXFVPVJVPwXOBo5K8rQpxv4U+Muq+nlVLQceB54/xFwkw0G7pJOram/gWHq/jR8wuUNVfQ0YA/4c+EJVPTap/YmquqCqXg7sC5wDLOtf3mnPs2/f7eNDzPm+iY0ks5J8MMmaJD/mV0c+ndfRPFhVT/TtPwo8fYq+0lYxHLTLqqqvAhcDH5qiy98D76W7pDT5cR6rqguAh4EjZnKOfc/R//XIpwAnAq8EngE8r9W39dKW9EuGg3Z1HwGOTzL5pDTA+cDxwPWTG5K8O8mxSfZKskdbUtqb7hVL28Le9JaGHgKeSu+oRdquDAft0qpqA70jg/8+oG1jVV036bf2CY8B59G7gulB4HTgD6tqTV+fz0/6nMNVMzTtTwHr2u0O4F9m6HGlrRb/2I8kaTKPHCRJHYaDJKnDcJAkdRgOkqSOnfZLvg444ICaM2fOqKchSTuVm2+++cGqGttSv502HObMmcPKlStHPQ1J2qkk+d7W9HNZSZLUYThIkjoMB0lSh+EgSeowHCRJHYaDJKnDcJAkdRgOkqQOw0GS1LHFT0gnWQa8Dnigqo5stc8AL2hd9gV+WFXzkswB7gTuam03VtVpbcxR9P5k417ANcC7qqqS7A98BpgDrAXeXFUPz8Br0zYyZ+kXRz2FkVp77mtHPQVpm9uaI4eLgQX9hap6S1XNq6p5wJXAP/Y1r55omwiG5kJgCTC33SYecylwXVXNBa5r+5KkEdpiOFTV9cDGQW1JArwZuGxzj5HkYGCfqrqh/UnGS4GTW/NJwCVt+5K+uiRpRIY953AMcH9V3d1XOzzJLUm+muSYVjsEGO/rM95qAAdV1XqAdn/gVE+WZEmSlUlWbtiwYcipS5KmMmw4LOLJRw3rgcOq6sXAe4BPJ9kHyICxv/Yfr66qi6pqflXNHxvb4jfOSpKmadpf2Z1kD+APgKMmalX1OPB42745yWrg+fSOFGb3DZ8NrGvb9yc5uKrWt+WnB6Y7J0nSzBjmyOFVwHeq6pfLRUnGksxq28+hd+J5TVsueiTJy9p5ilOAq9uw5cDitr24ry5JGpEthkOSy4AbgBckGU9yamtaSPdE9CuA25J8E/gscFpVTZzMfifwCWAVsBr4UqufCxyf5G7g+LYvSRqhLS4rVdWiKep/NKB2Jb1LWwf1XwkcOaD+EHDcluYhSdp+/IS0JKnDcJAkdRgOkqQOw0GS1GE4SJI6DAdJUofhIEnqMBwkSR2GgySpw3CQJHUYDpKkDsNBktRhOEiSOgwHSVKH4SBJ6pj2nwmVND1zln5x1FMYqbXnvnbUU9BW8MhBktRhOEiSOgwHSVKH4SBJ6thiOCRZluSBJLf31c5O8v0kt7bbiX1tZyZZleSuJK/pqy9otVVJlvbVD0/yr0nuTvKZJHvO5AuUJP36tubI4WJgwYD6h6tqXrtdA5DkCGAh8KI25n8lmZVkFnABcAJwBLCo9QX4m/ZYc4GHgVOHeUGSpOFtMRyq6npg41Y+3knA5VX1eFXdA6wCjm63VVW1pqp+BlwOnJQkwCuBz7bxlwAn/5qvQZI0w4Y553BGktvastN+rXYIcF9fn/FWm6r+TOCHVbVpUn2gJEuSrEyycsOGDUNMXZK0OdMNhwuB5wLzgPXAea2eAX1rGvWBquqiqppfVfPHxsZ+vRlLkrbatD4hXVX3T2wn+TjwhbY7Dhza13U2sK5tD6o/COybZI929NDfX5I0ItM6ckhycN/uG4CJK5mWAwuTPCXJ4cBc4OvATcDcdmXSnvROWi+vqgL+GXhjG78YuHo6c5IkzZwtHjkkuQw4FjggyThwFnBsknn0loDWAu8AqKo7klwBfBvYBJxeVU+0xzkDuBaYBSyrqjvaU7wPuDzJXwK3AJ+csVcnSZqWLYZDVS0aUJ7yB3hVnQOcM6B+DXDNgPoaelczbTd+8ZlffCZp8/yEtCSpw3CQJHUYDpKkDsNBktRhOEiSOgwHSVKH4SBJ6jAcJEkdhoMkqcNwkCR1GA6SpA7DQZLUYThIkjoMB0lSh+EgSeowHCRJHYaDJKnDcJAkdRgOkqQOw0GS1LHFcEiyLMkDSW7vq/2PJN9JcluSq5Ls2+pzkjyW5NZ2+1jfmKOSfCvJqiTnJ0mr759kRZK72/1+2+KFSpK23tYcOVwMLJhUWwEcWVW/A3wXOLOvbXVVzWu30/rqFwJLgLntNvGYS4HrqmoucF3blySN0BbDoaquBzZOqn25qja13RuB2Zt7jCQHA/tU1Q1VVcClwMmt+STgkrZ9SV9dkjQiM3HO4T8CX+rbPzzJLUm+muSYVjsEGO/rM95qAAdV1XqAdn/gDMxJkjSEPYYZnOS/AZuAf2il9cBhVfVQkqOAzyV5EZABw2saz7eE3tIUhx122PQmLUnaomkfOSRZDLwOeGtbKqKqHq+qh9r2zcBq4Pn0jhT6l55mA+va9v1t2Wli+emBqZ6zqi6qqvlVNX9sbGy6U5ckbcG0wiHJAuB9wOur6tG++liSWW37OfROPK9py0WPJHlZu0rpFODqNmw5sLhtL+6rS5JGZIvLSkkuA44FDkgyDpxF7+qkpwAr2hWpN7Yrk14BfCDJJuAJ4LSqmjiZ/U56Vz7tRe8cxcR5inOBK5KcCtwLvGlGXpkkadq2GA5VtWhA+ZNT9L0SuHKKtpXAkQPqDwHHbWkekqTtx09IS5I6DAdJUofhIEnqMBwkSR2GgySpw3CQJHUYDpKkjqG+W0mStrc5S7846imM1NpzX7tdnscjB0lSh+EgSeowHCRJHYaDJKnDcJAkdRgOkqQOw0GS1GE4SJI6DAdJUofhIEnqMBwkSR2GgySpY6vCIcmyJA8kub2vtn+SFUnubvf7tXqSnJ9kVZLbkrykb8zi1v/uJIv76kcl+VYbc36SzOSLlCT9erb2yOFiYMGk2lLguqqaC1zX9gFOAOa22xLgQuiFCXAW8FLgaOCsiUBpfZb0jZv8XJKk7WirwqGqrgc2TiqfBFzSti8BTu6rX1o9NwL7JjkYeA2woqo2VtXDwApgQWvbp6puqKoCLu17LEnSCAxzzuGgqloP0O4PbPVDgPv6+o232ubq4wPqkqQR2RYnpAedL6hp1LsPnCxJsjLJyg0bNgwxRUnS5gwTDve3JSHa/QOtPg4c2tdvNrBuC/XZA+odVXVRVc2vqvljY2NDTF2StDnDhMNyYOKKo8XA1X31U9pVSy8DftSWna4FXp1kv3Yi+tXAta3tkSQva1cpndL3WJKkEdiqvyGd5DLgWOCAJOP0rjo6F7giyanAvcCbWvdrgBOBVcCjwNsBqmpjkr8Abmr9PlBVEye530nviqi9gC+1myRpRLYqHKpq0RRNxw3oW8DpUzzOMmDZgPpK4MitmYskadvzE9KSpA7DQZLUYThIkjoMB0lSh+EgSeowHCRJHYaDJKnDcJAkdRgOkqQOw0GS1GE4SJI6DAdJUofhIEnqMBwkSR2GgySpw3CQJHUYDpKkDsNBktRhOEiSOgwHSVLHtMMhyQuS3Np3+3GSdyc5O8n3++on9o05M8mqJHcleU1ffUGrrUqydNgXJUkazh7THVhVdwHzAJLMAr4PXAW8HfhwVX2ov3+SI4CFwIuAZwFfSfL81nwBcDwwDtyUZHlVfXu6c5MkDWfa4TDJccDqqvpekqn6nARcXlWPA/ckWQUc3dpWVdUagCSXt76GgySNyEydc1gIXNa3f0aS25IsS7Jfqx0C3NfXZ7zVpqpLkkZk6HBIsifweuB/t9KFwHPpLTmtB86b6DpgeG2mPui5liRZmWTlhg0bhpq3JGlqM3HkcALwjaq6H6Cq7q+qJ6rqF8DH+dXS0ThwaN+42cC6zdQ7quqiqppfVfPHxsZmYOqSpEFmIhwW0beklOTgvrY3ALe37eXAwiRPSXI4MBf4OnATMDfJ4e0oZGHrK0kakaFOSCd5Kr2rjN7RV/5gknn0lobWTrRV1R1JrqB3onkTcHpVPdEe5wzgWmAWsKyq7hhmXpKk4QwVDlX1KPDMSbW3bab/OcA5A+rXANcMMxdJ0szxE9KSpA7DQZLUYThIkjoMB0lSh+EgSeowHCRJHYaDJKnDcJAkdRgOkqQOw0GS1GE4SJI6DAdJUofhIEnqMBwkSR2GgySpw3CQJHUYDpKkDsNBktRhOEiSOgwHSVKH4SBJ6hg6HJKsTfKtJLcmWdlq+ydZkeTudr9fqyfJ+UlWJbktyUv6Hmdx6393ksXDzkuSNH0zdeTw+1U1r6rmt/2lwHVVNRe4ru0DnADMbbclwIXQCxPgLOClwNHAWROBIkna/rbVstJJwCVt+xLg5L76pdVzI7BvkoOB1wArqmpjVT0MrAAWbKO5SZK2YCbCoYAvJ7k5yZJWO6iq1gO0+wNb/RDgvr6x4602Vf1JkixJsjLJyg0bNszA1CVJg+wxA4/x8qpal+RAYEWS72ymbwbUajP1JxeqLgIuApg/f36nXZI0M4Y+cqiqde3+AeAqeucM7m/LRbT7B1r3ceDQvuGzgXWbqUuSRmCocEjytCR7T2wDrwZuB5YDE1ccLQaubtvLgVPaVUsvA37Ulp2uBV6dZL92IvrVrSZJGoFhl5UOAq5KMvFYn66qf0pyE3BFklOBe4E3tf7XACcCq4BHgbcDVNXGJH8B3NT6faCqNg45N0nSNA0VDlW1Bvh3A+oPAccNqBdw+hSPtQxYNsx8JEkzw09IS5I6DAdJUofhIEnqMBwkSR2GgySpw3CQJHUYDpKkDsNBktRhOEiSOgwHSVKH4SBJ6jAcJEkdhoMkqcNwkCR1GA6SpA7DQZLUYThIkjoMB0lSh+EgSeowHCRJHdMOhySHJvnnJHcmuSPJu1r97CTfT3Jru53YN+bMJKuS3JXkNX31Ba22KsnS4V6SJGlYewwxdhPw3qr6RpK9gZuTrGhtH66qD/V3TnIEsBB4EfAs4CtJnt+aLwCOB8aBm5Isr6pvDzE3SdIQph0OVbUeWN+2H0lyJ3DIZoacBFxeVY8D9yRZBRzd2lZV1RqAJJe3voaDJI3IjJxzSDIHeDHwr610RpLbkixLsl+rHQLc1zdsvNWmqg96niVJViZZuWHDhpmYuiRpgKHDIcnTgSuBd1fVj4ELgecC8+gdWZw30XXA8NpMvVusuqiq5lfV/LGxsWGnLkmawjDnHEjym/SC4R+q6h8Bqur+vvaPA19ou+PAoX3DZwPr2vZUdUnSCAxztVKATwJ3VtXf9tUP7uv2BuD2tr0cWJjkKUkOB+YCXwduAuYmOTzJnvROWi+f7rwkScMb5sjh5cDbgG8lubXV3g8sSjKP3tLQWuAdAFV1R5Ir6J1o3gScXlVPACQ5A7gWmAUsq6o7hpiXJGlIw1yt9DUGny+4ZjNjzgHOGVC/ZnPjJEnbl5+QliR1GA6SpA7DQZLUYThIkjoMB0lSh+EgSeowHCRJHYaDJKnDcJAkdRgOkqQOw0GS1GE4SJI6DAdJUofhIEnqMBwkSR2GgySpw3CQJHUYDpKkDsNBktRhOEiSOnaYcEiyIMldSVYlWTrq+UjS7myHCIcks4ALgBOAI4BFSY4Y7awkafe1Q4QDcDSwqqrWVNXPgMuBk0Y8J0nabaWqRj0HkrwRWFBVf9z23wa8tKrOmNRvCbCk7b4AuGu7TnTmHAA8OOpJ7MR8/4bj+zecnf39e3ZVjW2p0x7bYyZbIQNqndSqqouAi7b9dLatJCurav6o57Gz8v0bju/fcHaX929HWVYaBw7t258NrBvRXCRpt7ejhMNNwNwkhyfZE1gILB/xnCRpt7VDLCtV1aYkZwDXArOAZVV1x4intS3t9EtjI+b7Nxzfv+HsFu/fDnFCWpK0Y9lRlpUkSTsQw0GS1GE4zKAkP+nbPjHJ3UkOS3J2kkeTHDhF30pyXt/+nyU5e7tNfAeT5A3tPXlh25+T5LEktyS5M8nXkyweMO7qJDds/xnvmJIclOTTSdYkuTnJDe29PTbJj5LcmuS2JF/p/7epX0nyRHufbk/y+ST7tvrEv8lb+257jnq+M8lw2AaSHAd8lN4H++5t5QeB904x5HHgD5IcsD3mtxNYBHyN3lVrE1ZX1Yur6rdb/U+TvH2isf2nfQmwb5LDt+tsd0BJAnwOuL6qnlNVR9F732a3Lv+vquZV1e/Qu1rw9BFNdUf3WHufjgQ28uT3aXVrm7j9bERz3CYMhxmW5Bjg48Brq2p1X9My4C1J9h8wbBO9KyD+dDtMcYeW5OnAy4FTeXI4/FJVrQHeA/znvvIfAp+n99UrA8ftZl4J/KyqPjZRqKrvVdVH+zu1ENkbeHg7z29ndANwyKgnsb0YDjPrKcDVwMlV9Z1JbT+hFxDvmmLsBcBbkzxjG85vZ3Ay8E9V9V1gY5KXTNHvG8AL+/YXAZe126JtO8WdwovovUdTOSbJrcC9wKvo/dvUFNqXgx7Hkz9/9dy+JaULRjS1bcZwmFk/B/6F3m+9g5wPLE6yz+SGqvoxcClP/m14d7SI3m//tPupftD/8itXkhwEPA/4WguVTUmO3Kaz3MkkuSDJN5Pc1EoTy0qHAp8CPjjC6e3I9moh+hCwP7Cir61/WWmXW5YzHGbWL4A3A7+b5P2TG6vqh8CngT+ZYvxH6AXL07bZDHdgSZ5JbznkE0nWAv8FeAuDv3vrxcCdbfstwH7APW3cHFxauoPeORgA2g+v44BBX7i2HHjFdprXzuaxqpoHPBvYk93o3IzhMMOq6lHgdfSWiAYdQfwt8A4GfDq9qjYCVzD1kceu7o3ApVX17Kqa036rvYdfnUQFeleKAB+id9IfekcXC9qYOcDEydfd2f8BfivJO/tqT52i778HVk/RJqCqfkTvqP7PkvzmqOezPRgO20D7Ib8A+PMkJ01qexC4it75iUHOo/eVwLujRfTem35XAu+nt757S5I76QXoR6vqUy0oDgNunBhQVfcAP07y0u0y6x1Q9b764GTg95Lck+TrwCXA+1qXY9pa+TeBtzH1lXRqquoW4JvsJr94+PUZkqQOjxwkSR2GgySpw3CQJHUYDpKkDsNBktRhOEiSOgwHSVLH/wdaz+AU/Kp2/wAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"class_names = np.array(['KNN', 'ADA', 'GB', 'RF'])\n",
"plt.bar(range(rmses.shape[0]), rmses[:, 0])\n",
"plt.xticks(range(rmses.shape[0]), class_names)\n",
"plt.title('RMSE Train')"
]
},
{
"cell_type": "code",
"execution_count": 282,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5,1,'RMSE Test')"
]
},
"execution_count": 282,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEICAYAAAC0+DhzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFuFJREFUeJzt3X+w3XV95/Hna4NYf0AJEigSNGijFtlu0BSdYbG0KRiwFaxSyXQlddlGLGy12h3BdgfXll3bSnVwWRzUFJhWkC0iUVEa2a6ss6BcJCKISBIQLkQSDCoOLBZ87x/nc/XL/Z6bhHtuchPyfMx853y/78/nc87nfCe5r/v9cc5NVSFJUte/mu0JSJJ2PoaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hoKetJHcneTTJj5N8L8lFSZ7bab8oSSV5/aRxH271P2jbeyY5N8l4e667knxoiteZWP77kPl8tNP+kyT/0tn+wgjv87QkX5rueGkYw0FPd79TVc8FFgGHA2dNav8OsHxiI8kewEnAuk6fs4DFwBHAXsBvADcPe53OcsbkiVTVaRPtwH8FPtXpf9xob1OaWYaDdgtV9T3gGgYh0fVZ4Mgkc9v2UuAW4HudPr8GXFlV99fA3VV1yfaYZ5Kjknw1yQ+SfD3JkZ22P2xHKQ8nWZ/kpCSHAx8Gjp44Qtoe89Lux3DQbiHJfOA4YO2kpv8HrAJObtunAJN/8N8AvCvJHyX510mynea4APgM8GfAvsCfA59JMreF198AS6pqL+Ao4Naquhl4J/C/2xHIL22PuWn3Yzjo6e4zSR4G7gU2AmcP6XMJcEqSXwR+ncEP6K7/BvwV8PvAGHBfkuWT+nym/bY/sfzhNOa6HPh0VX2pqn5aVVcD3wKO7fQ5LMkvVNV9VXX7NF5D2iaGg57uTmy/aR8NvAzYb3KHqvoKMI/Bb+qfq6pHJ7U/UVXnV9WRwD7AOcDKJL8y6XX26Swfm8ZcXwj8u27IMLjW8fyqeohBOP0x8L0kq5L88jReQ9omhoN2C1X1ZeAi4INTdPl74N30TylNfp5Hq+p84CHg0JmcI4Ojm49PCpnnVNWH2mt/vqqWAM8H7gEumJjWDM9DMhy0W/kwcEySyRelAc4DjgGum9yQ5J1Jjk7yrCR7tFNKe9G/Y2lUFwMnJVmSZE57vSVJfinJQUlel+TZwGPAj4En2rgHgIOTPGOG56PdmOGg3UZVbWJwZPCfh7Rtrqpra/gfOHkUOJfBHUwPAqcDb6yq9Z0+n530OYcrpzG/9cAbgf/SXue7wDsY/D+dw+CW2u8B32dwB9V/bEO/CNwNbEwy/lRfVxom/rEfSdJkHjlIknoMB0lSj+EgSeoxHCRJPXvM9gSma7/99qsFCxbM9jQkaZdy0003PVhV87bWb5cNhwULFjA2Njbb05CkXUqS725LP08rSZJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSenbZT0iPYsGZn5/tKcyquz/wupHGu/9G23/SrsAjB0lSj+EgSerZajgkWZlkY5JbO7VPJVnTlruTrGn1BUke7bR9tDPmlUm+mWRtkvOSpNX3TbI6yZ3tce72eKOSpG23LUcOFwFLu4WqenNVLaqqRcAVwKc7zesm2qrqtE79AmAFsLAtE895JnBtVS0Erm3bkqRZtNVwqKrrgM3D2tpv/78HXLql50hyILB3VV1fVQVcApzYmk8ALm7rF3fqkqRZMuo1h6OAB6rqzk7tkCQ3J/lykqNa7SBgvNNnvNUADqiqDQDtcf+pXizJiiRjScY2bdo04tQlSVMZNRyW8eSjhg3AC6rqcOBdwCeT7A1kyNh6qi9WVRdW1eKqWjxv3lb/kJEkaZqm/TmHJHsAvwu8cqJWVY8Bj7X1m5KsA17C4Ehhfmf4fOD+tv5AkgOrakM7/bRxunOSJM2MUY4cfgv4dlX97HRRknlJ5rT1FzG48Ly+nS56OMmr23WKU4Cr2rBVwPK2vrxTlyTNkm25lfVS4HrgpUnGk5zamk6mfyH6NcAtSb4B/CNwWlVNXMx+O/BxYC2wDvhCq38AOCbJncAxbVuSNIu2elqpqpZNUf+DIbUrGNzaOqz/GHDYkPr3gSVbm4ckacfxE9KSpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqmfYf+5E0PQvO/PxsT2FW3f2B1832FLQNPHKQJPUYDpKkHsNBktRjOEiSegwHSVLPVsMhycokG5Pc2qm9L8l9Sda05fhO21lJ1ia5I8lrO/WlrbY2yZmd+iFJvprkziSfSrLnTL5BSdJTty1HDhcBS4fUP1RVi9pyNUCSQ4GTgZe3Mf8jyZwkc4DzgeOAQ4FlrS/AX7XnWgg8BJw6yhuSJI1uq+FQVdcBm7fx+U4ALquqx6rqLmAtcERb1lbV+qr6CXAZcEKSAL8J/GMbfzFw4lN8D5KkGTbKNYczktzSTjvNbbWDgHs7fcZbbar684AfVNXjk+pDJVmRZCzJ2KZNm0aYuiRpS6YbDhcALwYWARuAc1s9Q/rWNOpDVdWFVbW4qhbPmzfvqc1YkrTNpvX1GVX1wMR6ko8Bn2ub48DBna7zgfvb+rD6g8A+SfZoRw/d/pLU49eP7JivH5nWkUOSAzubbwAm7mRaBZyc5JlJDgEWAl8DbgQWtjuT9mRw0XpVVRXwz8Cb2vjlwFXTmZMkaeZs9cghyaXA0cB+ScaBs4GjkyxicArobuBtAFV1W5LLgW8BjwOnV9UT7XnOAK4B5gArq+q29hLvAS5L8pfAzcAnZuzdSZKmZavhUFXLhpSn/AFeVecA5wypXw1cPaS+nsHdTJKknYSfkJYk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknq2Gg5JVibZmOTWTu1vknw7yS1JrkyyT6svSPJokjVt+WhnzCuTfDPJ2iTnJUmr75tkdZI72+Pc7fFGJUnbbluOHC4Clk6qrQYOq6pfBb4DnNVpW1dVi9pyWqd+AbACWNiWiec8E7i2qhYC17ZtSdIs2mo4VNV1wOZJtX+qqsfb5g3A/C09R5IDgb2r6vqqKuAS4MTWfAJwcVu/uFOXJM2Smbjm8O+BL3S2D0lyc5IvJzmq1Q4Cxjt9xlsN4ICq2gDQHvef6oWSrEgylmRs06ZNMzB1SdIwI4VDkj8DHgf+oZU2AC+oqsOBdwGfTLI3kCHD66m+XlVdWFWLq2rxvHnzpjttSdJW7DHdgUmWA78NLGmniqiqx4DH2vpNSdYBL2FwpNA99TQfuL+tP5DkwKra0E4/bZzunCRJM2NaRw5JlgLvAV5fVY906vOSzGnrL2Jw4Xl9O130cJJXt7uUTgGuasNWAcvb+vJOXZI0S7Z65JDkUuBoYL8k48DZDO5Oeiawut2RekO7M+k1wPuTPA48AZxWVRMXs9/O4M6nZzG4RjFxneIDwOVJTgXuAU6akXcmSZq2rYZDVS0bUv7EFH2vAK6Yom0MOGxI/fvAkq3NQ5K04/gJaUlSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqWebwiHJyiQbk9zaqe2bZHWSO9vj3FZPkvOSrE1yS5JXdMYsb/3vTLK8U39lkm+2MeclyUy+SUnSU7OtRw4XAUsn1c4Erq2qhcC1bRvgOGBhW1YAF8AgTICzgVcBRwBnTwRK67OiM27ya0mSdqBtCoequg7YPKl8AnBxW78YOLFTv6QGbgD2SXIg8FpgdVVtrqqHgNXA0ta2d1VdX1UFXNJ5LknSLBjlmsMBVbUBoD3u3+oHAfd2+o232pbq40PqPUlWJBlLMrZp06YRpi5J2pLtcUF62PWCmka9X6y6sKoWV9XiefPmjTBFSdKWjBIOD7RTQrTHja0+Dhzc6TcfuH8r9flD6pKkWTJKOKwCJu44Wg5c1amf0u5aejXww3ba6Rrg2CRz24XoY4FrWtvDSV7d7lI6pfNckqRZsMe2dEpyKXA0sF+ScQZ3HX0AuDzJqcA9wEmt+9XA8cBa4BHgrQBVtTnJXwA3tn7vr6qJi9xvZ3BH1LOAL7RFkjRLtikcqmrZFE1LhvQt4PQpnmclsHJIfQw4bFvmIkna/vyEtCSpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqSeaYdDkpcmWdNZfpTknUnel+S+Tv34zpizkqxNckeS13bqS1ttbZIzR31TkqTR7DHdgVV1B7AIIMkc4D7gSuCtwIeq6oPd/kkOBU4GXg48H/hSkpe05vOBY4Bx4MYkq6rqW9OdmyRpNNMOh0mWAOuq6rtJpupzAnBZVT0G3JVkLXBEa1tbVesBklzW+hoOkjRLZuqaw8nApZ3tM5LckmRlkrmtdhBwb6fPeKtNVZckzZKRwyHJnsDrgf/ZShcAL2ZwymkDcO5E1yHDawv1Ya+1IslYkrFNmzaNNG9J0tRm4sjhOODrVfUAQFU9UFVPVNVPgY/x81NH48DBnXHzgfu3UO+pqguranFVLZ43b94MTF2SNMxMhMMyOqeUkhzYaXsDcGtbXwWcnOSZSQ4BFgJfA24EFiY5pB2FnNz6SpJmyUgXpJM8m8FdRm/rlP86ySIGp4bunmirqtuSXM7gQvPjwOlV9UR7njOAa4A5wMqqum2UeUmSRjNSOFTVI8DzJtXesoX+5wDnDKlfDVw9ylwkSTPHT0hLknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9I4dDkruTfDPJmiRjrbZvktVJ7myPc1s9Sc5LsjbJLUle0Xme5a3/nUmWjzovSdL0zdSRw29U1aKqWty2zwSuraqFwLVtG+A4YGFbVgAXwCBMgLOBVwFHAGdPBIokacfbXqeVTgAubusXAyd26pfUwA3APkkOBF4LrK6qzVX1ELAaWLqd5iZJ2oqZCIcC/inJTUlWtNoBVbUBoD3u3+oHAfd2xo632lT1J0myIslYkrFNmzbNwNQlScPsMQPPcWRV3Z9kf2B1km9voW+G1GoL9ScXqi4ELgRYvHhxr12SNDNGPnKoqvvb40bgSgbXDB5op4tojxtb93Hg4M7w+cD9W6hLkmbBSOGQ5DlJ9ppYB44FbgVWARN3HC0Hrmrrq4BT2l1LrwZ+2E47XQMcm2RuuxB9bKtJkmbBqKeVDgCuTDLxXJ+sqi8muRG4PMmpwD3ASa3/1cDxwFrgEeCtAFW1OclfADe2fu+vqs0jzk2SNE0jhUNVrQf+zZD694ElQ+oFnD7Fc60EVo4yH0nSzPAT0pKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUs+0wyHJwUn+OcntSW5L8o5Wf1+S+5KsacvxnTFnJVmb5I4kr+3Ul7ba2iRnjvaWJEmj2mOEsY8D766qryfZC7gpyerW9qGq+mC3c5JDgZOBlwPPB76U5CWt+XzgGGAcuDHJqqr61ghzkySNYNrhUFUbgA1t/eEktwMHbWHICcBlVfUYcFeStcARrW1tVa0HSHJZ62s4SNIsmZFrDkkWAIcDX22lM5LckmRlkrmtdhBwb2fYeKtNVR/2OiuSjCUZ27Rp00xMXZI0xMjhkOS5wBXAO6vqR8AFwIuBRQyOLM6d6DpkeG2h3i9WXVhVi6tq8bx580aduiRpCqNccyDJMxgEwz9U1acBquqBTvvHgM+1zXHg4M7w+cD9bX2quiRpFoxyt1KATwC3V9XfduoHdrq9Abi1ra8CTk7yzCSHAAuBrwE3AguTHJJkTwYXrVdNd16SpNGNcuRwJPAW4JtJ1rTae4FlSRYxODV0N/A2gKq6LcnlDC40Pw6cXlVPACQ5A7gGmAOsrKrbRpiXJGlEo9yt9BWGXy+4egtjzgHOGVK/ekvjJEk7lp+QliT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSenaacEiyNMkdSdYmOXO25yNJu7OdIhySzAHOB44DDgWWJTl0dmclSbuvnSIcgCOAtVW1vqp+AlwGnDDLc5Kk3VaqarbnQJI3AUur6j+07bcAr6qqMyb1WwGsaJsvBe7YoROdOfsBD872JHZh7r/RuP9Gs6vvvxdW1bytddpjR8xkG2RIrZdaVXUhcOH2n872lWSsqhbP9jx2Ve6/0bj/RrO77L+d5bTSOHBwZ3s+cP8szUWSdns7SzjcCCxMckiSPYGTgVWzPCdJ2m3tFKeVqurxJGcA1wBzgJVVddssT2t72uVPjc0y999o3H+j2S32305xQVqStHPZWU4rSZJ2IoaDJKnHcJhBSX7cWT8+yZ1JXpDkfUkeSbL/FH0rybmd7T9N8r4dNvGdTJI3tH3ysra9IMmjSW5OcnuSryVZPmTcVUmu3/Ez3jklOSDJJ5OsT3JTkuvbvj06yQ+TrElyS5Ivdf9t6ueSPNH2061JPptkn1af+De5prPsOdvznUmGw3aQZAnwEQYf7LunlR8E3j3FkMeA302y346Y3y5gGfAVBnetTVhXVYdX1a+0+p8keetEY/tP+wpgnySH7NDZ7oSSBPgMcF1VvaiqXslgv81vXf5PVS2qql9lcLfg6bM01Z3do20/HQZs5sn7aV1rm1h+Mktz3C4MhxmW5CjgY8Drqmpdp2kl8OYk+w4Z9jiDOyD+ZAdMcaeW5LnAkcCpPDkcfqaq1gPvAv64U34j8FkGX70ydNxu5jeBn1TVRycKVfXdqvpIt1MLkb2Ah3bw/HZF1wMHzfYkdhTDYWY9E7gKOLGqvj2p7ccMAuIdU4w9H/j9JL+4Hee3KzgR+GJVfQfYnOQVU/T7OvCyzvYy4NK2LNu+U9wlvJzBPprKUUnWAPcAv8Xg36am0L4cdAlP/vzVizunlM6fpaltN4bDzPoX4P8y+K13mPOA5Un2ntxQVT8CLuHJvw3vjpYx+O2f9jjVD/qffeVKkgOAXwa+0kLl8SSHbddZ7mKSnJ/kG0lubKWJ00oHA38H/PUsTm9n9qwWot8H9gVWd9q6p5WedqflDIeZ9VPg94BfS/LeyY1V9QPgk8AfTTH+wwyC5TnbbYY7sSTPY3A65ONJ7gb+E/Bmhn/31uHA7W39zcBc4K42bgGeWrqNwTUYANoPryXAsC9cWwW8ZgfNa1fzaFUtAl4I7MludG3GcJhhVfUI8NsMThENO4L4W+BtDPl0elVtBi5n6iOPp7s3AZdU1QurakH7rfYufn4RFRjcKQJ8kMFFfxgcXSxtYxYAExdfd2f/C/iFJG/v1J49Rd9/C6ybok1AVf2QwVH9nyZ5xmzPZ0cwHLaD9kN+KfDnSU6Y1PYgcCWD6xPDnMvgK4F3R8sY7JuuK4D3Mji/e3OS2xkE6Eeq6u9aULwAuGFiQFXdBfwoyat2yKx3QjX46oMTgV9PcleSrwEXA+9pXY5q58q/AbyFqe+kU1NVNwPfYDf5xcOvz5Ak9XjkIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSev4/okrw+8VVvn8AAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"class_names = np.array(['KNN', 'ADA', 'GB', 'RF'])\n",
"plt.bar(range(rmses.shape[0]), rmses[:, 1])\n",
"plt.xticks(range(rmses.shape[0]), class_names)\n",
"plt.title('RMSE Test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MAE"
]
},
{
"cell_type": "code",
"execution_count": 283,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5,1,'MAE Train')"
]
},
"execution_count": 283,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAEmJJREFUeJzt3XuQZGV5x/HvL7B4CQjRnRgCi2OUKi8kgtkgxJAQMVULGDEJRjYJomVqE6MJJuZCrMTbX+aiSaFEag2o4AWNGrMqXgtTQgWQARcCrsblImwwYXR1cYVIVp/80WdN09vD9Oz0TM+++/1Ude25PKf7mVOzv3nnPad7UlVIktryQ5NuQJI0foa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdpiSX5iSQ7J92H9i+Gu1a0JHckeSDJ6oHtm5NUkumB7a/tth8/sP1FSb6XZOfA48cH6o4a2F9JvtO3ftJCv4aquq2qDl7ocdJiGO7aF9wOrN+9kuQngUcMFiUJcDawHThnyPNcXVUHDzzu7i+oqjv793ebn9a37cohr3vAIr42aUkY7toXXAq8sG/9HOCSIXUnAT8OnAucleSgpWgmybuSXJDkE0m+A5yU5LndbxPfTnJnkr/sq39ikupbvyrJ65L8W1f/iSSPXopetf8y3LUvuAZ4VJInd6PkFwDvGlJ3DvAR4H3d+nOWsKffAF4HHAJcDewEfgs4FPhl4NwkD/X6v0Gv38cCPwz80RL2qv3QRMM9ycVJ7kly8wi1RyX5bJIvJLkpyWnL0aNWjN2j918CvgT8Z//OJI8Eng+8p6r+F/gAe07NnJDkW32PWxfRzz9X1dVV9f2q+m5VXVFVN3frNwKXAb/wEMdfVFVfqar7gH8Cjl1EL9IeJj1yfwewbsTavwDeX1XHAWcB/7BUTWlFupTeaPdFDJ+S+RVgF3B5t/5u4NQkU30111TVYX2PJyyin7v6V5KcmORfk8wm2QH8NrB6+KEA/Fff8n2AF1w1VhMN96r6HL2LXz+Q5AndHOT1Sa5M8qTd5cCjuuVDgQddCFPbquqr9C6sngZ8aEjJOfQC8s4k/0VvNLyKvgux425pYP0y4IPAmqo6FPhHIEv02tK8Dpx0A0NsBH63qr6S5Bn0RujPAl4LfCrJ79Obo3z25FrUhLwE+JGq+k6SH3zvJjkCOAU4Fbipr/4V9EL//GXo7RBge1X9T5IT6P12+dFleF1pqBUV7kkOBn4W+KfeXW0APKz7dz3wjqp6Y5ITgUuTHFNV359Aq5qAqpprjvxsYHNVfap/Y5LzgVcmOabbdOKQNxP9YlVdN4b2Xgr8TZILgc8C7wceOYbnlfZKJv3HOro3oXy0qo5J8ijgy1V1+JC6W4B1VXVXt34bcEJV3bOc/UrSvmDSF1QfpKruBW5P8nzovSklydO63XfS+9WbJE8GHg7MTqRRSVrhJjpyT/Je4GR6dxX8N/Aa4ArgrcDh9C6IXVZVr0/yFOBt9C6aFfCng7+GS5J6Jj4tI0kavxU1LSNJGo+J3S2zevXqmp6entTLS9I+6frrr/96VU3NVzexcJ+enmZmZmZSLy9J+6QkXx2lzmkZSWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lq0Ir6Yx1aPtPnfWzSLUzUHW84fdItSEvKkbskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ2aN9yTPDzJ55PcmOSWJK8bUvOwJO9LsjXJtUmml6JZSdJoRhm5fxd4VlU9DTgWWJfkhIGalwDfrKonAn8H/NV425QkLcS84V49O7vVVd2jBsrOAN7ZLX8AOCVJxtalJGlBRppzT3JAks3APcCnq+ragZIjgLsAqmoXsAN4zDgblSSNbqRwr6rvVdWxwJHA8UmOGSgZNkofHN2TZEOSmSQzs7OzC+9WkjSSBd0tU1XfAv4VWDewaxuwBiDJgcChwPYhx2+sqrVVtXZqamqvGpYkzW+Uu2WmkhzWLT8CeDbwpYGyTcA53fKZwBVVtcfIXZK0PEb5yN/DgXcmOYDeD4P3V9VHk7wemKmqTcBFwKVJttIbsZ+1ZB1LkuY1b7hX1U3AcUO2v7pv+X+A54+3NUnS3vIdqpLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAbNG+5J1iT5bJItSW5Jcu6QmpOT7EiyuXu8emnalSSN4sARanYBr6yqG5IcAlyf5NNV9cWBuiur6jnjb1GStFDzjtyr6mtVdUO3/G1gC3DEUjcmSdp7C5pzTzINHAdcO2T3iUluTPLxJE+d4/gNSWaSzMzOzi64WUnSaEaZlgEgycHAB4FXVNW9A7tvAB5XVTuTnAZ8GDh68DmqaiOwEWDt2rW1111LEzZ93scm3cJE3fGG0yfdguYx0sg9ySp6wf7uqvrQ4P6qureqdnbLlwOrkqwea6eSpJGNcrdMgIuALVX1pjlqfqyrI8nx3fN+Y5yNSpJGN8q0zDOBs4F/T7K52/Yq4CiAqroQOBN4aZJdwP3AWVXltIskTci84V5VVwGZp+YtwFvG1ZQkaXF8h6okNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUoFH+QPaKM33exybdwkTd8YbTJ92CpBXOkbskNchwl6QGGe6S1CDDXZIaNG+4J1mT5LNJtiS5Jcm5Q2qS5PwkW5PclOTpS9OuJGkUo9wtswt4ZVXdkOQQ4Pokn66qL/bVnAoc3T2eAby1+1eSNAHzjtyr6mtVdUO3/G1gC3DEQNkZwCXVcw1wWJLDx96tJGkkC5pzTzINHAdcO7DrCOCuvvVt7PkDgCQbkswkmZmdnV1Yp5KkkY0c7kkOBj4IvKKq7h3cPeSQ2mND1caqWltVa6emphbWqSRpZCOFe5JV9IL93VX1oSEl24A1fetHAncvvj1J0t4Y5W6ZABcBW6rqTXOUbQJe2N01cwKwo6q+NsY+JUkLMMrdMs8Ezgb+PcnmbturgKMAqupC4HLgNGArcB/w4vG3Kkka1bzhXlVXMXxOvb+mgJeNqylJ0uL4DlVJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGjRvuCe5OMk9SW6eY//JSXYk2dw9Xj3+NiVJC3HgCDXvAN4CXPIQNVdW1XPG0pEkadHmHblX1eeA7cvQiyRpTMY1535ikhuTfDzJU+cqSrIhyUySmdnZ2TG9tCRp0DjC/QbgcVX1NODNwIfnKqyqjVW1tqrWTk1NjeGlJUnDLDrcq+reqtrZLV8OrEqyetGdSZL22qLDPcmPJUm3fHz3nN9Y7PNKkvbevHfLJHkvcDKwOsk24DXAKoCquhA4E3hpkl3A/cBZVVVL1rEkaV7zhntVrZ9n/1vo3SopSVohfIeqJDXIcJekBhnuktQgw12SGmS4S1KDDHdJatAonwopSWM1fd7HJt3CRN3xhtOX/DUcuUtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KD5g33JBcnuSfJzXPsT5Lzk2xNclOSp4+/TUnSQowycn8HsO4h9p8KHN09NgBvXXxbkqTFmDfcq+pzwPaHKDkDuKR6rgEOS3L4uBqUJC3cOObcjwDu6lvf1m3bQ5INSWaSzMzOzo7hpSVJw4wj3DNkWw0rrKqNVbW2qtZOTU2N4aUlScOMI9y3AWv61o8E7h7D80qS9tI4wn0T8MLurpkTgB1V9bUxPK8kaS8dOF9BkvcCJwOrk2wDXgOsAqiqC4HLgdOArcB9wIuXqllJ0mjmDfeqWj/P/gJeNraOJEmL5jtUJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWrQSOGeZF2SLyfZmuS8IftflGQ2yebu8dvjb1WSNKoD5ytIcgBwAfBLwDbguiSbquqLA6Xvq6qXL0GPkqQFGmXkfjywtapuq6oHgMuAM5a2LUnSYowS7kcAd/Wtb+u2Dfq1JDcl+UCSNcOeKMmGJDNJZmZnZ/eiXUnSKEYJ9wzZVgPrHwGmq+qngM8A7xz2RFW1sarWVtXaqamphXUqSRrZKOG+DegfiR8J3N1fUFXfqKrvdqtvA356PO1JkvbGKOF+HXB0kscnOQg4C9jUX5Dk8L7V5wJbxteiJGmh5r1bpqp2JXk58EngAODiqrolyeuBmaraBPxBkucCu4DtwIuWsGdJ0jzmDXeAqrocuHxg26v7lv8c+PPxtiZJ2lu+Q1WSGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktSgkcI9ybokX06yNcl5Q/Y/LMn7uv3XJpked6OSpNHNG+5JDgAuAE4FngKsT/KUgbKXAN+sqicCfwf81bgblSSNbpSR+/HA1qq6raoeAC4DzhioOQN4Z7f8AeCUJBlfm5KkhThwhJojgLv61rcBz5irpqp2JdkBPAb4en9Rkg3Ahm51Z5Iv703TK8BqBr625ZQ2fi/yHC6O529x9uXz97hRikYJ92Ej8NqLGqpqI7BxhNdc0ZLMVNXaSfexL/McLo7nb3H2h/M3yrTMNmBN3/qRwN1z1SQ5EDgU2D6OBiVJCzdKuF8HHJ3k8UkOAs4CNg3UbALO6ZbPBK6oqj1G7pKk5THvtEw3h/5y4JPAAcDFVXVLktcDM1W1CbgIuDTJVnoj9rOWsukVYJ+fWloBPIeL4/lbnObPXxxgS1J7fIeqJDXIcJekBhnuA5Ls7Fs+LclXkhyV5LVJ7kvyo3PUVpI39q3/cZLXLlvjK0iSX+nOx5O69ekk9yf5QpItST6f5Jwhx/1LkquXv+OVK8ljk7wnyW1Jrk9ydXd+T06yI8nmJDcl+Uz/96Z6knyvO0c3J/lIksO67bu/Jzf3PQ6adL/jZLjPIckpwJuBdVV1Z7f568Ar5zjku8CvJlm9HP2tcOuBq3jwhfVbq+q4qnpyt/0Pk7x4987uP93TgcOSPH5Zu12hund5fxj4XFX9RFX9NL1zd2RXcmVVHVtVP0XvrraXTajVlez+7hwdQ+9mj/5zdGu3b/fjgQn1uCQM9yGSnAS8DTi9qm7t23Ux8IIkjx5y2C56V+D/cBlaXLGSHAw8k97nDQ29a6qqbgP+CPiDvs2/BnyE3sdbtH631aieBTxQVRfu3lBVX62qN/cXdT8EDgG+ucz97Wuupvdu+v2C4b6nhwH/Ajyvqr40sG8nvYA/d45jLwB+M8mhS9jfSvc84BNV9R/A9iRPn6PuBuBJfevrgfd2j/VL2+I+46n0ztNcTkqyGbgTeDa9700N0X0A4ik8+D06T+ibkrlgQq0tGcN9T/8L/Bu9kecw5wPnJHnU4I6quhe4hAePSPc36+mNvun+nSuof/CRFUkeCzwRuKr7obAryTFL2uU+KMkFSW5Mcl23afe0zBrg7cBfT7C9leoR3Q/AbwCPBj7dt69/Wqa5KS3DfU/fB34d+JkkrxrcWVXfAt4D/N4cx/89vR8MP7xkHa5QSR5DbyrhH5PcAfwJ8AKGf/bQccCWbvkFwI8At3fHTePUDMAt9K5DANAF0CnA1JDaTcDPL1Nf+5L7q+pYeh+2dRD70XUJw32IqroPeA69KZZhI/g3Ab/DkHf4VtV24P3MPfJv2ZnAJVX1uKqa7kaUt/P/FwCB3p0KwN/Su2ANvdH9uu6YaWD3hcP93RXAw5O8tG/bI+eo/Tng1jn27feqage936j/OMmqSfezHAz3OXQhvQ74iyRnDOz7OvDP9Obnh3kjvY8U3d+sp3de+n0QeBW9+c0vJNlC74ffm6vq7V3QHwVcs/uAqroduDfJ4EdL71e6z2d6HvALSW5P8nl6fzfhz7qSk7r54huBs5n7Ti4BVfUF4Eb2k4GDHz8gSQ1y5C5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoP+D9bs5l7GT47SAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"class_names = np.array(['KNN', 'ADA', 'GB', 'RF'])\n",
"plt.bar(range(maes.shape[0]), maes[:, 0])\n",
"plt.xticks(range(maes.shape[0]), class_names)\n",
"plt.title('MAE Train')"
]
},
{
"cell_type": "code",
"execution_count": 284,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5,1,'MAE Test')"
]
},
"execution_count": 284,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAEoFJREFUeJzt3X+w5XVdx/HnK3ZRC4V0r4mwcE0pf6CAboQZxUg2KziiiclWiGazjWliaUWOo+SMM9iUFkjSmqSQv39ka+KvBhtwEvSCC4krtfwQNjCurC6SJK69++N8t45nz/Wee++5e+5+9vmY+c6e7/fzPue873d2X+dzP+d7zqaqkCS15Ucm3YAkafwMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw137jCS3Jrk/yZqB41uSVJLpgePndsePHzj+oiTfT3LvwPbIgbojBsYryX/17Z+4hJ/l60l+frH3l+ZjuGtfcwuwYfdOkicCDxosShLgTGAHcNaQx/l8VR00sN3RX1BVt/WPd4eP6Tt25dh+KmnMDHftay4FXti3fxZwyZC6E4FHAmcDZyQ5cDmaSfKgJH+R5PZuNn5Bkgd0Y49I8skk30pyd5LLu+MfBB4OfLr7DeAVy9Gb9m+Gu/Y1VwEPSfK4JAcALwD+bkjdWcDHgPd3+89apn7eAhwOPBH4aeCngHO6sT8CbgTWAIcC5wJU1fOBu4Bf7n4DOH+ZetN+bKLhnuTiJHcl+fIItUck+WySLyW5Pskpe6NHrUi7Z+/PAL4K/Ef/YJIfBZ4PvKeqvgd8iD2XZk7oZtS7t5sW2kSSVcBvAmdX1beqaidwHnBGV/I9er89HFFV91fVFQt9DmmxJj1zfyewfsTa1wIfqKrj6P3j+avlakor3qXArwEvYviSzHOBXcBl3f67gWcmmeqruaqqDunbHr2IPh4JrAZu2P0iAXyU3pILwBuBO4DPJtmW5PcX8RzSokw03LuZzI7+Y0ke3a1TXpPkyiSP3V0OPKS7fTC9fzTaD1XV1+i9sXoK8JEhJWcBBwG3Jfk68EF6IbxhSO1S3EnvReTRfS8SB1fVw7o+d1bV2VV1JPA84LVJnrb7xxhzL9IPmPTMfZhNwO9W1VOAV/P/M/Rzgd9Isp3ejOx3J9OeVoiXAE+vqv/qP5jkMOBkemvsx3bbMcCbGH7VzKJ1Sz4XA3+ZZE161iZ5RtfLs5M8qrtyZyfw/W4D+E/gJ8fZj9RvRYV7koOAnwM+mGQL8Nf03oiC3qzrnVV1OL0Z26VJVlT/2nuq6qaqmhkydCawpao+XVVf370B5wNPSnJ0V/fUIde5/8wiWnklvd8iZ+gF+CeBx3RjjwP+Gfg2cAXwZ1V1VTf2RuCN3XLOyxfxvNIPlUn/Zx3dB0/+saqOTvIQ4MaqOnRI3Q3A+qq6vdu/GTihqu7am/1K0r5gRc18q+oe4JYkz4feB1GSHNMN30bv122SPA54IDA7kUYlaYWb6Mw9yXuBk+hdB/yfwOuBy4G30VuOWQ28r6rekOTxwNvpvVFWwB9W1acn0bckrXQTX5aRJI3filqWkSSNx6pJPfGaNWtqenp6Uk8vSfuka6655htVNTVf3cTCfXp6mpmZYVeySZLmkuRro9S5LCNJDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ2a2CdUNVnT53x80i1M1K3nnTrpFqRl5cxdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGzRvuSR6Y5AtJrktyQ5I/GVLzgCTvT7ItydVJppejWUnSaEaZuX8XeHpVHQMcC6xPcsJAzUuAb1bVY4C3AG8ab5uSpIWYN9yr595ud3W31UDZacC7utsfAk5OkrF1KUlakJHW3JMckGQLcBfwmaq6eqDkMOB2gKraBewEHjbORiVJoxsp3Kvq+1V1LHA4cHySowdKhs3SB2f3JNmYZCbJzOzs7MK7lSSNZEFXy1TVt4B/BtYPDG0H1gIkWQUcDOwYcv9NVbWuqtZNTU0tqmFJ0vxGuVpmKskh3e0HAb8EfHWgbDNwVnf7dODyqtpj5i5J2jtG+crfQ4F3JTmA3ovBB6rqH5O8AZipqs3AO4BLk2yjN2M/Y9k6liTNa95wr6rrgeOGHH9d3+3/Bp4/3tYkSYvlJ1QlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNmjfck6xN8tkkW5PckOTsITUnJdmZZEu3vW552pUkjWLVCDW7gFdV1bVJHgxck+QzVfWVgborq+pZ429RkrRQ887cq+rOqrq2u/1tYCtw2HI3JklavAWtuSeZBo4Drh4y/NQk1yX5RJInzHH/jUlmkszMzs4uuFlJ0mhGWZYBIMlBwIeBV1bVPQPD1wJHVtW9SU4BPgocNfgYVbUJ2ASwbt26WnTX0oRNn/PxSbcwUbeed+qkW9A8Rpq5J1lNL9jfXVUfGRyvqnuq6t7u9mXA6iRrxtqpJGlko1wtE+AdwNaqevMcNY/o6khyfPe4d4+zUUnS6EZZlnkacCbwr0m2dMdeAxwBUFUXAacDL02yC7gPOKOqXHaRpAmZN9yr6nNA5ql5K/DWcTUlSVoaP6EqSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNWiU/yB7xZk+5+OTbmGibj3v1Em3IGmFc+YuSQ0y3CWpQYa7JDXIcJekBs0b7knWJvlskq1Jbkhy9pCaJDk/ybYk1yd58vK0K0kaxShXy+wCXlVV1yZ5MHBNks9U1Vf6ap4JHNVtPwu8rftTkjQB887cq+rOqrq2u/1tYCtw2EDZacAl1XMVcEiSQ8ferSRpJAtac08yDRwHXD0wdBhwe9/+dvZ8ASDJxiQzSWZmZ2cX1qkkaWQjh3uSg4APA6+sqnsGh4fcpfY4ULWpqtZV1bqpqamFdSpJGtlI4Z5kNb1gf3dVfWRIyXZgbd/+4cAdS29PkrQYo1wtE+AdwNaqevMcZZuBF3ZXzZwA7KyqO8fYpyRpAUa5WuZpwJnAvybZ0h17DXAEQFVdBFwGnAJsA74DvHj8rUqSRjVvuFfV5xi+pt5fU8DLxtWUJGlp/ISqJDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ2aN9yTXJzkriRfnmP8pCQ7k2zptteNv01J0kKsGqHmncBbgUt+SM2VVfWssXQkSVqyeWfuVXUFsGMv9CJJGpNxrbk/Ncl1ST6R5AlzFSXZmGQmyczs7OyYnlqSNGgc4X4tcGRVHQNcAHx0rsKq2lRV66pq3dTU1BieWpI0zJLDvaruqap7u9uXAauTrFlyZ5KkRVtyuCd5RJJ0t4/vHvPupT6uJGnx5r1aJsl7gZOANUm2A68HVgNU1UXA6cBLk+wC7gPOqKpato4lSfOaN9yrasM842+ld6mkJGmF8BOqktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGjfCukJI3V9Dkfn3QLE3Xreacu+3M4c5ekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGzRvuSS5OcleSL88xniTnJ9mW5PokTx5/m5KkhRhl5v5OYP0PGX8mcFS3bQTetvS2JElLMW+4V9UVwI4fUnIacEn1XAUckuTQcTUoSVq4cay5Hwbc3re/vTu2hyQbk8wkmZmdnR3DU0uShhlHuGfIsRpWWFWbqmpdVa2bmpoaw1NLkoYZR7hvB9b27R8O3DGGx5UkLdI4wn0z8MLuqpkTgJ1VdecYHleStEir5itI8l7gJGBNku3A64HVAFV1EXAZcAqwDfgO8OLlalaSNJp5w72qNswzXsDLxtaRJGnJ/ISqJDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0aKdyTrE9yY5JtSc4ZMv6iJLNJtnTbb42/VUnSqFbNV5DkAOBC4BnAduCLSTZX1VcGSt9fVS9fhh4lSQs0ysz9eGBbVd1cVfcD7wNOW962JElLMUq4Hwbc3re/vTs26HlJrk/yoSRrhz1Qko1JZpLMzM7OLqJdSdIoRgn3DDlWA/sfA6ar6knAPwHvGvZAVbWpqtZV1bqpqamFdSpJGtko4b4d6J+JHw7c0V9QVXdX1Xe73bcDTxlPe5KkxRgl3L8IHJXkUUkOBM4ANvcXJDm0b/fZwNbxtShJWqh5r5apql1JXg58CjgAuLiqbkjyBmCmqjYDr0jybGAXsAN40TL2LEmax7zhDlBVlwGXDRx7Xd/tPwb+eLytSZIWy0+oSlKDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGjRSuCdZn+TGJNuSnDNk/AFJ3t+NX51ketyNSpJGN2+4JzkAuBB4JvB4YEOSxw+UvQT4ZlU9BngL8KZxNypJGt0oM/fjgW1VdXNV3Q+8DzhtoOY04F3d7Q8BJyfJ+NqUJC3EqhFqDgNu79vfDvzsXDVVtSvJTuBhwDf6i5JsBDZ2u/cmuXExTa8Aaxj42famtPF7kedwaTx/S7Mvn78jRykaJdyHzcBrETVU1SZg0wjPuaIlmamqdZPuY1/mOVwaz9/S7A/nb5Rlme3A2r79w4E75qpJsgo4GNgxjgYlSQs3Srh/ETgqyaOSHAicAWweqNkMnNXdPh24vKr2mLlLkvaOeZdlujX0lwOfAg4ALq6qG5K8AZipqs3AO4BLk2yjN2M/YzmbXgH2+aWlFcBzuDSev6Vp/vzFCbYktcdPqEpSgwx3SWqQ4T4gyb19t09J8u9JjkhybpLvJHn4HLWV5M/79l+d5Ny91vgKkuS53fl4bLc/neS+JF9KsjXJF5KcNeR+/5Dk83u/45UryU8keU+Sm5Nck+Tz3fk9KcnOJFuSXJ/kn/r/bqonyfe7c/TlJB9Lckh3fPffyS1924GT7necDPc5JDkZuABYX1W3dYe/Abxqjrt8F/iVJGv2Rn8r3Abgc/zgG+s3VdVxVfW47vjvJXnx7sHuH92TgUOSPGqvdrtCdZ/y/ihwRVX9ZFU9hd65O7wrubKqjq2qJ9G7qu1lE2p1JbuvO0dH07vYo/8c3dSN7d7un1CPy8JwHyLJicDbgVOr6qa+oYuBFyR56JC77aL3Dvzv7YUWV6wkBwFPo/d9Q0Ovmqqqm4HfB17Rd/h5wMfofb1F61dbjerpwP1VddHuA1X1taq6oL+oexF4MPDNvdzfvubz9D5Nv18w3Pf0AOAfgOdU1VcHxu6lF/Bnz3HfC4FfT3LwMva30j0H+GRV/RuwI8mT56i7Fnhs3/4G4L3dtmF5W9xnPIHeeZrLiUm2ALcBv0Tv76aG6L4A8WR+8DM6j+5bkrlwQq0tG8N9T98D/oXezHOY84GzkjxkcKCq7gEu4QdnpPubDfRm33R/zhXU//eVFUl+AngM8LnuRWFXkqOXtct9UJILk1yX5Ivdod3LMmuBvwX+dILtrVQP6l4A7wYeCnymb6x/Waa5JS3DfU//A/wq8DNJXjM4WFXfAt4D/M4c9/8Lei8MP7ZsHa5QSR5Gbynhb5LcCvwB8AKGf/fQccDW7vYLgB8HbunuN41LMwA30HsfAoAugE4GpobUbgZ+YS/1tS+5r6qOpfdlWweyH70vYbgPUVXfAZ5Fb4ll2Az+zcBvM+QTvlW1A/gAc8/8W3Y6cElVHVlV092M8hb+/w1AoHelAvBn9N6wht7sfn13n2lg9xuH+7vLgQcmeWnfsR+do/bngZvmGNvvVdVOer9RvzrJ6kn3szcY7nPoQno98Nokpw2MfQP4e3rr88P8Ob2vFN3fbKB3Xvp9GHgNvfXNLyXZSu/F74Kq+tsu6I8Artp9h6q6BbgnyeBXS+9Xuu9neg7wi0luSfIFev9vwh91JSd268XXAWcy95VcAqrqS8B17CcTB79+QJIa5MxdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QG/S806QHj5O9/ZAAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"class_names = np.array(['KNN', 'ADA', 'GB', 'RF'])\n",
"plt.bar(range(maes.shape[0]), maes[:, 0])\n",
"plt.xticks(range(maes.shape[0]), class_names)\n",
"plt.title('MAE Test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dificuldades"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Trabalhar com textos nas features do dataset.\n",
"- Substituir valores que estão faltando.\n",
"- Saber quando remover uma feature."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Aprendizados"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Utilização das ferramentas ja existentes para realizar atividades que seriam realizadas manualmente.\n",
"- Um pouco de conhecimento sobre como funciona uma predição onde as features contém textos."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Possíveis melhorias futuras"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- O score que é retornada é, portanto, negativo quando o score deve ser minimizado e positivo se for um score que deva ser maximizado. Portanto, minizar o score é uma melhoria futura.\n",
"- Verificar o comportamento com as features que não foram utilizadas.\n",
"- Utilizar outros algoritmos para realizar a predição."
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [
"OHzogV28EwBJ",
"SUuskyQsEwBP",
"TuGX7DRrEwBW",
"3qa4BYJUEwBb",
"iILrtpDxEwBi",
"uIOf83lkEwBm",
"JeIbKYJCEwBz",
"FDBRPSj_EwB8",
"ztzrX_FOEwCE",
"slLrezFsEwCZ",
"WUHN-XxLEwCv",
"tL-laH_pEwC-",
"xDIMGEN7EwDG",
"3UqZ9i79EwDN",
"-EhpTVryEwD9",
"btL2O-tJEwEn"
],
"default_view": {},
"name": "Job Salary Prediction.ipynb",
"provenance": [],
"version": "0.3.2",
"views": {}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment