Skip to content

Instantly share code, notes, and snippets.

@borgeslt
Created July 20, 2020 11:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save borgeslt/0ef5e7a9f30674a7f796ef74e3e4e0e0 to your computer and use it in GitHub Desktop.
Save borgeslt/0ef5e7a9f30674a7f796ef74e3e4e0e0 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "IGTI_Trabalho_prático_1.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyMSLLSxKQmpT4qAmTSgXvNz"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "I9W-ehIbbUX7",
"colab_type": "text"
},
"source": [
"# *Bootcamp* IGTI - Analista de *Machine Learning*: Projeto Prático 1 -- Fundamentos\n",
"\n",
"Aplicação dos conceitos de análise e modelamento de *Machine Learning* aprendidos no Módulo 1 do Bootcamp.\n",
"\n",
"\n",
"**Objetivos:**\n",
"* Conhecimento do dataset\n",
"* Limpeza dos dados\n",
"* Identificação de *Outliers*\n",
"* Análise de regressão linear\n",
"\n",
"\n",
"Para qualquer aplicação que utilize algoritmos de *Machine Learning*, precisamos realizar 7 etapas básicas:\n",
"\n",
"* Coleta de dados\n",
"* Preparação dos dados\n",
"* Treinamento do modelo\n",
"* Avaliação do modelo\n",
"* Sintonia dos parâmetros\n",
"* Previsão\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZoxNm0NxyeXT",
"colab_type": "text"
},
"source": [
"Mas antes de começar, vamos entender...\n",
"\n",
"> ### **O que é Regressão Linear?**\n",
"\n",
"É um dos algoritmos de *Machine Learning* mais conhecidos, ele é utilizado para estimar valores reais baseado na relação entre variáveis dependentes e independentes contínuas. \n",
"\n",
"Essa relação pode ser traduzida para uma equação matemática que tem como saída uma linha na qual vamos ajustando nosso parâmetros, para chegar o mais próximo possível dessa linha com os resultados do modelo de ML criado. \n",
"\n",
"Essa linha é conhecida como **Linha de Regressão** e é representado por uma equação linear:\n",
"\n",
"<p align=\"center\"> Y = a * x + b </p>\n",
"\n",
"Onde:\n",
"* Y - Variável Dependente\n",
"* a - Coeficiente Angular\n",
"* x - Variável Independente\n",
"* b - Intercepção\n",
"\n",
"Os coeficientes a e b são derivados baseados na minimização da soma dos quadrados da diferença da distância entre os pontos da regressão linear.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "DvtuDkI7c6z_",
"colab_type": "code",
"colab": {}
},
"source": [
"# importar bibliotecas\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n"
],
"execution_count": 13,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "5UBSMbtPdPoc",
"colab_type": "code",
"colab": {
"resources": {
"http://localhost:8080/nbextensions/google.colab/files.js": {
"data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCkgewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwogICAgICBwZXJjZW50LnRleHRDb250ZW50ID0KICAgICAgICAgIGAke01hdGgucm91bmQoKHBvc2l0aW9uIC8gZmlsZURhdGEuYnl0ZUxlbmd0aCkgKiAxMDApfSUgZG9uZWA7CiAgICB9CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK",
"ok": true,
"headers": [
[
"content-type",
"application/javascript"
]
],
"status": 200,
"status_text": ""
}
},
"base_uri": "https://localhost:8080/",
"height": 76
},
"outputId": "46453f6c-ed62-45c6-f182-21bce1e630fa"
},
"source": [
"# upload do aquivo data.csv\n",
"from google.colab import files\n",
"uploaded = files.upload()"
],
"execution_count": 14,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <input type=\"file\" id=\"files-b0eb8692-87dc-4169-a94f-6afb07009eed\" name=\"files[]\" multiple disabled\n",
" style=\"border:none\" />\n",
" <output id=\"result-b0eb8692-87dc-4169-a94f-6afb07009eed\">\n",
" Upload widget is only available when the cell has been executed in the\n",
" current browser session. Please rerun this cell to enable.\n",
" </output>\n",
" <script src=\"/nbextensions/google.colab/files.js\"></script> "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"Saving data.csv to data (1).csv\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "WmGYOOVKddCH",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 230
},
"outputId": "c0113a76-0027-48a8-bf47-c959b5e935b1"
},
"source": [
"# importar o dataframe\n",
"df = pd.read_csv(\"data.csv\")\n",
"\n",
"# visualizar as primeiras entradas do dataset\n",
"df.head()"
],
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>valid_import</th>\n",
" <th>item</th>\n",
" <th>importer_id</th>\n",
" <th>exporter_id</th>\n",
" <th>country_of_origin</th>\n",
" <th>declared_quantity</th>\n",
" <th>declared_cost</th>\n",
" <th>mode_of_transport</th>\n",
" <th>route</th>\n",
" <th>date_of_departure</th>\n",
" <th>date_of_arrival</th>\n",
" <th>declared_weight</th>\n",
" <th>actual_weight</th>\n",
" <th>days_in_transit</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>True</td>\n",
" <td>cigar</td>\n",
" <td>111</td>\n",
" <td>222</td>\n",
" <td>India</td>\n",
" <td>129</td>\n",
" <td>3784.402551</td>\n",
" <td>sea</td>\n",
" <td>asia</td>\n",
" <td>04/25/2019</td>\n",
" <td>05/13/2019</td>\n",
" <td>1608.605135</td>\n",
" <td>1637.661221</td>\n",
" <td>18.232857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>True</td>\n",
" <td>cigar</td>\n",
" <td>111</td>\n",
" <td>222</td>\n",
" <td>India</td>\n",
" <td>104</td>\n",
" <td>3081.350806</td>\n",
" <td>sea</td>\n",
" <td>america</td>\n",
" <td>04/22/2019</td>\n",
" <td>05/24/2019</td>\n",
" <td>831.719301</td>\n",
" <td>848.273419</td>\n",
" <td>32.436029</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>True</td>\n",
" <td>cigar</td>\n",
" <td>111</td>\n",
" <td>222</td>\n",
" <td>India</td>\n",
" <td>130</td>\n",
" <td>4414.125741</td>\n",
" <td>sea</td>\n",
" <td>europe</td>\n",
" <td>04/29/2019</td>\n",
" <td>05/16/2019</td>\n",
" <td>1527.704165</td>\n",
" <td>1582.063911</td>\n",
" <td>16.996206</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>True</td>\n",
" <td>cigar</td>\n",
" <td>111</td>\n",
" <td>222</td>\n",
" <td>India</td>\n",
" <td>143</td>\n",
" <td>2533.535991</td>\n",
" <td>sea</td>\n",
" <td>panama</td>\n",
" <td>05/05/2019</td>\n",
" <td>05/25/2019</td>\n",
" <td>1138.680563</td>\n",
" <td>1179.993817</td>\n",
" <td>19.965886</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>True</td>\n",
" <td>cigar</td>\n",
" <td>111</td>\n",
" <td>222</td>\n",
" <td>China</td>\n",
" <td>141</td>\n",
" <td>4396.397887</td>\n",
" <td>sea</td>\n",
" <td>asia</td>\n",
" <td>05/14/2019</td>\n",
" <td>06/05/2019</td>\n",
" <td>761.744581</td>\n",
" <td>781.735080</td>\n",
" <td>22.160034</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" valid_import item ... actual_weight days_in_transit\n",
"0 True cigar ... 1637.661221 18.232857\n",
"1 True cigar ... 848.273419 32.436029\n",
"2 True cigar ... 1582.063911 16.996206\n",
"3 True cigar ... 1179.993817 19.965886\n",
"4 True cigar ... 781.735080 22.160034\n",
"\n",
"[5 rows x 14 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 15
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "yVUhl6JFdtWg",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 395
},
"outputId": "f98d0983-4d23-4ade-e40c-e71e9ff3f973"
},
"source": [
"# ver as características do dataset\n",
"df.info()"
],
"execution_count": 16,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 120 entries, 0 to 119\n",
"Data columns (total 14 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 valid_import 120 non-null bool \n",
" 1 item 120 non-null object \n",
" 2 importer_id 120 non-null int64 \n",
" 3 exporter_id 120 non-null int64 \n",
" 4 country_of_origin 120 non-null object \n",
" 5 declared_quantity 120 non-null int64 \n",
" 6 declared_cost 120 non-null float64\n",
" 7 mode_of_transport 120 non-null object \n",
" 8 route 120 non-null object \n",
" 9 date_of_departure 120 non-null object \n",
" 10 date_of_arrival 120 non-null object \n",
" 11 declared_weight 120 non-null float64\n",
" 12 actual_weight 120 non-null float64\n",
" 13 days_in_transit 120 non-null float64\n",
"dtypes: bool(1), float64(4), int64(3), object(6)\n",
"memory usage: 12.4+ KB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yLf18PbLm0pw",
"colab_type": "text"
},
"source": [
"### **Quantas colunas e linhas existem no *dataset***"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TYAnb3aEnCxP",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 53
},
"outputId": "71028984-dbb9-48e1-ea90-8b8ed58db82f"
},
"source": [
"print(\"Número de Linhas:\", df.shape[0])\n",
"print(\"Número de Colunas:\", df.shape[1])"
],
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"text": [
"Número de Linhas: 120\n",
"Número de Colunas: 14\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3Ii8nMvGk2c_",
"colab_type": "text"
},
"source": [
"### **Vamos analisar se existem colunas com valores nulos**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "AaxhVlfYlMOa",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 287
},
"outputId": "647873b4-8bd3-4968-9b68-b522df022cc3"
},
"source": [
"df.isnull().sum()"
],
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"valid_import 0\n",
"item 0\n",
"importer_id 0\n",
"exporter_id 0\n",
"country_of_origin 0\n",
"declared_quantity 0\n",
"declared_cost 0\n",
"mode_of_transport 0\n",
"route 0\n",
"date_of_departure 0\n",
"date_of_arrival 0\n",
"declared_weight 0\n",
"actual_weight 0\n",
"days_in_transit 0\n",
"dtype: int64"
]
},
"metadata": {
"tags": []
},
"execution_count": 18
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "k0LNJy7zlPJ-",
"colab_type": "text"
},
"source": [
"### **Vamos analisar as estatísticas do *Dataset***"
]
},
{
"cell_type": "code",
"metadata": {
"id": "_7wthDB3l-A8",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 326
},
"outputId": "2e30b5ab-d0a7-4135-dd13-6f106d178578"
},
"source": [
"df.describe()"
],
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>importer_id</th>\n",
" <th>exporter_id</th>\n",
" <th>declared_quantity</th>\n",
" <th>declared_cost</th>\n",
" <th>declared_weight</th>\n",
" <th>actual_weight</th>\n",
" <th>days_in_transit</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>120.0</td>\n",
" <td>120.0</td>\n",
" <td>120.000000</td>\n",
" <td>120.000000</td>\n",
" <td>120.000000</td>\n",
" <td>120.000000</td>\n",
" <td>120.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>111.0</td>\n",
" <td>222.0</td>\n",
" <td>127.458333</td>\n",
" <td>6743.649881</td>\n",
" <td>1264.702934</td>\n",
" <td>1306.429806</td>\n",
" <td>35.424705</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>14.641311</td>\n",
" <td>2991.797050</td>\n",
" <td>633.149971</td>\n",
" <td>656.911704</td>\n",
" <td>26.571591</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>111.0</td>\n",
" <td>222.0</td>\n",
" <td>100.000000</td>\n",
" <td>1441.012419</td>\n",
" <td>18.459509</td>\n",
" <td>19.275241</td>\n",
" <td>12.410325</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>111.0</td>\n",
" <td>222.0</td>\n",
" <td>115.750000</td>\n",
" <td>4442.903914</td>\n",
" <td>820.314400</td>\n",
" <td>841.763738</td>\n",
" <td>18.225625</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>111.0</td>\n",
" <td>222.0</td>\n",
" <td>131.500000</td>\n",
" <td>6010.218745</td>\n",
" <td>1255.597743</td>\n",
" <td>1305.716419</td>\n",
" <td>27.044293</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>111.0</td>\n",
" <td>222.0</td>\n",
" <td>139.000000</td>\n",
" <td>8887.095370</td>\n",
" <td>1711.314045</td>\n",
" <td>1763.681083</td>\n",
" <td>44.356374</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>111.0</td>\n",
" <td>222.0</td>\n",
" <td>149.000000</td>\n",
" <td>14281.325362</td>\n",
" <td>2806.338955</td>\n",
" <td>2918.681683</td>\n",
" <td>147.787560</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" importer_id exporter_id ... actual_weight days_in_transit\n",
"count 120.0 120.0 ... 120.000000 120.000000\n",
"mean 111.0 222.0 ... 1306.429806 35.424705\n",
"std 0.0 0.0 ... 656.911704 26.571591\n",
"min 111.0 222.0 ... 19.275241 12.410325\n",
"25% 111.0 222.0 ... 841.763738 18.225625\n",
"50% 111.0 222.0 ... 1305.716419 27.044293\n",
"75% 111.0 222.0 ... 1763.681083 44.356374\n",
"max 111.0 222.0 ... 2918.681683 147.787560\n",
"\n",
"[8 rows x 7 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 19
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4-7nemEm4K-j",
"colab_type": "text"
},
"source": [
"Pela tabela podemos encontrar, facilmente, informações importantes como a média e o desvio-padrão das variáveis."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EObtsGfFnNx9",
"colab_type": "text"
},
"source": [
"### **Existem *outliers* nas variáveis ``declared_quantity`` e ``days_in_transit``?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "5ukSxC7rnWQ3",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 266
},
"outputId": "62a58d1f-531a-46eb-bbfe-4f2704217649"
},
"source": [
"# identificar possíveis outliers\n",
"df[['declared_quantity', 'days_in_transit']].boxplot();"
],
"execution_count": 20,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD5CAYAAADcDXXiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAWEUlEQVR4nO3de5CddX3H8ffHBCHIXXAHkwzLYAYWg1DdIkK0u4ZSEDXpDGXconLZIUOlEfGW6LZDM9MtydgqUC0aXUiwcbkpkCaSQuOeoQGhEiDcFjUDQYLB6CjIAqUkfvvH+SWebPZ2bns2v3xeM2f2Ob/n9s3Z3/nsk995nvMoIjAzs7y8qdEFmJlZ7Tnczcwy5HA3M8uQw93MLEMOdzOzDE1udAEAhx9+eDQ3Nze6jGy88sorvOUtb2l0GWa7cd+srfXr1/8mIo4Yat6ECPfm5mYefPDBRpeRjUKhQFtbW6PLMNuN+2ZtSXp2uHkeljEzy5DD3cwsQw53M7MMOdzNzDLkcDczy5DD3cwsQw53M7MMOdzNrO56e3uZOXMms2fPZubMmfT29ja6pOxNiIuYrDKSyl7H399v4623t5euri56enrYvn07kyZNorOzE4COjo4GV5cvH7nvwSJiyMdRC1YNO89svHV3d9PT00N7ezuTJ0+mvb2dnp4euru7G11a1hzuZlZX/f39zJo1a5e2WbNm0d/f36CK9g4OdzOrq5aWFtatW7dL27p162hpaWlQRXsHh7uZ1VVXVxednZ309fWxbds2+vr66OzspKurq9GlZc0fqJpZXe340HT+/Pn09/fT0tJCd3e3P0ytM4e7mdVdR0cHHR0d/srfceRhGTOzDDnczcwy5HA3M8vQqOEu6TpJWyU9PsS8z0kKSYen55J0jaSNkh6V9O56FG1mZiMby5H7MuDMwY2SpgNnAL8oaT4LmJEe84Brqy/RzMzKNWq4R8Q9wG+HmPU14ItA6TXtc4Abouh+4BBJR9akUjMzG7OKToWUNAd4PiI2DPryqqnAcyXPN6e2LUNsYx7Fo3uampooFAqVlGLD8OtpE9HAwID75jgpO9wl7Q98meKQTMUiYimwFKC1tTV87msNrVntc4ltQvJ57uOnkiP3Y4CjgR1H7dOAhySdDDwPTC9ZdlpqswqduOguXnrtjbLXa164eszLHjxlHzZcUdXfajObYMoO94h4DHjbjueSNgGtEfEbSSuBv5V0I/Be4KWI2G1IxsbupdfeYNPis8tap9yjo3L+EJjZnmEsp0L2Aj8GjpW0WVLnCIv/EHga2Ah8G/hUTao0sz2a78Q0/kY9co+IEb/dJyKaS6YDuLT6sswsF74TU2P4ClUzqyvfiakxHO5mVle+E1NjONzNrK58J6bGcLibWV35TkyN4Zt1mFld+U5MjeFwN7O6852Yxp+HZczMMuQj9wnuwJaFnLB8YfkrLi9nHwDlXQVrZhObw32Ce7l/sb9+wPZ4vb29dHd37xxz7+rq8ph7nTnczayufIVqY3jM3czqyleoNobD3czqyleoNobD3czqyleoNobD3czqyleoNoY/UDWzuvIVqo3hcDezuvMVquPPwzJmVne+E9P485G7mdWVz3NvDB+5m1ld+Tz3xnC4m1ld+Tz3xhg13CVdJ2mrpMdL2r4i6SlJj0q6TdIhJfO+JGmjpJ9K+ot6FW5me4aWlhYWLVq0y5j7okWLfJ57nY3lyH0ZcOagtruBmRHxLuBnwJcAJB0PfAx4Z1rn3yRNqlm1ZrbHaW9vZ8mSJVx00UWsXr2aiy66iCVLltDe3t7o0rI26geqEXGPpOZBbXeVPL0fOCdNzwFujIjXgWckbQROBn5ck2rNbI/T19fHggULuO6663ae575gwQJuv/32RpeWtVqcLXMRcFOankox7HfYnNp2I2keMA+gqamJQqFQg1LyVO5rMzAwUPY6fv2tXvr7+7nqqqs4/fTTGRgY4IADDmDbtm1ceeWV7nd1VFW4S+oCtgEryl03IpYCSwFaW1vDFzYMY83qsi/6KPtCkQr2YTZWLS0tTJo0iba2tp19s6+vj5aWFve7Oqr4bBlJFwAfBs6LiEjNzwPTSxabltrMbC/l75ZpjIqO3CWdCXwR+LOIeLVk1krge5K+CrwdmAH8T9VVmtkeq6Ojg/vuu4+zzjqL119/nX333ZeLL77YFzDV2ajhLqkXaAMOl7QZuILi2TH7AndLArg/Ii6JiCck3Qw8SXG45tKI2F6v4s1s4uvt7WX16tXceeedu1yheuqppzrg62jUYZmI6IiIIyNin4iYFhE9EfGOiJgeESelxyUly3dHxDERcWxE3Fnf8s1sovMVqo3h75bZA1R0A+s1Y1/n4Cn7lL99szHyFaqN4XCf4DYtPrvsdZoXrq5oPbN62HEnptKLlnwnpvrzd8uYWV35bJnG8JG7mdWV78TUGA53M6s734lp/HlYxswsQw53M7MMOdzNzDLkcDczy5DD3cwsQw53M7MMOdzNzDLkcDczy5DD3cwsQw53M7MMOdzNzDLkcDczy5DD3cwsQw53M7MMOdzNzDI0arhLuk7SVkmPl7QdJuluST9PPw9N7ZJ0jaSNkh6V9O56Fm9mZkMby5H7MuDMQW0LgbURMQNYm54DnAXMSI95wLW1KdPMzMoxarhHxD3Abwc1zwGWp+nlwNyS9hui6H7gEElH1qpYMzMbm0pvs9cUEVvS9AtAU5qeCjxXstzm1LaFQSTNo3h0T1NTE4VCocJS9l6ld5MfTEuGbu/r66tTNWajGxgY8Ht9nFR9D9WICElRwXpLgaUAra2t4fsqli9i6Jfd96m0icp9c/xUerbMr3YMt6SfW1P788D0kuWmpTYzMxtHlYb7SuD8NH0+cEdJ+yfTWTOnAC+VDN+Ymdk4GXVYRlIv0AYcLmkzcAWwGLhZUifwLHBuWvyHwIeAjcCrwIV1qNnMzEYxarhHRMcws2YPsWwAl1ZblJmZVcdXqJqZZcjhbmZ119vby8yZM5k9ezYzZ86kt7e30SVlr+pTIc3MRtLb20tXVxc9PT1s376dSZMm0dnZCUBHx3CjvlYtH7mbWV11d3fT09NDe3s7kydPpr29nZ6eHrq7uxtdWtYc7mZWV/39/cyaNWuXtlmzZtHf39+givYODnczq6uWlhbWrVu3S9u6detoaWlpUEV7B4e7mdVVV1cXnZ2d9PX1sW3bNvr6+ujs7KSrq6vRpWXNH6iaWV3t+NB0/vz59Pf309LSQnd3tz9MrTOHu5nVXUdHBx0dHf7isHHkYRkzsww53M3MMuRwNzPLkMPdzCxDDnczsww53M3MMuRwNzPLkMPdzCxDDnczsww53M3MMuRwNzPLUFXhLulySU9IelxSr6T9JB0t6QFJGyXdJOnNtSrWzMzGpuJwlzQV+DTQGhEzgUnAx4AlwNci4h3A74DOWhRqZmZjV+2wzGRgiqTJwP7AFuCDwK1p/nJgbpX7MDOzMlX8lb8R8bykfwZ+AbwG3AWsB16MiG1psc3A1KHWlzQPmAfQ1NREoVCotBQbZGBgwK+nTUjum+On4nCXdCgwBzgaeBG4BThzrOtHxFJgKUBra2v4O55rx9+ZbROV++b4qWZY5nTgmYj4dUS8AfwAOA04JA3TAEwDnq+yRjMzK1M1d2L6BXCKpP0pDsvMBh4E+oBzgBuB84E7qi3SzPYskipaLyJqXMneq+Ij94h4gOIHpw8Bj6VtLQUWAJ+VtBF4K9BTgzrNbA8SEUM+jlqwath5DvbaquoeqhFxBXDFoOangZOr2a6ZmVXHV6iamWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mliGHu5lZhhzuZmYZqircJR0i6VZJT0nql/Q+SYdJulvSz9PPQ2tVrJmZjU21R+5XA2si4jjgRKAfWAisjYgZwNr03MzMxlHF4S7pYOADQA9ARPxfRLwIzAGWp8WWA3OrLdLMzMozuYp1jwZ+DVwv6URgPXAZ0BQRW9IyLwBNQ60saR4wD6CpqYlCoVBFKVZqYGDAr6dNWO6b46OacJ8MvBuYHxEPSLqaQUMwERGSYqiVI2IpsBSgtbU12traqijFShUKBfx62oS0ZrX75jipZsx9M7A5Ih5Iz2+lGPa/knQkQPq5tboSzcysXBWHe0S8ADwn6djUNBt4ElgJnJ/azgfuqKpCMzMrWzXDMgDzgRWS3gw8DVxI8Q/GzZI6gWeBc6vch5mZlamqcI+IR4DWIWbNrma7ZmZWHV+hamaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWoWpvs2dme7ETF93FS6+9UdY6zQtXl7X8wVP2YcMVZ5S1jjnczawKL732BpsWnz3m5QuFAm1tbWXto9w/BlbkYRkzswxVHe6SJkl6WNKq9PxoSQ9I2ijpJklvrr5MMzMrRy2O3C8D+kueLwG+FhHvAH4HdNZgH2ZmVoaqwl3SNOBs4DvpuYAPAremRZYDc6vZh5mZla/aD1SvAr4IHJievxV4MSK2peebgalDrShpHjAPoKmpiUKhUGUptsPAwIBfTxs35fS1Svum+3P5Kg53SR8GtkbEeklt5a4fEUuBpQCtra1R7ifoNrxKzkgwq8ia1WX1tYr6Zpn7sKJqjtxPAz4q6UPAfsBBwNXAIZImp6P3acDz1ZdpZmblqHjMPSK+FBHTIqIZ+Bjwo4g4D+gDzkmLnQ/cUXWVZmZWlnqc574A+KykjRTH4HvqsA8zMxtBTa5QjYgCUEjTTwMn12K7ZmZWGV+hamaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWoZp8K6SZ7Z0ObFnICcsXlrfS8nL3AcVbNVs5HO5mVrGX+xezafHYg7eS2+w1L1xdZlUGHpYxM8uSw93MLEMOdzOzDDnczcwy5HA3M8tQxeEuabqkPklPSnpC0mWp/TBJd0v6efp5aO3KNTOzsajmyH0b8LmIOB44BbhU0vHAQmBtRMwA1qbnZmY2jioO94jYEhEPpemXgX5gKjCHP16msByYW22RZmZWnppcxCSpGfgT4AGgKSK2pFkvAE3DrDMPmAfQ1NREoVCoRSkGDAwM+PW0cVNOX6u0b7o/l6/qcJd0APB94DMR8XtJO+dFREiKodaLiKXAUoDW1tYo96o1G14lVwGaVWTN6rL6WkV9s8x9WFFVZ8tI2odisK+IiB+k5l9JOjLNPxLYWl2JZmZWroqP3FU8RO8B+iPiqyWzVgLnA4vTzzuqqtDMJrSyv/tlTXnLHzxln/K2b0B1wzKnAZ8AHpP0SGr7MsVQv1lSJ/AscG51JZrZRFXOl4ZB8Q9BuetYZSoO94hYB2iY2bMr3a6ZmVXPV6iamWXI4W5mliGHu5lZhhzuZmYZcribmWXI91A1s5orvVJ9t3lLhl8vYsgL2q0CPnI3s5qLiCEffX19w85zsNeWw93MLEMOdzOzDDnczcwy5HA3M8uQw93MLEMOdzOzDDnczcwy5HA3M8uQJsKFA5J+TfHGHlYbhwO/aXQRZkNw36ytoyLiiKFmTIhwt9qS9GBEtDa6DrPB3DfHj4dlzMwy5HA3M8uQwz1PSxtdgNkw3DfHicfczcwy5CN3M7MMOdzNzDLkcDczy5DDvUyS/kHS5ytYb2Ai1VPjGtoknVry/BJJn0zTF0h6e+Oqsx3q3VckfVTSwgrWa5b01/WoadB+dtYnaa6k4+u9z0ZyuE9AKtqTfjdtwM5wj4hvRsQN6ekFgMN9LxARKyNicQWrNgNDhrukmt3neVB9cwGH+95OUpekn0laBxyb2o6RtEbSekn/Lem41N4k6TZJG9Lj1EHbOkDSWkkPSXpM0pzU3izpp5JuAB4Hpkv6gqSfSHpU0qKR6hmh9veU1PIVSY+n9gskfb1kuVWS2tL0tZIelPTEoP1ukrSopPbjJDUDlwCXS3pE0vt3HCFKOgdoBVakeWdLur1ke38u6bayfyE2ZsP03YtTv9og6fuS9pd0oKRnJO2Tljlox3NJn5b0ZOqHN46wr519StIySddIuk/S06kvDGcx8P7URy5P21kp6UfA2lHeM/2Svp366l2SpqR5u9W8o770nvwo8JW0z2Oqf6UnoJFuVutHALwHeAzYHzgI2Ah8HlgLzEjLvBf4UZq+CfhMmp4EHJymB9LPycBBafrwtD1RPHr5A3BKmncGxXOCRfGP8CrgA8PVM0L9jwIfSNNfAR5P0xcAXy9ZbhXQlqYPK6m/ALwrPd8EzE/TnwK+k6b/obSG0udp/dY0LeAp4Ij0/HvARxr9O871MULffWvJMv9Y8ju9HpibpucB/5Kmfwnsm6YPGWF/O/sUsAy4JfXd44GNI6zXBqwatJ3NJf1wpPfMNuCkNO9m4OPD1TxEfec0+ndUz4eP3Ef3fuC2iHg1In4PrAT2ozgMcYukR4BvAUem5T8IXAsQEdsj4qVB2xPwT5IeBf4LmAo0pXnPRsT9afqM9HgYeAg4DpgxTD1DknQIxY59T2r67hj/zedKeijt+53s+t/XH6Sf6ym+ucYsiu+q7wIfT7W9D7iznG1YWYbrKzPT/zYfA86j+DsG+A5wYZq+kGLYQ/EAYYWkj1MM07G6PSL+EBFP8sc+PlZ3R8Rv0/RI75lnIuKRNF3aJyutORs1G8/ay7wJeDEiTqpg3fOAI4D3RMQbkjZR/GMB8ErJcgKujIhvla4s6TMV7HMo29h1WG6/tP2jKR7d/WlE/E7SspL6AF5PP7dTWf+5HvgP4H+BWyJir3zjNdgyikfoGyRdQPHImYi4Nw11tAGTIuLxtPzZFP/X+BGgS9IJY/y9vV4yrTJrLH0vjPSeKd3HdmDKcDWXuf89no/cR3cPMFfSFEkHUuwsrwLPSPor2PkB6Ilp+bXA36T2SZIOHrS9g4GtqZO2A0cNs9//BC6SdEDa1lRJbxumniFFxIvAi5JmpabzSmZvAk6S9CZJ04GTU/tBFN9YL0lqAs4a/qXZ6WXgwLHMi4hfUvwv89/xxyNDq4/h+sqBwJY0vn7eoHVuoDhcdj2Aih/sT4+IPmABxf57QI3rHKn/wNjfM8CYax5tn3s8h/soIuIhiuPoGygOIfwkzToP6JS0AXgCmJPaLwPa039517P7J/IrgNY0/5MUx6CH2u9dFN9kP07L3gocOEI9w7kQ+EYaPio9eroXeAZ4EriG4tAPEbGB4nDMU2n/946yfSgeif/ljg9UB81bBnwzzdtxVLUCeC4i+sewbavQCH3l74EHKP5uB/e/FcChQG96Pgn499QHHwauSQcNtfQosD19wHv5EPPH9J4pMZaabwS+IOnhXD9Q9XfL7EXSmS2rImJmg+v4OvBwRPQ0sg7bXTqrZU5EfKLRtVh1POZu40rSeorDPp9rdC22K0n/SnEY7kONrsWq5yP3TEj6BnDaoOarI8Lj2lZTki6kOPxY6t6IuHSU9U5g9zO2Xo+I99ayPityuJuZZcgfqJqZZcjhbmaWIYe7mVmGHO5mZhn6f4hfZezlxGJIAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BKEia4Kl34El",
"colab_type": "text"
},
"source": [
"A partir dos gráficos, chegamos a conclusão que na variável ```days_in_transit``` existem possíveis *outliers*."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DaPGIVkhpsiv",
"colab_type": "text"
},
"source": [
"## Aplicando o modelo de *Machine Learning*\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "qRvMpmNCn8Le",
"colab_type": "code",
"colab": {}
},
"source": [
"# realizando a análise da regressão\n",
"x = df.declared_weight.values # variável independente\n",
"y = df.actual_weight # variável dependente"
],
"execution_count": 21,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Al_cgSH4o5yj",
"colab_type": "code",
"colab": {}
},
"source": [
"# importar o modelo de regressão linear univariada\n",
"from sklearn.linear_model import LinearRegression"
],
"execution_count": 22,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "6XVaDrJ6pERA",
"colab_type": "code",
"colab": {}
},
"source": [
"# realiza a contrução do modelo de regressão\n",
"reg = LinearRegression()\n",
"x_reshapeed = x.reshape((-1, 1)) # transforma os dados para o plano 2D\n",
"regressao = reg.fit (x_reshapeed, y) #encontra os coeficientes (realiza a regressão)"
],
"execution_count": 23,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "a1G35DZQpgWj",
"colab_type": "code",
"colab": {}
},
"source": [
"# realiza a previsão\n",
"previsao = reg.predict(x_reshapeed)"
],
"execution_count": 24,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "8dBM9WUWpndu",
"colab_type": "code",
"colab": {}
},
"source": [
"# análise do modelo\n",
"from sklearn.metrics import r2_score"
],
"execution_count": 25,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "LYv3thynp3ON",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 53
},
"outputId": "98a8cfcb-de4f-40e6-c74a-4dc9aa25fb4c"
},
"source": [
"# parâmetros encontrados\n",
"print('Y = {}X {}'.format(reg.coef_, reg.intercept_))\n",
"\n",
"R_2 = r2_score(y, previsao) # calcula o R2\n",
"\n",
"print(\"Coeficiente de Correlação (R2):\", R_2)"
],
"execution_count": 26,
"outputs": [
{
"output_type": "stream",
"text": [
"Y = [1.03718115]X -5.296233030439225\n",
"Coeficiente de Correlação (R2): 0.9993288165644932\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5qWPvtSfqOrm",
"colab_type": "text"
},
"source": [
"### **Pelo Coeficiente de Correlação (R2), o que é possível afirmar sobre a relação entre as variáveis?**\n",
"\n",
"Podemos afirmar que a análise possui um bom 'fit' dos dados, ou seja, é possível prever o peso real de um indivíduo a partir do seu peso declarado."
]
},
{
"cell_type": "code",
"metadata": {
"id": "JEyMOqCorLlP",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 382
},
"outputId": "def1b679-0d30-41c7-b008-ea6611fc399f"
},
"source": [
"# plotar o gráfico dos dados\n",
"plt.figure(figsize=(4, 4), dpi=100)\n",
"plt.scatter(x, y, color='red') # plota o gráfio de dispersão\n",
"plt.plot(x, previsao, color='black', linewidth = 2) # plota a linha do gráfico\n",
"plt.xlabel(\"Peso Declarado\")\n",
"plt.ylabel(\"Peso Real\")\n",
"plt.show();"
],
"execution_count": 27,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 400x400 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment